NYUHSL Faculty Bibliography

Searched for:

in-biosketch:true

person:smallw03

Total Results:

npj health systems. 2026:3(1).DOI: 10.1038/s44401-025-00059-8

Enhancing the prediction of hospital discharge disposition with extraction-based language model classification

Small, William R; Crowley, Ryan J; Pariente, Chloe; Zhang, Jeff; Eaton, Kevin P; Jiang, Lavender Yao; Oermann, Eric; Aphinyanaphongs, Yindalon

Early identification of inpatient discharges to skilled nursing facilities (SNFs) facilitates care transition planning. Predictive information in admission history and physical notes (H&Ps) is dispersed across long documents. Language models adeptly predict clinical outcomes from text but have limitations: token length constraints, noisy inputs, and opaque outputs. Therefore, we developed extraction-based language model classification (ELC): generative language models distill H&Ps into task-relevant categories ("Structured Extracted Data") before summarizing them into a concise narrative ("AI Risk Snapshot"). We hypothesized that language models utilizing AI Risk Snapshots to predict SNF discharges would perform the best. In this retrospective observational study, nine language models predicted SNF discharges from unstructured predictors (raw H&P text, truncated assessment and plan) and ELC-derived predictors (Structured Extracted Data, AI Risk Snapshots). ELC substantially reduced input length (AI Risk Snapshot median 141 tokens vs raw H&P median 2,120 tokens) and improved average AUROC and AUPRC across models. The best performance was achieved by Bio+Clinical BERT fine-tuned on AI Risk Snapshots (AUROC = .851). AI Risk Snapshots enhanced interpretability by aligning with nurse case managers' risk assessments and facilitating prompt design. Structuring and summarizing H&Ps via ELC thus mitigates the practical limitations of language models and improves SNF discharge prediction.

PMCID:12789015

PMID: 41522677

ISSN: 3005-1959

CID: 5985892

Journal of the American College of Radiology : JACR. 2025:22(11S):S610-S624.DOI: 10.1016/j.jacr.2025.08.037

ACR Appropriateness Criteria® Screening, Locoregional Assessment, and Surveillance of Pancreatic Ductal Adenocarcinoma: 2025 Update

,; Fung, Alice; Zaheer, Atif; Porter, Kristin K; Bashir, Mustafa R; Cash, Brooks D; Chiorean, E Gabriela; Choi, Youngjee; Ejaz, Aslam; Gage, Kenneth L; Russo, Gregory K; Small, William; Smith, Elainea N; Thakrar, Kiran H; Vij, Abhinav; Wahab, Shaun A; Kim, David H

Pancreatic ductal adenocarcinoma is a highly lethal cancer that often presents with vague and indolent symptoms leading to advanced stage diagnosis. Imaging plays a crucial role in the diagnosis, assessment of locoregional and metastatic disease, surgical planning, and surveillance after neoadjuvant therapy and surgery. This document reviews available imaging modalities that are best used for these clinical scenarios, and a summary of current evidence is provided to support the use of the various modalities in each of the clinical contexts. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision process support the systematic analysis of the medical literature from peer reviewed journals. Established methodology principles such as Grading of Recommendations Assessment, Development, and Evaluation or GRADE are adapted to evaluate the evidence. The RAND/UCLA Appropriateness Method User Manual provides the methodology to determine the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where peer reviewed literature is lacking or equivocal, experts may be the primary evidentiary source available to formulate a recommendation.

PMID: 41193048

ISSN: 1558-349x

CID: 5959922

Journal of the American College of Radiology : JACR. 2025:22(11S):S578-S585.DOI: 10.1016/j.jacr.2025.08.040

ACR Appropriateness Criteria® Male Breast Cancer Screening

,; Freer, Phoebe E; Neal, Colleen H; Brown, Ann; Bennett, Debbie L; Cassidy, Michael R; Chetlen, Alison; Dibble, Elizabeth H; Giordano, Sharon H; Greenwood, Heather I; Hurley, Janet; Ivansco, Lillian K; Malak, Sharp F; Rauch, Gaiane M; Reig, Beatriu; Singh, Puneet; Small, William; Yeh, Eren D; Slanetz, Priscilla J

Breast cancer screening recommendations have been established historically for women, but, have been less clearly outlined for men. For average-risk men and younger men less than 25 year of age, imaging is not usually appropriate as a screening test for breast cancer. For men of higher-than-average risk, screening with mammography as annual surveillance imaging is usually appropriate. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision process support the systematic analysis of the medical literature from peer reviewed journals. Established methodology principles such as Grading of Recommendations Assessment, Development, and Evaluation or GRADE are adapted to evaluate the evidence. The RAND/UCLA Appropriateness Method User Manual provides the methodology to determine the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where peer reviewed literature is lacking or equivocal, experts may be the primary evidentiary source available to formulate a recommendation.

PMID: 41193045

ISSN: 1558-349x

CID: 5959912

Journal of the American College of Radiology : JACR. 2025:22(11S):S508-S530.DOI: 10.1016/j.jacr.2025.08.044

ACR Appropriateness Criteria® Female Breast Cancer Screening: 2025 Update

,; Yeh, Eren D; Brown, Ann; Freer, Phoebe E; Bahl, Manisha; Bennett, Debbie L; Darbha, Lalitha; Dibble, Elizabeth H; Greenwood, Heather I; Hill, Faihza M; Ivansco, Lillian K; Kremer, Mallory E; Minami, Christina A; Mullen, Lisa A; Neal, Colleen H; Newell, Mary S; Radhakrishnan, Archana; Rauch, Gaiane M; Reig, Beatriu; Shaughnessy, Elizabeth; Small, William; Ulaner, Gary A; Lewin, Alana A

Routine screening substantially reduces the risk of mortality and morbidity of breast cancer with early detection. Multiple different imaging modalities may be used to screen for breast cancer. Screening recommendations differ based on an individual's risk of developing breast cancer. Numerous factors contribute to breast cancer risk, which is frequently divided into three major categories: average, intermediate, and high risk. For patients assigned female at birth with native breast tissue, mammography and digital breast tomosynthesis are recommended for breast cancer screening in all risk categories. In high-risk patients, screening with breast MRI is recommended starting as early as 25 to 30 years of age and mammography and digital breast tomosynthesis with a variable starting age between 25 and 40 years of age, depending on the type of risk. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision process support the systematic analysis of the medical literature from peer reviewed journals. Established methodology principles such as Grading of Recommendations Assessment, Development, and Evaluation or GRADE are adapted to evaluate the evidence. The RAND/UCLA Appropriateness Method User Manual provides the methodology to determine the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where peer reviewed literature is lacking or equivocal, experts may be the primary evidentiary source available to formulate a recommendation.

PMID: 41193041

ISSN: 1558-349x

CID: 5959892

JAMA network open. 2025:8(8).DOI: 10.1001/jamanetworkopen.2025.26339

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

Small, William R; Austrian, Jonathan; O'Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah

IMPORTANCE/UNASSIGNED:Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown. OBJECTIVES/UNASSIGNED:To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health. EXPOSURES/UNASSIGNED:Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales). RESULTS/UNASSIGNED:Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46). CONCLUSIONS AND RELEVANCE/UNASSIGNED:Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.

PMID: 40802185

ISSN: 2574-3805

CID: 5906762

Applied clinical informatics. 2025:16(4):1114-1120.DOI: 10.1055/a-2675-3510

Disappearing Text as a Clinical Decision Support Layer: A Case Series

Silberlust, Jared; Small, William; Shah, Darshi; Chakravartty, Eesha; Moawad, Katherine; Moawad, Andrew; Testa, Paul; Feldman, Jonah

OBJECTIVES/OBJECTIVE:This case series aims to evaluate several applications of inline disappearing text (DT) clinical decision support (CDS) tools within clinician documentation. METHODS:DT blocks were created to prompt documentation for perioperative anticoagulation planning (Scenario 1), pre-discharge intravenous antibiotic planning (Scenario 2), and advanced care planning (Scenario 3). In Scenario 1, DT was the only intervention. In Scenario 2, DT was paired with a documentation SmartList. In Scenario 3, DT was paired with a documentation SmartList and an OurPractice Advisory. The number of documented perioperative anticoagulation plans, pre-discharge intravenous antibiotic plans, and Advanced Care Planning notes were measured pre- and post-intervention and compared using Chi-square analyses. RESULTS:In Scenario 1, there was no statistically significant change in the percentage of perioperative anticoagulation plans documented at 0-24 and 24-48 hours before surgery. In Scenario 2, documentation of antibiotic contingency planning in patients expected to be discharged within 24 hours increased from 60% (54 of 90 notes) to 93% (1,850 of 1,994 notes) X2 (1, N=2,084) = 113.1, p < 0.001. In Scenario 3, ACP note documentation by discharge in patients with a positive mandatory surprise question increased from 43% (821 of 1,909 encounters) to 52% (975 of 1,874 encounters) X2 (1, N=3,783) = 30.5, p < 0.001. CONCLUSIONS:Utilizing DT in conjunction with other forms of CDS was associated with an improvement of documentation quality in pre-discharge IV antibiotics and advanced care planning. A sociotechnical analysis explores how interactions between technology, people, workflow, and culture could contextualize how utilizing DT with other forms of CDS was more effective than DT alone.

PMID: 40763805

ISSN: 1869-0327

CID: 5905032

Journal of the American College of Radiology : JACR. 2025:22(5S):S359-S371.DOI: 10.1016/j.jacr.2025.02.016

ACR Appropriateness Criteria® Ovarian Cancer Screening: 2024 Update

,; Venkatesan, Aradhana M; Kilcoyne, Aoife; Akin, Esma A; Chuang, Linus; Hindman, Nicole M; Huang, Chenchan; McCourt, Carolyn Kay; Rauch, Gaiane M; Sattari, Maryam; Schoenborn, Nancy; Schultz, David; Sertic, Madeleine; Small, William; Stein, Erica B; Suarez-Weiss, Krista; Kang, Stella K

Ovarian cancer remains low in prevalence but has the highest mortality of all gynecologic malignancies. Population-based screening for ovarian cancer remains a topic of interest in contemporary practice, given that the majority of cancers encountered are high-grade aggressive malignancies, for which favorable survival is encountered in the setting of early-stage disease. This document summarizes a review of the available data from randomized and observational trials that have evaluated the role of imaging for ovarian cancer screening in average-risk and high-risk patients. When considering screening using pelvic ultrasound in average-risk patients, we found insufficient published evidence to recommend ovarian cancer screening. Randomized controlled trials have not demonstrated a mortality benefit in this setting. Screening with pelvic ultrasound may be appropriate for select patients at high risk, although the existing data remain limited as large, randomized trials have not been performed in this setting. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision process support the systematic analysis of the medical literature from peer reviewed journals. Established methodology principles such as Grading of Recommendations Assessment, Development, and Evaluation or GRADE are adapted to evaluate the evidence. The RAND/UCLA Appropriateness Method User Manual provides the methodology to determine the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where peer reviewed literature is lacking or equivocal, experts may be the primary evidentiary source available to formulate a recommendation.

PMID: 40409887

ISSN: 1558-349x

CID: 5853732

Journal of the American College of Radiology : JACR. 2025:22(5S):S405-S423.DOI: 10.1016/j.jacr.2025.02.023

ACR Appropriateness Criteria® Supplemental Breast Cancer Screening Based on Breast Density: 2024 Update

,; Paulis, Lisa V; Lewin, Alana A; Weinstein, Susan P; Baron, Paul; Dayaratna, Sandra; Dodelzon, Katerina; Dogan, Basak E; Gulati, Abhishek; Kantor, Olga; Kasales, Claudia; Kunjummen, Jean M; Kuzmiak, Cherie M; Newell, Mary S; Salkowski, Lonie R; Sharpe, Richard E; Small, William; Ulaner, Gary A; Slanetz, Priscilla J

Screening mammography has been proven to reduce the mortality from breast cancer by approximately 30%, however, it is less sensitive in women with dense breast tissue and certain risk groups. Supplemental screening may be considered based on the patient's risk level and breast density. In all women, digital breast tomosynthesis improves screening sensitivity. Average-risk women with heterogeneously dense tissue may also benefit from breast MRI, abbreviated breast MRI (AB-MRI) or breast ultrasound (US). In intermediate-risk women with nondense tissue, breast MRI and ABMRI may be appropriate. In intermediate-risk women with heterogeneously dense and extremely dense tissue, breast MRI and AB-MRI are usually appropriate, whereas US and contrast-enhanced mammography (CEM) may be appropriate. Breast MRI or ABMRI is usually appropriate in all high-risk women, regardless of density. Screening breast US or CEM could be considered in this population. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision process support the systematic analysis of the medical literature from peer reviewed journals. Established methodology principles such as Grading of Recommendations Assessment, Development, and Evaluation or GRADE are adapted to evaluate the evidence. The RAND/UCLA Appropriateness Method User Manual provides the methodology to determine the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where peer reviewed literature is lacking or equivocal, experts may be the primary evidentiary source available to formulate a recommendation.

PMID: 40409891

ISSN: 1558-349x

CID: 5853762

Journal of diabetes science & technology. 2025.DOI: 10.1177/19322968251324535

Classifying Continuous Glucose Monitoring Documents From Electronic Health Records

Zheng, Yaguang; Iturrate, Eduardo; Li, Lehan; Wu, Bei; Small, William R; Zweig, Susan; Fletcher, Jason; Chen, Zhihao; Johnson, Stephen B

BACKGROUND:Clinical use of continuous glucose monitoring (CGM) is increasing storage of CGM-related documents in electronic health records (EHR); however, the standardization of CGM storage is lacking. We aimed to evaluate the sensitivity and specificity of CGM Ambulatory Glucose Profile (AGP) classification criteria. METHODS:We randomly chose 2244 (18.1%) documents from NYU Langone Health. Our document classification algorithm: (1) separated multiple-page documents into a single-page image; (2) rotated all pages into an upright orientation; (3) determined types of devices using optical character recognition; and (4) tested for the presence of particular keywords in the text. Two experts in using CGM for research and clinical practice conducted an independent manual review of 62 (2.8%) reports. We calculated sensitivity (correct classification of CGM AGP report) and specificity (correct classification of non-CGM report) by comparing the classification algorithm against manual review. RESULTS:Among 2244 documents, 1040 (46.5%) were classified as CGM AGP reports (43.3% FreeStyle Libre and 56.7% Dexcom), 1170 (52.1%) non-CGM reports (eg, progress notes, CGM request forms, or physician letters), and 34 (1.5%) uncertain documents. The agreement for the evaluation of the documents between the two experts was 100% for sensitivity and 98.4% for specificity. When comparing the classification result between the algorithm and manual review, the sensitivity and specificity were 95.0% and 91.7%. CONCLUSION/CONCLUSIONS:Nearly half of CGM-related documents were AGP reports, which are useful for clinical practice and diabetes research; however, the remaining half are other clinical documents. Future work needs to standardize the storage of CGM-related documents in the EHR.

PMCID:11904921

PMID: 40071848

ISSN: 1932-2968

CID: 5808452

JAMA network open. 2025:8(8).DOI:

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

Small, William R.; Austrian, Jonathan; O\Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A.; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J.; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah

ISI:001551557000002

ISSN: 2574-3805

CID: 5974192