Try a new search

Format these results:

Searched for:

in-biosketch:yes

person:smallw03

Total Results:

18


ACR Appropriateness Criteria® Screening, Locoregional Assessment, and Surveillance of Pancreatic Ductal Adenocarcinoma: 2025 Update

,; Fung, Alice; Zaheer, Atif; Porter, Kristin K; Bashir, Mustafa R; Cash, Brooks D; Chiorean, E Gabriela; Choi, Youngjee; Ejaz, Aslam; Gage, Kenneth L; Russo, Gregory K; Small, William; Smith, Elainea N; Thakrar, Kiran H; Vij, Abhinav; Wahab, Shaun A; Kim, David H
Pancreatic ductal adenocarcinoma is a highly lethal cancer that often presents with vague and indolent symptoms leading to advanced stage diagnosis. Imaging plays a crucial role in the diagnosis, assessment of locoregional and metastatic disease, surgical planning, and surveillance after neoadjuvant therapy and surgery. This document reviews available imaging modalities that are best used for these clinical scenarios, and a summary of current evidence is provided to support the use of the various modalities in each of the clinical contexts. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision process support the systematic analysis of the medical literature from peer reviewed journals. Established methodology principles such as Grading of Recommendations Assessment, Development, and Evaluation or GRADE are adapted to evaluate the evidence. The RAND/UCLA Appropriateness Method User Manual provides the methodology to determine the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where peer reviewed literature is lacking or equivocal, experts may be the primary evidentiary source available to formulate a recommendation.
PMID: 41193048
ISSN: 1558-349x
CID: 5959922

ACR Appropriateness Criteria® Male Breast Cancer Screening

,; Freer, Phoebe E; Neal, Colleen H; Brown, Ann; Bennett, Debbie L; Cassidy, Michael R; Chetlen, Alison; Dibble, Elizabeth H; Giordano, Sharon H; Greenwood, Heather I; Hurley, Janet; Ivansco, Lillian K; Malak, Sharp F; Rauch, Gaiane M; Reig, Beatriu; Singh, Puneet; Small, William; Yeh, Eren D; Slanetz, Priscilla J
Breast cancer screening recommendations have been established historically for women, but, have been less clearly outlined for men. For average-risk men and younger men less than 25 year of age, imaging is not usually appropriate as a screening test for breast cancer. For men of higher-than-average risk, screening with mammography as annual surveillance imaging is usually appropriate. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision process support the systematic analysis of the medical literature from peer reviewed journals. Established methodology principles such as Grading of Recommendations Assessment, Development, and Evaluation or GRADE are adapted to evaluate the evidence. The RAND/UCLA Appropriateness Method User Manual provides the methodology to determine the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where peer reviewed literature is lacking or equivocal, experts may be the primary evidentiary source available to formulate a recommendation.
PMID: 41193045
ISSN: 1558-349x
CID: 5959912

ACR Appropriateness Criteria® Female Breast Cancer Screening: 2025 Update

,; Yeh, Eren D; Brown, Ann; Freer, Phoebe E; Bahl, Manisha; Bennett, Debbie L; Darbha, Lalitha; Dibble, Elizabeth H; Greenwood, Heather I; Hill, Faihza M; Ivansco, Lillian K; Kremer, Mallory E; Minami, Christina A; Mullen, Lisa A; Neal, Colleen H; Newell, Mary S; Radhakrishnan, Archana; Rauch, Gaiane M; Reig, Beatriu; Shaughnessy, Elizabeth; Small, William; Ulaner, Gary A; Lewin, Alana A
Routine screening substantially reduces the risk of mortality and morbidity of breast cancer with early detection. Multiple different imaging modalities may be used to screen for breast cancer. Screening recommendations differ based on an individual's risk of developing breast cancer. Numerous factors contribute to breast cancer risk, which is frequently divided into three major categories: average, intermediate, and high risk. For patients assigned female at birth with native breast tissue, mammography and digital breast tomosynthesis are recommended for breast cancer screening in all risk categories. In high-risk patients, screening with breast MRI is recommended starting as early as 25 to 30 years of age and mammography and digital breast tomosynthesis with a variable starting age between 25 and 40 years of age, depending on the type of risk. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision process support the systematic analysis of the medical literature from peer reviewed journals. Established methodology principles such as Grading of Recommendations Assessment, Development, and Evaluation or GRADE are adapted to evaluate the evidence. The RAND/UCLA Appropriateness Method User Manual provides the methodology to determine the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where peer reviewed literature is lacking or equivocal, experts may be the primary evidentiary source available to formulate a recommendation.
PMID: 41193041
ISSN: 1558-349x
CID: 5959892

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

Small, William R; Austrian, Jonathan; O'Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
IMPORTANCE/UNASSIGNED:Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown. OBJECTIVES/UNASSIGNED:To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health. EXPOSURES/UNASSIGNED:Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales). RESULTS/UNASSIGNED:Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46). CONCLUSIONS AND RELEVANCE/UNASSIGNED:Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.
PMID: 40802185
ISSN: 2574-3805
CID: 5906762

Disappearing Text as a Clinical Decision Support Layer: A Case Series

Silberlust, Jared; Small, William; Shah, Darshi; Chakravartty, Eesha; Moawad, Katherine; Moawad, Andrew; Testa, Paul; Feldman, Jonah
OBJECTIVES/OBJECTIVE:This case series aims to evaluate several applications of inline disappearing text (DT) clinical decision support (CDS) tools within clinician documentation. METHODS:DT blocks were created to prompt documentation for perioperative anticoagulation planning (Scenario 1), pre-discharge intravenous antibiotic planning (Scenario 2), and advanced care planning (Scenario 3). In Scenario 1, DT was the only intervention. In Scenario 2, DT was paired with a documentation SmartList. In Scenario 3, DT was paired with a documentation SmartList and an OurPractice Advisory. The number of documented perioperative anticoagulation plans, pre-discharge intravenous antibiotic plans, and Advanced Care Planning notes were measured pre- and post-intervention and compared using Chi-square analyses. RESULTS:In Scenario 1, there was no statistically significant change in the percentage of perioperative anticoagulation plans documented at 0-24 and 24-48 hours before surgery. In Scenario 2, documentation of antibiotic contingency planning in patients expected to be discharged within 24 hours increased from 60% (54 of 90 notes) to 93% (1,850 of 1,994 notes) X2 (1, N=2,084) = 113.1, p < 0.001. In Scenario 3, ACP note documentation by discharge in patients with a positive mandatory surprise question increased from 43% (821 of 1,909 encounters) to 52% (975 of 1,874 encounters) X2 (1, N=3,783) = 30.5, p < 0.001. CONCLUSIONS:Utilizing DT in conjunction with other forms of CDS was associated with an improvement of documentation quality in pre-discharge IV antibiotics and advanced care planning. A sociotechnical analysis explores how interactions between technology, people, workflow, and culture could contextualize how utilizing DT with other forms of CDS was more effective than DT alone.
PMID: 40763805
ISSN: 1869-0327
CID: 5905032

ACR Appropriateness Criteria® Supplemental Breast Cancer Screening Based on Breast Density: 2024 Update

,; Paulis, Lisa V; Lewin, Alana A; Weinstein, Susan P; Baron, Paul; Dayaratna, Sandra; Dodelzon, Katerina; Dogan, Basak E; Gulati, Abhishek; Kantor, Olga; Kasales, Claudia; Kunjummen, Jean M; Kuzmiak, Cherie M; Newell, Mary S; Salkowski, Lonie R; Sharpe, Richard E; Small, William; Ulaner, Gary A; Slanetz, Priscilla J
Screening mammography has been proven to reduce the mortality from breast cancer by approximately 30%, however, it is less sensitive in women with dense breast tissue and certain risk groups. Supplemental screening may be considered based on the patient's risk level and breast density. In all women, digital breast tomosynthesis improves screening sensitivity. Average-risk women with heterogeneously dense tissue may also benefit from breast MRI, abbreviated breast MRI (AB-MRI) or breast ultrasound (US). In intermediate-risk women with nondense tissue, breast MRI and ABMRI may be appropriate. In intermediate-risk women with heterogeneously dense and extremely dense tissue, breast MRI and AB-MRI are usually appropriate, whereas US and contrast-enhanced mammography (CEM) may be appropriate. Breast MRI or ABMRI is usually appropriate in all high-risk women, regardless of density. Screening breast US or CEM could be considered in this population. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision process support the systematic analysis of the medical literature from peer reviewed journals. Established methodology principles such as Grading of Recommendations Assessment, Development, and Evaluation or GRADE are adapted to evaluate the evidence. The RAND/UCLA Appropriateness Method User Manual provides the methodology to determine the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where peer reviewed literature is lacking or equivocal, experts may be the primary evidentiary source available to formulate a recommendation.
PMID: 40409891
ISSN: 1558-349x
CID: 5853762

ACR Appropriateness Criteria® Ovarian Cancer Screening: 2024 Update

,; Venkatesan, Aradhana M; Kilcoyne, Aoife; Akin, Esma A; Chuang, Linus; Hindman, Nicole M; Huang, Chenchan; McCourt, Carolyn Kay; Rauch, Gaiane M; Sattari, Maryam; Schoenborn, Nancy; Schultz, David; Sertic, Madeleine; Small, William; Stein, Erica B; Suarez-Weiss, Krista; Kang, Stella K
Ovarian cancer remains low in prevalence but has the highest mortality of all gynecologic malignancies. Population-based screening for ovarian cancer remains a topic of interest in contemporary practice, given that the majority of cancers encountered are high-grade aggressive malignancies, for which favorable survival is encountered in the setting of early-stage disease. This document summarizes a review of the available data from randomized and observational trials that have evaluated the role of imaging for ovarian cancer screening in average-risk and high-risk patients. When considering screening using pelvic ultrasound in average-risk patients, we found insufficient published evidence to recommend ovarian cancer screening. Randomized controlled trials have not demonstrated a mortality benefit in this setting. Screening with pelvic ultrasound may be appropriate for select patients at high risk, although the existing data remain limited as large, randomized trials have not been performed in this setting. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision process support the systematic analysis of the medical literature from peer reviewed journals. Established methodology principles such as Grading of Recommendations Assessment, Development, and Evaluation or GRADE are adapted to evaluate the evidence. The RAND/UCLA Appropriateness Method User Manual provides the methodology to determine the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where peer reviewed literature is lacking or equivocal, experts may be the primary evidentiary source available to formulate a recommendation.
PMID: 40409887
ISSN: 1558-349x
CID: 5853732

Classifying Continuous Glucose Monitoring Documents From Electronic Health Records

Zheng, Yaguang; Iturrate, Eduardo; Li, Lehan; Wu, Bei; Small, William R; Zweig, Susan; Fletcher, Jason; Chen, Zhihao; Johnson, Stephen B
BACKGROUND:Clinical use of continuous glucose monitoring (CGM) is increasing storage of CGM-related documents in electronic health records (EHR); however, the standardization of CGM storage is lacking. We aimed to evaluate the sensitivity and specificity of CGM Ambulatory Glucose Profile (AGP) classification criteria. METHODS:We randomly chose 2244 (18.1%) documents from NYU Langone Health. Our document classification algorithm: (1) separated multiple-page documents into a single-page image; (2) rotated all pages into an upright orientation; (3) determined types of devices using optical character recognition; and (4) tested for the presence of particular keywords in the text. Two experts in using CGM for research and clinical practice conducted an independent manual review of 62 (2.8%) reports. We calculated sensitivity (correct classification of CGM AGP report) and specificity (correct classification of non-CGM report) by comparing the classification algorithm against manual review. RESULTS:Among 2244 documents, 1040 (46.5%) were classified as CGM AGP reports (43.3% FreeStyle Libre and 56.7% Dexcom), 1170 (52.1%) non-CGM reports (eg, progress notes, CGM request forms, or physician letters), and 34 (1.5%) uncertain documents. The agreement for the evaluation of the documents between the two experts was 100% for sensitivity and 98.4% for specificity. When comparing the classification result between the algorithm and manual review, the sensitivity and specificity were 95.0% and 91.7%. CONCLUSION/CONCLUSIONS:Nearly half of CGM-related documents were AGP reports, which are useful for clinical practice and diabetes research; however, the remaining half are other clinical documents. Future work needs to standardize the storage of CGM-related documents in the EHR.
PMCID:11904921
PMID: 40071848
ISSN: 1932-2968
CID: 5808452

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

Small, William R.; Austrian, Jonathan; O\Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A.; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J.; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
ISI:001551557000002
ISSN: 2574-3805
CID: 5974192

Evaluating Large Language Models in extracting cognitive exam dates and scores

Zhang, Hao; Jethani, Neil; Jones, Simon; Genes, Nicholas; Major, Vincent J; Jaffe, Ian S; Cardillo, Anthony B; Heilenbach, Noah; Ali, Nadia Fazal; Bonanni, Luke J; Clayburn, Andrew J; Khera, Zain; Sadler, Erica C; Prasad, Jaideep; Schlacter, Jamie; Liu, Kevin; Silva, Benjamin; Montgomery, Sophie; Kim, Eric J; Lester, Jacob; Hill, Theodore M; Avoricani, Alba; Chervonski, Ethan; Davydov, James; Small, William; Chakravartty, Eesha; Grover, Himanshu; Dodson, John A; Brody, Abraham A; Aphinyanaphongs, Yindalon; Masurkar, Arjun; Razavian, Narges
Ensuring reliability of Large Language Models (LLMs) in clinical tasks is crucial. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.
PMCID:11634005
PMID: 39661652
ISSN: 2767-3170
CID: 5762692