Searched for: in-biosketch:true
person:burkrj01
Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model
Small, William R; Austrian, Jonathan; O'Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
IMPORTANCE/UNASSIGNED:Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown. OBJECTIVES/UNASSIGNED:To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health. EXPOSURES/UNASSIGNED:Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales). RESULTS/UNASSIGNED:Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46). CONCLUSIONS AND RELEVANCE/UNASSIGNED:Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.
PMID: 40802185
ISSN: 2574-3805
CID: 5906762
Macy Foundation Innovation Report Part II: From Hype to Reality: Innovators' Visions for Navigating AI Integration Challenges in Medical Education
Gin, Brian C; LaForge, Kate; Burk-Rafel, Jesse; Boscardin, Christy K
PURPOSE/OBJECTIVE:Artificial intelligence (AI) promises to significantly impact medical education, yet its implementation raises important questions about educational effectiveness, ethical use, and equity. In the second part of a 2-part innovation report, which was commissioned by the Josiah Macy Jr. Foundation to inform discussions at a conference on AI in medical education, the authors explore the perspectives of innovators actively integrating AI into medical education, examining their perceptions regarding the impacts, opportunities, challenges, and strategies for successful AI adoption and risk mitigation. METHOD/METHODS:Semi-structured interviews were conducted with 25 medical education AI innovators-including learners, educators, institutional leaders, and industry representatives-from June to August 2024. Interviews explored participants' perceptions of AI's influence on medical education, challenges to integration, and strategies for mitigating challenges. Transcripts were analyzed using thematic analysis to identify themes and synthesize participants' recommendations for AI integration. RESULTS:Innovators' responses were synthesized into 2 main thematic areas: (1) AI's impact on teaching, learning, and assessment, and (2) perceived threats and strategies for mitigating them. Participants identified AI's potential to enact precision education through virtual tutors and standardized patients, support active learning formats, enable centralized teaching, and facilitate cognitive offloading. AI-enhanced assessments could automate grading, predict learner trajectories, and integrate performance data from clinical interactions. Yet, innovators expressed concerns over threats to transparency and validity, potential propagation of biases, risks of over-reliance and deskilling, and institutional disparities. Proposed mitigation strategies emphasized validating AI outputs, establishing foundational competencies, fostering collaboration and open-source sharing, enhancing AI literacy, and maintaining robust ethical standards. CONCLUSIONS:AI innovators in medical education envision transformative opportunities for individualized learning and precision education, balanced against critical threats. Realizing these benefits requires proactive, collaborative efforts to establish rigorous validation frameworks; uphold foundational medical competencies; and prioritize ethical, equitable AI integration.
PMID: 40479503
ISSN: 1938-808x
CID: 5862832
Large Language Model-Augmented Strategic Analysis of Innovation Projects in Graduate Medical Education
Winkel, Abigail Ford; Burk-Rafel, Jesse; Terhune, Kyla; Garibaldi, Brian T; DeWaters, Ami L; Co, John Patrick T; Andrews, John S
PMCID:12080501
PMID: 40386486
ISSN: 1949-8357
CID: 5852792
How Data Analytics Can Be Leveraged to Enhance Graduate Clinical Skills Education
Garibaldi, Brian T; Hollon, McKenzie; Knopp, Michelle I; Winkel, Abigail Ford; Burk-Rafel, Jesse; Caretta-Weyer, Holly A
PMCID:12080502
PMID: 40386478
ISSN: 1949-8357
CID: 5852752
Artificial intelligence based assessment of clinical reasoning documentation: an observational study of the impact of the clinical learning environment on resident documentation quality
Schaye, Verity; DiTullio, David J; Sartori, Daniel J; Hauck, Kevin; Haller, Matthew; Reinstein, Ilan; Guzman, Benedict; Burk-Rafel, Jesse
BACKGROUND:Objective measures and large datasets are needed to determine aspects of the Clinical Learning Environment (CLE) impacting the essential skill of clinical reasoning documentation. Artificial Intelligence (AI) offers a solution. Here, the authors sought to determine what aspects of the CLE might be impacting resident clinical reasoning documentation quality assessed by AI. METHODS:In this observational, retrospective cross-sectional analysis of hospital admission notes from the Electronic Health Record (EHR), all categorical internal medicine (IM) residents who wrote at least one admission note during the study period July 1, 2018- June 30, 2023 at two sites of NYU Grossman School of Medicine's IM residency program were included. Clinical reasoning documentation quality of admission notes was determined to be low or high-quality using a supervised machine learning model. From note-level data, the shift (day or night) and note index within shift (if a note was first, second, etc. within shift) were calculated. These aspects of the CLE were included as potential markers of workload, which have been shown to have a strong relationship with resident performance. Patient data was also captured, including age, sex, Charlson Comorbidity Index, and primary diagnosis. The relationship between these variables and clinical reasoning documentation quality was analyzed using generalized estimating equations accounting for resident-level clustering. RESULTS:Across 37,750 notes authored by 474 residents, patients who were older, had more pre-existing comorbidities, and presented with certain primary diagnoses (e.g., infectious and pulmonary conditions) were associated with higher clinical reasoning documentation quality. When controlling for these and other patient factors, variables associated with clinical reasoning documentation quality included academic year (adjusted odds ratio, aOR, for high-quality: 1.10; 95% CI 1.06-1.15; P <.001), night shift (aOR 1.21; 95% CI 1.13-1.30; P <.001), and note index (aOR 0.93; 95% CI 0.90-0.95; P <.001). CONCLUSIONS:AI can be used to assess complex skills such as clinical reasoning in authentic clinical notes that can help elucidate the potential impact of the CLE on resident clinical reasoning documentation quality. Future work should explore residency program and systems interventions to optimize the CLE.
PMCID:12016287
PMID: 40264096
ISSN: 1472-6920
CID: 5830212
Large Language Model-Based Assessment of Clinical Reasoning Documentation in the Electronic Health Record Across Two Institutions: Development and Validation Study
Schaye, Verity; DiTullio, David; Guzman, Benedict Vincent; Vennemeyer, Scott; Shih, Hanniel; Reinstein, Ilan; Weber, Danielle E; Goodman, Abbie; Wu, Danny T Y; Sartori, Daniel J; Santen, Sally A; Gruppen, Larry; Aphinyanaphongs, Yindalon; Burk-Rafel, Jesse
BACKGROUND:Clinical reasoning (CR) is an essential skill; yet, physicians often receive limited feedback. Artificial intelligence holds promise to fill this gap. OBJECTIVE:We report the development of named entity recognition (NER), logic-based and large language model (LLM)-based assessments of CR documentation in the electronic health record across 2 institutions (New York University Grossman School of Medicine [NYU] and University of Cincinnati College of Medicine [UC]). METHODS:-scores for the NER, logic-based model and area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) for the LLMs. RESULTS:-scores 0.80, 0.74, and 0.80 for D0, D1, D2, respectively. The GatorTron LLM performed best for EA2 scores AUROC/AUPRC 0.75/ 0.69. CONCLUSIONS:This is the first multi-institutional study to apply LLMs for assessing CR documentation in the electronic health record. Such tools can enhance feedback on CR. Lessons learned by implementing these models at distinct institutions support the generalizability of this approach.
PMID: 40117575
ISSN: 1438-8871
CID: 5813782
Community Racial and Ethnic Representation Among Physicians in US Internal Medicine Residency Programs
Kim, Jung G; Lett, Elle; Boscardin, Christy K; Hauer, Karen E; Chen, Isabel L; Henderson, Mark C; Hogan, Sean O; Yamazaki, Kenji; Burk-Rafel, Jesse; Fancher, Tonya; Nguyen, Mytien; Holmboe, Eric S; McDade, William; Boatright, Dowin H
IMPORTANCE/UNASSIGNED:Increasing underrepresented in medicine (URIM) physicians among historically underserved communities helps reduce health disparities. The concordance of URIM physicians with their communities improves access to care, particularly for American Indian and Alaska Native, Black, and Hispanic or Latinx individuals. OBJECTIVES/UNASSIGNED:To explore county-level racial and ethnic representation of US internal medicine (IM) residents, examine racial and ethnic concordance between residents and their communities, and assess whether representation varies by presence of academic institutions or underserved settings. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:This retrospective cross-sectional study collected data from the Association of American Medical Colleges, Accreditation Council for Graduate Medical Education (ACGME), Area Health Resources Files, and US Department of Education data on ACGME-accredited US IM residency programs and their associated county populations. Self-reported racial and ethnic data from 2018 for 4848 residents in 393 IM programs in 205 counties were used. Data were analyzed between February 15 and September 20, 2024. EXPOSURE/UNASSIGNED:County-level presence for academic health centers (AHCs), minority-serving institutions (MSIs), health professional shortage areas (HPSAs), and rurality. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Main outcomes were representation quotients (RQs) or the ratio of the proportion of IM residents and their concordant county-level racial and ethnic populations. Quantile linear regression models on median representation were used to identify the association with URIM, Asian, and White residents by US Census division and county-level AHCs, MSIs, HPSAs, and rurality. RESULTS/UNASSIGNED:Among 4848 residents, 4 (0.08%) self-identified as American Indian or Alaskan Native, 1709 (35.3%) as Asian, 289 (6.0%) as Black, 211 (4.4%) as Hispanic or Latinx, 2 (0.04%) as Native Hawaiian or Other Pacific Islander, and 2633 (54.3%) as White. A total of 761 (15.7%) were classified as URIM. Among URIM groups, American Indian and Alaska Native (mean [SE] RQ, 0.00 [0.04]), Black (mean [SE] RQ, 0.09 [0.20]), Hispanic and Latinx (mean [SE] RQ, 0.00 [0.04]), and Native Hawaiian and other Pacific Islander (mean [SE] RQ, 0.00 [0.26]) residents were grossly underrepresented compared with their training sites' county-level representation. Fifty-one of 205 counties (24.8%) with IM programs had no URIM residents. Black and Hispanic or Latinx residents had higher representation in counties with more MSIs (mean [SD] RQ, 0.19 [0.24]; P = .04; mean [SD] RQ, 0.15 [0.04]; P < .001, respectively), and Hispanic or Latinx residents were less represented in counties with more AHCs (mean [SD] RQ, 0.00 [0.06]; P < .001). Asian residents had lower RQs in counties with more MSIs (mean [SD] RQ, 6.00 [0.65]; P < .001), and White residents had higher representation in counties with greater presence of AHCs (mean [SD] RQ, 0.77 [0.04]; P = .007). CONCLUSIONS AND RELEVANCE/UNASSIGNED:In this cross-sectional study, URIM IM residents remained underrepresented compared with their program's county populations. These findings should inform racial and ethnic diversity policies to address the continuing underrepresentation among graduate medical education physicians, which adversely impacts the care of historically underserved communities.
PMCID:11783195
PMID: 39883461
ISSN: 2574-3805
CID: 5781162
Characterizing Residents' Clinical Experiences-A Step Toward Precision Education
Burk-Rafel, Jesse; Drake, Carolyn B; Sartori, Daniel J
PMID: 39693075
ISSN: 2574-3805
CID: 5764502
A Theoretical Foundation to Inform the Implementation of Precision Education and Assessment
Drake, Carolyn B; Heery, Lauren M; Burk-Rafel, Jesse; Triola, Marc M; Sartori, Daniel J
Precision education (PE) uses personalized educational interventions to empower trainees and improve learning outcomes. While PE has the potential to represent a paradigm shift in medical education, a theoretical foundation to guide the effective implementation of PE strategies has not yet been described. Here, the authors introduce a theoretical foundation for the implementation of PE, integrating key learning theories with the digital tools that allow them to be operationalized. Specifically, the authors describe how the master adaptive learner (MAL) model, transformative learning theory, and self-determination theory can be harnessed in conjunction with nudge strategies and audit and feedback dashboards to drive learning and meaningful behavior change. The authors also provide practical examples of these theories and tools in action by describing precision interventions already in use at one academic medical center, concretizing PE's potential in the current clinical environment. These examples illustrate how a firm theoretical grounding allows educators to most effectively tailor PE interventions to fit individual learners' needs and goals, facilitating efficient learning and, ultimately, improving patient and health system outcomes.
PMID: 38113440
ISSN: 1938-808x
CID: 5612362
Foreword: The Next Era of Assessment and Precision Education
Schumacher, Daniel J; Santen, Sally A; Pugh, Carla M; Burk-Rafel, Jesse
PMID: 38109655
ISSN: 1938-808x
CID: 5612462