Try a new search

Format these results:

Searched for:

in-biosketch:true

person:burkrj01

Total Results:

54


Large Language Model-Based Identification of Acute Coronary Syndrome Management Delays

Schaye, Verity; Rajput, Bijal; Signoriello, Lexi; Burk-Rafel, Jesse; Guzman, Benedict; Webster, Tyler; Sartori, Daniel J
Acute coronary syndrome (ACS) requires prompt treatment, yet management delays are difficult to identify. In this study, we developed a large language model (LLM) system to identify ACS management delays and characterized delay cases. Admissions to internal medicine residents at NYU July 2022-June 2025 (n=4,642) were included. Prompts were validated to determine if the resident admission note documented initiation of ACS management and if the initial cardiology consult note documented initiation of ACS management (ground truth) (n=161 for each). Discordant cases were reviewed by three physicians using a validated tool to confirm management delays. Demographics and key clinical findings of patients with and without delays were compared. The LLM identified management delays with a 52% positive predictive value (n=35/67). Patients who were older, females, and with preferred language other than English or Spanish were more likely to have a management delay (73.4 ± 15.3 vs 68.5 ± 12.6 years-old, p=0.036, 56.8% vs 34% females, p=0.014, and 27.0% vs 15.5% other preferred language, p=0.046, in management delay vs non-management delay cases). The management delay group had longer average time in hours to receiving heparin, aspirin, and cardiac catheterization (56.91 ± 56.78 vs 18.97 ± 13.76, p<0.001, 13.94 ± 16.64 vs 8.23 ± 9.82, p=0.005, and 65.12 ± 51.65 vs 39.51 ± 44.19, p=0.006, respectively in management delay vs non-management delay cases). In conclusion, the LLM-based system we developed to identify ACS management delays can detect cases at scale to inform individual and systems-level interventions to improve quality of ACS care.
PMID: 42259441
ISSN: 1879-1913
CID: 6048182

Leveraging a Large Language Model to Generate Quality Improvement Feedback for Clinical Notes

Kim, Christopher J; Gelfinbein, Joseph; Gencerliler, Nihan; Jahan, Nusrat; Udaikumar, Jahnavi; Heery, Lauren M; Goodman, Adam; Ng, Sarah; Attard, Joel; Asha, Sharmin; Burk-Rafel, Jesse; Guzman, Benedict Vincent; Hochman, Katherine A; Testa, Paul; Feldman, Jonah
BACKGROUND:Poor documentation quality can significantly affect healthcare operations, but the feedback process for clinicians to improve clinical notes is time-consuming and often insufficient. Large language models (LLMs) such as Generative Pre-trained Transformer 4 (GPT-4) have the potential to streamline this process. OBJECTIVES/OBJECTIVE:To determine whether an LLM can generate feedback to improve the medical contingency and discharge planning (MCDP) component of clinical documentation that is non-inferior to feedback by physicians. METHODS:A cross-sectional study of GPT-4 feedback and physician feedback on inpatient progress notes was conducted. A random sample of 64 inpatient progress notes identified by the validated AI Audit Tool as having a low likelihood of containing MCDP was included from adult general medicine patients hospitalized at New York University Langone Health (NYULH) in December 2023. Both GPT-4 model and attending physicians generated feedback on these inpatient progress notes. A/B testing was then conducted on the measures of understandability, usefulness, acceptability, and impartiality. Evaluations employed 5-point Likert scales that were converted to 10-point bidirectional interval scales for interpretability, ranging from -10 (human suggestions significantly better) to +10 (GPT-4 suggestions significantly better), with a non-inferiority threshold set to -1 for the primary endpoint. RESULTS:64 inpatient progress notes were included, representing 55% female patients with a median age of 73. GPT-4 feedback was non-inferior to physician feedback in all measures: understandability (mean 1.27, 95% CI 0.73 to 1.8, P < 0.001), usefulness (mean 2.09, 95% CI 1.27 to 2.91, P < 0.001), acceptability (mean 2.07, 95% CI 1.33 to 2.81, P < 0.001), and impartiality (mean -0.20, 95% CI -0.52 to 0.12, P < 0.001). CONCLUSIONS:This study shows that an LLM can be leveraged to generate note quality feedback that is non-inferior to expert clinician feedback.
PMID: 41985489
ISSN: 1869-0327
CID: 6027922

Sex, Race, and Ethnicity Differences Among Residents With Exceptionally High Graduate Medical Education Ratings

Kim, Jung G; Hauer, Karen E; Boscardin, Christy K; Su, Jasmine I-Shin; Holmboe, Eric S; Konopasek, Lyuba; Chen, Isabel L; Gonzalez, Cristina M; Ogedegbe, Gbenga G; Burk-Rafel, Jesse; Nguyen, Mytien; Andrews, John S; Henderson, David D; Richardson, Judee; McDade, William; Boatright, Dowin
IMPORTANCE/UNASSIGNED:Limited research exists on sex, racial, and ethnic disparities in required graduate medical education (GME) resident competency ratings across specialties during sensitive periods when career decision-making occurs. Rating disparities using an antideficit-based approach measured by exceptionally high ratings are underexplored in GME. OBJECTIVE/UNASSIGNED:To assess the association of exceptionally high ratings in the Accreditation Council for Graduate Medical Education (ACGME) Milestones during time-sensitive training periods across specialties with differences among residents' characteristics, including sex, race, and ethnicity. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:This cross-sectional analysis was conducted between March 15 and December 31, 2025, using 2018 to 2021 Association of American Medical Colleges and ACGME data. Postgraduate year (PGY) 2 residents training at US ACGME-accredited emergency medicine, family medicine, internal medicine, obstetrics and gynecology, pediatrics, and surgery residency programs between 2018 and 2021 who self-reported sex, race, or ethnicity were studied. EXPOSURE/UNASSIGNED:Required Milestones ratings at the end of PGY-2 training associated with resident sex and race or ethnicity (underrepresented in medicine [URiM] and Asian), while controlling for preresidency Step 2 Clinical Knowledge examination scores. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Proportion and adjusted odds ratios (AORs) for exceptionally high resident-level ratings (80th percentile level) across competencies in interpersonal and communication skills, medical knowledge, patient care, practice-based learning and improvement, professionalism, and systems-based practice. RESULTS/UNASSIGNED:Among 19 492 PGY-2 residents across 1754 programs, 10 384 (53.3%) were female, 28 (0.14%) American Indian or Alaskan Native, 4327 (22.2%) Asian, 1106 (5.7%) Black, 1008 (5.2%) Hispanic or Latinx, 3 (0.02%) Native Hawaiian or Pacific Islander, 12 269 (62.9%) White, 751 (3.9%) reporting 2 or more races, and 3423 (17.6%) classified as URiM. Exceptional rating differences were identified by sex, race, and ethnicity. Across all specialties, female residents had greater odds for 80th percentile ratings (AOR, 1.12; 95% CI, 1.05-1.21; P < .001); whereas when compared with White residents, URiM residents (AOR, 0.68; 95% CI, 0.62-0.76; P < .001) and Asian residents (AOR, 0.67; 95% CI, 0.60-0.74; P < .001) were less likely to have 80th percentile ratings than White residents. Within specialties, URiM residents in emergency medicine, family medicine, internal medicine, obstetrics and gynecology, and surgery were less likely to have 80th percentile ratings, whereas Asian residents in family medicine, internal medicine, pediatrics, and surgery were also less likely than White residents. CONCLUSION AND RELEVANCE/UNASSIGNED:In this cross-sectional national study of residents, exceptionally higher ratings were associated with differing resident characteristics during crucial career planning phases. These results suggest the need for more studies to explore factors of resident success during GME training.
PMCID:13036576
PMID: 41910971
ISSN: 2574-3805
CID: 6021292

Large language model-based identification of venous thromboembolism diagnostic delays

Schaye, Verity; Sartori, Daniel J; Signoriello, Lexi; Malhotra, Kiran; Guzman, Benedict; Rajput, Bijal; Reinstein, Ilan; Burk-Rafel, Jesse
BACKGROUND:Delayed diagnosis of venous thromboembolism (VTE) is prevalent among hospitalized patients, yet case identification is challenging and feedback limited. OBJECTIVE:To develop a large language model (LLM)-based electronic-trigger to identify VTE diagnostic delays. METHODS:All admissions to internal medicine (IM) residents at NYU Langone Health between January 2022 and December 2023 (n = 20,843) were included. Using an open-source LLM, prompts were validated to detect (1) residents considering VTE in admission notes and (2) VTE confirmation in five types of imaging reports (n = 100 for each prompt validation set). The validated prompts were applied to determine discordance between admission note differential omitting VTE and imaging report confirming VTE. Two hospitalists reviewed discordant cases using a validated tool to identify diagnostic delays. Hospitalizations were labeled as diagnostic delays, in-hospital complication, or false-positive. Based on in-hospital complication and false-positive patterns, exclusion criteria were implemented. Positive predictive value (PPV) and negative predictive value (NPV) were calculated. RESULTS:The LLM prompts correctly classified admission notes and VTE imaging studies with high accuracy (range 98%-100%, n = 699 VTE cases identified). Of the 137 diagnostic delays the LLM-based electronic-trigger identified, 31 were true-positives, 60 in-hospital complications, and 46 false-positives. 4.4% of all VTE hospitalizations had a diagnostic delay. With the exclusion criteria, the PPV was 48% (95% confidence interval [CI], 35%-62%) and NPV was 95% (95% CI, 87%-98%). CONCLUSIONS:We developed the first LLM-based electronic-trigger to identify VTE diagnostic delays, with higher performance than existing non-LLM electronic-triggers. LLM-based approaches can facilitate diagnostic performance feedback and are scalable to other conditions and institutions.
PMID: 41058083
ISSN: 1553-5606
CID: 5951832

The first step in visual diagnosis: a study of novices developing the ability to distinguish normal from abnormal cases

Oh, So-Young; Burk-Rafel, J; Reinstein, I; Hatala, R; Van Gerven, P W M; Smeenk, F W J M; Pusic, M V
PMID: 41427977
ISSN: 1573-1677
CID: 6035782

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

Small, William R.; Austrian, Jonathan; O\Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A.; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J.; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
ISI:001551557000002
ISSN: 2574-3805
CID: 5974192

The impact of shifting hospitalist switch days from Monday to Tuesday

Nguyen, Larry; Messing, Lauren; Hochman, Katherine A; QuiƱones-Camacho, Adriana; Burk-Rafel, Jesse; Verplanke, Benjamin
There is limited data on which hospitalist switch day is optimal for hospital operations and throughput. A quality improvement intervention was implemented, changing the hospitalist switch day from Monday to Tuesday. Retrospective observational analysis revealed an increase in Monday discharges (1.3%, p = .01), a decrease in Tuesday discharges (-1.6%, p < .005), and a significant reduction in 30-day unplanned readmission rates (-1.5%, p = .003), with no significant changes in the average length of stay. Additional studies are needed to further verify these findings in different hospital settings and to consider other switch day patterns.
PMID: 41186934
ISSN: 1553-5606
CID: 5959692

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

Small, William R; Austrian, Jonathan; O'Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
IMPORTANCE/UNASSIGNED:Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown. OBJECTIVES/UNASSIGNED:To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health. EXPOSURES/UNASSIGNED:Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales). RESULTS/UNASSIGNED:Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46). CONCLUSIONS AND RELEVANCE/UNASSIGNED:Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.
PMID: 40802185
ISSN: 2574-3805
CID: 5906762

Macy Foundation Innovation Report Part II: From Hype to Reality: Innovators' Visions for Navigating AI Integration Challenges in Medical Education

Gin, Brian C; LaForge, Kate; Burk-Rafel, Jesse; Boscardin, Christy K
PURPOSE/OBJECTIVE:Artificial intelligence (AI) promises to significantly impact medical education, yet its implementation raises important questions about educational effectiveness, ethical use, and equity. In the second part of a 2-part innovation report, which was commissioned by the Josiah Macy Jr. Foundation to inform discussions at a conference on AI in medical education, the authors explore the perspectives of innovators actively integrating AI into medical education, examining their perceptions regarding the impacts, opportunities, challenges, and strategies for successful AI adoption and risk mitigation. METHOD/METHODS:Semi-structured interviews were conducted with 25 medical education AI innovators-including learners, educators, institutional leaders, and industry representatives-from June to August 2024. Interviews explored participants' perceptions of AI's influence on medical education, challenges to integration, and strategies for mitigating challenges. Transcripts were analyzed using thematic analysis to identify themes and synthesize participants' recommendations for AI integration. RESULTS:Innovators' responses were synthesized into 2 main thematic areas: (1) AI's impact on teaching, learning, and assessment, and (2) perceived threats and strategies for mitigating them. Participants identified AI's potential to enact precision education through virtual tutors and standardized patients, support active learning formats, enable centralized teaching, and facilitate cognitive offloading. AI-enhanced assessments could automate grading, predict learner trajectories, and integrate performance data from clinical interactions. Yet, innovators expressed concerns over threats to transparency and validity, potential propagation of biases, risks of over-reliance and deskilling, and institutional disparities. Proposed mitigation strategies emphasized validating AI outputs, establishing foundational competencies, fostering collaboration and open-source sharing, enhancing AI literacy, and maintaining robust ethical standards. CONCLUSIONS:AI innovators in medical education envision transformative opportunities for individualized learning and precision education, balanced against critical threats. Realizing these benefits requires proactive, collaborative efforts to establish rigorous validation frameworks; uphold foundational medical competencies; and prioritize ethical, equitable AI integration.
PMID: 40479503
ISSN: 1938-808x
CID: 5862832

Large Language Model-Augmented Strategic Analysis of Innovation Projects in Graduate Medical Education

Winkel, Abigail Ford; Burk-Rafel, Jesse; Terhune, Kyla; Garibaldi, Brian T; DeWaters, Ami L; Co, John Patrick T; Andrews, John S
PMCID:12080501
PMID: 40386486
ISSN: 1949-8357
CID: 5852792