Generative AI Summaries to Facilitate ED Handoff
Genes, Nicholas; Simon, Gregory; Koziatek, Christian; Kim, Jung G; Woo, Kar-Mun; Dahn, Cassidy; Chan, Leland; Wiesenfeld, Batia
Background Emergency Department (ED) handoff to inpatient teams is a potential source of error. Generative Artificial Intelligence (AI) has shown promise in succinctly summarizing large quantities of clinical data and may help improve Emergency Department (ED) handoff. Objectives Our objectives were to: 1) evaluate the accuracy, clinical utility, and safety of AI-generated ED-to-inpatient handoff summaries; 2) identify patient and visit characteristics influencing summary effectiveness; and 3) characterize potential error patterns to inform implementation strategies. Methods This exploratory study evaluated AI-generated handoff summaries at an urban academic ED (February-April 2024). A HIPAA-compliant GPT-4 model generated summaries aligned with the IPASS framework; ED providers assessed summary accuracy, usefulness, and safety through on-shift surveys. Results Among 50 cases, median quality and usefulness scores were 4/5 (SE = 0.13). Safety concerns arose in 6% of cases, with issues including data omissions and mischaracterizations. Consultation status significantly affected usefulness scores (p < 0.05). Omissions of relevant medications, laboratory results, and other essential detailss were noted (n=6), and EM clinicians disagreed with some AI characterizations of patient stability, vitals and workup (n=8). The most common response was positive impressions of the technology incorporated into the handoff process (n=11). Conclusions This exploratory provider-in-the-loop model demonstrated clinical acceptability and highlighted areas for refinement. Future studies should incorporate recipient perspectives and examine clinical outcomes to scale and optimize AI implementation.
PMID: 40795949
ISSN: 1869-0327
CID: 5907202
Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings
Woo, Kar-Mun C; Simon, Gregory W; Akindutire, Olumide; Aphinyanaphongs, Yindalon; Austrian, Jonathan S; Kim, Jung G; Genes, Nicholas; Goldenring, Jacob A; Major, Vincent J; Pariente, Chloé S; Pineda, Edwin G; Kang, Stella K
OBJECTIVES/OBJECTIVE:To evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings. MATERIALS AND METHODS/METHODS:Radiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as "definitely actionable" (DA) or "possibly actionable-clinical correlation" (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale. RESULTS:For the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were "hallucinated" outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision. CONCLUSION/CONCLUSIONS:GPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via "human-in-the-loop" workflows remains critical for clinical implementation.
PMID: 38778578
ISSN: 1527-974x
CID: 5654832