Searched for: in-biosketch:true
person:pat218
Continuous learning and improvement cycles to improve first contact provider assignments at a large academic health system
Will, John; Kothari, Ulka; Blecker, Saul B; Roncoli, Thomas; Moeller, Ben; Testa, Paul; Feldman, Jonah
BACKGROUND:Communication failures are a leading cause of sentinel events in U.S. healthcare, often due to unclear provider contact identification. The electronic health record (EHR) system offers a solution by enabling the discrete assignment of a first contact provider (FCP), who oversees and coordinates patient care. However, adoption of this practice is inconsistent across many hospital settings. This study describes the impact of continuous learning and improvement cycles to address this challenge. METHODS:Following the Plan-Do-Study-Act (PDSA) lifecycle, we completed five quality improvement cycles. Each PDSA cycle included a technological intervention accompanied by evolving operational expectations for clinical staff. We evaluated improvement after each PDSA by measuring the percent of a hospitalized patient's time with an assigned FCP. RESULTS:FCP coverage significantly improved from a baseline average of 5.1% to 59.0% after PDSA Cycle 1 (p < 0.001), 67.4% after Cycle 2 (p < 0.001), 79.7% after Cycle 3 (p < 0.001), 87.5% after Cycle 4 (p < 0.001), and 99.4% after Cycle 5 (p < 0.001). CONCLUSION/CONCLUSIONS:Having a reliable FCP at any point during a patient's hospital admission is an important safety practice. Continuous learning and improvement cycles, driven by a strong partnership between technology and operations, led to significant and sustained improvements in FCP assignments.
PMID: 42161113
ISSN: 1872-8243
CID: 6038302
Design and Rationale of the Radial Access Insertion Sheath Evaluation via the Electronic Medical Record (RAISE-EMR) Study
Shah, Binita; Lerner, Johanna; Martin, Jacob; Patrick Crane, B; Andrade, Orwell; Li, Xiyue; Xia, Yuhe; Yu, Chang; Testa, Paul A; Rao, Sunil V; ,
BACKGROUND:Decisions involving the purchase of procedural equipment at the health system level require balancing efficacy, safety, physician preference, and cost. The application of efficient and low-cost pragmatic study designs has the potential to rapidly generate data to inform health system operations. METHODS:The aim of the pragmatic RAISE-EMR study is to determine physician preference between two commercially available radial artery introducer sheaths, one of which has a higher acquisition cost, to guide inventory selection in the hospital system's catheterization laboratories. Patients undergoing coronary angiography using 6-French radial artery access were prospectively identified and randomized through the health system's electronic medical record (EMR). Among 1696 eligible unique patients, 554 patients (32.7%) were randomized over 37 days across three hospitals. Randomization took place through the EMR after the attending interventional cardiologist signed a mandated pre-procedure note. The study was deemed non-human subject research and approved by the NYU Langone Health Quality Improvement Oversight Committee. The primary endpoint, a physician satisfaction score, will be ascertained by a mandated semi-quantitative survey within the electronic procedure note. All data, including co-variables and clinical outcomes, will be ascertained using structured data within the EMR. CONCLUSIONS:The RAISE-EMR study is designed to determine physician preference of two commercially available radial artery introducer sheaths and potentially reduce supply costs using an entirely EMR-based randomized study design. Pragmatic study designs leveraging structured data within an EMR can be used to rapidly provide data to inform operational decision-making and have implications for the future of evidence generation.
PMID: 42106091
ISSN: 1097-6744
CID: 6031762
Leveraging a Large Language Model to Generate Quality Improvement Feedback for Clinical Notes
Kim, Christopher J; Gelfinbein, Joseph; Gencerliler, Nihan; Jahan, Nusrat; Udaikumar, Jahnavi; Heery, Lauren M; Goodman, Adam; Ng, Sarah; Attard, Joel; Asha, Sharmin; Burk-Rafel, Jesse; Guzman, Benedict Vincent; Hochman, Katherine A; Testa, Paul; Feldman, Jonah
BACKGROUND:Poor documentation quality can significantly affect healthcare operations, but the feedback process for clinicians to improve clinical notes is time-consuming and often insufficient. Large language models (LLMs) such as Generative Pre-trained Transformer 4 (GPT-4) have the potential to streamline this process. OBJECTIVES/OBJECTIVE:To determine whether an LLM can generate feedback to improve the medical contingency and discharge planning (MCDP) component of clinical documentation that is non-inferior to feedback by physicians. METHODS:A cross-sectional study of GPT-4 feedback and physician feedback on inpatient progress notes was conducted. A random sample of 64 inpatient progress notes identified by the validated AI Audit Tool as having a low likelihood of containing MCDP was included from adult general medicine patients hospitalized at New York University Langone Health (NYULH) in December 2023. Both GPT-4 model and attending physicians generated feedback on these inpatient progress notes. A/B testing was then conducted on the measures of understandability, usefulness, acceptability, and impartiality. Evaluations employed 5-point Likert scales that were converted to 10-point bidirectional interval scales for interpretability, ranging from -10 (human suggestions significantly better) to +10 (GPT-4 suggestions significantly better), with a non-inferiority threshold set to -1 for the primary endpoint. RESULTS:64 inpatient progress notes were included, representing 55% female patients with a median age of 73. GPT-4 feedback was non-inferior to physician feedback in all measures: understandability (mean 1.27, 95% CI 0.73 to 1.8, P < 0.001), usefulness (mean 2.09, 95% CI 1.27 to 2.91, P < 0.001), acceptability (mean 2.07, 95% CI 1.33 to 2.81, P < 0.001), and impartiality (mean -0.20, 95% CI -0.52 to 0.12, P < 0.001). CONCLUSIONS:This study shows that an LLM can be leveraged to generate note quality feedback that is non-inferior to expert clinician feedback.
PMID: 41985489
ISSN: 1869-0327
CID: 6027922
Accurate, fair, and generalisable scaling of injury severity score-based AI with demographics in terms of mortality in patients with trauma: multi-centre, multi-national retrospective cohort study
Choi, Yunjeong; Seok, Junepill; Young-Chul Oh, Thomas; Hsu, Jeremy; Kim, Do Wan; Yu, Byungchul; Cho, Jayun; Jang, Woocheol; Kim, Jina; Oh, Na-Eun; Ahn, Jehyeuk; Femia, Robert J; Testa, Paul A; Yon, Dong Keon; Sodickson, Daniel K; Kang, Wu Seong; Lee, Jinseok
BACKGROUND:Accurate and equitable prediction of trauma-related in-hospital mortality is critical for guiding clinical decisions and optimising trauma care resources. Traditional severity scoring systems like the Injury Severity Score (ISS) do not account for demographic factors, potentially limiting their fairness and generalisability across diverse populations. METHODS:We developed and externally validated an artificial intelligence (AI) model based on ISS and integrated demographic features (age and sex) to predict in-hospital mortality after trauma. Data from the Korean Trauma Data Bank were used for model development and internal validation, comprising 121,418 patients with trauma aged ≥15 years treated at 19 trauma centres in South Korea (2017-2022). External validation was performed on an independent cohort of 7458 patients from five trauma centres (four in South Korea and one in Australia, 2022-2024). The primary outcome was trauma-related in-hospital mortality. Predictive performance was assessed using area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, accuracy, and balanced accuracy. Fairness was evaluated by comparing AUROC differences across age (<65 vs ≥65 years) and sex (female vs male) subgroups. FINDINGS/RESULTS:The ISS-based AI model incorporating age and sex achieved high predictive performance (internal validation AUROC, 0.934; external validation AUROC range, 0.901-0.920), outperforming conventional ISS-based methods. The model also demonstrated improved fairness, showing reduced AUROC differences across subgroups (age: 0.068 vs 0.091; sex: 0.021 vs 0.046 for AI model vs ISS, respectively). INTERPRETATION/CONCLUSIONS:Scaling an ISS-based AI model through demographic integration yielded accurate, fair, and generalisable predictions of trauma-related in-hospital mortality. This approach may enhance trauma care decision-making and enable more equitable resource allocation across diverse clinical settings. FUNDING/BACKGROUND:This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2025-RS-2024-00438239) and the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (RS-2024-00509257, Global AI Frontier Lab). In addition, this research was supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (RS-2025-02220492).
PMCID:13000556
PMID: 41830825
ISSN: 2352-3964
CID: 6016242
People, process, technology: a framework for clinical informatics fellowship applicants to evaluate programs
Silberlust, Jared; Solanki, Priyanka; Austrian, Jonathan; Testa, Paul; Genes, Nicholas
OBJECTIVES/UNASSIGNED:To propose a structured framework for evaluating and comparing clinical informatics fellowship programs using the People, Process, and Technology (PPT) model. MATERIALS AND METHODS/UNASSIGNED:We adapted Leavitt's organizational theory to create a three-pillar framework operationalized with features relevant to fellowship applicants and directors. We then applied this framework to a random sample of 18 program websites. RESULTS/UNASSIGNED:The PPT framework categorizes key fellowship characteristics into People (eg, mentorship, co-fellows, diversity), Process (eg, clinical duties, research emphasis, education), and Technology (eg, EHR systems, technical training, remote work). A visual grid illustrates variation in operational versus research focus and levels of mentorship. Website analysis revealed inconsistent transparency and detail. DISCUSSION/UNASSIGNED:The PPT framework provides a systematic, accessible approach for applicants to assess fellowship fit and for programs to communicate their strengths. CONCLUSION/UNASSIGNED:Standardizing fellowship descriptions using the PPT model may improve alignment between applicant goals and program offerings, enhancing both the application process and training experience.
PMCID:12831926
PMID: 41589219
ISSN: 2574-2531
CID: 6003142
Prehospital real-time AI for trauma mortality prediction: a multi-institutional and multi-national validation study
Oh, Na-Eun; Oh, Thomas Young-Chul; Hsu, Jeremy; Kim, Do Wan; Yu, Byungchul; Cho, Jayun; Seok, Junepill; Lee, Jin Young; Jang, Woocheol; Kim, Jina; Femia, Robert J; Testa, Paul A; Yon, Dong Keon; Sodickson, Daniel K; Kang, Wu Seong; Lee, Jinseok
Early identification of high-risk trauma patients in the prehospital setting is crucial for optimizing resource allocation and improving survival. We developed and externally validated a real-time AI model predicting emergency room mortality using 21 prehospital variables. Model development and internal validation utilized the Korean Trauma Data Bank (KTDB; 204,189 patients), and external validation included four South Korean trauma centers (8,358 patients) and one Australian Level 1 center (3,578 patients). Our Prehospital-AI model, an ensemble of XGBoost, LightGBM, and random forest, achieved an AUROC of 0.923 (sensitivity: 0.780, specificity: 0.880) on the test set, outperforming the shock index (AUROC: 0.712). External validation yielded AUROCs of 0.925-0.956 across South Korean centers and 0.895 in the Australian center. Here we show that the Prehospital-AI model enables accurate, real-time risk assessment in the prehospital setting, outperforming traditional triage tools and improving trauma system efficiency. Nonetheless, additional multinational studies are warranted to further evaluate its generalizability across diverse trauma care systems.
PMID: 41501064
ISSN: 2041-1723
CID: 5981072
Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model
Small, William R.; Austrian, Jonathan; O\Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A.; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J.; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
ISI:001551557000002
ISSN: 2574-3805
CID: 5974192
Leveraging Machine Learning and Robotic Process Automation to Identify and Convert Unstructured Colonoscopy Results Into Actionable Data: Proof-of-Concept Study
Stevens, Elizabeth R; Hartman, Jager; Testa, Paul; Mansukhani, Ajay; Monina, Casey; Shunk, Amelia; Ranson, David; Imberg, Yana; Cote, Ann; Prabhu, Dinesha; Szerencsy, Adam
BACKGROUND/UNASSIGNED:With rising patient volumes and a focus on quality, our health system had the objective to create a more efficient way to ensure accurate documentation of colorectal cancer (CRC) screening intervals from inbound colonoscopy reports to ensure timely follow-up. We developed an integrated end-to-end workflow solution using machine learning (ML) and robotic process automation (RPA) to extract and update electronic health record (EHR) follow-up dates from unstructured data. OBJECTIVE/UNASSIGNED:This study aimed to automate data extraction from external, free-text colonoscopy reports to identify and document recommended follow-up dates for CRC screening in structured EHR fields. METHODS/UNASSIGNED:As proof of concept, we outline the process development, validity, and implementation of an approach that integrates available tools to automate data retrieval and entry within the EHR of a large academic health system. The health system uses Epic Systems as its EHR platform, and the ML model used was trained on health system patient colonoscopy reports. This proof-of-concept process study consisted of six stages: (1) identification of gaps in documenting recommendations for follow-up CRC screening from external colonoscopy reports, (2) defining process objectives, (3) identification of technologies, (4) creation of process architecture, (5) process validation, and (6) health system-wide implementation. A chart review was performed to validate process outcomes and estimate impact. RESULTS/UNASSIGNED:We developed an automated process with 3 primary steps leveraging ML and RPA to create a fully orchestrated workflow to update CRC screening recall dates based on colonoscopy reports received from external sources. Process validity was assessed with 690 scanned colonoscopy reports. During process validation, the overall automated process achieved an accuracy of 80.7% (557/690, 95% CI 77.8%-83.7%) for correctly identifying the presence or absence of a valid follow-up date and a follow-up date false negative identification rate of 32.9% (130/395, 95% CI 29.4%-36.4%). From the organization-wide implementation to go-live until December 31, 2024, the system processed 16,563 external colonoscopy reports. Of these, 35.3% (5841/16,563) had a follow-up date meeting the relevant ML model threshold and thus were identified as ready for RPA processing. CONCLUSIONS/UNASSIGNED:Implementation of an automated workflow to extract and update CRC screening follow-up dates from colonoscopy reports is feasible and has the potential to improve accuracy in patient recall while reducing documentation burden. By standardizing data ingestion, extending this approach to various unstructured data types can address deficiencies in structured EHR documentation and solve for a lack of data integration and reporting for quality measures. Automated workflows leveraging ML and RPA offer practical solutions to overcome interoperability challenges and the use of unstructured data within health care systems.
PMCID:12634012
PMID: 41264858
ISSN: 2291-9694
CID: 5969362
Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model
Small, William R; Austrian, Jonathan; O'Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
IMPORTANCE/UNASSIGNED:Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown. OBJECTIVES/UNASSIGNED:To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health. EXPOSURES/UNASSIGNED:Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales). RESULTS/UNASSIGNED:Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46). CONCLUSIONS AND RELEVANCE/UNASSIGNED:Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.
PMID: 40802185
ISSN: 2574-3805
CID: 5906762
Disappearing Text as a Clinical Decision Support Layer: A Case Series
Silberlust, Jared; Small, William; Shah, Darshi; Chakravartty, Eesha; Moawad, Katherine; Moawad, Andrew; Testa, Paul; Feldman, Jonah
OBJECTIVES/OBJECTIVE:This case series aims to evaluate several applications of inline disappearing text (DT) clinical decision support (CDS) tools within clinician documentation. METHODS:DT blocks were created to prompt documentation for perioperative anticoagulation planning (Scenario 1), pre-discharge intravenous antibiotic planning (Scenario 2), and advanced care planning (Scenario 3). In Scenario 1, DT was the only intervention. In Scenario 2, DT was paired with a documentation SmartList. In Scenario 3, DT was paired with a documentation SmartList and an OurPractice Advisory. The number of documented perioperative anticoagulation plans, pre-discharge intravenous antibiotic plans, and Advanced Care Planning notes were measured pre- and post-intervention and compared using Chi-square analyses. RESULTS:In Scenario 1, there was no statistically significant change in the percentage of perioperative anticoagulation plans documented at 0-24 and 24-48 hours before surgery. In Scenario 2, documentation of antibiotic contingency planning in patients expected to be discharged within 24 hours increased from 60% (54 of 90 notes) to 93% (1,850 of 1,994 notes) X2 (1, N=2,084) = 113.1, p < 0.001. In Scenario 3, ACP note documentation by discharge in patients with a positive mandatory surprise question increased from 43% (821 of 1,909 encounters) to 52% (975 of 1,874 encounters) X2 (1, N=3,783) = 30.5, p < 0.001. CONCLUSIONS:Utilizing DT in conjunction with other forms of CDS was associated with an improvement of documentation quality in pre-discharge IV antibiotics and advanced care planning. A sociotechnical analysis explores how interactions between technology, people, workflow, and culture could contextualize how utilizing DT with other forms of CDS was more effective than DT alone.
PMID: 40763805
ISSN: 1869-0327
CID: 5905032