Searched for: person:martij102
in-biosketch:true
Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model
Small, William R; Austrian, Jonathan; O'Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
IMPORTANCE/UNASSIGNED:Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown. OBJECTIVES/UNASSIGNED:To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health. EXPOSURES/UNASSIGNED:Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales). RESULTS/UNASSIGNED:Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46). CONCLUSIONS AND RELEVANCE/UNASSIGNED:Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.
PMID: 40802185
ISSN: 2574-3805
CID: 5906762
Real-World Clinical Impact of High-Sensitivity Troponin for Chest Pain Evaluation in the Emergency Department
Martin, Jacob A; Zhang, Robert S; Rhee, Aaron J; Saxena, Archana; Akindutire, Olumide; Maqsood, M Haisum; Genes, Nicholas; Gollogly, Nathan; Smilowitz, Nathaniel R; Quinones-Camacho, Adriana
BACKGROUND:High-sensitivity cardiac troponin (hs-cTnI) assays can quantify troponin concentrations with low limits of detection, potentially expediting and enhancing myocardial infarction diagnoses. This study investigates the real-world impact of hs-cTnI implementation on operational metrics and downstream cardiac services in patients presenting to the emergency department with chest pain. METHODS AND RESULTS/RESULTS:[lt] 0.001) during the index encounter. CONCLUSION/CONCLUSIONS:Implementation of the hs-cTnI assay was associated with reduced hospital admissions, shorter length of stay, and decreases in most downstream cardiac testing.
PMID: 40240953
ISSN: 2047-9980
CID: 5828482
Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model
Small, William R.; Austrian, Jonathan; O\Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A.; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J.; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
ISI:001551557000002
ISSN: 2574-3805
CID: 5974192
Performance of a Protein Language Model for Variant Annotation in Cardiac Disease
Hochstadt, Aviram; Barbhaiya, Chirag; Aizer, Anthony; Bernstein, Scott; Cerrone, Marina; Garber, Leonid; Holmes, Douglas; Knotts, Robert J; Kushnir, Alex; Martin, Jacob; Park, David; Spinelli, Michael; Yang, Felix; Chinitz, Larry A; Jankelson, Lior
BACKGROUND:Genetic testing is a cornerstone in the assessment of many cardiac diseases. However, variants are frequently classified as variants of unknown significance, limiting the utility of testing. Recently, the DeepMind group (Google) developed AlphaMissense, a unique artificial intelligence-based model, based on language model principles, for the prediction of missense variant pathogenicity. We aimed to report on the performance of AlphaMissense, accessed by VarCardio, an open web-based variant annotation engine, in a real-world cardiovascular genetics center. METHODS AND RESULTS/RESULTS:<0.001). Genotype-phenotype concordance was highly aligned using VarCard.io predictions, at 95.9% (95% CI, 92.8-97.9) concordance rate. For 109 variants classified as pathogenic, likely pathogenic, benign, or likely benign by ClinVar, concordance with VarCard.io was high (90.5%). CONCLUSIONS:AlphaMissense, accessed via VarCard.io, may be a highly efficient tool for cardiac genetic variant interpretation. The engine's notable performance in assessing variants that are classified as variants of unknown significance in ClinVar demonstrates its potential to enhance cardiac genetic testing.
PMID: 39392163
ISSN: 2047-9980
CID: 5706292
Development and validation of a prediction model for actionable aspects of frailty in the text of clinicians' encounter notes
Martin, Jacob A; Crane-Droesch, Andrew; Lapite, Folasade C; Puhl, Joseph C; Kmiec, Tyler E; Silvestri, Jasmine A; Ungar, Lyle H; Kinosian, Bruce P; Himes, Blanca E; Hubbard, Rebecca A; Diamond, Joshua M; Ahya, Vivek; Sims, Michael W; Halpern, Scott D; Weissman, Gary E
OBJECTIVE:Frailty is a prevalent risk factor for adverse outcomes among patients with chronic lung disease. However, identifying frail patients who may benefit from interventions is challenging using standard data sources. We therefore sought to identify phrases in clinical notes in the electronic health record (EHR) that describe actionable frailty syndromes. MATERIALS AND METHODS:We used an active learning strategy to select notes from the EHR and annotated each sentence for 4 actionable aspects of frailty: respiratory impairment, musculoskeletal problems, fall risk, and nutritional deficiencies. We compared the performance of regression, tree-based, and neural network models to predict the labels for each sentence. We evaluated performance with the scaled Brier score (SBS), where 1 is perfect and 0 is uninformative, and the positive predictive value (PPV). RESULTS:We manually annotated 155 952 sentences from 326 patients. Elastic net regression had the best performance across all 4 frailty aspects (SBS 0.52, 95% confidence interval [CI] 0.49-0.54) followed by random forests (SBS 0.49, 95% CI 0.47-0.51), and multi-task neural networks (SBS 0.39, 95% CI 0.37-0.42). For the elastic net model, the PPV for identifying the presence of respiratory impairment was 54.8% (95% CI 53.3%-56.6%) at a sensitivity of 80%. DISCUSSION:Classification models using EHR notes can effectively identify actionable aspects of frailty among patients living with chronic lung disease. Regression performed better than random forest and neural network models. CONCLUSIONS:NLP-based models offer promising support to population health management programs that seek to identify and refer community-dwelling patients with frailty for evidence-based interventions.
PMCID:8714261
PMID: 34791302
ISSN: 1527-974x
CID: 5743572