Try a new search

Format these results:

Searched for:

in-biosketch:yes

person:aphiny01

Total Results:

104


Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format

Zaretsky, Jonah; Kim, Jeong Min; Baskharoun, Samuel; Zhao, Yunan; Austrian, Jonathan; Aphinyanaphongs, Yindalon; Gupta, Ravi; Blecker, Saul B; Feldman, Jonah
IMPORTANCE/UNASSIGNED:By law, patients have immediate access to discharge notes in their medical records. Technical language and abbreviations make notes difficult to read and understand for a typical patient. Large language models (LLMs [eg, GPT-4]) have the potential to transform these notes into patient-friendly language and format. OBJECTIVE/UNASSIGNED:To determine whether an LLM can transform discharge summaries into a format that is more readable and understandable. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:This cross-sectional study evaluated a sample of the discharge summaries of adult patients discharged from the General Internal Medicine service at NYU (New York University) Langone Health from June 1 to 30, 2023. Patients discharged as deceased were excluded. All discharge summaries were processed by the LLM between July 26 and August 5, 2023. INTERVENTIONS/UNASSIGNED:A secure Health Insurance Portability and Accountability Act-compliant platform, Microsoft Azure OpenAI, was used to transform these discharge summaries into a patient-friendly format between July 26 and August 5, 2023. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Outcomes included readability as measured by Flesch-Kincaid Grade Level and understandability using Patient Education Materials Assessment Tool (PEMAT) scores. Readability and understandability of the original discharge summaries were compared with the transformed, patient-friendly discharge summaries created through the LLM. As balancing metrics, accuracy and completeness of the patient-friendly version were measured. RESULTS/UNASSIGNED:Discharge summaries of 50 patients (31 female [62.0%] and 19 male [38.0%]) were included. The median patient age was 65.5 (IQR, 59.0-77.5) years. Mean (SD) Flesch-Kincaid Grade Level was significantly lower in the patient-friendly discharge summaries (6.2 [0.5] vs 11.0 [1.5]; P < .001). PEMAT understandability scores were significantly higher for patient-friendly discharge summaries (81% vs 13%; P < .001). Two physicians reviewed each patient-friendly discharge summary for accuracy on a 6-point scale, with 54 of 100 reviews (54.0%) giving the best possible rating of 6. Summaries were rated entirely complete in 56 reviews (56.0%). Eighteen reviews noted safety concerns, mostly involving omissions, but also several inaccurate statements (termed hallucinations). CONCLUSIONS AND RELEVANCE/UNASSIGNED:The findings of this cross-sectional study of 50 discharge summaries suggest that LLMs can be used to translate discharge summaries into patient-friendly language and formats that are significantly more readable and understandable than discharge summaries as they appear in electronic health records. However, implementation will require improvements in accuracy, completeness, and safety. Given the safety concerns, initial implementation will require physician review.
PMID: 38466307
ISSN: 2574-3805
CID: 5678332

QTNet: Predicting Drug-Induced QT Prolongation With Artificial Intelligence-Enabled Electrocardiograms

Zhang, Hao; Tarabanis, Constantine; Jethani, Neil; Goldstein, Mark; Smith, Silas; Chinitz, Larry; Ranganath, Rajesh; Aphinyanaphongs, Yindalon; Jankelson, Lior
BACKGROUND:Prediction of drug-induced long QT syndrome (diLQTS) is of critical importance given its association with torsades de pointes. There is no reliable method for the outpatient prediction of diLQTS. OBJECTIVES/OBJECTIVE:This study sought to evaluate the use of a convolutional neural network (CNN) applied to electrocardiograms (ECGs) to predict diLQTS in an outpatient population. METHODS:We identified all adult outpatients newly prescribed a QT-prolonging medication between January 1, 2003, and March 31, 2022, who had a 12-lead sinus ECG in the preceding 6 months. Using risk factor data and the ECG signal as inputs, the CNN QTNet was implemented in TensorFlow to predict diLQTS. RESULTS:Models were evaluated in a held-out test dataset of 44,386 patients (57% female) with a median age of 62 years. Compared with 3 other models relying on risk factors or ECG signal or baseline QTc alone, QTNet achieved the best (P < 0.001) performance with a mean area under the curve of 0.802 (95% CI: 0.786-0.818). In a survival analysis, QTNet also had the highest inverse probability of censorship-weighted area under the receiver-operating characteristic curve at day 2 (0.875; 95% CI: 0.848-0.904) and up to 6 months. In a subgroup analysis, QTNet performed best among males and patients ≤50 years or with baseline QTc <450 ms. In an external validation cohort of solely suburban outpatient practices, QTNet similarly maintained the highest predictive performance. CONCLUSIONS:An ECG-based CNN can accurately predict diLQTS in the outpatient setting while maintaining its predictive performance over time. In the outpatient setting, our model could identify higher-risk individuals who would benefit from closer monitoring.
PMID: 38703162
ISSN: 2405-5018
CID: 5658252

Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings

Woo, Kar-Mun C; Simon, Gregory W; Akindutire, Olumide; Aphinyanaphongs, Yindalon; Austrian, Jonathan S; Kim, Jung G; Genes, Nicholas; Goldenring, Jacob A; Major, Vincent J; Pariente, Chloé S; Pineda, Edwin G; Kang, Stella K
OBJECTIVES/OBJECTIVE:To evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings. MATERIALS AND METHODS/METHODS:Radiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as "definitely actionable" (DA) or "possibly actionable-clinical correlation" (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale. RESULTS:For the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were "hallucinated" outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision. CONCLUSION/CONCLUSIONS:GPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via "human-in-the-loop" workflows remains critical for clinical implementation.
PMID: 38778578
ISSN: 1527-974x
CID: 5654832

Development and external validation of a dynamic risk score for early prediction of cardiogenic shock in cardiac intensive care units using machine learning

Hu, Yuxuan; Lui, Albert; Goldstein, Mark; Sudarshan, Mukund; Tinsay, Andrea; Tsui, Cindy; Maidman, Samuel D; Medamana, John; Jethani, Neil; Puli, Aahlad; Nguy, Vuthy; Aphinyanaphongs, Yindalon; Kiefer, Nicholas; Smilowitz, Nathaniel R; Horowitz, James; Ahuja, Tania; Fishman, Glenn I; Hochman, Judith; Katz, Stuart; Bernard, Samuel; Ranganath, Rajesh
BACKGROUND:Myocardial infarction and heart failure are major cardiovascular diseases that affect millions of people in the US with the morbidity and mortality being highest among patients who develop cardiogenic shock. Early recognition of cardiogenic shock allows prompt implementation of treatment measures. Our objective is to develop a new dynamic risk score, called CShock, to improve early detection of cardiogenic shock in cardiac intensive care unit (ICU). METHODS:We developed and externally validated a deep learning-based risk stratification tool, called CShock, for patients admitted into the cardiac ICU with acute decompensated heart failure and/or myocardial infarction to predict onset of cardiogenic shock. We prepared a cardiac ICU dataset using MIMIC-III database by annotating with physician adjudicated outcomes. This dataset that consisted of 1500 patients with 204 having cardiogenic/mixed shock was then used to train CShock. The features used to train the model for CShock included patient demographics, cardiac ICU admission diagnoses, routinely measured laboratory values and vital signs, and relevant features manually extracted from echocardiogram and left heart catheterization reports. We externally validated the risk model on the New York University (NYU) Langone Health cardiac ICU database that was also annotated with physician adjudicated outcomes. The external validation cohort consisted of 131 patients with 25 patients experiencing cardiogenic/mixed shock. RESULTS:CShock achieved an area under the receiver operator characteristic curve (AUROC) of 0.821 (95% CI 0.792-0.850). CShock was externally validated in the more contemporary NYU cohort and achieved an AUROC of 0.800 (95% CI 0.717-0.884), demonstrating its generalizability in other cardiac ICUs. Having an elevated heart rate is most predictive of cardiogenic shock development based on Shapley values. The other top ten predictors are having an admission diagnosis of myocardial infarction with ST-segment elevation, having an admission diagnosis of acute decompensated heart failure, Braden Scale, Glasgow Coma Scale, Blood urea nitrogen, Systolic blood pressure, Serum chloride, Serum sodium, and Arterial blood pH. CONCLUSIONS:The novel CShock score has the potential to provide automated detection and early warning for cardiogenic shock and improve the outcomes for the millions of patients who suffer from myocardial infarction and heart failure.
PMID: 38518758
ISSN: 2048-8734
CID: 5640892

NATURE

Jiang, Lavender Yao; Liu, Xujin Chris; Nejatian, Nima Pour; Nasir-Moin, Mustafa; Wang, Duo; Abidin, Anas; Eaton, Kevin; Riina, Howard Antony; Laufer, Ilya; Punjabi, Paawan; Miceli, Madeline; Kim, Nora C.; Orillac, Cordelia; Schnurman, Zane; Livia, Christopher; Weiss, Hannah; Kurland, David; Neifert, Sean; Dastagirzada, Yosef; Kondziolka, Douglas; Cheung, Alexander T. M.; Yang, Grace; Cao, Ming; Flores, Mona; Costa, Anthony B.; Aphinyanaphongs, Yindalon; Cho, Kyunghyun; Oermann, Eric Karl
ISI:001005804900017
ISSN: 0028-0836
CID: 5883642

Ten Years of Health Informatics Education for Physicians

Chapter by: Major, Vincent J.; Plottel, Claudia S.; Aphinyanaphongs, Yindalon
in: Proceedings - 2023 IEEE 11th International Conference on Healthcare Informatics, ICHI 2023 by
[S.l.] : Institute of Electrical and Electronics Engineers Inc., 2023
pp. 637-644
ISBN: 9798350302639
CID: 5630952

Enabling AI-Augmented Clinical Workflows by Accessing Patient Data in Real-Time with FHIR

Chapter by: Major, Vincent J.; Wang, Walter; Aphinyanaphongs, Yindalon
in: Proceedings - 2023 IEEE 11th International Conference on Healthcare Informatics, ICHI 2023 by
[S.l.] : Institute of Electrical and Electronics Engineers Inc., 2023
pp. 531-533
ISBN: 9798350302639
CID: 5630942

Marketing and US Food and Drug Administration Clearance of Artificial Intelligence and Machine Learning Enabled Software in and as Medical Devices: A Systematic Review

Clark, Phoebe; Kim, Jayne; Aphinyanaphongs, Yindalon
IMPORTANCE:The marketing of health care devices enabled for use with artificial intelligence (AI) or machine learning (ML) is regulated in the US by the US Food and Drug Administration (FDA), which is responsible for approving and regulating medical devices. Currently, there are no uniform guidelines set by the FDA to regulate AI- or ML-enabled medical devices, and discrepancies between FDA-approved indications for use and device marketing require articulation. OBJECTIVE:To explore any discrepancy between marketing and 510(k) clearance of AI- or ML-enabled medical devices. EVIDENCE REVIEW:This systematic review was a manually conducted survey of 510(k) approval summaries and accompanying marketing materials of devices approved between November 2021 and March 2022, conducted between March and November 2022, following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guideline. Analysis focused on the prevalence of discrepancies between marketing and certification material for AI/ML enabled medical devices. FINDINGS:A total of 119 FDA 510(k) clearance summaries were analyzed in tandem with their respective marketing materials. The devices were taxonomized into 3 individual categories of adherent, contentious, and discrepant devices. A total of 15 devices (12.61%) were considered discrepant, 8 devices (6.72%) were considered contentious, and 96 devices (84.03%) were consistent between marketing and FDA 510(k) clearance summaries. Most devices were from the radiological approval committees (75 devices [82.35%]), with 62 of these devices (82.67%) adherent, 3 (4.00%) contentious, and 10 (13.33%) discrepant; followed by the cardiovascular device approval committee (23 devices [19.33%]), with 19 of these devices (82.61%) considered adherent, 2 contentious (8.70%) and 2 discrepant (8.70%). The difference between these 3 categories in cardiovascular and radiological devices was statistically significant (P < .001). CONCLUSIONS AND RELEVANCE:In this systematic review, low adherence rates within committees were observed most often in committees with few AI- or ML-enabled devices. and discrepancies between clearance documentation and marketing material were present in one-fifth of devices surveyed.
PMID: 37405771
ISSN: 2574-3805
CID: 5536832

Health system-scale language models are all-purpose prediction engines

Jiang, Lavender Yao; Liu, Xujin Chris; Nejatian, Nima Pour; Nasir-Moin, Mustafa; Wang, Duo; Abidin, Anas; Eaton, Kevin; Riina, Howard Antony; Laufer, Ilya; Punjabi, Paawan; Miceli, Madeline; Kim, Nora C; Orillac, Cordelia; Schnurman, Zane; Livia, Christopher; Weiss, Hannah; Kurland, David; Neifert, Sean; Dastagirzada, Yosef; Kondziolka, Douglas; Cheung, Alexander T M; Yang, Grace; Cao, Ming; Flores, Mona; Costa, Anthony B; Aphinyanaphongs, Yindalon; Cho, Kyunghyun; Oermann, Eric Karl
Physicians make critical time-constrained decisions every day. Clinical predictive models can help physicians and administrators make decisions by forecasting clinical and operational events. Existing structured data-based clinical predictive models have limited use in everyday practice owing to complexity in data processing, as well as model development and deployment1-3. Here we show that unstructured clinical notes from the electronic health record can enable the training of clinical language models, which can be used as all-purpose clinical predictive engines with low-resistance development and deployment. Our approach leverages recent advances in natural language processing4,5 to train a large language model for medical language (NYUTron) and subsequently fine-tune it across a wide range of clinical and operational predictive tasks. We evaluated our approach within our health system for five such tasks: 30-day all-cause readmission prediction, in-hospital mortality prediction, comorbidity index prediction, length of stay prediction, and insurance denial prediction. We show that NYUTron has an area under the curve (AUC) of 78.7-94.9%, with an improvement of 5.36-14.7% in the AUC compared with traditional models. We additionally demonstrate the benefits of pretraining with clinical text, the potential for increasing generalizability to different sites through fine-tuning and the full deployment of our system in a prospective, single-arm trial. These results show the potential for using clinical language models in medicine to read alongside physicians and provide guidance at the point of care.
PMCID:10338337
PMID: 37286606
ISSN: 1476-4687
CID: 5536672

Methods and Impact for Using Federated Learning to Collaborate on Clinical Research

Cheung, Alexander T M; Nasir-Moin, Mustafa; Fred Kwon, Young Joon; Guan, Jiahui; Liu, Chris; Jiang, Lavender; Raimondo, Christian; Chotai, Silky; Chambless, Lola; Ahmad, Hasan S; Chauhan, Daksh; Yoon, Jang W; Hollon, Todd; Buch, Vivek; Kondziolka, Douglas; Chen, Dinah; Al-Aswad, Lama A; Aphinyanaphongs, Yindalon; Oermann, Eric Karl
BACKGROUND:The development of accurate machine learning algorithms requires sufficient quantities of diverse data. This poses a challenge in health care because of the sensitive and siloed nature of biomedical information. Decentralized algorithms through federated learning (FL) avoid data aggregation by instead distributing algorithms to the data before centrally updating one global model. OBJECTIVE:To establish a multicenter collaboration and assess the feasibility of using FL to train machine learning models for intracranial hemorrhage (ICH) detection without sharing data between sites. METHODS:Five neurosurgery departments across the United States collaborated to establish a federated network and train a convolutional neural network to detect ICH on computed tomography scans. The global FL model was benchmarked against a standard, centrally trained model using a held-out data set and was compared against locally trained models using site data. RESULTS:A federated network of practicing neurosurgeon scientists was successfully initiated to train a model for predicting ICH. The FL model achieved an area under the ROC curve of 0.9487 (95% CI 0.9471-0.9503) when predicting all subtypes of ICH compared with a benchmark (non-FL) area under the ROC curve of 0.9753 (95% CI 0.9742-0.9764), although performance varied by subtype. The FL model consistently achieved top three performance when validated on any site's data, suggesting improved generalizability. A qualitative survey described the experience of participants in the federated network. CONCLUSION/CONCLUSIONS:This study demonstrates the feasibility of implementing a federated network for multi-institutional collaboration among clinicians and using FL to conduct machine learning research, thereby opening a new paradigm for neurosurgical collaboration.
PMID: 36399428
ISSN: 1524-4040
CID: 5385002