NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:aphiny01

Total Results:

Nature. 2023:619(7969):357-362.DOI: 10.1038/s41586-023-06160-y

Health system-scale language models are all-purpose prediction engines

Jiang, Lavender Yao; Liu, Xujin Chris; Nejatian, Nima Pour; Nasir-Moin, Mustafa; Wang, Duo; Abidin, Anas; Eaton, Kevin; Riina, Howard Antony; Laufer, Ilya; Punjabi, Paawan; Miceli, Madeline; Kim, Nora C; Orillac, Cordelia; Schnurman, Zane; Livia, Christopher; Weiss, Hannah; Kurland, David; Neifert, Sean; Dastagirzada, Yosef; Kondziolka, Douglas; Cheung, Alexander T M; Yang, Grace; Cao, Ming; Flores, Mona; Costa, Anthony B; Aphinyanaphongs, Yindalon; Cho, Kyunghyun; Oermann, Eric Karl

Physicians make critical time-constrained decisions every day. Clinical predictive models can help physicians and administrators make decisions by forecasting clinical and operational events. Existing structured data-based clinical predictive models have limited use in everyday practice owing to complexity in data processing, as well as model development and deployment^1-3. Here we show that unstructured clinical notes from the electronic health record can enable the training of clinical language models, which can be used as all-purpose clinical predictive engines with low-resistance development and deployment. Our approach leverages recent advances in natural language processing^4,5 to train a large language model for medical language (NYUTron) and subsequently fine-tune it across a wide range of clinical and operational predictive tasks. We evaluated our approach within our health system for five such tasks: 30-day all-cause readmission prediction, in-hospital mortality prediction, comorbidity index prediction, length of stay prediction, and insurance denial prediction. We show that NYUTron has an area under the curve (AUC) of 78.7-94.9%, with an improvement of 5.36-14.7% in the AUC compared with traditional models. We additionally demonstrate the benefits of pretraining with clinical text, the potential for increasing generalizability to different sites through fine-tuning and the full deployment of our system in a prospective, single-arm trial. These results show the potential for using clinical language models in medicine to read alongside physicians and provide guidance at the point of care.

PMCID:10338337

PMID: 37286606

ISSN: 1476-4687

CID: 5536672

Neurosurgery. 2023:92(2):431-438.DOI: 10.1227/neu.0000000000002198

Methods and Impact for Using Federated Learning to Collaborate on Clinical Research

Cheung, Alexander T M; Nasir-Moin, Mustafa; Fred Kwon, Young Joon; Guan, Jiahui; Liu, Chris; Jiang, Lavender; Raimondo, Christian; Chotai, Silky; Chambless, Lola; Ahmad, Hasan S; Chauhan, Daksh; Yoon, Jang W; Hollon, Todd; Buch, Vivek; Kondziolka, Douglas; Chen, Dinah; Al-Aswad, Lama A; Aphinyanaphongs, Yindalon; Oermann, Eric Karl

BACKGROUND:The development of accurate machine learning algorithms requires sufficient quantities of diverse data. This poses a challenge in health care because of the sensitive and siloed nature of biomedical information. Decentralized algorithms through federated learning (FL) avoid data aggregation by instead distributing algorithms to the data before centrally updating one global model. OBJECTIVE:To establish a multicenter collaboration and assess the feasibility of using FL to train machine learning models for intracranial hemorrhage (ICH) detection without sharing data between sites. METHODS:Five neurosurgery departments across the United States collaborated to establish a federated network and train a convolutional neural network to detect ICH on computed tomography scans. The global FL model was benchmarked against a standard, centrally trained model using a held-out data set and was compared against locally trained models using site data. RESULTS:A federated network of practicing neurosurgeon scientists was successfully initiated to train a model for predicting ICH. The FL model achieved an area under the ROC curve of 0.9487 (95% CI 0.9471-0.9503) when predicting all subtypes of ICH compared with a benchmark (non-FL) area under the ROC curve of 0.9753 (95% CI 0.9742-0.9764), although performance varied by subtype. The FL model consistently achieved top three performance when validated on any site's data, suggesting improved generalizability. A qualitative survey described the experience of participants in the federated network. CONCLUSION/CONCLUSIONS:This study demonstrates the feasibility of implementing a federated network for multi-institutional collaboration among clinicians and using FL to conduct machine learning research, thereby opening a new paradigm for neurosurgical collaboration.

PMID: 36399428

ISSN: 1524-4040

CID: 5385002

Ten Years of Health Informatics Education for Physicians

Chapter by: Major, Vincent J.; Plottel, Claudia S.; Aphinyanaphongs, Yindalon

in: Proceedings - 2023 IEEE 11th International Conference on Healthcare Informatics, ICHI 2023 by

[S.l.] : Institute of Electrical and Electronics Engineers Inc., 2023

pp. 637-644

ISBN: 9798350302639

CID: 5630952

Enabling AI-Augmented Clinical Workflows by Accessing Patient Data in Real-Time with FHIR

Chapter by: Major, Vincent J.; Wang, Walter; Aphinyanaphongs, Yindalon

in: Proceedings - 2023 IEEE 11th International Conference on Healthcare Informatics, ICHI 2023 by

[S.l.] : Institute of Electrical and Electronics Engineers Inc., 2023

pp. 531-533

ISBN: 9798350302639

CID: 5630942

NATURE. 2023:619(7969):357-+.DOI:

NATURE

Jiang, Lavender Yao; Liu, Xujin Chris; Nejatian, Nima Pour; Nasir-Moin, Mustafa; Wang, Duo; Abidin, Anas; Eaton, Kevin; Riina, Howard Antony; Laufer, Ilya; Punjabi, Paawan; Miceli, Madeline; Kim, Nora C.; Orillac, Cordelia; Schnurman, Zane; Livia, Christopher; Weiss, Hannah; Kurland, David; Neifert, Sean; Dastagirzada, Yosef; Kondziolka, Douglas; Cheung, Alexander T. M.; Yang, Grace; Cao, Ming; Flores, Mona; Costa, Anthony B.; Aphinyanaphongs, Yindalon; Cho, Kyunghyun; Oermann, Eric Karl

ISI:001005804900017

ISSN: 0028-0836

CID: 5883642

Nature machine intelligence. 2022:4(10):807-809.DOI: 10.1038/s42256-022-00544-x

AI model transferability in healthcare: a sociotechnical perspective

Wiesenfeld, Batia Mishan; Aphinyanaphongs, Yin; Nov, Oded

SCOPUS:85139986644

ISSN: 2522-5839

CID: 5350312

American journal of gastroenterology. 2022:Conference:(Annual).DOI: 10.14309/01.ajg.0000857436.54468.02

Predicting Post-Operative C. difficile Infection (CDI) With Automated Machine Learning (AutoML) Algorithms Using the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) Database [Meeting Abstract]

Thangirala, A; Li, T; Abaza, E; Aphinyanaphongs, Y; Axelrad, J; Chen, J; Kelleher, A; Oeding, J; Hu, E; Martin, J; Katz, G; Brejt, S; Castillo, G; Ostberg, N; Kan, K

Introduction: Clostridium difficile infection (CDI) is one of the most common hospital-acquired infections leading to prolonged hospitalization and significant morbidity. Only a few prior studies have developed predictive risk models for CDI and all but one have utilized logistic regression (LR) models to identify risk factors. Automated machine learning (AutoML) programs consistently outperform standard LR models in non-medical contexts. This study aims to investigate the utility of AutoML methods in developing a model for post-operative CDI prediction.
Method(s): We used an AutoML system developed by Amazon, Autogluon v0.3.1, to evaluate the prediction accuracy of post-surgical CDI using the 2016-2018 ACS NSQIP database. A total of A total of 3,049,617 patients and 79 pre-operative features were included in the model. Post-operative CDI was defined as CDI within 30 days of surgery. Models were trained for 4 hours to optimize performance on the Brier score, with lower being better. Validation of all performance metrics was done using the 2019 NSQIP database.
Result(s): 0.36% of the patients (n = 11,001) developed post-operative CDI. Brier scores were calculated for each model with the top performing model being an ensembled neural net model having a Brier score of 0.0027 on the test set. The corresponding AUROC and AUC-PR was 0.840 and 0.015 respectively (Figure).
Conclusion(s): The models generated via AutoML to predict post-operative CDI had discriminatory characteristics greater than or equal to those models reported in the literature. Future post-operative CDI models may benefit from automated machine learning techniques

EMBASE:641287886

ISSN: 1572-0241

CID: 5514802

Journal of general internal medicine. 2022:37(9):2230-2238.DOI: 10.1007/s11606-022-07526-0

Development and Validation of a Machine Learning Model for Automated Assessment of Resident Clinical Reasoning Documentation

Schaye, Verity; Guzman, Benedict; Burk-Rafel, Jesse; Marin, Marina; Reinstein, Ilan; Kudlowitz, David; Miller, Louis; Chun, Jonathan; Aphinyanaphongs, Yindalon

BACKGROUND:Residents receive infrequent feedback on their clinical reasoning (CR) documentation. While machine learning (ML) and natural language processing (NLP) have been used to assess CR documentation in standardized cases, no studies have described similar use in the clinical environment. OBJECTIVE:The authors developed and validated using Kane's framework a ML model for automated assessment of CR documentation quality in residents' admission notes. DESIGN, PARTICIPANTS, MAIN MEASURES/UNASSIGNED:Internal medicine residents' and subspecialty fellows' admission notes at one medical center from July 2014 to March 2020 were extracted from the electronic health record. Using a validated CR documentation rubric, the authors rated 414 notes for the ML development dataset. Notes were truncated to isolate the relevant portion; an NLP software (cTAKES) extracted disease/disorder named entities and human review generated CR terms. The final model had three input variables and classified notes as demonstrating low- or high-quality CR documentation. The ML model was applied to a retrospective dataset (9591 notes) for human validation and data analysis. Reliability between human and ML ratings was assessed on 205 of these notes with Cohen's kappa. CR documentation quality by post-graduate year (PGY) was evaluated by the Mantel-Haenszel test of trend. KEY RESULTS/RESULTS:The top-performing logistic regression model had an area under the receiver operating characteristic curve of 0.88, a positive predictive value of 0.68, and an accuracy of 0.79. Cohen's kappa was 0.67. Of the 9591 notes, 31.1% demonstrated high-quality CR documentation; quality increased from 27.0% (PGY1) to 31.0% (PGY2) to 39.0% (PGY3) (p < .001 for trend). Validity evidence was collected in each domain of Kane's framework (scoring, generalization, extrapolation, and implications). CONCLUSIONS:The authors developed and validated a high-performing ML model that classifies CR documentation quality in resident admission notes in the clinical environment-a novel application of ML and NLP with many potential use cases.

PMCID:9296753

PMID: 35710676

ISSN: 1525-1497

CID: 5277902

Applied clinical informatics. 2022:13(3):632-640.DOI: 10.1055/s-0042-1750416

Evaluating the Effect of a COVID-19 Predictive Model to Facilitate Discharge: A Randomized Controlled Trial

Major, Vincent J; Jones, Simon A; Razavian, Narges; Bagheri, Ashley; Mendoza, Felicia; Stadelman, Jay; Horwitz, Leora I; Austrian, Jonathan; Aphinyanaphongs, Yindalon

BACKGROUND:â€ƒWe previously developed and validated a predictive model to help clinicians identify hospitalized adults with coronavirus disease 2019 (COVID-19) who may be ready for discharge given their low risk of adverse events. Whether this algorithm can prompt more timely discharge for stable patients in practice is unknown. OBJECTIVES/OBJECTIVE:â€ƒThe aim of the study is to estimate the effect of displaying risk scores on length of stay (LOS). METHODS:â€ƒWe integrated model output into the electronic health record (EHR) at four hospitals in one health system by displaying a green/orange/red score indicating low/moderate/high-risk in a patient list column and a larger COVID-19 summary report visible for each patient. Display of the score was pseudo-randomized 1:1 into intervention and control arms using a patient identifier passed to the model execution code. Intervention effect was assessed by comparing LOS between intervention and control groups. Adverse safety outcomes of death, hospice, and re-presentation were tested separately and as a composite indicator. We tracked adoption and sustained use through daily counts of score displays. RESULTS:â€ƒEnrolling 1,010 patients from May 15, 2020 to December 7, 2020, the trial found no detectable difference in LOS. The intervention had no impact on safety indicators of death, hospice or re-presentation after discharge. The scores were displayed consistently throughout the study period but the study lacks a causally linked process measure of provider actions based on the score. Secondary analysis revealed complex dynamics in LOS temporally, by primary symptom, and hospital location. CONCLUSION/CONCLUSIONS:â€ƒAn AI-based COVID-19 risk score displayed passively to clinicians during routine care of hospitalized adults with COVID-19 was safe but had no detectable impact on LOS. Health technology challenges such as insufficient adoption, nonuniform use, and provider trust compounded with temporal factors of the COVID-19 pandemic may have contributed to the null result. TRIAL REGISTRATION/BACKGROUND:â€ƒClinicalTrials.gov identifier: NCT04570488.

PMCID:9329139

PMID: 35896506

ISSN: 1869-0327

CID: 5276672

Journal of biomedical informatics. 2022:130.DOI: 10.1016/j.jbi.2022.104086

Automated interpretable discovery of heterogeneous treatment effectiveness: A COVID-19 case study

Lengerich, Benjamin J; Nunnally, Mark E; Aphinyanaphongs, Yin; Ellington, Caleb; Caruana, Rich

Testing multiple treatments for heterogeneous (varying) effectiveness with respect to many underlying risk factors requires many pairwise tests; we would like to instead automatically discover and visualize patient archetypes and predictors of treatment effectiveness using multitask machine learning. In this paper, we present a method to estimate these heterogeneous treatment effects with an interpretable hierarchical framework that uses additive models to visualize expected treatment benefits as a function of patient factors (identifying personalized treatment benefits) and concurrent treatments (identifying combinatorial treatment benefits). This method achieves state-of-the-art predictive power for COVID-19 in-hospital mortality and interpretable identification of heterogeneous treatment benefits. We first validate this method on the large public MIMIC-IV dataset of ICU patients to test recovery of heterogeneous treatment effects. Next we apply this method to a proprietary dataset of over 3000 patients hospitalized for COVID-19, and find evidence of heterogeneous treatment effectiveness predicted largely by indicators of inflammation and thrombosis risk: patients with few indicators of thrombosis risk benefit most from treatments against inflammation, while patients with few indicators of inflammation risk benefit most from treatments against thrombosis. This approach provides an automated methodology to discover heterogeneous and individualized effectiveness of treatments.

PMCID:9055753

PMID: 35504543

ISSN: 1532-0480

CID: 5216082