NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:nsr3

Total Results:

Neurocritical care. 2025:43(1):119-129.DOI: 10.1007/s12028-025-02214-3

Predicting hematoma expansion after intracerebral hemorrhage: a comparison of clinician prediction with deep learning radiomics models

Yu, Boyang; Melmed, Kara R; Frontera, Jennifer; Zhu, Weicheng; Huang, Haoxu; Qureshi, Adnan I; Maggard, Abigail; Steinhof, Michael; Kuohn, Lindsey; Kumar, Arooshi; Berson, Elisa R; Tran, Anh T; Payabvash, Seyedmehdi; Ironside, Natasha; Brush, Benjamin; Dehkharghani, Seena; Razavian, Narges; Ranganath, Rajesh

BACKGROUND:Early prediction of hematoma expansion (HE) following nontraumatic intracerebral hemorrhage (ICH) may inform preemptive therapeutic interventions. We sought to identify how accurately machine learning (ML) radiomics models predict HE compared with expert clinicians using head computed tomography (HCT). METHODS:We used data from 900 study participants with ICH enrolled in the Antihypertensive Treatment of Acute Cerebral Hemorrhage 2 Study. ML models were developed using baseline HCT images, as well as admission clinical data in a training cohort (n = 621), and their performance was evaluated in an independent test cohort (n = 279) to predict HE (defined as HE by 33% or > 6 mL at 24 h). We simultaneously surveyed expert clinicians and asked them to predict HE using the same initial HCT images and clinical data. Area under the receiver operating characteristic curve (AUC) were compared between clinician predictions, ML models using radiomic data only (a random forest classifier and a deep learning imaging model) and ML models using both radiomic and clinical data (three random forest classifier models using different feature combinations). Kappa values comparing interrater reliability among expert clinicians were calculated. The best performing model was compared with clinical predication. RESULTS:The AUC for expert clinician prediction of HE was 0.591, with a kappa of 0.156 for interrater variability, compared with ML models using radiomic data only (a deep learning model using image input, AUC 0.680) and using both radiomic and clinical data (a random forest model, AUC 0.677). The intraclass correlation coefficient for clinical judgment and the best performing ML model was 0.47 (95% confidence interval 0.23-0.75). CONCLUSIONS:We introduced supervised ML algorithms demonstrating that HE prediction may outperform practicing clinicians. Despite overall moderate AUCs, our results set a new relative benchmark for performance in these tasks that even expert clinicians find challenging. These results emphasize the need for continued improvements and further enhanced clinical decision support to optimally manage patients with ICH.

PMID: 39920546

ISSN: 1556-0961

CID: 5784422

Scientific reports. 2025:15(1).DOI: 10.1038/s41598-025-89607-8

Identification of patients at risk for pancreatic cancer in a 3-year timeframe based on machine learning algorithms

Zhu, Weicheng; Chen, Long; Aphinyanaphongs, Yindalon; Kastrinos, Fay; Simeone, Diane M; Pochapin, Mark; Stender, Cody; Razavian, Narges; Gonda, Tamas A

Early detection of pancreatic cancer (PC) remains challenging largely due to the low population incidence and few known risk factors. However, screening in at-risk populations and detection of early cancer has the potential to significantly alter survival. In this study, we aim to develop a predictive model to identify patients at risk for developing new-onset PC at two and a half to three year time frame. We used the Electronic Health Records (EHR) of a large medical system from 2000 to 2021 (N = 537,410). The EHR data analyzed in this work consists of patients' demographic information, diagnosis records, and lab values, which are used to identify patients who were diagnosed with pancreatic cancer and the risk factors used in the machine learning algorithm for prediction. We identified 73 risk factors of pancreatic cancer with the Phenome-wide Association Study (PheWAS) on a matched case-control cohort. Based on them, we built a large-scale machine learning algorithm based on EHR. A temporally stratified validation based on patients not included in any stage of the training of the model was performed. This model showed an AUROC at 0.742 [0.727, 0.757] which was similar in both the general population and in a subset of the population who has had prior cross-sectional imaging. The rate of diagnosis of pancreatic cancer in those in the top 1 percentile of the risk score was 6 folds higher than the general population. Our model leverages data extracted from a 6-month window of time in the electronic health record to identify patients at nearly sixfold higher than baseline risk of developing pancreatic cancer 2.5-3 years from evaluation. This approach offers an opportunity to define an enriched population entirely based on static data, where current screening may be recommended.

PMID: 40188106

ISSN: 2045-2322

CID: 5819542

PLOS digital health. 2024:3(12).DOI: 10.1371/journal.pdig.0000685

Evaluating Large Language Models in extracting cognitive exam dates and scores

Zhang, Hao; Jethani, Neil; Jones, Simon; Genes, Nicholas; Major, Vincent J; Jaffe, Ian S; Cardillo, Anthony B; Heilenbach, Noah; Ali, Nadia Fazal; Bonanni, Luke J; Clayburn, Andrew J; Khera, Zain; Sadler, Erica C; Prasad, Jaideep; Schlacter, Jamie; Liu, Kevin; Silva, Benjamin; Montgomery, Sophie; Kim, Eric J; Lester, Jacob; Hill, Theodore M; Avoricani, Alba; Chervonski, Ethan; Davydov, James; Small, William; Chakravartty, Eesha; Grover, Himanshu; Dodson, John A; Brody, Abraham A; Aphinyanaphongs, Yindalon; Masurkar, Arjun; Razavian, Narges

Ensuring reliability of Large Language Models (LLMs) in clinical tasks is crucial. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.

PMCID:11634005

PMID: 39661652

ISSN: 2767-3170

CID: 5762692

[Zhong ji yi kan] = [Medicine for intermediate groups]. 2024.DOI: 10.1101/2024.04.26.24306180

Predicting Risk of Alzheimer's Diseases and Related Dementias with AI Foundation Model on Electronic Health Records

Zhu, Weicheng; Tang, Huanze; Zhang, Hao; Rajamohan, Haresh Rengaraj; Huang, Shih-Lun; Ma, Xinyue; Chaudhari, Ankush; Madaan, Divyam; Almahmoud, Elaf; Chopra, Sumit; Dodson, John A; Brody, Abraham A; Masurkar, Arjun V; Razavian, Narges

Early identification of Alzheimer's disease (AD) and AD-related dementias (ADRD) has high clinical significance, both because of the potential to slow decline through initiating FDA-approved therapies and managing modifiable risk factors, and to help persons living with dementia and their families to plan before cognitive loss makes doing so challenging. However, substantial racial and ethnic disparities in early diagnosis currently lead to additional inequities in care, urging accurate and inclusive risk assessment programs. In this study, we trained an artificial intelligence foundation model to represent the electronic health records (EHR) data with a vast cohort of 1.2 million patients within a large health system. Building upon this foundation EHR model, we developed a predictive Transformer model, named TRADE, capable of identifying risks for AD/ADRD and mild cognitive impairment (MCI), by analyzing the past sequential visit records. Amongst individuals 65 and older, our model was able to generate risk predictions for various future timeframes. On the held-out validation set, our model achieved an area under the receiver operating characteristic (AUROC) of 0.772 (95% CI: 0.770, 0.773) for identifying the AD/ADRD/MCI risks in 1 year, and AUROC of 0.735 (95% CI: 0.734, 0.736) in 5 years. The positive predictive values (PPV) in 5 years among individuals with top 1% and 5% highest estimated risks were 39.2% and 27.8%, respectively. These results demonstrate significant improvements upon the current EHR-based AD/ADRD/MCI risk assessment models, paving the way for better prognosis and management of AD/ADRD/MCI at scale.

PMCID:11071573

PMID: 38712223

CID: 5662732

[Zhong ji yi kan] = [Medicine for intermediate groups]. 2024.DOI: 10.1101/2023.07.10.23292373

Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores

IMPORTANCE/UNASSIGNED:Large language models (LLMs) are crucial for medical tasks. Ensuring their reliability is vital to avoid false results. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. OBJECTIVE/UNASSIGNED:Evaluate ChatGPT and LlaMA-2 performance in extracting MMSE and CDR scores, including their associated dates. METHODS/UNASSIGNED:Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. RESULTS/UNASSIGNED:For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. CONCLUSIONS/UNASSIGNED:In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.

PMCID:10888985

PMID: 38405784

CID: 5722422

Scientific reports. 2023:13(1).DOI: 10.1038/s41598-023-43726-2

Author Correction: Generalizable deep learning model for early Alzheimer's disease detection from structural MRIs

Liu, Sheng; Masurkar, Arjun V; Rusinek, Henry; Chen, Jingyun; Zhang, Ben; Zhu, Weicheng; Fernandez-Granda, Carlos; Razavian, Narges

PMID: 37783742

ISSN: 2045-2322

CID: 5735542

Cell reports. Medicine. 2023:4(9).DOI: 10.1016/j.xcrm.2023.101173

Deep learning integrates histopathology and proteogenomics at a pan-cancer level

Wang, Joshua M; Hong, Runyu; Demicco, Elizabeth G; Tan, Jimin; Lazcano, Rossana; Moreira, Andre L; Li, Yize; Calinawan, Anna; Razavian, Narges; Schraink, Tobias; Gillette, Michael A; Omenn, Gilbert S; An, Eunkyung; Rodriguez, Henry; Tsirigos, Aristotelis; Ruggles, Kelly V; Ding, Li; Robles, Ana I; Mani, D R; Rodland, Karin D; Lazar, Alexander J; Liu, Wenke; FenyÃ¶, David; ,

We introduce a pioneering approach that integrates pathology imaging with transcriptomics and proteomics to identify predictive histology features associated with critical clinical outcomes in cancer. We utilize 2,755 H&E-stained histopathological slides from 657 patients across 6 cancer types from CPTAC. Our models effectively recapitulate distinctions readily made by human pathologists: tumor vs. normal (AUROC = 0.995) and tissue-of-origin (AUROC = 0.979). We further investigate predictive power on tasks not normally performed from H&E alone, including TP53 prediction and pathologic stage. Importantly, we describe predictive morphologies not previously utilized in a clinical setting. The incorporation of transcriptomics and proteomics identifies pathway-level signatures and cellular processes driving predictive histology features. Model generalizability and interpretability is confirmed using TCGA. We propose a classification system for these tasks, and suggest potential clinical applications for this integrated human and machine learning approach. A publicly available web-based platform implements these models.

PMCID:10518635

PMID: 37582371

ISSN: 2666-3791

CID: 5590072

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023.DOI: 10.1109/CVPR52729.2023.00327

Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning [Proceedings Paper]

Liu, Kangning; Zhu, Weicheng; Shen, Yiqiu; Liu, Sheng; Razavian, Narges; J. Geras, Krzysztof; Fernandez-Granda, Carlos

ORIGINAL:0017083

ISSN: 2575-7075

CID: 5573532

Frontiers in aging neuroscience. 2023:15.DOI: 10.3389/fnagi.2023.1149036

On gaps of clinical diagnosis of dementia subtypes: A study of Alzheimer"™s disease and Lewy body disease

Wei, Hui; Masurkar, Arjun V.; Razavian, Narges

Introduction: Alzheimer"™s disease (AD) and Lewy body disease (LBD) are the two most common neurodegenerative dementias and can occur in combination (AD+LBD). Due to overlapping biomarkers and symptoms, clinical differentiation of these subtypes could be difficult. However, it is unclear how the magnitude of diagnostic uncertainty varies across dementia spectra and demographic variables. We aimed to compare clinical diagnosis and post-mortem autopsy-confirmed pathological results to assess the clinical subtype diagnosis quality across these factors. Methods: We studied data of 1,920 participants recorded by the National Alzheimer"™s Coordinating Center from 2005 to 2019. Selection criteria included autopsy-based neuropathological assessments for AD and LBD, and the initial visit with Clinical Dementia Rating (CDR) stage of normal, mild cognitive impairment, or mild dementia. Longitudinally, we analyzed the first visit at each subsequent CDR stage. This analysis included positive predictive values, specificity, sensitivity and false negative rates of clinical diagnosis, as well as disparities by sex, race, age, and education. If autopsy-confirmed AD and/or LBD was missed in the clinic, the alternative clinical diagnosis was analyzed. Findings: In our findings, clinical diagnosis of AD+LBD had poor sensitivities. Over 61% of participants with autopsy-confirmed AD+LBD were diagnosed clinically as AD. Clinical diagnosis of AD had a low sensitivity at the early dementia stage and low specificities at all stages. Among participants diagnosed as AD in the clinic, over 32% had concurrent LBD neuropathology at autopsy. Among participants diagnosed as LBD, 32% to 54% revealed concurrent autopsy-confirmed AD pathology. When three subtypes were missed by clinicians, "No cognitive impairment" and "primary progressive aphasia or behavioral variant frontotemporal dementia" were the leading primary etiologic clinical diagnoses. With increasing dementia stages, the clinical diagnosis accuracy of black participants became significantly worse than other races, and diagnosis quality significantly improved for males but not females. Discussion: These findings demonstrate that clinical diagnosis of AD, LBD, and AD+LBD are inaccurate and suffer from significant disparities on race and sex. They provide important implications for clinical management, anticipatory guidance, trial enrollment and applicability of potential therapies for AD, and promote research into better biomarker-based assessment of LBD pathology.

SCOPUS:85151542204

ISSN: 1663-4365

CID: 5460452

Scientific reports. 2022:12(1).DOI: 10.1038/s41598-022-20674-x

Generalizable deep learning model for early Alzheimer's disease detection from structural MRIs

Liu, Sheng; Masurkar, Arjun V; Rusinek, Henry; Chen, Jingyun; Zhang, Ben; Zhu, Weicheng; Fernandez-Granda, Carlos; Razavian, Narges

Early diagnosis of Alzheimer's disease plays a pivotal role in patient care and clinical trials. In this study, we have developed a new approach based on 3D deep convolutional neural networks to accurately differentiate mild Alzheimer's disease dementia from mild cognitive impairment and cognitively normal individuals using structural MRIs. For comparison, we have built a reference model based on the volumes and thickness of previously reported brain regions that are known to be implicated in disease progression. We validate both models on an internal held-out cohort from The Alzheimer's Disease Neuroimaging Initiative (ADNI) and on an external independent cohort from The National Alzheimer's Coordinating Center (NACC). The deep-learning model is accurate, achieved an area-under-the-curve (AUC) of 85.12 when distinguishing between cognitive normal subjects and subjects with either MCI or mild Alzheimer's dementia. In the more challenging task of detecting MCI, it achieves an AUC of 62.45. It is also significantly faster than the volume/thickness model in which the volumes and thickness need to be extracted beforehand. The model can also be used to forecast progression: subjects with mild cognitive impairment misclassified as having mild Alzheimer's disease dementia by the model were faster to progress to dementia over time. An analysis of the features learned by the proposed model shows that it relies on a wide range of regions associated with Alzheimer's disease. These findings suggest that deep neural networks can automatically learn to identify imaging biomarkers that are predictive of Alzheimer's disease, and leverage them to achieve accurate early detection of the disease.

PMCID:9576679

PMID: 36253382

ISSN: 2045-2322

CID: 5352422