NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:kjg5

Total Results:

Nature communications. 2026.DOI: 10.1038/s41467-026-73088-y

Multi-modal AI for comprehensive breast cancer prognostication

Witowski, Jan; Zeng, Ken G; Cappadona, Joseph; Elayoubi, Jailan; Choucair, Khalil; Chiru, Elena Diana; Chan, Nancy; Kang, Young-Joon; Howard, Frederick; Ostrovnaya, Irina; Fernandez-Granda, Carlos; Schnabel, Freya; Steinsnyder, Zoe; Ozerdem, Ugur; Liu, Kangning; Abdulsattar, Waleed; Zong, Yu; Daoud, Lina; Beydoun, Rafic; Saad, Anas M; Thakore, Nitya; Sadic, Mohammad; Yeung, Frank; Liu, Elisa; Hill, Theodore; Swett, Benjamin; Rigau, Danielle; Clayburn, Andrew J; Speirs, Valerie; Vetter, Marcus; Sojak, Lina; Muenst, Simone; Baumhoer, Daniel; Pan, Jia-Wern; Makmur, Haslina; Teo, Soo-Hwang; Pak, Linda M; Angel, Victor; Zilenaite-Petrulaitiene, Dovile; Laurinavicius, Arvydas; Klar, Natalie; Piening, Brian D; Bifulco, Carlo; Jun, Sun-Young; Yi, Jae Pak; Lim, Su Hyun; Brufsky, Adam; Esteva, Francisco J; Pusztai, Lajos; LeCun, Yann; Geras, Krzysztof J

Treatment selection in breast cancer is guided by risk assessment using molecular subtypes and clinicopathological characteristics. However, current approaches lack the precision required for optimal clinical decision-making. To address this, we use data from 8161 patients to develop and evaluate an AI test integrating digital pathology with clinical data. The AI test provides a robust method for predicting disease-free interval (C-index: 0.71 [0.68-0.75], HR: 3.63 [3.02-4.37, p < 0.001]). In a direct comparison, the AI test displays numerically higher discrimination (C-index: 0.67 [0.61-0.74]) than the standard-of-care 21-gene assay (C-index: 0.61 [0.49-0.73]). Across molecular subtypes, the AI test demonstrates robust prognostic performance, including in triple negative breast cancer (C-index: 0.71 [0.62-0.81], HR: 3.81 [2.35-6.17, p=0.02]), where no guideline-recommended assays currently exist. These findings highlight the potential of AI-based pathology tests as a promising tool for improved risk stratification across all major subtypes, with implications for clinical decision-making.

PMID: 42161927

ISSN: 2041-1723

CID: 6038332

PLoS one. 2026:21(5).DOI: 10.1371/journal.pone.0344600

Robust disease prognosis via diagnostic knowledge preservation: A sequential learning approach

Rajamohan, Haresh Rengaraj; Xu, Yanqi; Zhu, Weicheng; Kijowski, Richard; Cho, Kyunghyun; Geras, Krzysztof J; Razavian, Narges; Deniz, Cem M

Accurate disease prognosis is essential for patient care but is often hindered by the scarcity of longitudinal data. This study explores deep learning training strategies that utilize large, accessible diagnostic datasets to pretrain models aimed at predicting future disease progression in knee osteoarthritis (OA), Alzheimer's disease (AD), and breast cancer (BC). While diagnostic pretraining improves prognostic task performance, naive fine-tuning for prognosis can cause 'catastrophic forgetting,' where the model's original diagnostic accuracy degrades, a significant patient safety concern in real-world settings. To address this, we propose a sequential learning strategy with experience replay. We used cohorts with knee radiographs, brain MRIs, and digital mammograms to predict 4-year structural worsening in OA, 2-year cognitive decline in AD, and 5-year cancer diagnosis in BC. Our results showed that diagnostic pretraining on larger datasets improved prognosis model performance compared to standard baselines, boosting both the Area Under the Receiver Operating Characteristic curve (AUROC) (e.g., Knee OA external: 0.770 vs 0.747; Breast Cancer: 0.874 vs 0.848) and the Area Under the Precision-Recall Curve (AUPRC) (e.g., Alzheimer's Disease: 0.752 vs 0.683). Additionally, a sequential learning approach with experience replay achieved prognostic performance comparable to dedicated single-task models (e.g., Breast Cancer AUROC 0.876 vs 0.874) while also preserving diagnostic ability. This method maintained high diagnostic accuracy (e.g., Breast Cancer Balanced Accuracy 50.4% vs 50.9% for a dedicated diagnostic model), unlike simpler multitask methods prone to catastrophic forgetting (e.g., 37.7%). Our findings show that leveraging large diagnostic datasets is a reliable and data-efficient way to enhance prognostic models while maintaining essential diagnostic skills.

PMCID:13148697

PMID: 42090385

ISSN: 1932-6203

CID: 6031322

[Zhong ji yi kan] = [Medicine for intermediate groups]. 2025.DOI: 10.1101/2025.09.22.25336414

Robust Disease Prognosis via Diagnostic Knowledge Preservation: A Sequential Learning Approach

Rajamohan, Haresh Rengaraj; Xu, Yanqi; Zhu, Weicheng; Kijowski, Richard; Cho, Kyunghyun; Geras, Krzysztof J; Razavian, Narges; Deniz, Cem M

Accurate disease prognosis is essential for patient care but is often hindered by the lack of long-term data. This study explores deep learning training strategies that utilize large, accessible diagnostic datasets to pretrain models aimed at predicting future disease progression in knee osteoarthritis (OA), Alzheimer's disease (AD), and breast cancer (BC). While diagnostic pretraining improves prognostic task performance, naive fine-tuning for prognosis can cause 'catastrophic forgetting,' where the model's original diagnostic accuracy degrades, a significant patient safety concern in real-world settings. To address this, we propose a sequential learning strategy with experience replay. We used cohorts with knee radiographs, brain MRIs, and digital mammograms to predict 4-year structural worsening in OA, 2-year cognitive decline in AD, and 5-year cancer diagnosis in BC. Our results showed that diagnostic pretraining on larger datasets improved prognosis model performance compared to standard baselines, boosting both the Area Under the Receiver Operating Characteristic curve (AUROC) (e.g., Knee OA external: 0.77 vs 0.747; Breast Cancer: 0.874 vs 0.848) and the Area Under the Precision-Recall Curve (AUPRC) (e.g., Alzheimer's Disease: 0.752 vs 0.683). Additionally, a sequential learning approach with experience replay achieved prognostic performance comparable to dedicated single-task models (e.g., Breast Cancer AUROC 0.876 vs 0.874) while also preserving diagnostic ability. This method maintained high diagnostic accuracy (e.g., Breast Cancer Balanced Accuracy 50.4% vs 50.9% for a dedicated diagnostic model), unlike simpler multitask methods prone to catastrophic forgetting (e.g., 37.7%). Our findings show that leveraging large diagnostic datasets is a reliable and data-efficient way to enhance prognostic models while maintaining essential diagnostic skills.

PMCID:12486016

PMID: 41040735

CID: 5973072

Studies in health technology & informatics. 2025:329:1884-1885.DOI: 10.3233/SHTI251263

Meta-Repository of Screening Mammography Classifiers

Chłędowski, Jakub; Stadnick, Benjamin; Witowski, Jan; Rajiv, Vishwaesh; Shamout, Farah E; Cho, Kyunghyun; Geras, Krzysztof J

We present a meta-repository that enables reproducible benchmarking of AI classifiers for breast cancer screening using mammography data. It includes 5 open-source models evaluated across 7 international datasets, addressing key challenges in model generalization and transparency. By providing a standardized evaluation framework and enabling cross-platform reproducibility, our work supports both research progress and clinical integration. The meta-repository is available https://www.github.com/nyukat/mammography_metarepository.

PMID: 40776280

ISSN: 1879-8365

CID: 5905382

IEEE transactions on medical imaging. 2024:43(1):351-365.DOI: 10.1109/TMI.2023.3302799

An efficient deep neural network to classify large 3D images with small objects

Park, Jungkyu; Chledowski, Jakub; Jastrzebski, Stanislaw; Witowski, Jan; Xu, Yanqi; Du, Linda; Gaddam, Sushma; Kim, Eric; Lewin, Alana; Parikh, Ujas; Plaunova, Anastasia; Chen, Sardius; Millet, Alexandra; Park, James; Pysarenko, Kristine; Patel, Shalin; Goldberg, Julia; Wegener, Melanie; Moy, Linda; Heacock, Laura; Reig, Beatriu; Geras, Krzysztof J

3D imaging enables accurate diagnosis by providing spatial information about organ anatomy. However, using 3D images to train AI models is computationally challenging because they consist of 10x or 100x more pixels than their 2D counterparts. To be trained with high-resolution 3D images, convolutional neural networks resort to downsampling them or projecting them to 2D. We propose an effective alternative, a neural network that enables efficient classification of full-resolution 3D medical images. Compared to off-the-shelf convolutional neural networks, our network, 3D Globally-Aware Multiple Instance Classifier (3D-GMIC), uses 77.98%-90.05% less GPU memory and 91.23%-96.02% less computation. While it is trained only with image-level labels, without segmentation labels, it explains its predictions by providing pixel-level saliency maps. On a dataset collected at NYU Langone Health, including 85,526 patients with full-field 2D mammography (FFDM), synthetic 2D mammography, and 3D mammography, 3D-GMIC achieves an AUC of 0.831 (95% CI: 0.769-0.887) in classifying breasts with malignant findings using 3D mammography. This is comparable to the performance of GMIC on FFDM (0.816, 95% CI: 0.737-0.878) and synthetic 2D (0.826, 95% CI: 0.754-0.884), which demonstrates that 3D-GMIC successfully classified large 3D images despite focusing computation on a smaller percentage of its input compared to GMIC. Therefore, 3D-GMIC identifies and utilizes extremely small regions of interest from 3D images consisting of hundreds of millions of pixels, dramatically reducing associated computational challenges. 3D-GMIC generalizes well to BCS-DBT, an external dataset from Duke University Hospital, achieving an AUC of 0.848 (95% CI: 0.798-0.896).

PMID: 37590109

ISSN: 1558-254x

CID: 5588742

Exploring synthesizing 2D mammograms from 3D digital breast tomosynthesis images

Chapter by: Chledowski, Jakub; Park, Jungkyu; Geras, Krzysztof J.

in: 2023 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2023 by

[S.l.] : Institute of Electrical and Electronics Engineers Inc., 2023

pp. 562-569

ISBN: 9798350382204

CID: 5701252

[Zhong ji yi kan] = [Medicine for intermediate groups]. 2023.DOI: 10.21203/rs.3.rs-3035772/v1

Improving Information Extraction from Pathology Reports using Named Entity Recognition

Zeng, Ken G; Dutt, Tarun; Witowski, Jan; Kranthi Kiran, G V; Yeung, Frank; Kim, Michelle; Kim, Jesi; Pleasure, Mitchell; Moczulski, Christopher; Lopez, L Julian Lechuga; Zhang, Hao; Harbi, Mariam Al; Shamout, Farah E; Major, Vincent J; Heacock, Laura; Moy, Linda; Schnabel, Freya; Pak, Linda M; Shen, Yiqiu; Geras, Krzysztof J

Pathology reports are considered the gold standard in medical research due to their comprehensive and accurate diagnostic information. Natural language processing (NLP) techniques have been developed to automate information extraction from pathology reports. However, existing studies suffer from two significant limitations. First, they typically frame their tasks as report classification, which restricts the granularity of extracted information. Second, they often fail to generalize to unseen reports due to variations in language, negation, and human error. To overcome these challenges, we propose a BERT (bidirectional encoder representations from transformers) named entity recognition (NER) system to extract key diagnostic elements from pathology reports. We also introduce four data augmentation methods to improve the robustness of our model. Trained and evaluated on 1438 annotated breast pathology reports, acquired from a large medical center in the United States, our BERT model trained with data augmentation achieves an entity F1-score of 0.916 on an internal test set, surpassing the BERT baseline (0.843). We further assessed the model's generalizability using an external validation dataset from the United Arab Emirates, where our model maintained satisfactory performance (F1-score 0.860). Our findings demonstrate that our NER systems can effectively extract fine-grained information from widely diverse medical reports, offering the potential for large-scale information extraction in a wide range of medical and AI research. We publish our code at https://github.com/nyukat/pathology_extraction.

PMCID:10350195

PMID: 37461545

CID: 5588752

Nature medicine. 2023:29(7):1814-1820.DOI: 10.1038/s41591-023-02437-x

Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians

Dvijotham, Krishnamurthy Dj; Winkens, Jim; Barsbey, Melih; Ghaisas, Sumedh; Stanforth, Robert; Pawlowski, Nick; Strachan, Patricia; Ahmed, Zahra; Azizi, Shekoofeh; Bachrach, Yoram; Culp, Laura; Daswani, Mayank; Freyberg, Jan; Kelly, Christopher; Kiraly, Atilla; Kohlberger, Timo; McKinney, Scott; Mustafa, Basil; Natarajan, Vivek; Geras, Krzysztof; Witowski, Jan; Qin, Zhi Zhen; Creswell, Jacob; Shetty, Shravya; Sieniek, Marcin; Spitz, Terry; Corrado, Greg; Kohli, Pushmeet; Cemgil, Taylan; Karthikesalingam, Alan

Predictive artificial intelligence (AI) systems based on deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings, but can make errors in cases accurately diagnosed by clinicians and vice versa. We developed Complementarity-Driven Deferral to Clinical Workflow (CoDoC), a system that can learn to decide between the opinion of a predictive AI model and a clinical workflow. CoDoC enhances accuracy relative to clinician-only or AI-only baselines in clinical workflows that screen for breast cancer or tuberculosis (TB). For breast cancer screening, compared to double reading with arbitration in a screening program in the UK, CoDoC reduced false positives by 25% at the same false-negative rate, while achieving a 66% reduction in clinician workload. For TB triaging, compared to standalone AI and clinical workflows, CoDoC achieved a 5-15% reduction in false positives at the same false-negative rate for three of five commercially available predictive AI systems. To facilitate the deployment of CoDoC in novel futuristic clinical settings, we present results showing that CoDoC's performance gains are sustained across several axes of variation (imaging modality, clinical setting and predictive AI system) and discuss the limitations of our evaluation and where further validation would be needed. We provide an open-source implementation to encourage further research and application.

PMID: 37460754

ISSN: 1546-170x

CID: 5535572

JAMA network open. 2023:6(2).DOI: 10.1001/jamanetworkopen.2023.0524

A Competition, Benchmark, Code, and Data for Using Artificial Intelligence to Detect Lesions in Digital Breast Tomosynthesis

Konz, Nicholas; Buda, Mateusz; Gu, Hanxue; Saha, Ashirbani; Yang, Jichen; Chledowski, Jakub; Park, Jungkyu; Witowski, Jan; Geras, Krzysztof J; Shoshan, Yoel; Gilboa-Solomon, Flora; Khapun, Daniel; Ratner, Vadim; Barkan, Ella; Ozery-Flato, Michal; Martí, Robert; Omigbodun, Akinyinka; Marasinou, Chrysostomos; Nakhaei, Noor; Hsu, William; Sahu, Pranjal; Hossain, Md Belayat; Lee, Juhun; Santos, Carlos; Przelaskowski, Artur; Kalpathy-Cramer, Jayashree; Bearce, Benjamin; Cha, Kenny; Farahani, Keyvan; Petrick, Nicholas; Hadjiiski, Lubomir; Drukker, Karen; Armato, Samuel G; Mazurowski, Maciej A

IMPORTANCE:An accurate and robust artificial intelligence (AI) algorithm for detecting cancer in digital breast tomosynthesis (DBT) could significantly improve detection accuracy and reduce health care costs worldwide. OBJECTIVES:To make training and evaluation data for the development of AI algorithms for DBT analysis available, to develop well-defined benchmarks, and to create publicly available code for existing methods. DESIGN, SETTING, AND PARTICIPANTS:This diagnostic study is based on a multi-institutional international grand challenge in which research teams developed algorithms to detect lesions in DBT. A data set of 22 032 reconstructed DBT volumes was made available to research teams. Phase 1, in which teams were provided 700 scans from the training set, 120 from the validation set, and 180 from the test set, took place from December 2020 to January 2021, and phase 2, in which teams were given the full data set, took place from May to July 2021. MAIN OUTCOMES AND MEASURES:The overall performance was evaluated by mean sensitivity for biopsied lesions using only DBT volumes with biopsied lesions; ties were broken by including all DBT volumes. RESULTS:A total of 8 teams participated in the challenge. The team with the highest mean sensitivity for biopsied lesions was the NYU B-Team, with 0.957 (95% CI, 0.924-0.984), and the second-place team, ZeDuS, had a mean sensitivity of 0.926 (95% CI, 0.881-0.964). When the results were aggregated, the mean sensitivity for all submitted algorithms was 0.879; for only those who participated in phase 2, it was 0.926. CONCLUSIONS AND RELEVANCE:In this diagnostic study, an international competition produced algorithms with high sensitivity for using AI to detect lesions on DBT images. A standardized performance benchmark for the detection task using publicly available clinical imaging data was released, with detailed descriptions and analyses of submitted algorithms accompanied by a public release of their predictions and code for selected methods. These resources will serve as a foundation for future research on computer-assisted diagnosis methods for DBT, significantly lowering the barrier of entry for new researchers.

PMCID:9951043

PMID: 36821110

ISSN: 2574-3805

CID: 5448222

MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images

Chapter by: Hayat, Nasir; Geras, Krzysztof J.; Shamout, Farah E.

in: Proceedings of Machine Learning Research by

[S.l.] : ML Research Press, 2022

pp. 479-503

ISBN:

CID: 5550582