Try a new search

Format these results:

Searched for:



Total Results:


An efficient deep neural network to classify large 3D images with small objects

Park, Jungkyu; Chledowski, Jakub; Jastrzebski, Stanislaw; Witowski, Jan; Xu, Yanqi; Du, Linda; Gaddam, Sushma; Kim, Eric; Lewin, Alana; Parikh, Ujas; Plaunova, Anastasia; Chen, Sardius; Millet, Alexandra; Park, James; Pysarenko, Kristine; Patel, Shalin; Goldberg, Julia; Wegener, Melanie; Moy, Linda; Heacock, Laura; Reig, Beatriu; Geras, Krzysztof J
3D imaging enables accurate diagnosis by providing spatial information about organ anatomy. However, using 3D images to train AI models is computationally challenging because they consist of 10x or 100x more pixels than their 2D counterparts. To be trained with high-resolution 3D images, convolutional neural networks resort to downsampling them or projecting them to 2D. We propose an effective alternative, a neural network that enables efficient classification of full-resolution 3D medical images. Compared to off-the-shelf convolutional neural networks, our network, 3D Globally-Aware Multiple Instance Classifier (3D-GMIC), uses 77.98%-90.05% less GPU memory and 91.23%-96.02% less computation. While it is trained only with image-level labels, without segmentation labels, it explains its predictions by providing pixel-level saliency maps. On a dataset collected at NYU Langone Health, including 85,526 patients with full-field 2D mammography (FFDM), synthetic 2D mammography, and 3D mammography, 3D-GMIC achieves an AUC of 0.831 (95% CI: 0.769-0.887) in classifying breasts with malignant findings using 3D mammography. This is comparable to the performance of GMIC on FFDM (0.816, 95% CI: 0.737-0.878) and synthetic 2D (0.826, 95% CI: 0.754-0.884), which demonstrates that 3D-GMIC successfully classified large 3D images despite focusing computation on a smaller percentage of its input compared to GMIC. Therefore, 3D-GMIC identifies and utilizes extremely small regions of interest from 3D images consisting of hundreds of millions of pixels, dramatically reducing associated computational challenges. 3D-GMIC generalizes well to BCS-DBT, an external dataset from Duke University Hospital, achieving an AUC of 0.848 (95% CI: 0.798-0.896).
PMID: 37590109
ISSN: 1558-254x
CID: 5588742

Improving Information Extraction from Pathology Reports using Named Entity Recognition

Zeng, Ken G; Dutt, Tarun; Witowski, Jan; Kranthi Kiran, G V; Yeung, Frank; Kim, Michelle; Kim, Jesi; Pleasure, Mitchell; Moczulski, Christopher; Lopez, L Julian Lechuga; Zhang, Hao; Harbi, Mariam Al; Shamout, Farah E; Major, Vincent J; Heacock, Laura; Moy, Linda; Schnabel, Freya; Pak, Linda M; Shen, Yiqiu; Geras, Krzysztof J
Pathology reports are considered the gold standard in medical research due to their comprehensive and accurate diagnostic information. Natural language processing (NLP) techniques have been developed to automate information extraction from pathology reports. However, existing studies suffer from two significant limitations. First, they typically frame their tasks as report classification, which restricts the granularity of extracted information. Second, they often fail to generalize to unseen reports due to variations in language, negation, and human error. To overcome these challenges, we propose a BERT (bidirectional encoder representations from transformers) named entity recognition (NER) system to extract key diagnostic elements from pathology reports. We also introduce four data augmentation methods to improve the robustness of our model. Trained and evaluated on 1438 annotated breast pathology reports, acquired from a large medical center in the United States, our BERT model trained with data augmentation achieves an entity F1-score of 0.916 on an internal test set, surpassing the BERT baseline (0.843). We further assessed the model's generalizability using an external validation dataset from the United Arab Emirates, where our model maintained satisfactory performance (F1-score 0.860). Our findings demonstrate that our NER systems can effectively extract fine-grained information from widely diverse medical reports, offering the potential for large-scale information extraction in a wide range of medical and AI research. We publish our code at
PMID: 37461545
CID: 5588752

Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians

Dvijotham, Krishnamurthy Dj; Winkens, Jim; Barsbey, Melih; Ghaisas, Sumedh; Stanforth, Robert; Pawlowski, Nick; Strachan, Patricia; Ahmed, Zahra; Azizi, Shekoofeh; Bachrach, Yoram; Culp, Laura; Daswani, Mayank; Freyberg, Jan; Kelly, Christopher; Kiraly, Atilla; Kohlberger, Timo; McKinney, Scott; Mustafa, Basil; Natarajan, Vivek; Geras, Krzysztof; Witowski, Jan; Qin, Zhi Zhen; Creswell, Jacob; Shetty, Shravya; Sieniek, Marcin; Spitz, Terry; Corrado, Greg; Kohli, Pushmeet; Cemgil, Taylan; Karthikesalingam, Alan
Predictive artificial intelligence (AI) systems based on deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings, but can make errors in cases accurately diagnosed by clinicians and vice versa. We developed Complementarity-Driven Deferral to Clinical Workflow (CoDoC), a system that can learn to decide between the opinion of a predictive AI model and a clinical workflow. CoDoC enhances accuracy relative to clinician-only or AI-only baselines in clinical workflows that screen for breast cancer or tuberculosis (TB). For breast cancer screening, compared to double reading with arbitration in a screening program in the UK, CoDoC reduced false positives by 25% at the same false-negative rate, while achieving a 66% reduction in clinician workload. For TB triaging, compared to standalone AI and clinical workflows, CoDoC achieved a 5-15% reduction in false positives at the same false-negative rate for three of five commercially available predictive AI systems. To facilitate the deployment of CoDoC in novel futuristic clinical settings, we present results showing that CoDoC's performance gains are sustained across several axes of variation (imaging modality, clinical setting and predictive AI system) and discuss the limitations of our evaluation and where further validation would be needed. We provide an open-source implementation to encourage further research and application.
PMID: 37460754
ISSN: 1546-170x
CID: 5535572

A Competition, Benchmark, Code, and Data for Using Artificial Intelligence to Detect Lesions in Digital Breast Tomosynthesis

Konz, Nicholas; Buda, Mateusz; Gu, Hanxue; Saha, Ashirbani; Yang, Jichen; Chledowski, Jakub; Park, Jungkyu; Witowski, Jan; Geras, Krzysztof J; Shoshan, Yoel; Gilboa-Solomon, Flora; Khapun, Daniel; Ratner, Vadim; Barkan, Ella; Ozery-Flato, Michal; Martí, Robert; Omigbodun, Akinyinka; Marasinou, Chrysostomos; Nakhaei, Noor; Hsu, William; Sahu, Pranjal; Hossain, Md Belayat; Lee, Juhun; Santos, Carlos; Przelaskowski, Artur; Kalpathy-Cramer, Jayashree; Bearce, Benjamin; Cha, Kenny; Farahani, Keyvan; Petrick, Nicholas; Hadjiiski, Lubomir; Drukker, Karen; Armato, Samuel G; Mazurowski, Maciej A
IMPORTANCE:An accurate and robust artificial intelligence (AI) algorithm for detecting cancer in digital breast tomosynthesis (DBT) could significantly improve detection accuracy and reduce health care costs worldwide. OBJECTIVES:To make training and evaluation data for the development of AI algorithms for DBT analysis available, to develop well-defined benchmarks, and to create publicly available code for existing methods. DESIGN, SETTING, AND PARTICIPANTS:This diagnostic study is based on a multi-institutional international grand challenge in which research teams developed algorithms to detect lesions in DBT. A data set of 22 032 reconstructed DBT volumes was made available to research teams. Phase 1, in which teams were provided 700 scans from the training set, 120 from the validation set, and 180 from the test set, took place from December 2020 to January 2021, and phase 2, in which teams were given the full data set, took place from May to July 2021. MAIN OUTCOMES AND MEASURES:The overall performance was evaluated by mean sensitivity for biopsied lesions using only DBT volumes with biopsied lesions; ties were broken by including all DBT volumes. RESULTS:A total of 8 teams participated in the challenge. The team with the highest mean sensitivity for biopsied lesions was the NYU B-Team, with 0.957 (95% CI, 0.924-0.984), and the second-place team, ZeDuS, had a mean sensitivity of 0.926 (95% CI, 0.881-0.964). When the results were aggregated, the mean sensitivity for all submitted algorithms was 0.879; for only those who participated in phase 2, it was 0.926. CONCLUSIONS AND RELEVANCE:In this diagnostic study, an international competition produced algorithms with high sensitivity for using AI to detect lesions on DBT images. A standardized performance benchmark for the detection task using publicly available clinical imaging data was released, with detailed descriptions and analyses of submitted algorithms accompanied by a public release of their predictions and code for selected methods. These resources will serve as a foundation for future research on computer-assisted diagnosis methods for DBT, significantly lowering the barrier of entry for new researchers.
PMID: 36821110
ISSN: 2574-3805
CID: 5448222

Improving breast cancer diagnostics with deep learning for MRI

Witowski, Jan; Heacock, Laura; Reig, Beatriu; Kang, Stella K; Lewin, Alana; Pysarenko, Kristine; Patel, Shalin; Samreen, Naziya; Rudnicki, Wojciech; Łuczyńska, Elżbieta; Popiela, Tadeusz; Moy, Linda; Geras, Krzysztof J
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has a high sensitivity in detecting breast cancer but often leads to unnecessary biopsies and patient workup. We used a deep learning (DL) system to improve the overall accuracy of breast cancer diagnosis and personalize management of patients undergoing DCE-MRI. On the internal test set (n = 3936 exams), our system achieved an area under the receiver operating characteristic curve (AUROC) of 0.92 (95% CI: 0.92 to 0.93). In a retrospective reader study, there was no statistically significant difference (P = 0.19) between five board-certified breast radiologists and the DL system (mean ΔAUROC, +0.04 in favor of the DL system). Radiologists' performance improved when their predictions were averaged with DL's predictions [mean ΔAUPRC (area under the precision-recall curve), +0.07]. We demonstrated the generalizability of the DL system using multiple datasets from Poland and the United States. An additional reader study on a Polish dataset showed that the DL system was as robust to distribution shift as radiologists. In subgroup analysis, we observed consistent results across different cancer subtypes and patient demographics. Using decision curve analysis, we showed that the DL system can reduce unnecessary biopsies in the range of clinically relevant risk thresholds. This would lead to avoiding biopsies yielding benign results in up to 20% of all patients with BI-RADS category 4 lesions. Last, we performed an error analysis, investigating situations where DL predictions were mostly incorrect. This exploratory work creates a foundation for deployment and prospective analysis of DL-based models for breast MRI.
PMID: 36170446
ISSN: 1946-6242
CID: 5334352

Estimation of the capillary level input function for dynamic contrast-enhanced MRI of the breast using a deep learning approach

Bae, Jonghyun; Huang, Zhengnan; Knoll, Florian; Geras, Krzysztof; Pandit Sood, Terlika; Feng, Li; Heacock, Laura; Moy, Linda; Kim, Sungheon Gene
PURPOSE/OBJECTIVE:To develop a deep learning approach to estimate the local capillary-level input function (CIF) for pharmacokinetic model analysis of DCE-MRI. METHODS:A deep convolutional network was trained with numerically simulated data to estimate the CIF. The trained network was tested using simulated lesion data and used to estimate voxel-wise CIF for pharmacokinetic model analysis of breast DCE-MRI data using an abbreviated protocol from women with malignant (n = 25) and benign (n = 28) lesions. The estimated parameters were used to build a logistic regression model to detect the malignancy. RESULT/RESULTS:The pharmacokinetic parameters estimated using the network-predicted CIF from our breast DCE data showed significant differences between the malignant and benign groups for all parameters. Testing the diagnostic performance with the estimated parameters, the conventional approach with arterial input function (AIF) showed an area under the curve (AUC) between 0.76 and 0.87, and the proposed approach with CIF demonstrated similar performance with an AUC between 0.79 and 0.81. CONCLUSION/CONCLUSIONS:This study shows the feasibility of estimating voxel-wise CIF using a deep neural network. The proposed approach could eliminate the need to measure AIF manually without compromising the diagnostic performance to detect the malignancy in the clinical setting.
PMID: 35001423
ISSN: 1522-2594
CID: 5118282

Differences between human and machine perception in medical diagnosis

Makino, Taro; Jastrzębski, Stanisław; Oleszkiewicz, Witold; Chacko, Celin; Ehrenpreis, Robin; Samreen, Naziya; Chhor, Chloe; Kim, Eric; Lee, Jiyon; Pysarenko, Kristine; Reig, Beatriu; Toth, Hildegard; Awal, Divya; Du, Linda; Kim, Alice; Park, James; Sodickson, Daniel K; Heacock, Laura; Moy, Linda; Cho, Kyunghyun; Geras, Krzysztof J
Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since they can fail for reasons unrelated to underlying pathology. Humans are less likely to make such superficial mistakes, since they use features that are grounded on medical science. It is therefore important to know whether DNNs use different features than humans. Towards this end, we propose a framework for comparing human and machine perception in medical diagnosis. We frame the comparison in terms of perturbation robustness, and mitigate Simpson's paradox by performing a subgroup analysis. The framework is demonstrated with a case study in breast cancer screening, where we separately analyze microcalcifications and soft tissue lesions. While it is inconclusive whether humans and DNNs use different features to detect microcalcifications, we find that for soft tissue lesions, DNNs rely on high frequency components ignored by radiologists. Moreover, these features are located outside of the region of the images found most suspicious by radiologists. This difference between humans and machines was only visible through subgroup analysis, which highlights the importance of incorporating medical domain knowledge into the comparison.
PMID: 35477730
ISSN: 2045-2322
CID: 5205672

MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images

Chapter by: Hayat, Nasir; Geras, Krzysztof J.; Shamout, Farah E.
in: Proceedings of Machine Learning Research by
[S.l.] : ML Research Press, 2022
pp. 479-503
CID: 5550582

Generative multitask learning mitigates target-causing confounding

Chapter by: Makino, Taro; Geras, Krzysztof J.; Cho, Kyunghyun
in: Advances in Neural Information Processing Systems by
[S.l.] : Neural information processing systems foundation, 2022
pp. ?-?
ISBN: 9781713871088
CID: 5550692

Reducing False-Positive Biopsies using Deep Neural Networks that Utilize both Local and Global Image Context of Screening Mammograms

Wu, Nan; Huang, Zhe; Shen, Yiqiu; Park, Jungkyu; Phang, Jason; Makino, Taro; Gene Kim, S; Cho, Kyunghyun; Heacock, Laura; Moy, Linda; Geras, Krzysztof J
Breast cancer is the most common cancer in women, and hundreds of thousands of unnecessary biopsies are done around the world at a tremendous cost. It is crucial to reduce the rate of biopsies that turn out to be benign tissue. In this study, we build deep neural networks (DNNs) to classify biopsied lesions as being either malignant or benign, with the goal of using these networks as second readers serving radiologists to further reduce the number of false-positive findings. We enhance the performance of DNNs that are trained to learn from small image patches by integrating global context provided in the form of saliency maps learned from the entire image into their reasoning, similar to how radiologists consider global context when evaluating areas of interest. Our experiments are conducted on a dataset of 229,426 screening mammography examinations from 141,473 patients. We achieve an AUC of 0.8 on a test set consisting of 464 benign and 136 malignant lesions.
PMID: 34731338
ISSN: 1618-727x
CID: 5038152