NYUHSL Faculty Bibliography

Searched for:

in-biosketch:true

person:oermae01

Total Results:

149

Nature medicine. 2025:31(2):618-626.DOI: 10.1038/s41591-024-03445-1

Medical large language models are vulnerable to data-poisoning attacks

Alber, Daniel Alexander; Yang, Zihao; Alyakin, Anton; Yang, Eunice; Rai, Sumedha; Valliani, Aly A; Zhang, Jeff; Rosenbaum, Gabriel R; Amend-Thomas, Ashley K; Kurland, David B; Kremer, Caroline M; Eremiev, Alexander; Negash, Bruck; Wiggan, Daniel D; Nakatsuka, Michelle A; Sangwon, Karl L; Neifert, Sean N; Khan, Hammad A; Save, Akshay Vinod; Palla, Adhith; Grin, Eric A; Hedman, Monika; Nasir-Moin, Mustafa; Liu, Xujin Chris; Jiang, Lavender Yao; Mankowski, Michal A; Segev, Dorry L; Aphinyanaphongs, Yindalon; Riina, Howard A; Golfinos, John G; Orringer, Daniel A; Kondziolka, Douglas; Oermann, Eric Karl

The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content (F1 = 85.7%). Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs. In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety.

PMID: 39779928

ISSN: 1546-170x

CID: 5782182

Neurosurgery. 2025:97(2):387-398.DOI: 10.1227/neu.0000000000003354

Augmenting Large Language Models With Automated, Bibliometrics-Powered Literature Search for Knowledge Distillation: A Pilot Study for Common Spinal Pathologies

Kurland, David B; Alber, Daniel A; Palla, Adhith; de Souza, Daniel N; Lau, Darryl; Laufer, Ilya; Frempong-Boadu, Anthony K; Kondziolka, Douglas; Oermann, Eric K

BACKGROUND AND OBJECTIVES/OBJECTIVE:Scholarly output is accelerating in medical domains, making it challenging to keep up with the latest neurosurgical literature. The emergence of large language models (LLMs) has facilitated rapid, high-quality text summarization. However, LLMs cannot autonomously conduct literature reviews and are prone to hallucinating source material. We devised a novel strategy that combines Reference Publication Year Spectroscopy-a bibliometric technique for identifying foundational articles within a corpus-with LLMs to automatically summarize and cite salient details from articles. We demonstrate our approach for four common spinal conditions in a proof of concept. METHODS:Reference Publication Year Spectroscopy identified seminal articles from the corpora of literature for cervical myelopathy, lumbar radiculopathy, lumbar stenosis, and adjacent segment disease. The article text was split into 1024-token chunks. Queries from three knowledge domains (surgical management, pathophysiology, and natural history) were constructed. The most relevant article chunks for each query were retrieved from a vector database using chain-of-thought prompting. LLMs automatically summarized the literature into a comprehensive narrative with fully referenced facts and statistics. Information was verified through manual review, and spine surgery faculty were surveyed for qualitative feedback. RESULTS:Our tandem approach cost less than $1 for each condition and ran within 5 minutes. Generative Pre-trained Transformer-4 was the best-performing model, with a near-perfect 97.5% citation accuracy. Surveys of spine faculty helped refine the prompting scheme to improve the cohesion and accessibility summaries. The final artificial intelligence-generated text provided high-fidelity summaries of each pathology's most clinically relevant information. CONCLUSION/CONCLUSIONS:We demonstrate the rapid, automated summarization of seminal articles for four common spinal pathologies, with a generalizable workflow implemented using consumer-grade hardware. Our tandem strategy fuses bibliometrics and artificial intelligence to bridge the gap toward fully automated knowledge distillation, obviating the need for manual literature review and article selection.

PMID: 40662770

ISSN: 1524-4040

CID: 5897082

World neurosurgery. 2024:192:246-247.DOI: 10.1016/j.wneu.2024.09.094

Predicting STA-MCA Anastomosis Success: Insights from FLOW 800 Hemodynamics [Letter]

Sangwon, Karl L; Oermann, Eric K; Nossek, Erez

PMID: 39307270

ISSN: 1878-8769

CID: 5766452

Journal of medical Internet research. 2024:26.DOI: 10.2196/64226

Economics and Equity of Large Language Models: Health Care Perspective

Nagarajan, Radha; Kondo, Midori; Salas, Franz; Sezgin, Emre; Yao, Yuan; Klotzman, Vanessa; Godambe, Sandip A; Khan, Naqi; Limon, Alfonso; Stephenson, Graham; Taraman, Sharief; Walton, Nephi; Ehwerhemuepha, Louis; Pandit, Jay; Pandita, Deepti; Weiss, Michael; Golden, Charles; Gold, Adam; Henderson, John; Shippy, Angela; Celi, Leo Anthony; Hogan, William R; Oermann, Eric K; Sanger, Terence; Martel, Steven

Large language models (LLMs) continue to exhibit noteworthy capabilities across a spectrum of areas, including emerging proficiencies across the health care continuum. Successful LLM implementation and adoption depend on digital readiness, modern infrastructure, a trained workforce, privacy, and an ethical regulatory landscape. These factors can vary significantly across health care ecosystems, dictating the choice of a particular LLM implementation pathway. This perspective discusses 3 LLM implementation pathways-training from scratch pathway (TSP), fine-tuned pathway (FTP), and out-of-the-box pathway (OBP)-as potential onboarding points for health systems while facilitating equitable adoption. The choice of a particular pathway is governed by needs as well as affordability. Therefore, the risks, benefits, and economics of these pathways across 4 major cloud service providers (Amazon, Microsoft, Google, and Oracle) are presented. While cost comparisons, such as on-demand and spot pricing across the cloud service providers for the 3 pathways, are presented for completeness, the usefulness of managed services and cloud enterprise tools is elucidated. Managed services can complement the traditional workforce and expertise, while enterprise tools, such as federated learning, can overcome sample size challenges when implementing LLMs using health care data. Of the 3 pathways, TSP is expected to be the most resource-intensive regarding infrastructure and workforce while providing maximum customization, enhanced transparency, and performance. Because TSP trains the LLM using enterprise health care data, it is expected to harness the digital signatures of the population served by the health care system with the potential to impact outcomes. The use of pretrained models in FTP is a limitation. It may impact its performance because the training data used in the pretrained model may have hidden bias and may not necessarily be health care-related. However, FTP provides a balance between customization, cost, and performance. While OBP can be rapidly deployed, it provides minimal customization and transparency without guaranteeing long-term availability. OBP may also present challenges in interfacing seamlessly with downstream applications in health care settings with variations in pricing and use over time. Lack of customization in OBP can significantly limit its ability to impact outcomes. Finally, potential applications of LLMs in health care, including conversational artificial intelligence, chatbots, summarization, and machine translation, are highlighted. While the 3 implementation pathways discussed in this perspective have the potential to facilitate equitable adoption and democratization of LLMs, transitions between them may be necessary as the needs of health systems evolve. Understanding the economics and trade-offs of these onboarding pathways can guide their strategic adoption and demonstrate value while impacting health care outcomes favorably.

PMID: 39541580

ISSN: 1438-8871

CID: 5753562

Clinical transplantation. 2024:38(11).DOI: 10.1111/ctr.70018

Hospitalization and Hospitalized Delirium Are Associated With Decreased Access to Kidney Transplantation and Increased Risk of Waitlist Mortality

Long, Jane J; Hong, Jingyao; Liu, Yi; Nalatwad, Akanksha; Li, Yiting; Ghildayal, Nidhi; Johnston, Emily A; Schwartzberg, Jordan; Ali, Nicole; Oermann, Eric; Mankowski, Michal; Gelb, Bruce E; Chanan, Emily L; Chodosh, Joshua L; Mathur, Aarti; Segev, Dorry L; McAdams-DeMarco, Mara A

BACKGROUND:Kidney transplant (KT) candidates often experience hospitalizations, increasing their delirium risk. Hospitalizations and delirium are associated with worse post-KT outcomes, yet their relationship with pre-KT outcomes is less clear. Pre-KT delirium may worsen access to KT due to its negative impact on cognition and ability to maintain overall health. METHODS:Using a prospective cohort of 2374 KT candidates evaluated at a single center (2009-2020), we abstracted hospitalizations and associated delirium records after listing via chart review. We evaluated associations between waitlist mortality and likelihood of KT with hospitalizations and hospitalized delirium using competing risk models and tested whether associations differed by gerontologic factors. RESULTS: < 0.001), with those aged ≥65 having a 61% lower likelihood of KT. CONCLUSION/CONCLUSIONS:Hospitalization and delirium are associated with worse pre-KT outcomes and have serious implications on candidates' access to KT. Providers should work to reduce preventable instances of delirium.

PMID: 39498973

ISSN: 1399-0012

CID: 5766752

Clinical transplantation. 2024:38(10).DOI: 10.1111/ctr.15466

ChatGPT Solving Complex Kidney Transplant Cases: A Comparative Study With Human Respondents

Mankowski, Michal A; Jaffe, Ian S; Xu, Jingzhi; Bae, Sunjae; Oermann, Eric K; Aphinyanaphongs, Yindalon; McAdams-DeMarco, Mara A; Lonze, Bonnie E; Orandi, Babak J; Stewart, Darren; Levan, Macey; Massie, Allan; Gentry, Sommer; Segev, Dorry L

INTRODUCTION/BACKGROUND:ChatGPT has shown the ability to answer clinical questions in general medicine but may be constrained by the specialized nature of kidney transplantation. Thus, it is important to explore how ChatGPT can be used in kidney transplantation and how its knowledge compares to human respondents. METHODS:We prompted ChatGPT versions 3.5, 4, and 4 Visual (4 V) with 12 multiple-choice questions related to six kidney transplant cases from 2013 to 2015 American Society of Nephrology (ASN) fellowship program quizzes. We compared the performance of ChatGPT with US nephrology fellowship program directors, nephrology fellows, and the audience of the ASN's annual Kidney Week meeting. RESULTS:Overall, ChatGPT 4 V correctly answered 10 out of 12 questions, showing a performance level comparable to nephrology fellows (group majority correctly answered 9 of 12 questions) and training program directors (11 of 12). This surpassed ChatGPT 4 (7 of 12 correct) and 3.5 (5 of 12). All three ChatGPT versions failed to correctly answer questions where the consensus among human respondents was low. CONCLUSION/CONCLUSIONS:Each iterative version of ChatGPT performed better than the prior version, with version 4 V achieving performance on par with nephrology fellows and training program directors. While it shows promise in understanding and answering kidney transplantation questions, ChatGPT should be seen as a complementary tool to human expertise rather than a replacement.

PMCID:11441623

PMID: 39329220

ISSN: 1399-0012

CID: 5714092

Nature communications. 2024:15(1).DOI: 10.1038/s41467-024-52414-2

Longitudinal deep neural networks for assessing metastatic brain cancer on a large open benchmark

Link, Katherine E; Schnurman, Zane; Liu, Chris; Kwon, Young Joon Fred; Jiang, Lavender Yao; Nasir-Moin, Mustafa; Neifert, Sean; Alzate, Juan Diego; Bernstein, Kenneth; Qu, Tanxia; Chen, Viola; Yang, Eunice; Golfinos, John G; Orringer, Daniel; Kondziolka, Douglas; Oermann, Eric Karl

The detection and tracking of metastatic cancer over the lifetime of a patient remains a major challenge in clinical trials and real-world care. Advances in deep learning combined with massive datasets may enable the development of tools that can address this challenge. We present NYUMets-Brain, the world's largest, longitudinal, real-world dataset of cancer consisting of the imaging, clinical follow-up, and medical management of 1,429 patients. Using this dataset we developed Segmentation-Through-Time, a deep neural network which explicitly utilizes the longitudinal structure of the data and obtained state-of-the-art results at small (<10 mm³) metastases detection and segmentation. We also demonstrate that the monthly rate of change of brain metastases over time are strongly predictive of overall survival (HR 1.27, 95%CI 1.18-1.38). We are releasing the dataset, codebase, and model weights for other cancer researchers to build upon these results and to serve as a public benchmark.

PMCID:11408643

PMID: 39289405

ISSN: 2041-1723

CID: 5720652

Neurosurgery practice. 2024:5(3).DOI: 10.1227/neuprac.0000000000000092

The Evolution of Pediatric Spine Surgery: A Bibliometric Analysis of Publications From 1902 to 2023

Mir, Jamshaid M; Kurland, David B; Cheung, Alexander T M; Liu, Albert; Shlobin, Nathan A; Alber, Daniel; Rai, Sumedha; Jain, Vasvi; Rodriguez-Olaverri, Juan C; Anderson, Richard C E; Lau, Darryl; Kondziolka, Douglas; Oermann, Eric K

BACKGROUND AND OBJECTIVES/OBJECTIVE:Pediatric spine surgery has evolved considerably over the past century. No previous study conducted a bibliometric analysis of the corpus of pediatric spine surgery. We used big data and advanced bibliometric analyses to evaluate trends in the progression of pediatric spine surgery as a distinct field since the beginning of the 20th century. METHODS:A Web of Science query was designed to capture the representative corpus of pediatric spine literature. Statistical and bibliometric analyses were performed using various Python packages and the Bibliometrix R package. RESULTS:The collection, published from 1902 to 2023, comprised a total of 11 861 articles from 61 journals and 32 715 unique authors. The overall growth rate annually for publications was 5.08%. An upsurge in publications was seen in the 1980s, after the advent of specialty and subspecialty journals. Illustratively, over 90% of all articles pertaining to pediatric spine surgery were published in the past 3 decades. International and domestic collaboration also increased exponentially over this time period. Reference publication year spectroscopy allowed us to identify 75 articles that comprise the historical roots of modern pediatric spine surgery. There was a recent lexical evolution of topics and terms toward alignment, outcomes, and patient-centric terms. Coauthorship among under-represented groups increased since 1990, but remains low, with disparities persisting across journals. CONCLUSION/CONCLUSIONS:This comprehensive bibliometric analysis on the corpus of pediatric spine surgery offers insight into the evolving landscape of research, authorship, and publication trends over the past century. Advancements in the understanding of the natural history and technology have led the field to become increasingly outcomes focused, all of which have been fueled by pioneering authors. While diversity among authors improves, under-representation of various groups continues to persist, indicating a critical role for further outreach and promotion.

PMCID:11783662

PMID: 39959902

ISSN: 2834-4383

CID: 5866242

Journal of neurosurgery. 2024:1-10.DOI: 10.3171/2024.4.JNS24713

Assessing superficial temporal artery-middle cerebral artery anastomosis patency using FLOW 800 hemodynamics

Sangwon, Karl L; Nguyen, Matthew; Wiggan, Daniel D; Negash, Bruck; Alber, Daniel A; Liu, Xujin Chris; Liu, Albert; Rabbin-Birnbaum, Corinne; Sharashidze, Vera; Baranoski, Jacob; Raz, Eytan; Shapiro, Maksim; Rutledge, Caleb; Nelson, Peter Kim; Riina, Howard; Russin, Jonathan; Oermann, Eric K; Nossek, Erez

OBJECTIVE:The objective of this study was to investigate the use of indocyanine green videoangiography with FLOW 800 hemodynamic parameters intraoperatively during superficial temporal artery-middle cerebral artery (STA-MCA) bypass surgery to predict patency prior to anastomosis performance. METHODS:A retrospective and exploratory data analysis was conducted using FLOW 800 software prior to anastomosis to assess four regions of interest (ROIs; proximal and distal recipients and adjacent and remote gyri) for four hemodynamic parameters (speed, delay, rise time, and time to peak). Medical records were used to classify patients into flow and no-flow groups based on immediate or perioperative anastomosis patency. Hemodynamic parameters were compared using univariate and multivariate analyses. Principal component analysis was used to identify high risk of no flow (HRnf) and low risk of no flow (LRnf) groups, correlated with prospective angiographic follow-ups. Machine learning models were fitted to predict patency using FLOW 800 features, and the a posteriori effect of complication risk of those features was computed. RESULTS:A total of 39 cases underwent STA-MCA bypass surgery with complete FLOW 800 data collection. Thirty-five cases demonstrated flow after anastomosis revascularization and were compared with 4 cases with no flow after revascularization. Proximal and distal recipient speeds were significantly different between the no-flow and flow groups (proximal: 238.3 ± 120.8 and 138.5 ± 93.6, respectively [p < 0.001]; distal: 241.0 ± 117.0 and 142.1 ± 103.8, respectively [p < 0.05]). Based on principal component analysis, the HRnf group (n = 10) was characterized by high-flow speed (> 75th percentile) in all ROIs, whereas the LRnf group (n = 10) had contrasting patterns. In prospective long-term follow-up, 6 of 9 cases in the HRnf group, including the original no-flow cases, had no or low flow, whereas 8 of 8 cases in the LRnf group maintained robust flow. Machine learning models predicted patency failure with a mean F1 score of 0.930 and consistently relied on proximal recipient speed as the most important feature. Computation of posterior likelihood showed a 95.29% chance of patients having long-term patency given a lower proximal speed. CONCLUSIONS:These results suggest that a high proximal speed measured in the recipient vessel prior to anastomosis can elevate the risk of perioperative no flow and long-term reduction of flow. With an increased dataset size, continued FLOW 800-based ROI metric analysis could be used to guide intraoperative anastomosis site selection prior to anastomosis and predict patency outcome.

PMID: 39151199

ISSN: 1933-0693

CID: 5727032

Patterns. 2024:5(8).DOI: 10.1016/j.patter.2024.101028

Concepts and applications of digital twins in healthcare and medicine

Zhang, Kang; Zhou, Hong-Yu; Baptista-Hon, Daniel T; Gao, Yuanxu; Liu, Xiaohong; Oermann, Eric; Xu, Sheng; Jin, Shengwei; Zhang, Jian; Sun, Zhuo; Yin, Yun; Razmi, Ronald M; Loupy, Alexandre; Beck, Stephan; Qu, Jia; Wu, Joseph; ,

The digital twin (DT) is a concept widely used in industry to create digital replicas of physical objects or systems. The dynamic, bi-directional link between the physical entity and its digital counterpart enables a real-time update of the digital entity. It can predict perturbations related to the physical object's function. The obvious applications of DTs in healthcare and medicine are extremely attractive prospects that have the potential to revolutionize patient diagnosis and treatment. However, challenges including technical obstacles, biological heterogeneity, and ethical considerations make it difficult to achieve the desired goal. Advances in multi-modal deep learning methods, embodied AI agents, and the metaverse may mitigate some difficulties. Here, we discuss the basic concepts underlying DTs, the requirements for implementing DTs in medicine, and their current and potential healthcare uses. We also provide our perspective on five hallmarks for a healthcare DT system to advance research in this field.

PMCID:11368703

PMID: 39233690

ISSN: 2666-3899

CID: 5688062