Try a new search

Format these results:

Searched for:

in-biosketch:true

person:oermae01

Total Results:

149


Neural and computational mechanisms underlying one-shot perceptual learning in humans

Hachisuka, Ayaka; Shor, Jonathan D; Liu, Xujin Chris; Friedman, Daniel; Dugan, Patricia; Saez, Ignacio; Panov, Fedor E; Wang, Yao; Doyle, Werner; Devinsky, Orrin; Oermann, Eric K; He, Biyu J
The ability to quickly learn and generalize is one of the brain's most impressive feats and recreating it remains a major challenge for modern artificial intelligence research. One of the most mysterious one-shot learning abilities displayed by humans is one-shot perceptual learning, whereby a single viewing experience drastically alters visual perception in a long-lasting manner. Where in the brain one-shot perceptual learning occurs and what mechanisms support it remain enigmatic. Combining psychophysics, 7 T fMRI, and intracranial recordings, we identify the high-level visual cortex as the most likely neural substrate wherein neural plasticity supports one-shot perceptual learning. We further develop a deep neural network model incorporating top-down feedback into a vision transformer, which recapitulates and predicts human behavior. The prior knowledge learnt by this model is highly similar to the neural code in the human high-level visual cortex. These results reveal the neurocomputational mechanisms underlying one-shot perceptual learning in humans.
PMCID:12873369
PMID: 41639076
ISSN: 2041-1723
CID: 6000282

In Reply: Augmenting Large Language Models With Automated, Bibliometrics-Powered Literature Search for Knowledge Distillation: A Pilot Study for Common Spinal Pathologies

Kurland, David B; Alber, Daniel A; Oermann, Eric K
PMID: 41537755
ISSN: 1524-4040
CID: 5986532

Intraoperative Evaluation of Dural Arteriovenous Fistula Obliteration Using FLOW 800 Hemodynamic Analysis

Sangwon, Karl L; Grin, Eric A; Negash, Bruck; Wiggan, Daniel D; Lapierre, Cathryn; Raz, Eytan; Shapiro, Maksim; Laufer, Ilya; Sharashidze, Vera; Rutledge, Caleb; Riina, Howard A; Oermann, Eric K; Nossek, Erez
BACKGROUND AND OBJECTIVES/OBJECTIVE:Dural arteriovenous fistula (dAVF) surgery is a microsurgical procedure that requires confirmation of obliteration using formal cerebral angiography, but the lack of intraoperative angiogram or need for postoperative angiogram in some settings necessitates a search for alternative, less invasive methods to verify surgical success. This study evaluates the use of indocyanine green videoangiography FLOW 800 hemodynamic intraoperatively during cranial and spinal dAVF obliteration to confirm obliteration and predict surgical success. METHODS:A retrospective analysis was conducted using indocyanine green videoangiography FLOW 800 to intraoperatively measure 4 hemodynamic parameters-Delay Time, Speed, Time to Peak, and Rise Time-across venous drainage regions of interest pre/post-dAVF obliteration. Univariate and multivariate statistical analyses to evaluate and visualize presurgical vs postsurgical state hemodynamic changes included nonparametric statistical tests, logistic regression, and Bayesian analysis. RESULTS:A total of 14 venous drainage regions of interest from 8 patients who had successful spinal or cranial dAVF obliteration confirmed with intraoperative digital subtraction angiography were extracted. Significant hemodynamic changes were observed after dAVF obliteration, with median Speed decreasing from 13.5 to 5.5 s-1 (P = .029) and Delay Time increasing from 2.07 to 7.86 s (P = .020). Bayesian logistic regression identified Delay Time as the strongest predictor of postsurgical state, with a 50% increase associated with 2.16 times higher odds of achieving obliteration (odds ratio = 4.59, 95% highest density interval: 1.07-19.95). Speed exhibited a trend toward a negative association with postsurgical state (odds ratio = 0.62, 95% highest density interval: 0.26-1.42). Receiver operating characteristic-area under the curve analysis using logistic regression demonstrated a score of 0.760, highlighting Delay Time and Speed as key features distinguishing preobliteration and postobliteration states. CONCLUSION/CONCLUSIONS:Our findings demonstrate that intraoperative FLOW 800 analysis reliably quantifies and visualizes immediate hemodynamic changes consistent with dAVF obliteration. Speed and Delay Time emerged as key indicators of surgical success, highlighting the potential of FLOW 800 as a noninvasive adjunct to traditional imaging techniques for confirming dAVF obliteration intraoperatively.
PMID: 40434390
ISSN: 2332-4260
CID: 5855352

Enhancing the prediction of hospital discharge disposition with extraction-based language model classification

Small, William R; Crowley, Ryan J; Pariente, Chloe; Zhang, Jeff; Eaton, Kevin P; Jiang, Lavender Yao; Oermann, Eric; Aphinyanaphongs, Yindalon
Early identification of inpatient discharges to skilled nursing facilities (SNFs) facilitates care transition planning. Predictive information in admission history and physical notes (H&Ps) is dispersed across long documents. Language models adeptly predict clinical outcomes from text but have limitations: token length constraints, noisy inputs, and opaque outputs. Therefore, we developed extraction-based language model classification (ELC): generative language models distill H&Ps into task-relevant categories ("Structured Extracted Data") before summarizing them into a concise narrative ("AI Risk Snapshot"). We hypothesized that language models utilizing AI Risk Snapshots to predict SNF discharges would perform the best. In this retrospective observational study, nine language models predicted SNF discharges from unstructured predictors (raw H&P text, truncated assessment and plan) and ELC-derived predictors (Structured Extracted Data, AI Risk Snapshots). ELC substantially reduced input length (AI Risk Snapshot median 141 tokens vs raw H&P median 2,120 tokens) and improved average AUROC and AUPRC across models. The best performance was achieved by Bio+Clinical BERT fine-tuned on AI Risk Snapshots (AUROC = .851). AI Risk Snapshots enhanced interpretability by aligning with nurse case managers' risk assessments and facilitating prompt design. Structuring and summarizing H&Ps via ELC thus mitigates the practical limitations of language models and improves SNF discharge prediction.
PMCID:12789015
PMID: 41522677
ISSN: 3005-1959
CID: 5985892

Large-scale multi-omic biosequence transformers for modeling protein-nucleic acid interactions

Chen, Sully F; Steele, Robert J; Hocky, Glen M; Lemeneh, Beakal; Lad, Shivanand P; Oermann, Eric K
The transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. To date, most biosequence transformers have been trained on single-omic data-either proteins or nucleic acids-and have seen incredible success in downstream tasks in each domain, with particularly noteworthy breakthroughs in protein structural modeling. However, single-omic pretraining limits the ability of these models to capture cross-modal interactions. Here we present OmniBioTE, the largest open-source multi-omic model trained on over 250 billion tokens of mixed protein and nucleic acid data. We show that despite only being trained on unlabeled sequence data, OmniBioTE learns joint representations mapping genes to their corresponding protein sequences. We further demonstrate that OmniBioTE achieves state-of-the-art results predicting the change in Gibbs free energy ([Formula: see text]) of the binding interaction between a given nucleic acid and protein. Remarkably, we show that multi-omic biosequence transformers emergently learn useful structural information without any a priori structural training, allowing us to predict which protein residues are most involved in the protein-nucleic acid binding interaction. Compared to single-omic controls trained with identical compute, OmniBioTE also demonstrates superior performance-per-FLOP across both multi-omic and single-omic benchmarks. Together, these results highlight the power of a unified modeling approach for biological sequences and establish OmniBioTE as a foundation model for multi-omic discovery.
PMID: 41628239
ISSN: 1932-6203
CID: 5999602

Evaluating the Performance and Fragility of Large Language Models on the Self-Assessment for Neurological Surgeons

Vishwanath, Krithik; Alyakin, Anton; Ghosh, Mrigayu; Lee, Jin Vivian; Alber, Daniel Alexander; Sangwon, Karl L; Kondziolka, Douglas; Oermann, Eric Karl
BACKGROUND AND OBJECTIVES/OBJECTIVE:The Congress of Neurological Surgeons Self-Assessment for Neurological Surgeons questions are widely used by neurosurgical residents to prepare for written board examinations. Recently, these questions have also served as benchmarks for evaluating large language models' (LLMs) neurosurgical knowledge. LLMs show significant promise for transforming neurosurgical practice; however, they are susceptible to in-text distractions and confounding factors. Given the increasing use of generative artificial intelligence and ambient dictation technologies, clinical text is at a larger risk for the inclusion of extraneous details. The aim of this study was to assess the performance of state-of-the-art LLMs on neurosurgery board-like questions and to evaluate their robustness to the inclusion of distractor statements. METHODS:A comprehensive evaluation was conducted using 28 state-of-the-art LLMs. These models were tested on 2904 neurosurgery board examination questions derived from the Congress of Neurological Surgeons Self-Assessment for Neurological Surgeons. In addition, the study introduced a distraction framework to assess the fragility of these models. The framework incorporated simple, irrelevant distractor statements containing polysemous words with clinical meanings used in nonclinical contexts to determine the extent to which such distractions degrade model performance on standard medical benchmarks. RESULTS:Six of the 28 tested LLMs achieved board-passing outcomes, with the top-performing models scoring over 15.7% above the passing threshold. When exposed to distractions, accuracy across various model architectures was significantly reduced-by as much as 20.4%-with 1 model failing that had previously passed. Both general-purpose and medical open-source models experienced greater performance declines compared with proprietary variants when subjected to the added distractors. CONCLUSION/CONCLUSIONS:While current LLMs demonstrate an impressive ability to answer neurosurgery board-like examination questions, their performance is markedly vulnerable to extraneous, distracting information. These findings underscore the critical need for developing novel mitigation strategies aimed at bolstering LLM resilience against in-text distractions, particularly for safe and effective clinical deployment.
PMID: 41358748
ISSN: 1524-4040
CID: 5977102

A full life cycle biological clock based on routine clinical data and its impact in health and diseases

Wang, Kai; Liu, Fei; Wu, Wei; Hu, Changxi; Shen, Xian; Wang, Meihao; Li, Gen; Zeng, Fanxin; Liu, Li; Wong, Io Nam; Liu, Sian; Zou, Zixing; Li, Bingzhou; Li, Jinghang; Huang, Xiaoying; Jin, Shengwei; Li, Zhuomin; Xu, Hui; Chen, Gang; Chen, Xiaodong; Zhu, Ying; Li, Ping; Feng, Zhe; Wang, Winston; Cheng, Linling; Yang, Mingqi; Hou, Qiang; Lu, Wenyang; Sun, Yiwen; Li, Kun; Zhong, Tian; Sun, Zhuo; Yin, Yun; Loupy, Alexandre; Oermann, Eric; Chen, Xiangmei; Zhang, Kang; ,
Aging research has primarily focused on adult aging clocks, leaving a critical gap in understanding a biological clock across the full life cycle, particularly during infancy and childhood. Here we introduce LifeClock, a biological clock model that predicts biological age across all life stages using routine electronic health records and laboratory test data. To enhance individualized predictions, we integrated virtual patient representations from 24,633,025 heterogeneous longitudinal clinical visits across 9,680,764 individuals and projected them into a latent space. Our approach leverages EHRFormer, a time-series transformer-based model, to analyze developmental and aging dynamics with high precision and develop accurate biological age clocks spanning infancy to old age. Our findings reveal distinct biological clock patterns across different life stages. The pediatric clock is strongly associated with children's development and accurately predicts current and future risks of major pediatric diseases, including malnutrition, growth and developmental abnormalities. The adult clock is strongly associated with aging and accurately predicts current and future risks of major age-related diseases, such as diabetes, renal failure, stroke and cardiovascular diseases. This work therefore distinguishes pediatric development from adult aging, establishing a novel framework to advance precision health by leveraging routine clinical data across the entire lifespan.
PMID: 41145791
ISSN: 1546-170x
CID: 5961022

Neuro Data Hub: A New Approach for Streamlining Medical Clinical Research

Han, Xu; Alyakin, Anton; Ciprut, Shannon; Lapierre, Cathryn; Stryker, Jaden; Golfinos, John; Kondziolka, Douglas; Oermann, Eric Karl
BACKGROUND AND OBJECTIVES/OBJECTIVE:Neurosurgical clinical research depends on medical data collection and evaluation that is often laborious, time consuming, and inefficient. The goal of this work was to implement and evaluate a novel departmental data infrastructure (Neuro Data Hub) designed to provide specialized data services for neurosurgical research. Data acquisition would become available purely by request. METHODS:through collaboration between Department Leadership and Medical Center Information Technology, integrating it with Institutional Review Board workflows and an existing Epic electronic health record Datalake infrastructure. The system implementation included monthly departmental meetings and an asynchronous Research Electronic Data Capture-based request system. Data requests submitted between August 2023 and November 2024 were analyzed and categorized as basic, complex, or Natural Language Processing (NLP)-augmented, with optional visualization and database creation services. Request volumes, types, and execution times were assessed. RESULTS:The Hub processed 39 research data requests (2.6/month), comprising 3 basic, 22 complex, and 14 NLP-augmented requests. Two complex requests included visualization services, and one NLP request included database creation. Average request execution time was 36.5 days, with NLP-augmented requests showing increasing adoption over time. CONCLUSION/CONCLUSIONS:The Neuro Data Hub represents a paradigm shift from centralized to department-level data services, providing specialized support for neurosurgical research and democratizing access to institutional data. While effective, implementation may be limited by institutional information technology infrastructure requirements. This model could serve as a template for any form of medical-clinical research program seeking to improve data accessibility and research capabilities.
PMCID:12560744
PMID: 41163737
ISSN: 2834-4383
CID: 5961452

The pitfalls of multiple-choice questions in generative AI and medical education

Singh, Shrutika; Alyakin, Anton; Alber, Daniel Alexander; Stryker, Jaden; Tong, Ai Phuong S; Sangwon, Karl; Goff, Nicolas; De La Paz, Mathew; Hernandez-Rovira, Miguel; Park, Ki Yun; Leuthardt, Eric Claude; Oermann, Eric Karl
The performance of Large Language Models (LLMs) on multiple-choice question (MCQ) benchmarks is frequently cited as proof of their medical capabilities. We hypothesized that LLM performance on medical MCQs may in part be illusory and driven by factors beyond medical content knowledge and reasoning capabilities. To assess this, we created a novel benchmark of free-response questions with paired MCQs (FreeMedQA). Using this benchmark, we evaluated three state-of-the-art LLMs (GPT-4o, GPT-3.5, and LLama-3-70B-instruct) and found an average absolute deterioration of 39.43% in performance on free-response questions relative to multiple-choice (p = 1.3 * 10-5) which was greater than the human performance decline of 22.29%. To isolate the role of the MCQ format on performance, we performed a masking study, iteratively masking out parts of the question stem. At 100% masking, the average LLM multiple-choice performance was 6.70% greater than random chance (p = 0.002) with one LLM (GPT-4o) obtaining an accuracy of 37.34%. Notably, for all LLMs the free-response performance was near zero. Our results highlight the shortcomings in medical MCQ benchmarks for overestimating the capabilities of LLMs in medicine, and, broadly, the potential for improving both human and machine assessments using LLM-evaluated free-response questions.
PMCID:12658246
PMID: 41298584
ISSN: 2045-2322
CID: 5968502

Most Roads Lead to Cushing: Mapping Neurosurgical Training Lineages in the United States

Kurland, David B; Park, Minjun; Gajjar, Avi A; Liu, Albert; Kondziolka, Douglas; Golfinos, John G; Alleyne, Cargill H; Oermann, Eric K
OBJECTIVE:Mentorship and training relationships shape the careers and influence of neurosurgeons. Network analysis can reveal structural characteristics and key individuals who support network connectivity and drive the field's development. This endeavor analyzed the U.S.-based neurosurgical training network derived from NeurosurGen.com. METHODS:A network graph was constructed representing neurosurgical training relationships, including chairperson-trainee, program director-trainee, and coresident connections. Graph- and node-level metrics, with a focus on centrality measures, were calculated for a trainer-trainee subgraph. RESULTS:The network consisted of 8840 neurosurgeons represented as nodes, and 382,143 relationships represented as edges. It evolved from an early small-world structure to a hierarchical and decentralized structure dominated by local clusters. Demographic shifts over time reflected increasing diversity and inclusion, with greater representation of female, Hispanic, Asian, and Black trainees across 285 training programs. Nodes were preferentially connected via residency, and the connectivity among underrepresented populations improved in concert with increased representation. Harvey W. Cushing was the quintessential neurosurgeon-influencer in the United States, ranking highly across most centrality measures over time. CONCLUSIONS:The neurosurgical training network is sparse but interconnected, typical of large real-world professional networks. While many small groups of neurosurgeons are closely tied within their immediate training hierarchy and peer group, in modern neurosurgery, each surgeon is only connected to a small fraction of the total network. Highly central individuals have played critical roles in linking disparate groups and shaping network structure. Increasing diversity in recent decades indicates progress toward inclusivity, although overall representation remains low.
PMID: 40914191
ISSN: 1878-8769
CID: 5966272