Searched for: in-biosketch:true
person:oermae01
A Nationwide Study Characterizing the Risk and Outcome Profiles of Multilevel Fusion Procedures in Neuromuscular Scoliosis Patients with Neurofibromatosis Type 1
Price, Gabrielle; Martini, Michael L; Caridi, John M; Lau, Darryl; Oermann, Eric K; Neifert, Sean N
BACKGROUND:Spine abnormalities are a common manifestation of Neurofibromatosis Type 1 (NF1); however, the outcomes of surgical treatment for NF1-associated spinal deformity are not well explored. The purpose of this study was to investigate the outcome and risk profiles of multilevel fusion surgery for NF1 patients. METHODS:The National Inpatient Sample was queried for NF1 and non-NF1 patient populations with neuromuscular scoliosis who underwent multilevel fusion surgery involving eight or more vertebral levels between 2004 and 2017. Multivariate regression modeling was used to explore the relationship between perioperative variables and pertinent outcomes. RESULTS:Of the 55,485 patients with scoliosis, 533 patients (0.96%) had NF1. Patients with NF1 were more likely to have comorbid solid tumors (P < 0.0001), clinical depression (P < 0.0001), peripheral vascular disease (P < 0.0001), and hypertension (P < 0.001). Following surgery, NF1 patients had a higher incidence of hydrocephalus (0.6% vs. 1.9% P = 0.002), seizures (4.9% vs. 5.7% P = 0.006), and accidental vessel laceration (0.3% vs.1.9% P = 0.011). Although there were no differences in overall complication rates or in-hospital mortality, multivariate regression revealed NF1 patients had an increased probability of pulmonary (OR 0.5, 95%CI 0.3-0.8, P = 0.004) complications. There were no significant differences in utilization, including nonhome discharge or extended hospitalization; however, patients with NF1 had higher total hospital charges (mean -$18739, SE 3384, P < 0.0001). CONCLUSIONS:These findings indicate that NF1 is associated with certain complications following multilevel fusion surgery but does not appear to be associated with differences in quality or cost outcomes. These results provide some guidance to surgeons and other healthcare professionals in their perioperative decision making by raising awareness about risk factors for NF1 patients undergoing multilevel fusion surgery. We intend for this study to set the national baseline for complications after multilevel fusion in the NF1 population.
PMID: 36586581
ISSN: 1878-8769
CID: 5418972
Methods and Impact for Using Federated Learning to Collaborate on Clinical Research
Cheung, Alexander T M; Nasir-Moin, Mustafa; Fred Kwon, Young Joon; Guan, Jiahui; Liu, Chris; Jiang, Lavender; Raimondo, Christian; Chotai, Silky; Chambless, Lola; Ahmad, Hasan S; Chauhan, Daksh; Yoon, Jang W; Hollon, Todd; Buch, Vivek; Kondziolka, Douglas; Chen, Dinah; Al-Aswad, Lama A; Aphinyanaphongs, Yindalon; Oermann, Eric Karl
BACKGROUND:The development of accurate machine learning algorithms requires sufficient quantities of diverse data. This poses a challenge in health care because of the sensitive and siloed nature of biomedical information. Decentralized algorithms through federated learning (FL) avoid data aggregation by instead distributing algorithms to the data before centrally updating one global model. OBJECTIVE:To establish a multicenter collaboration and assess the feasibility of using FL to train machine learning models for intracranial hemorrhage (ICH) detection without sharing data between sites. METHODS:Five neurosurgery departments across the United States collaborated to establish a federated network and train a convolutional neural network to detect ICH on computed tomography scans. The global FL model was benchmarked against a standard, centrally trained model using a held-out data set and was compared against locally trained models using site data. RESULTS:A federated network of practicing neurosurgeon scientists was successfully initiated to train a model for predicting ICH. The FL model achieved an area under the ROC curve of 0.9487 (95% CI 0.9471-0.9503) when predicting all subtypes of ICH compared with a benchmark (non-FL) area under the ROC curve of 0.9753 (95% CI 0.9742-0.9764), although performance varied by subtype. The FL model consistently achieved top three performance when validated on any site's data, suggesting improved generalizability. A qualitative survey described the experience of participants in the federated network. CONCLUSION/CONCLUSIONS:This study demonstrates the feasibility of implementing a federated network for multi-institutional collaboration among clinicians and using FL to conduct machine learning research, thereby opening a new paradigm for neurosurgical collaboration.
PMID: 36399428
ISSN: 1524-4040
CID: 5385002
Autoencoders for sample size estimation for fully connected neural network classifiers
Gulamali, Faris F; Sawant, Ashwin S; Kovatch, Patricia; Glicksberg, Benjamin; Charney, Alexander; Nadkarni, Girish N; Oermann, Eric
Sample size estimation is a crucial step in experimental design but is understudied in the context of deep learning. Currently, estimating the quantity of labeled data needed to train a classifier to a desired performance, is largely based on prior experience with similar models and problems or on untested heuristics. In many supervised machine learning applications, data labeling can be expensive and time-consuming and would benefit from a more rigorous means of estimating labeling requirements. Here, we study the problem of estimating the minimum sample size of labeled training data necessary for training computer vision models as an exemplar for other deep learning problems. We consider the problem of identifying the minimal number of labeled data points to achieve a generalizable representation of the data, a minimum converging sample (MCS). We use autoencoder loss to estimate the MCS for fully connected neural network classifiers. At sample sizes smaller than the MCS estimate, fully connected networks fail to distinguish classes, and at sample sizes above the MCS estimate, generalizability strongly correlates with the loss function of the autoencoder. We provide an easily accessible, code-free, and dataset-agnostic tool to estimate sample sizes for fully connected networks. Taken together, our findings suggest that MCS and convergence estimation are promising methods to guide sample size estimates for data collection and labeling prior to training deep learning models in computer vision.
PMCID:9747810
PMID: 36513729
ISSN: 2398-6352
CID: 5883392
Rescue therapy for vasospasm following aneurysmal subarachnoid hemorrhage: a propensity score-matched analysis with machine learning
Martini, Michael L; Neifert, Sean N; Shuman, William H; Chapman, Emily K; Schüpper, Alexander J; Oermann, Eric K; Mocco, J; Todd, Michael; Torner, James C; Molyneux, Andrew; Mayer, Stephan; Roux, Peter Le; Vergouwen, Mervyn D I; Rinkel, Gabriel J E; Wong, George K C; Kirkpatrick, Peter; Quinn, Audrey; Hänggi, Daniel; Etminan, Nima; van den Bergh, Walter M; Jaja, Blessing N R; Cusimano, Michael; Schweizer, Tom A; Suarez, Jose I; Fukuda, Hitoshi; Yamagata, Sen; Lo, Benjamin; Leonardo de Oliveira Manoel, Airton; Boogaarts, Hieronymus D; Macdonald, R Loch; ,
OBJECTIVE:Rescue therapies have been recommended for patients with angiographic vasospasm (aVSP) and delayed cerebral ischemia (DCI) following subarachnoid hemorrhage (SAH). However, there is little evidence from randomized clinical trials that these therapies are safe and effective. The primary aim of this study was to apply game theory-based methods in explainable machine learning (ML) and propensity score matching to determine if rescue therapy was associated with better 3-month outcomes following post-SAH aVSP and DCI. The authors also sought to use these explainable ML methods to identify patient populations that were more likely to receive rescue therapy and factors associated with better outcomes after rescue therapy. METHODS:Data for patients with aVSP or DCI after SAH were obtained from 8 clinical trials and 1 observational study in the Subarachnoid Hemorrhage International Trialists repository. Gradient boosting ML models were constructed for each patient to predict the probability of receiving rescue therapy and the 3-month Glasgow Outcome Scale (GOS) score. Favorable outcome was defined as a 3-month GOS score of 4 or 5. Shapley Additive Explanation (SHAP) values were calculated for each patient-derived model to quantify feature importance and interaction effects. Variables with high SHAP importance in predicting rescue therapy administration were used in a propensity score-matched analysis of rescue therapy and 3-month GOS scores. RESULTS:The authors identified 1532 patients with aVSP or DCI. Predictive, explainable ML models revealed that aneurysm characteristics and neurological complications, but not admission neurological scores, carried the highest relative importance rankings in predicting whether rescue therapy was administered. Younger age and absence of cerebral ischemia/infarction were invariably linked to better rescue outcomes, whereas the other important predictors of outcome varied by rescue type (interventional or noninterventional). In a propensity score-matched analysis guided by SHAP-based variable selection, rescue therapy was associated with higher odds of 3-month GOS scores of 4-5 (OR 1.63, 95% CI 1.22-2.17). CONCLUSIONS:Rescue therapy may increase the odds of good outcome in patients with aVSP or DCI after SAH. Given the strong association between cerebral ischemia/infarction and poor outcome, trials focusing on preventative or therapeutic interventions in these patients may be most able to demonstrate improvements in clinical outcomes. Insights developed from these models may be helpful for improving patient selection and trial design.
PMID: 34214980
ISSN: 1933-0693
CID: 5883372
Editorial. The future of stroke care is remote and now [Editorial]
Oermann, Eric K; Riina, Howard A
PMID: 34560649
ISSN: 1933-0693
CID: 5883382
Deploying deep learning models on unseen medical imaging using adversarial domain adaptation
Valliani, Aly A; Gulamali, Faris F; Kwon, Young Joon; Martini, Michael L; Wang, Chiatse; Kondziolka, Douglas; Chen, Viola J; Wang, Weichung; Costa, Anthony B; Oermann, Eric K
The fundamental challenge in machine learning is ensuring that trained models generalize well to unseen data. We developed a general technique for ameliorating the effect of dataset shift using generative adversarial networks (GANs) on a dataset of 149,298 handwritten digits and dataset of 868,549 chest radiographs obtained from four academic medical centers. Efficacy was assessed by comparing area under the curve (AUC) pre- and post-adaptation. On the digit recognition task, the baseline CNN achieved an average internal test AUC of 99.87% (95% CI, 99.87-99.87%), which decreased to an average external test AUC of 91.85% (95% CI, 91.82-91.88%), with an average salvage of 35% from baseline upon adaptation. On the lung pathology classification task, the baseline CNN achieved an average internal test AUC of 78.07% (95% CI, 77.97-78.17%) and an average external test AUC of 71.43% (95% CI, 71.32-71.60%), with a salvage of 25% from baseline upon adaptation. Adversarial domain adaptation leads to improved model performance on radiographic data derived from multiple out-of-sample healthcare populations. This work can be applied to other medical imaging domains to help shape the deployment toolkit of machine learning in medicine.
PMCID:9565422
PMID: 36240135
ISSN: 1932-6203
CID: 5352202
Generating novel pituitary datasets from open-source imaging data and deep volumetric segmentation
Gologorsky, Rachel; Harake, Edward; von Oiste, Grace; Nasir-Moin, Mustafa; Couldwell, William; Oermann, Eric; Hollon, Todd
PURPOSE/OBJECTIVE:The estimated incidence of pituitary adenomas in the general population is 10-30%, yet radiographic diagnosis remains a challenge. Diagnosis is complicated by the heterogeneity of radiographic features in both normal (e.g. complex anatomy, pregnancy) and pathologic states (e.g. primary endocrinopathy, hypophysitis). Clinical symptoms and laboratory testing are often equivocal, which can result in misdiagnosis or unnecessary specialist referrals. Computer vision models can aid in pituitary adenoma diagnosis; however, a major challenge to model development is the lack of dedicated pituitary imaging datasets. We hypothesized that deep volumetric segmentation models trained to extract the sellar and parasellar region from existing whole-brain MRI scans could be used to generate a novel dataset of pituitary imaging. METHODS:Six open-source whole-brain MRI datasets, created for research purposes, were included for model development. Deep learning-based volumetric segmentation models were trained using 318 manually annotated MRI scans from a single open-source MRI dataset. Out-of-distribution volumetric segmentation performance was then tested on 418 MRIs from five held-out research datasets. RESULTS:On our annotated images, agreement between manual and model volumetric segmentations was high. Dice scores (a measure of overlap) ranged 0.76-0.82 for both in-distribution and out-of-distribution model testing. In total, 6,755 MRIs from six data sources were included in the final generated pituitary dataset. CONCLUSIONS:We present the first and largest dataset of pituitary imaging constructed using existing MRI data and deep volumetric segmentation models trained to identify sellar and parasellar anatomy. The model generalizes well across patient populations and MRI scanner types. We hope our pituitary dataset will be an integral part of future machine learning research on pituitary pathologies.
PMID: 35943676
ISSN: 1573-7403
CID: 5286832
Population scale latent space cohort matching for the improved use and exploration of observational trial data
Gologorsky, Rachel; Somani, Sulaiman S; Neifert, Sean N; Valliani, Aly A; Link, Katherine E; Chen, Viola J; Costa, Anthony B; Oermann, Eric K
A significant amount of clinical research is observational by nature and derived from medical records, clinical trials, and large-scale registries. While there is no substitute for randomized, controlled experimentation, such experiments or trials are often costly, time consuming, and even ethically or practically impossible to execute. Combining classical regression and structural equation modeling with matching techniques can leverage the value of observational data. Nevertheless, identifying variables of greatest interest in high-dimensional data is frequently challenging, even with application of classical dimensionality reduction and/or propensity scoring techniques. Here, we demonstrate that projecting high-dimensional medical data onto a lower-dimensional manifold using deep autoencoders and post-hoc generation of treatment/control cohorts based on proximity in the lower-dimensional space results in better matching of confounding variables compared to classical propensity score matching (PSM) in the original high-dimensional space (P<0.0001) and performs similarly to PSM models constructed by experts with prior knowledge of the underlying pathology when evaluated on predicting risk ratios from real-world clinical data. Thus, in cases when the underlying problem is poorly understood and the data is high-dimensional in nature, matching in the autoencoder latent space might be of particular benefit.
PMID: 35730283
ISSN: 1551-0018
CID: 5278662
Pragmatic Prediction of Excessive Length of Stay After Cervical Spine Surgery With Machine Learning and Validation on a National Scale
Valliani, Aly A; Feng, Rui; Martini, Michael L; Neifert, Sean N; Kim, Nora C; Gal, Jonathan S; Oermann, Eric K; Caridi, John M
BACKGROUND:Extended postoperative hospital stays are associated with numerous clinical risks and increased economic cost. Accurate preoperative prediction of extended length of stay (LOS) can facilitate targeted interventions to mitigate clinical harm and resource utilization. OBJECTIVE:To develop a machine learning algorithm aimed at predicting extended LOS after cervical spine surgery on a national level and elucidate drivers of prediction. METHODS:Electronic medical records from a large, urban academic medical center were retrospectively examined to identify patients who underwent cervical spine fusion surgeries between 2008 and 2019 for machine learning algorithm development and in-sample validation. The National Inpatient Sample database was queried to identify cervical spine fusion surgeries between 2009 and 2017 for out-of-sample validation of algorithm performance. Gradient-boosted trees predicted LOS and efficacy was assessed using the area under the receiver operating characteristic curve (AUROC). Shapley values were calculated to characterize preoperative risk factors for extended LOS and explain algorithm predictions. RESULTS:Gradient-boosted trees accurately predicted extended LOS across cohorts, achieving an AUROC of 0.87 (SD = 0.01) on the single-center validation set and an AUROC of 0.84 (SD = 0.00) on the nationwide National Inpatient Sample data set. Anterior approach only, elective admission status, age, and total number of Elixhauser comorbidities were important predictors that affected the likelihood of prolonged LOS. CONCLUSION/CONCLUSIONS:Machine learning algorithms accurately predict extended LOS across single-center and national patient cohorts and characterize key preoperative drivers of increased LOS after cervical spine surgery.
PMID: 35834322
ISSN: 1524-4040
CID: 5269342
Robust Prediction of Non-home Discharge After Thoracolumbar Spine Surgery with Ensemble Machine Learning and Validation on a Nationwide Cohort
Valliani, Aly A; Kim, Nora C; Martini, Michael L; Gal, Jonathan S; Neifert, Sean N; Feng, Rui; Geng, Eric E; Kim, Jun S; Cho, Samuel K; Oermann, Eric K; Caridi, John M
BACKGROUND:Delays in postoperative referrals to rehabilitation or skilled nursing facilities contribute toward extended hospital stays. Facilitating more efficient referrals through accurate preoperative prediction algorithms has the potential to reduce unnecessary economic burden and minimize risk of hospital-acquired complications. We develop a robust machine learning algorithm to predict non-home discharge after thoracolumbar spine surgery that generalizes to unseen populations and identifies markers for prediction. METHODS:Retrospective electronic health records were obtained from the single-center data warehouse (SCDW) to identify patients undergoing thoracolumbar spine surgeries between 2008 and 2019 for algorithm development and internal validation. The National Inpatient Sample (NIS) database was queried to identify thoracolumbar surgeries between 2009 and 2017 for out-of-sample validation. Ensemble decision trees were constructed for prediction and area under the receiver operating characteristic curve (AUROC) was used to assess performance. SHAP values were derived to identify drivers of non-home discharge for interpretation of algorithm predictions. RESULTS:5,224 cases of thoracolumbar spine surgeries were isolated from the SCDW and 492,312 cases were identified from NIS. The model achieved an AUROCs of 0.81 (SD=0.01) on the SCDW test set and 0.77 (SD=0.01) on the nationwide NIS dataset, thereby demonstrating robust prediction of non-home discharge across all diverse patient cohorts. Age, total Elixhauser comorbidities, Medicare insurance, weighted Elixhauser score, and female gender were among the most important predictors of non-home discharge. CONCLUSIONS:Machine learning algorithms reliably predict non-home discharge after thoracolumbar spine surgery across single-center and national cohorts and identify preoperative features of importance that elucidate algorithm decision-making.
PMID: 35654334
ISSN: 1878-8769
CID: 5236162