Searched for: in-biosketch:yes
person:stolog01
Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research
Golob, Jonathan L; Oskotsky, Tomiko T; Tang, Alice S; Roldan, Alennie; Chung, Verena; Ha, Connie W Y; Wong, Ronald J; Flynn, Kaitlin J; Parraga-Leo, Antonio; Wibrand, Camilla; Minot, Samuel S; Oskotsky, Boris; Andreoletti, Gaia; Kosti, Idit; Bletz, Julie; Nelson, Amber; Gao, Jifan; Wei, Zhoujingpeng; Chen, Guanhua; Tang, Zheng-Zheng; Novielli, Pierfrancesco; Romano, Donato; Pantaleo, Ester; Amoroso, Nicola; Monaco, Alfonso; Vacca, Mirco; De Angelis, Maria; Bellotti, Roberto; Tangaro, Sabina; Kuntzleman, Abigail; Bigcraft, Isaac; Techtmann, Stephen; Bae, Daehun; Kim, Eunyoung; Jeon, Jongbum; Joe, Soobok; ,; Theis, Kevin R; Ng, Sherrianne; Lee, Yun S; Diaz-Gimeno, Patricia; Bennett, Phillip R; MacIntyre, David A; Stolovitzky, Gustavo; Lynch, Susan V; Albrecht, Jake; Gomez-Lopez, Nardhy; Romero, Roberto; Stevenson, David K; Aghaeepour, Nima; Tarca, Adi L; Costello, James C; Sirota, Marina
Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; <37 weeks) or (2) early preterm birth (ePTB; <32 weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth.
PMID: 38134931
ISSN: 2666-3791
CID: 5799442
Extracellular vesicles, RNA sequencing, and bioinformatic analyses: Challenges, solutions, and recommendations
Miceli, Rebecca T; Chen, Tzu-Yi; Nose, Yohei; Tichkule, Swapnil; Brown, Briana; Fullard, John F; Saulsbury, Marilyn D; Heyliger, Simon O; Gnjatic, Sacha; Kyprianou, Natasha; Cordon-Cardo, Carlos; Sahoo, Susmita; Taioli, Emanuela; Roussos, Panos; Stolovitzky, Gustavo; Gonzalez-Kozlova, Edgar; Dogra, Navneet
Extracellular vesicles (EVs) are heterogeneous entities secreted by cells into their microenvironment and systemic circulation. Circulating EVs carry functional small RNAs and other molecular footprints from their cell of origin, and thus have evident applications in liquid biopsy, therapeutics, and intercellular communication. Yet, the complete transcriptomic landscape of EVs is poorly characterized due to critical limitations including variable protocols used for EV-RNA extraction, quality control, cDNA library preparation, sequencing technologies, and bioinformatic analyses. Consequently, there is a gap in knowledge and the need for a standardized approach in delineating EV-RNAs. Here, we address these gaps by describing the following points by (1) focusing on the large canopy of the EVs and particles (EVPs), which includes, but not limited to - exosomes and other large and small EVs, lipoproteins, exomeres/supermeres, mitochondrial-derived vesicles, RNA binding proteins, and cell-free DNA/RNA/proteins; (2) examining the potential functional roles and biogenesis of EVPs; (3) discussing various transcriptomic methods and technologies used in uncovering the cargoes of EVPs; (4) presenting a comprehensive list of RNA subtypes reported in EVPs; (5) describing different EV-RNA databases and resources specific to EV-RNA species; (6) reviewing established bioinformatics pipelines and novel strategies for reproducible EV transcriptomics analyses; (7) emphasizing the significant need for a gold standard approach in identifying EV-RNAs across studies; (8) and finally, we highlight current challenges, discuss possible solutions, and present recommendations for robust and reproducible analyses of EVP-associated small RNAs. Overall, we seek to provide clarity on the transcriptomics landscape, sequencing technologies, and bioinformatic analyses of EVP-RNAs. Detailed portrayal of the current state of EVP transcriptomics will lead to a better understanding of how the RNA cargo of EVPs can be used in modern and targeted diagnostics and therapeutics. For the inclusion of different particles discussed in this article, we use the terms large/small EVs, non-vesicular extracellular particles (NVEPs), EPs and EVPs as defined in MISEV guidelines by the International Society of Extracellular Vesicles (ISEV).
PMCID:11613500
PMID: 39625409
ISSN: 2001-3078
CID: 5763732
Microbiome Preterm Birth DREAM Challenge: Crowdsourcing Machine Learning Approaches to Advance Preterm Birth Research
Golob, Jonathan L; Oskotsky, Tomiko T; Tang, Alice S; Roldan, Alennie; Chung, Verena; Ha, Connie W Y; Wong, Ronald J; Flynn, Kaitlin J; Parraga-Leo, Antonio; Wibrand, Camilla; Minot, Samuel S; Andreoletti, Gaia; Kosti, Idit; Bletz, Julie; Nelson, Amber; Gao, Jifan; Wei, Zhoujingpeng; Chen, Guanhua; Tang, Zheng-Zheng; Novielli, Pierfrancesco; Romano, Donato; Pantaleo, Ester; Amoroso, Nicola; Monaco, Alfonso; Vacca, Mirco; De Angelis, Maria; Bellotti, Roberto; Tangaro, Sabina; Kuntzleman, Abigail; Bigcraft, Isaac; Techtmann, Stephen; Bae, Daehun; Kim, Eunyoung; Jeon, Jongbum; Joe, Soobok; ,; Theis, Kevin R; Ng, Sherrianne; Lee Li, Yun S; Diaz-Gimeno, Patricia; Bennett, Phillip R; MacIntyre, David A; Stolovitzky, Gustavo; Lynch, Susan V; Albrecht, Jake; Gomez-Lopez, Nardhy; Romero, Roberto; Stevenson, David K; Aghaeepour, Nima; Tarca, Adi L; Costello, James C; Sirota, Marina
Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on novel datasets representing 331 samples from 148 pregnant individuals. From 318 DREAM challenge participants we received 148 and 121 submissions for our two separate prediction sub-challenges with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87, respectively. Alpha diversity, VALENCIA community state types, and composition (via phylotype relative abundance) were important features in the top performing models, most of which were tree based methods. This work serves as the foundation for subsequent efforts to translate predictive tests into clinical practice, and to better understand and prevent preterm birth.
PMID: 36945505
CID: 5824972
Detecting Ground Glass Opacity Features in Patients With Lung Cancer: Automated Extraction and Longitudinal Analysis via Deep Learning-Based Natural Language Processing
Lee, Kyeryoung; Liu, Zongzhi; Chandran, Urmila; Kalsekar, Iftekhar; Laxmanan, Balaji; Higashi, Mitchell K; Jun, Tomi; Ma, Meng; Li, Minghao; Mai, Yun; Gilman, Christopher; Wang, Tongyu; Ai, Lei; Aggarwal, Parag; Pan, Qi; Oh, William; Stolovitzky, Gustavo; Schadt, Eric; Wang, Xiaoyan
BACKGROUND:Ground-glass opacities (GGOs) appearing in computed tomography (CT) scans may indicate potential lung malignancy. Proper management of GGOs based on their features can prevent the development of lung cancer. Electronic health records are rich sources of information on GGO nodules and their granular features, but most of the valuable information is embedded in unstructured clinical notes. OBJECTIVE:We aimed to develop, test, and validate a deep learning-based natural language processing (NLP) tool that automatically extracts GGO features to inform the longitudinal trajectory of GGO status from large-scale radiology notes. METHODS:We developed a bidirectional long short-term memory with a conditional random field-based deep-learning NLP pipeline to extract GGO and granular features of GGO retrospectively from radiology notes of 13,216 lung cancer patients. We evaluated the pipeline with quality assessments and analyzed cohort characterization of the distribution of nodule features longitudinally to assess changes in size and solidity over time. RESULTS:-scores on different GGO features. We deployed this GGO NLP model to extract and structure comprehensive characteristics of GGOs from 29,496 radiology notes of 4521 lung cancer patients. Longitudinal analysis revealed that size increased in 16.8% (240/1424) of patients, decreased in 14.6% (208/1424), and remained unchanged in 68.5% (976/1424) in their last note compared to the first note. Among 1127 patients who had longitudinal radiology notes of GGO status, 815 (72.3%) were reported to have stable status, and 259 (23%) had increased/progressed status in the subsequent notes. CONCLUSIONS:Our deep learning-based NLP pipeline can automatically extract granular GGO features at scale from electronic health records when this information is documented in radiology notes and help inform the natural history of GGO. This will open the way for a new paradigm in lung cancer prevention and early detection.
PMCID:11041451
PMID: 38875565
ISSN: 2817-1705
CID: 5799472
Analysis of real-world data to investigate evolving treatment sequencing patterns in advanced non-small cell lung cancers and their impact on survival
Liu, Zongzhi; Lee, Kyeryoung; Cohn, David; Zhang, Mingwei; Ai, Lei; Li, Minghao; Zhang, Xingming; Jun, Tomi; Higashi, Mitchell K; Pan, Qi; Oh, William; Stolovitzky, Gustavo; Schadt, Eric; Wang, Xiaoyan; Li, Shuyu D
BACKGROUND/UNASSIGNED:Although optimal sequencing of systemic therapy in cancer care is critical to achieving maximal clinical benefit, there is a lack of analysis of treatment sequencing in advanced non-small cell lung cancer (aNSCLC) in real-world settings. METHODS/UNASSIGNED:line of therapy (LOT). RESULTS/UNASSIGNED:line chemotherapy alone, there was no statistically significant difference in time-to-next treatment (TTNT) and in OS among the three patient groups. CONCLUSIONS/UNASSIGNED:line setting.
PMID: 37324065
ISSN: 2072-1439
CID: 5799432
Extracellular vesicles carry distinct proteo-transcriptomic signatures that are different from their cancer cell of origin
Chen, Tzu-Yi; Gonzalez-Kozlova, Edgar; Soleymani, Taliah; La Salvia, Sabrina; Kyprianou, Natasha; Sahoo, Susmita; Tewari, Ashutosh K; Cordon-Cardo, Carlos; Stolovitzky, Gustavo; Dogra, Navneet
Circulating extracellular vesicles (EVs) contain molecular footprints-lipids, proteins, RNA, and DNA-from their cell of origin. Consequently, EV-associated RNA and proteins have gained widespread interest as liquid-biopsy biomarkers. Yet, an integrative proteo-transcriptomic landscape of EVs and comparison with their cell of origin remains obscure. Here, we report that EVs enrich distinct proteo-transcriptome that does not linearly correlate with their cell of origin. We show that EVs enrich endosomal and extracellular proteins, small RNA (∼13-200 nucleotides) associated with cell differentiation, development, and Wnt signaling. EVs cargo specific RNAs (RNY3, vtRNA, and MIRLET-7) and their complementary proteins (YBX1, IGF2BP2, and SRSF1/2). To ensure an unbiased and independent analyses, we studied 12 cancer cell lines, matching EVs (inhouse and exRNA database), and serum EVs of patients with prostate cancer. Together, we show that EV-RNA-protein complexes may constitute a functional interaction network to protect and regulate molecular access until a function is achieved.
PMCID:9157216
PMID: 35663013
ISSN: 2589-0042
CID: 5822842
A Crowdsourcing Approach to Develop Machine Learning Models to Quantify Radiographic Joint Damage in Rheumatoid Arthritis
Sun, Dongmei; Nguyen, Thanh M; Allaway, Robert J; Wang, Jelai; Chung, Verena; Yu, Thomas V; Mason, Michael; Dimitrovsky, Isaac; Ericson, Lars; Li, Hongyang; Guan, Yuanfang; Israel, Ariel; Olar, Alex; Pataki, Balint Armin; Stolovitzky, Gustavo; Guinney, Justin; Gulko, Percio S; Frazier, Mason B; Chen, Jake Y; Costello, James C; Bridges, S Louis; ,
IMPORTANCE:An automated, accurate method is needed for unbiased assessment quantifying accrual of joint space narrowing and erosions on radiographic images of the hands and wrists, and feet for clinical trials, monitoring of joint damage over time, assisting rheumatologists with treatment decisions. Such a method has the potential to be directly integrated into electronic health records. OBJECTIVES:To design and implement an international crowdsourcing competition to catalyze the development of machine learning methods to quantify radiographic damage in rheumatoid arthritis (RA). DESIGN, SETTING, AND PARTICIPANTS:This diagnostic/prognostic study describes the Rheumatoid Arthritis 2-Dialogue for Reverse Engineering Assessment and Methods (RA2-DREAM Challenge), which used existing radiographic images and expert-curated Sharp-van der Heijde (SvH) scores from 2 clinical studies (674 radiographic sets from 562 patients) for training (367 sets), leaderboard (119 sets), and final evaluation (188 sets). Challenge participants were tasked with developing methods to automatically quantify overall damage (subchallenge 1), joint space narrowing (subchallenge 2), and erosions (subchallenge 3). The challenge was finished on June 30, 2020. MAIN OUTCOMES AND MEASURES:Scores derived from submitted algorithms were compared with the expert-curated SvH scores, and a baseline model was created for benchmark comparison. Performances were ranked using weighted root mean square error (RMSE). The performance and reproductivity of each algorithm was assessed using Bayes factor from bootstrapped data, and further evaluated with a postchallenge independent validation data set. RESULTS:The RA2-DREAM Challenge received a total of 173 submissions from 26 participants or teams in 7 countries for the leaderboard round, and 13 submissions were included in the final evaluation. The weighted RMSEs metric showed that the winning algorithms produced scores that were very close to the expert-curated SvH scores. Top teams included Team Shirin for subchallenge 1 (weighted RMSE, 0.44), HYL-YFG (Hongyang Li and Yuanfang Guan) subchallenge 2 (weighted RMSE, 0.38), and Gold Therapy for subchallenge 3 (weighted RMSE, 0.43). Bootstrapping/Bayes factor approach and the postchallenge independent validation confirmed the reproducibility and the estimation concordance indices between final evaluation and postchallenge independent validation data set were 0.71 for subchallenge 1, 0.78 for subchallenge 2, and 0.82 for subchallenge 3. CONCLUSIONS AND RELEVANCE:The RA2-DREAM Challenge resulted in the development of algorithms that provide feasible, quick, and accurate methods to quantify joint damage in RA. Ultimately, these methods could help research studies on RA joint damage and may be integrated into electronic health records to help clinicians serve patients better by providing timely, reliable, and quantitative information for making treatment decisions to prevent further damage.
PMID: 36036935
ISSN: 2574-3805
CID: 5822852
Crowdsourced mapping of unexplored target space of kinase inhibitors
CichoĊska, Anna; Ravikumar, Balaguru; Allaway, Robert J; Wan, Fangping; Park, Sungjoon; Isayev, Olexandr; Li, Shuya; Mason, Michael; Lamb, Andrew; Tanoli, Ziaurrehman; Jeon, Minji; Kim, Sunkyu; Popova, Mariya; Capuzzi, Stephen; Zeng, Jianyang; Dang, Kristen; Koytiger, Gregory; Kang, Jaewoo; Wells, Carrow I; Willson, Timothy M; ,; Oprea, Tudor I; Schlessinger, Avner; Drewry, David H; Stolovitzky, Gustavo; Wennerberg, Krister; Guinney, Justin; Aittokallio, Tero
Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.
PMCID:8175708
PMID: 34083538
ISSN: 2041-1723
CID: 5822782
Evaluation of artificial intelligence systems for assisting neurologists with fast and accurate annotations of scalp electroencephalography data
Roy, Subhrajit; Kiral, Isabell; Mirmomeni, Mahtab; Mummert, Todd; Braz, Alan; Tsay, Jason; Tang, Jianbin; Asif, Umar; Schaffter, Thomas; Ahsen, Mehmet Eren; Iwamori, Toshiya; Yanagisawa, Hiroki; Poonawala, Hasan; Madan, Piyush; Qin, Yong; Picone, Joseph; Obeid, Iyad; Marques, Bruno De Assis; Maetschke, Stefan; Khalaf, Rania; Rosen-Zvi, Michal; Stolovitzky, Gustavo; Harrer, Stefan; ,
BACKGROUND:Assistive automatic seizure detection can empower human annotators to shorten patient monitoring data review times. We present a proof-of-concept for a seizure detection system that is sensitive, automated, patient-specific, and tunable to maximise sensitivity while minimizing human annotation times. The system uses custom data preparation methods, deep learning analytics and electroencephalography (EEG) data. METHODS:Scalp EEG data of 365 patients containing 171,745 s ictal and 2,185,864 s interictal samples obtained from clinical monitoring systems were analysed as part of a crowdsourced artificial intelligence (AI) challenge. Participants were tasked to develop an ictal/interictal classifier with high sensitivity and low false alarm rates. We built a challenge platform that prevented participants from downloading or directly accessing the data while allowing crowdsourced model development. FINDINGS/RESULTS:The automatic detection system achieved tunable sensitivities between 75.00% and 91.60% allowing a reduction in the amount of raw EEG data to be reviewed by a human annotator by factors between 142x, and 22x respectively. The algorithm enables instantaneous reviewer-managed optimization of the balance between sensitivity and the amount of raw EEG data to be reviewed. INTERPRETATION/CONCLUSIONS:This study demonstrates the utility of deep learning for patient-specific seizure detection in EEG data. Furthermore, deep learning in combination with a human reviewer can provide the basis for an assistive data labelling system lowering the time of manual review while maintaining human expert annotation performance. FUNDING/BACKGROUND:IBM employed all IBM Research authors. Temple University employed all Temple University authors. The Icahn School of Medicine at Mount Sinai employed Eren Ahsen. The corresponding authors Stefan Harrer and Gustavo Stolovitzky declare that they had full access to all the data in the study and that they had final responsibility for the decision to submit for publication.
PMID: 33745882
ISSN: 2352-3964
CID: 5822772
A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery
Creason, Allison; Haan, David; Dang, Kristen; Chiotti, Kami E; Inkman, Matthew; Lamb, Andrew; Yu, Thomas; Hu, Yin; Norman, Thea C; Buchanan, Alex; van Baren, Marijke J; Spangler, Ryan; Rollins, M Rick; Spellman, Paul T; Rozanov, Dmitri; Zhang, Jin; Maher, Christopher A; Caloian, Cristian; Watson, John D; Uhrig, Sebastian; Haas, Brian J; Jain, Miten; Akeson, Mark; Ahsen, Mehmet Eren; ,; Stolovitzky, Gustavo; Guinney, Justin; Boutros, Paul C; Stuart, Joshua M; Ellrott, Kyle
The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information.
PMCID:8376800
PMID: 34146471
ISSN: 2405-4720
CID: 5822792