Searched for: in-biosketch:yes
person:stolog01
Detecting Ground Glass Opacity Features in Patients With Lung Cancer: Automated Extraction and Longitudinal Analysis via Deep Learning-Based Natural Language Processing
Lee, Kyeryoung; Liu, Zongzhi; Chandran, Urmila; Kalsekar, Iftekhar; Laxmanan, Balaji; Higashi, Mitchell K; Jun, Tomi; Ma, Meng; Li, Minghao; Mai, Yun; Gilman, Christopher; Wang, Tongyu; Ai, Lei; Aggarwal, Parag; Pan, Qi; Oh, William; Stolovitzky, Gustavo; Schadt, Eric; Wang, Xiaoyan
BACKGROUND:Ground-glass opacities (GGOs) appearing in computed tomography (CT) scans may indicate potential lung malignancy. Proper management of GGOs based on their features can prevent the development of lung cancer. Electronic health records are rich sources of information on GGO nodules and their granular features, but most of the valuable information is embedded in unstructured clinical notes. OBJECTIVE:We aimed to develop, test, and validate a deep learning-based natural language processing (NLP) tool that automatically extracts GGO features to inform the longitudinal trajectory of GGO status from large-scale radiology notes. METHODS:We developed a bidirectional long short-term memory with a conditional random field-based deep-learning NLP pipeline to extract GGO and granular features of GGO retrospectively from radiology notes of 13,216 lung cancer patients. We evaluated the pipeline with quality assessments and analyzed cohort characterization of the distribution of nodule features longitudinally to assess changes in size and solidity over time. RESULTS:-scores on different GGO features. We deployed this GGO NLP model to extract and structure comprehensive characteristics of GGOs from 29,496 radiology notes of 4521 lung cancer patients. Longitudinal analysis revealed that size increased in 16.8% (240/1424) of patients, decreased in 14.6% (208/1424), and remained unchanged in 68.5% (976/1424) in their last note compared to the first note. Among 1127 patients who had longitudinal radiology notes of GGO status, 815 (72.3%) were reported to have stable status, and 259 (23%) had increased/progressed status in the subsequent notes. CONCLUSIONS:Our deep learning-based NLP pipeline can automatically extract granular GGO features at scale from electronic health records when this information is documented in radiology notes and help inform the natural history of GGO. This will open the way for a new paradigm in lung cancer prevention and early detection.
PMCID:11041451
PMID: 38875565
ISSN: 2817-1705
CID: 5799472
Analysis of real-world data to investigate evolving treatment sequencing patterns in advanced non-small cell lung cancers and their impact on survival
Liu, Zongzhi; Lee, Kyeryoung; Cohn, David; Zhang, Mingwei; Ai, Lei; Li, Minghao; Zhang, Xingming; Jun, Tomi; Higashi, Mitchell K; Pan, Qi; Oh, William; Stolovitzky, Gustavo; Schadt, Eric; Wang, Xiaoyan; Li, Shuyu D
BACKGROUND/UNASSIGNED:Although optimal sequencing of systemic therapy in cancer care is critical to achieving maximal clinical benefit, there is a lack of analysis of treatment sequencing in advanced non-small cell lung cancer (aNSCLC) in real-world settings. METHODS/UNASSIGNED:line of therapy (LOT). RESULTS/UNASSIGNED:line chemotherapy alone, there was no statistically significant difference in time-to-next treatment (TTNT) and in OS among the three patient groups. CONCLUSIONS/UNASSIGNED:line setting.
PMID: 37324065
ISSN: 2072-1439
CID: 5799432
Microbiome Preterm Birth DREAM Challenge: Crowdsourcing Machine Learning Approaches to Advance Preterm Birth Research
Golob, Jonathan L; Oskotsky, Tomiko T; Tang, Alice S; Roldan, Alennie; Chung, Verena; Ha, Connie W Y; Wong, Ronald J; Flynn, Kaitlin J; Parraga-Leo, Antonio; Wibrand, Camilla; Minot, Samuel S; Andreoletti, Gaia; Kosti, Idit; Bletz, Julie; Nelson, Amber; Gao, Jifan; Wei, Zhoujingpeng; Chen, Guanhua; Tang, Zheng-Zheng; Novielli, Pierfrancesco; Romano, Donato; Pantaleo, Ester; Amoroso, Nicola; Monaco, Alfonso; Vacca, Mirco; De Angelis, Maria; Bellotti, Roberto; Tangaro, Sabina; Kuntzleman, Abigail; Bigcraft, Isaac; Techtmann, Stephen; Bae, Daehun; Kim, Eunyoung; Jeon, Jongbum; Joe, Soobok; ,; Theis, Kevin R; Ng, Sherrianne; Lee Li, Yun S; Diaz-Gimeno, Patricia; Bennett, Phillip R; MacIntyre, David A; Stolovitzky, Gustavo; Lynch, Susan V; Albrecht, Jake; Gomez-Lopez, Nardhy; Romero, Roberto; Stevenson, David K; Aghaeepour, Nima; Tarca, Adi L; Costello, James C; Sirota, Marina
Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on novel datasets representing 331 samples from 148 pregnant individuals. From 318 DREAM challenge participants we received 148 and 121 submissions for our two separate prediction sub-challenges with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87, respectively. Alpha diversity, VALENCIA community state types, and composition (via phylotype relative abundance) were important features in the top performing models, most of which were tree based methods. This work serves as the foundation for subsequent efforts to translate predictive tests into clinical practice, and to better understand and prevent preterm birth.
PMID: 36945505
CID: 5824972
A Crowdsourcing Approach to Develop Machine Learning Models to Quantify Radiographic Joint Damage in Rheumatoid Arthritis
Sun, Dongmei; Nguyen, Thanh M; Allaway, Robert J; Wang, Jelai; Chung, Verena; Yu, Thomas V; Mason, Michael; Dimitrovsky, Isaac; Ericson, Lars; Li, Hongyang; Guan, Yuanfang; Israel, Ariel; Olar, Alex; Pataki, Balint Armin; Stolovitzky, Gustavo; Guinney, Justin; Gulko, Percio S; Frazier, Mason B; Chen, Jake Y; Costello, James C; Bridges, S Louis; ,
IMPORTANCE:An automated, accurate method is needed for unbiased assessment quantifying accrual of joint space narrowing and erosions on radiographic images of the hands and wrists, and feet for clinical trials, monitoring of joint damage over time, assisting rheumatologists with treatment decisions. Such a method has the potential to be directly integrated into electronic health records. OBJECTIVES:To design and implement an international crowdsourcing competition to catalyze the development of machine learning methods to quantify radiographic damage in rheumatoid arthritis (RA). DESIGN, SETTING, AND PARTICIPANTS:This diagnostic/prognostic study describes the Rheumatoid Arthritis 2-Dialogue for Reverse Engineering Assessment and Methods (RA2-DREAM Challenge), which used existing radiographic images and expert-curated Sharp-van der Heijde (SvH) scores from 2 clinical studies (674 radiographic sets from 562 patients) for training (367 sets), leaderboard (119 sets), and final evaluation (188 sets). Challenge participants were tasked with developing methods to automatically quantify overall damage (subchallenge 1), joint space narrowing (subchallenge 2), and erosions (subchallenge 3). The challenge was finished on June 30, 2020. MAIN OUTCOMES AND MEASURES:Scores derived from submitted algorithms were compared with the expert-curated SvH scores, and a baseline model was created for benchmark comparison. Performances were ranked using weighted root mean square error (RMSE). The performance and reproductivity of each algorithm was assessed using Bayes factor from bootstrapped data, and further evaluated with a postchallenge independent validation data set. RESULTS:The RA2-DREAM Challenge received a total of 173 submissions from 26 participants or teams in 7 countries for the leaderboard round, and 13 submissions were included in the final evaluation. The weighted RMSEs metric showed that the winning algorithms produced scores that were very close to the expert-curated SvH scores. Top teams included Team Shirin for subchallenge 1 (weighted RMSE, 0.44), HYL-YFG (Hongyang Li and Yuanfang Guan) subchallenge 2 (weighted RMSE, 0.38), and Gold Therapy for subchallenge 3 (weighted RMSE, 0.43). Bootstrapping/Bayes factor approach and the postchallenge independent validation confirmed the reproducibility and the estimation concordance indices between final evaluation and postchallenge independent validation data set were 0.71 for subchallenge 1, 0.78 for subchallenge 2, and 0.82 for subchallenge 3. CONCLUSIONS AND RELEVANCE:The RA2-DREAM Challenge resulted in the development of algorithms that provide feasible, quick, and accurate methods to quantify joint damage in RA. Ultimately, these methods could help research studies on RA joint damage and may be integrated into electronic health records to help clinicians serve patients better by providing timely, reliable, and quantitative information for making treatment decisions to prevent further damage.
PMID: 36036935
ISSN: 2574-3805
CID: 5822852
Extracellular vesicles carry distinct proteo-transcriptomic signatures that are different from their cancer cell of origin
Chen, Tzu-Yi; Gonzalez-Kozlova, Edgar; Soleymani, Taliah; La Salvia, Sabrina; Kyprianou, Natasha; Sahoo, Susmita; Tewari, Ashutosh K; Cordon-Cardo, Carlos; Stolovitzky, Gustavo; Dogra, Navneet
Circulating extracellular vesicles (EVs) contain molecular footprints-lipids, proteins, RNA, and DNA-from their cell of origin. Consequently, EV-associated RNA and proteins have gained widespread interest as liquid-biopsy biomarkers. Yet, an integrative proteo-transcriptomic landscape of EVs and comparison with their cell of origin remains obscure. Here, we report that EVs enrich distinct proteo-transcriptome that does not linearly correlate with their cell of origin. We show that EVs enrich endosomal and extracellular proteins, small RNA (∼13-200 nucleotides) associated with cell differentiation, development, and Wnt signaling. EVs cargo specific RNAs (RNY3, vtRNA, and MIRLET-7) and their complementary proteins (YBX1, IGF2BP2, and SRSF1/2). To ensure an unbiased and independent analyses, we studied 12 cancer cell lines, matching EVs (inhouse and exRNA database), and serum EVs of patients with prostate cancer. Together, we show that EV-RNA-protein complexes may constitute a functional interaction network to protect and regulate molecular access until a function is achieved.
PMCID:9157216
PMID: 35663013
ISSN: 2589-0042
CID: 5822842
The Fermi-Dirac distribution provides a calibrated probabilistic output for binary classifiers
Kim, Sung-Cheol; Arun, Adith S; Ahsen, Mehmet Eren; Vogel, Robert; Stolovitzky, Gustavo
Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi-Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers. We show that the area under the receiver operating characteristics curve (AUC) in a classification problem is related to the temperature of an equivalent physical system. In a similar manner, the optimal decision threshold between the two classes is associated with the chemical potential of an equivalent physical system. Using our framework, we also derive a closed-form expression to calculate the variance for the AUC of a classifier. Finally, we introduce FiDEL (Fermi-Dirac-based ensemble learning), an ensemble learning algorithm that uses the calibrated nature of the classifier's output probability to combine possibly very different classifiers.
PMCID:8403970
PMID: 34413191
ISSN: 1091-6490
CID: 5822822
A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery
Creason, Allison; Haan, David; Dang, Kristen; Chiotti, Kami E; Inkman, Matthew; Lamb, Andrew; Yu, Thomas; Hu, Yin; Norman, Thea C; Buchanan, Alex; van Baren, Marijke J; Spangler, Ryan; Rollins, M Rick; Spellman, Paul T; Rozanov, Dmitri; Zhang, Jin; Maher, Christopher A; Caloian, Cristian; Watson, John D; Uhrig, Sebastian; Haas, Brian J; Jain, Miten; Akeson, Mark; Ahsen, Mehmet Eren; ,; Stolovitzky, Gustavo; Guinney, Justin; Boutros, Paul C; Stuart, Joshua M; Ellrott, Kyle
The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information.
PMCID:8376800
PMID: 34146471
ISSN: 2405-4720
CID: 5822792
COSIFER: a Python package for the consensus inference of molecular interaction networks
Manica, Matteo; Bunne, Charlotte; Mathis, Roland; Cadow, Joris; Ahsen, Mehmet Eren; Stolovitzky, Gustavo A; Martínez, María Rodríguez
SUMMARY:The advent of high-throughput technologies has provided researchers with measurements of thousands of molecular entities and enable the investigation of the internal regulatory apparatus of the cell. However, network inference from high-throughput data is far from being a solved problem. While a plethora of different inference methods have been proposed, they often lead to non-overlapping predictions, and many of them lack user-friendly implementations to enable their broad utilization. Here, we present Consensus Interaction Network Inference Service (COSIFER), a package and a companion web-based platform to infer molecular networks from expression data using state-of-the-art consensus approaches. COSIFER includes a selection of state-of-the-art methodologies for network inference and different consensus strategies to integrate the predictions of individual methods and generate robust networks. AVAILABILITY AND IMPLEMENTATION:COSIFER Python source code is available at https://github.com/PhosphorylatedRabbits/cosifer. The web service is accessible at https://ibm.biz/cosifer-aas. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
PMCID:8337002
PMID: 33241320
ISSN: 1367-4811
CID: 5822762
Unannotated small RNA clusters associated with circulating extracellular vesicles detect early stage liver cancer
von Felden, Johann; Garcia-Lezana, Teresa; Dogra, Navneet; Gonzalez-Kozlova, Edgar; Ahsen, Mehmet Eren; Craig, Amanda; Gifford, Stacey; Wunsch, Benjamin; Smith, Joshua T; Kim, Sungcheol; Diaz, Jennifer E L; Chen, Xintong; Labgaa, Ismail; Haber, Philipp; Olsen, Reena; Han, Dan; Restrepo, Paula; D'Avola, Delia; Hernandez-Meza, Gabriela; Allette, Kimaada; Sebra, Robert; Saberi, Behnam; Tabrizian, Parissa; Asgharpour, Amon; Dieterich, Douglas; Llovet, Josep M; Cordon-Cardo, Carlos; Tewari, Ash; Schwartz, Myron; Stolovitzky, Gustavo; Losic, Bojan; Villanueva, Augusto
OBJECTIVE:Surveillance tools for early cancer detection are suboptimal, including hepatocellular carcinoma (HCC), and biomarkers are urgently needed. Extracellular vesicles (EVs) have gained increasing scientific interest due to their involvement in tumour initiation and metastasis; however, most extracellular RNA (exRNA) blood-based biomarker studies are limited to annotated genomic regions. DESIGN/METHODS:EVs were isolated with differential ultracentrifugation and integrated nanoscale deterministic lateral displacement arrays (nanoDLD) and quality assessed by electron microscopy, immunoblotting, nanoparticle tracking and deconvolution analysis. Genome-wide sequencing of the largely unexplored small exRNA landscape, including unannotated transcripts, identified and reproducibly quantified small RNA clusters (smRCs). Their key genomic features were delineated across biospecimens and EV isolation techniques in prostate cancer and HCC. Three independent exRNA cancer datasets with a total of 479 samples from 375 patients, including longitudinal samples, were used for this study. RESULTS:ExRNA smRCs were dominated by uncharacterised, unannotated small RNA with a consensus sequence of 20 nt. An unannotated 3-smRC signature was significantly overexpressed in plasma exRNA of patients with HCC (p<0.01, n=157). An independent validation in a phase 2 biomarker case-control study revealed 86% sensitivity and 91% specificity for the detection of early HCC from controls at risk (n=209) (area under the receiver operating curve (AUC): 0.87). The 3-smRC signature was independent of alpha-fetoprotein (p<0.0001) and a composite model yielded an increased AUC of 0.93. CONCLUSION/CONCLUSIONS:These findings directly lead to the prospect of a minimally invasive, blood-only, operator-independent clinical tool for HCC surveillance, thus highlighting the potential of unannotated smRCs for biomarker research in cancer.
PMID: 34321221
ISSN: 1468-3288
CID: 5822812
Crowdsourcing assessment of maternal blood multi-omics for predicting gestational age and preterm birth
Tarca, Adi L; Pataki, Bálint Ármin; Romero, Roberto; Sirota, Marina; Guan, Yuanfang; Kutum, Rintu; Gomez-Lopez, Nardhy; Done, Bogdan; Bhatti, Gaurav; Yu, Thomas; Andreoletti, Gaia; Chaiworapongsa, Tinnakorn; ,; Hassan, Sonia S; Hsu, Chaur-Dong; Aghaeepour, Nima; Stolovitzky, Gustavo; Csabai, Istvan; Costello, James C
Identification of pregnancies at risk of preterm birth (PTB), the leading cause of newborn deaths, remains challenging given the syndromic nature of the disease. We report a longitudinal multi-omics study coupled with a DREAM challenge to develop predictive models of PTB. The findings indicate that whole-blood gene expression predicts ultrasound-based gestational ages in normal and complicated pregnancies (r = 0.83) and, using data collected before 37 weeks of gestation, also predicts the delivery date in both normal pregnancies (r = 0.86) and those with spontaneous preterm birth (r = 0.75). Based on samples collected before 33 weeks in asymptomatic women, our analysis suggests that expression changes preceding preterm prelabor rupture of the membranes are consistent across time points and cohorts and involve leukocyte-mediated immunity. Models built from plasma proteomic data predict spontaneous preterm delivery with intact membranes with higher accuracy and earlier in pregnancy than transcriptomic models (AUROC = 0.76 versus AUROC = 0.6 at 27-33 weeks of gestation).
PMID: 34195686
ISSN: 2666-3791
CID: 5822802