Searched for: in-biosketch:yes
person:stolog01
Crowdsourced mapping of unexplored target space of kinase inhibitors
Cichońska, Anna; Ravikumar, Balaguru; Allaway, Robert J; Wan, Fangping; Park, Sungjoon; Isayev, Olexandr; Li, Shuya; Mason, Michael; Lamb, Andrew; Tanoli, Ziaurrehman; Jeon, Minji; Kim, Sunkyu; Popova, Mariya; Capuzzi, Stephen; Zeng, Jianyang; Dang, Kristen; Koytiger, Gregory; Kang, Jaewoo; Wells, Carrow I; Willson, Timothy M; ,; Oprea, Tudor I; Schlessinger, Avner; Drewry, David H; Stolovitzky, Gustavo; Wennerberg, Krister; Guinney, Justin; Aittokallio, Tero
Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.
PMCID:8175708
PMID: 34083538
ISSN: 2041-1723
CID: 5822782
Evaluation of artificial intelligence systems for assisting neurologists with fast and accurate annotations of scalp electroencephalography data
Roy, Subhrajit; Kiral, Isabell; Mirmomeni, Mahtab; Mummert, Todd; Braz, Alan; Tsay, Jason; Tang, Jianbin; Asif, Umar; Schaffter, Thomas; Ahsen, Mehmet Eren; Iwamori, Toshiya; Yanagisawa, Hiroki; Poonawala, Hasan; Madan, Piyush; Qin, Yong; Picone, Joseph; Obeid, Iyad; Marques, Bruno De Assis; Maetschke, Stefan; Khalaf, Rania; Rosen-Zvi, Michal; Stolovitzky, Gustavo; Harrer, Stefan; ,
BACKGROUND:Assistive automatic seizure detection can empower human annotators to shorten patient monitoring data review times. We present a proof-of-concept for a seizure detection system that is sensitive, automated, patient-specific, and tunable to maximise sensitivity while minimizing human annotation times. The system uses custom data preparation methods, deep learning analytics and electroencephalography (EEG) data. METHODS:Scalp EEG data of 365 patients containing 171,745 s ictal and 2,185,864 s interictal samples obtained from clinical monitoring systems were analysed as part of a crowdsourced artificial intelligence (AI) challenge. Participants were tasked to develop an ictal/interictal classifier with high sensitivity and low false alarm rates. We built a challenge platform that prevented participants from downloading or directly accessing the data while allowing crowdsourced model development. FINDINGS/RESULTS:The automatic detection system achieved tunable sensitivities between 75.00% and 91.60% allowing a reduction in the amount of raw EEG data to be reviewed by a human annotator by factors between 142x, and 22x respectively. The algorithm enables instantaneous reviewer-managed optimization of the balance between sensitivity and the amount of raw EEG data to be reviewed. INTERPRETATION/CONCLUSIONS:This study demonstrates the utility of deep learning for patient-specific seizure detection in EEG data. Furthermore, deep learning in combination with a human reviewer can provide the basis for an assistive data labelling system lowering the time of manual review while maintaining human expert annotation performance. FUNDING/BACKGROUND:IBM employed all IBM Research authors. Temple University employed all Temple University authors. The Icahn School of Medicine at Mount Sinai employed Eren Ahsen. The corresponding authors Stefan Harrer and Gustavo Stolovitzky declare that they had full access to all the data in the study and that they had final responsibility for the decision to submit for publication.
PMID: 33745882
ISSN: 2352-3964
CID: 5822772
Open Problems in Extracellular RNA Data Analysis: Insights From an ERCC Online Workshop
Alexander, Roger P; Kitchen, Robert R; Tosar, Juan Pablo; Roth, Matthew; Mestdagh, Pieter; Max, Klaas E A; Rozowsky, Joel; Kaczor-Urbanowicz, Karolina Elżbieta; Chang, Justin; Balaj, Leonora; Losic, Bojan; Van Nostrand, Eric L; LaPlante, Emily; Mateescu, Bogdan; White, Brian S; Yu, Rongshan; Milosavljevic, Aleksander; Stolovitzky, Gustavo; Spengler, Ryan M
We now know RNA can survive the harsh environment of biofluids when encapsulated in vesicles or by associating with lipoproteins or RNA binding proteins. These extracellular RNA (exRNA) play a role in intercellular signaling, serve as biomarkers of disease, and form the basis of new strategies for disease treatment. The Extracellular RNA Communication Consortium (ERCC) hosted a two-day online workshop (April 19-20, 2021) on the unique challenges of exRNA data analysis. The goal was to foster an open dialog about best practices and discuss open problems in the field, focusing initially on small exRNA sequencing data. Video recordings of workshop presentations and discussions are available (https://exRNA.org/exRNAdata2021-videos/). There were three target audiences: experimentalists who generate exRNA sequencing data, computational and data scientists who work with those groups to analyze their data, and experimental and data scientists new to the field. Here we summarize issues explored during the workshop, including progress on an effort to develop an exRNA data analysis challenge to engage the community in solving some of these open problems.
PMCID:8762274
PMID: 35047007
ISSN: 1664-8021
CID: 5822832
Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data
Tanevski, Jovan; Nguyen, Thin; Truong, Buu; Karaiskos, Nikos; Ahsen, Mehmet Eren; Zhang, Xinyu; Shu, Chang; Xu, Ke; Liang, Xiaoyu; Hu, Ying; Pham, Hoang Vv; Xiaomei, Li; Le, Thuc D; Tarca, Adi L; Bhatti, Gaurav; Romero, Roberto; Karathanasis, Nestoras; Loher, Phillipe; Chen, Yang; Ouyang, Zhengqing; Mao, Disheng; Zhang, Yuping; Zand, Maryam; Ruan, Jianhua; Hafemeister, Christoph; Qiu, Peng; Tran, Duc; Nguyen, Tin; Gabor, Attila; Yu, Thomas; Guinney, Justin; Glaab, Enrico; Krause, Roland; Banda, Peter; ,; Stolovitzky, Gustavo; Rajewsky, Nikolaus; Saez-Rodriguez, Julio; Meyer, Pablo
Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair-rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues.
PMCID:7536825
PMID: 32972997
ISSN: 2575-1077
CID: 5822752
Deterministic Lateral Displacement: Challenges and Perspectives
Hochstetter, Axel; Vernekar, Rohan; Austin, Robert H; Becker, Holger; Beech, Jason P; Fedosov, Dmitry A; Gompper, Gerhard; Kim, Sung-Cheol; Smith, Joshua T; Stolovitzky, Gustavo; Tegenfeldt, Jonas O; Wunsch, Benjamin H; Zeming, Kerwin K; Krüger, Timm; Inglis, David W
The advent of microfluidics in the 1990s promised a revolution in multiple industries from healthcare to chemical processing. Deterministic lateral displacement (DLD) is a continuous-flow microfluidic particle separation method discovered in 2004 that has been applied successfully and widely to the separation of blood cells, yeast, spores, bacteria, viruses, DNA, droplets, and more. Deterministic lateral displacement is conceptually simple and can deliver consistent performance over a wide range of flow rates and particle concentrations. Despite wide use and in-depth study, DLD has not yet been fully elucidated or optimized, with different approaches to the same problem yielding varying results. We endeavor here to provide up-to-date expert opinion on the state-of-art and current fundamental, practical, and commercial challenges with DLD as well as describe experimental and modeling opportunities. Because these challenges and opportunities arise from constraints on hydrodynamics, fabrication, and operation at the micro- and nanoscale, we expect this Perspective to serve as a guide for the broader micro- and nanofluidic community to identify and to address open questions in the field.
PMID: 32844655
ISSN: 1936-086x
CID: 5822732
The transcriptomic response of cells to a drug combination is more than the sum of the responses to the monotherapies
Diaz, Jennifer El; Ahsen, Mehmet Eren; Schaffter, Thomas; Chen, Xintong; Realubit, Ronald B; Karan, Charles; Califano, Andrea; Losic, Bojan; Stolovitzky, Gustavo
Our ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature of the synergistic combination was unique relative to that of either constituent monotherapy. The sequential activation of transcription factors in time in the gene regulatory network was implicated. The nature of this transcriptional cascade suggests that drug synergy may ensue when the transcriptional responses elicited by two unrelated individual drugs are correlated. We used these results as the basis of a simple prediction algorithm attaining an AUROC of 0.77 in the prediction of synergistic drug combinations in an independent dataset.
PMCID:7546737
PMID: 32945258
ISSN: 2050-084x
CID: 5822742
R/PY-SUMMA: An R/Python Package for Unsupervised Ensemble Learning for Binary Classification Problems in Bioinformatics
Ahsen, Mehmet Eren; Vogel, Robert; Stolovitzky, Gustavo A
The increasing availability of complex data in biology and medicine has promoted the use of machine learning in classification tasks to address important problems in translational and fundamental science. Two important obstacles, however, may limit the unraveling of the full potential of machine learning in these fields: the lack of generalization of the resulting models and the limited number of labeled data sets in some applications. To address these important problems, we developed an unsupervised ensemble algorithm called strategy for unsupervised multiple method aggregation (SUMMA). By virtue of being an ensemble method, SUMMA is more robust to generalization than the predictions it combines. By virtue of being unsupervised, SUMMA does not require labeled data. SUMMA receives as input predictions from a diversity of models and estimates their classification performance even when labeled data are unavailable. It then uses these performance estimates to combine these different predictions into an ensemble model. SUMMA can be applied to a variety of binary classification problems in bioinformatics including but not limited to gene network inference, cancer diagnostics, drug response prediction, somatic mutation, and differential expression calling. In this application note, we introduce the R/PY-SUMMA packages, available in R or Python, that implement the SUMMA algorithm.
PMID: 31905016
ISSN: 1557-8666
CID: 5822722
NeTFactor, a framework for identifying transcriptional regulators of gene expression-based biomarkers
Ahsen, Mehmet Eren; Chun, Yoojin; Grishin, Alexander; Grishina, Galina; Stolovitzky, Gustavo; Pandey, Gaurav; Bunyavanich, Supinda
Biological and regulatory mechanisms underlying many multi-gene expression-based disease biomarkers are often not readily evident. We describe an innovative framework, NeTFactor, that combines network analyses with gene expression data to identify transcription factors (TFs) that significantly and maximally regulate such a biomarker. NeTFactor uses a computationally-inferred context-specific gene regulatory network and applies topological, statistical, and optimization methods to identify regulator TFs. Application of NeTFactor to a multi-gene expression-based asthma biomarker identified ETS translocation variant 4 (ETV4) and peroxisome proliferator-activated receptor gamma (PPARG) as the biomarker's most significant TF regulators. siRNA-based knock down of these TFs in an airway epithelial cell line model demonstrated significant reduction of cytokine expression relevant to asthma, validating NeTFactor's top-scoring findings. While PPARG has been associated with airway inflammation, ETV4 has not yet been implicated in asthma, thus indicating the possibility of novel, disease-relevant discovery by NeTFactor. We also show that NeTFactor's results are robust when the gene regulatory network and biomarker are derived from independent data. Additionally, our application of NeTFactor to a different disease biomarker identified TF regulators of interest. These results illustrate that the application of NeTFactor to multi-gene expression-based biomarkers could yield valuable insights into regulatory mechanisms and biological processes underlying disease.
PMCID:6737052
PMID: 31506535
ISSN: 2045-2322
CID: 5822712
Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges [Letter]
Ellrott, Kyle; Buchanan, Alex; Creason, Allison; Mason, Michael; Schaffter, Thomas; Hoff, Bruce; Eddy, James; Chilton, John M; Yu, Thomas; Stuart, Joshua M; Saez-Rodriguez, Julio; Stolovitzky, Gustavo; Boutros, Paul C; Guinney, Justin
Challenges are achieving broad acceptance for addressing many biomedical questions and enabling tool assessment. But ensuring that the methods evaluated are reproducible and reusable is complicated by the diversity of software architectures, input and output file formats, and computing environments. To mitigate these problems, some challenges have leveraged new virtualization and compute methods, requiring participants to submit cloud-ready software packages. We review recent data challenges with innovative approaches to model reproducibility and data sharing, and outline key lessons for improving quantitative biomedical data analysis through crowd-sourced benchmarking challenges.
PMCID:6737594
PMID: 31506093
ISSN: 1474-760x
CID: 5822702
Assessment of network module identification across complex diseases
Choobdar, Sarvenaz; Ahsen, Mehmet E; Crawford, Jake; Tomasoni, Mattia; Fang, Tao; Lamparter, David; Lin, Junyuan; Hescott, Benjamin; Hu, Xiaozhe; Mercer, Johnathan; Natoli, Ted; Narayan, Rajiv; ,; Subramanian, Aravind; Zhang, Jitao D; Stolovitzky, Gustavo; Kutalik, Zoltán; Lage, Kasper; Slonim, Donna K; Saez-Rodriguez, Julio; Cowen, Lenore J; Bergmann, Sven; Marbach, Daniel
Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.
PMCID:6719725
PMID: 31471613
ISSN: 1548-7105
CID: 5822692