NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:stolog01

Total Results:

139

Nature communications. 2021:12(1).DOI: 10.1038/s41467-021-23165-1

Crowdsourced mapping of unexplored target space of kinase inhibitors

Cichońska, Anna; Ravikumar, Balaguru; Allaway, Robert J; Wan, Fangping; Park, Sungjoon; Isayev, Olexandr; Li, Shuya; Mason, Michael; Lamb, Andrew; Tanoli, Ziaurrehman; Jeon, Minji; Kim, Sunkyu; Popova, Mariya; Capuzzi, Stephen; Zeng, Jianyang; Dang, Kristen; Koytiger, Gregory; Kang, Jaewoo; Wells, Carrow I; Willson, Timothy M; ,; Oprea, Tudor I; Schlessinger, Avner; Drewry, David H; Stolovitzky, Gustavo; Wennerberg, Krister; Guinney, Justin; Aittokallio, Tero

Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.

PMCID:8175708

PMID: 34083538

ISSN: 2041-1723

CID: 5822782

Cell systems. 2021:12(8):827-838.e5.DOI: 10.1016/j.cels.2021.05.021

A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery

Creason, Allison; Haan, David; Dang, Kristen; Chiotti, Kami E; Inkman, Matthew; Lamb, Andrew; Yu, Thomas; Hu, Yin; Norman, Thea C; Buchanan, Alex; van Baren, Marijke J; Spangler, Ryan; Rollins, M Rick; Spellman, Paul T; Rozanov, Dmitri; Zhang, Jin; Maher, Christopher A; Caloian, Cristian; Watson, John D; Uhrig, Sebastian; Haas, Brian J; Jain, Miten; Akeson, Mark; Ahsen, Mehmet Eren; ,; Stolovitzky, Gustavo; Guinney, Justin; Boutros, Paul C; Stuart, Joshua M; Ellrott, Kyle

The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information.

PMCID:8376800

PMID: 34146471

ISSN: 2405-4720

CID: 5822792

Gut. 2021.DOI: 10.1136/gutjnl-2021-325036

Unannotated small RNA clusters associated with circulating extracellular vesicles detect early stage liver cancer

von Felden, Johann; Garcia-Lezana, Teresa; Dogra, Navneet; Gonzalez-Kozlova, Edgar; Ahsen, Mehmet Eren; Craig, Amanda; Gifford, Stacey; Wunsch, Benjamin; Smith, Joshua T; Kim, Sungcheol; Diaz, Jennifer E L; Chen, Xintong; Labgaa, Ismail; Haber, Philipp; Olsen, Reena; Han, Dan; Restrepo, Paula; D'Avola, Delia; Hernandez-Meza, Gabriela; Allette, Kimaada; Sebra, Robert; Saberi, Behnam; Tabrizian, Parissa; Asgharpour, Amon; Dieterich, Douglas; Llovet, Josep M; Cordon-Cardo, Carlos; Tewari, Ash; Schwartz, Myron; Stolovitzky, Gustavo; Losic, Bojan; Villanueva, Augusto

OBJECTIVE:Surveillance tools for early cancer detection are suboptimal, including hepatocellular carcinoma (HCC), and biomarkers are urgently needed. Extracellular vesicles (EVs) have gained increasing scientific interest due to their involvement in tumour initiation and metastasis; however, most extracellular RNA (exRNA) blood-based biomarker studies are limited to annotated genomic regions. DESIGN/METHODS:EVs were isolated with differential ultracentrifugation and integrated nanoscale deterministic lateral displacement arrays (nanoDLD) and quality assessed by electron microscopy, immunoblotting, nanoparticle tracking and deconvolution analysis. Genome-wide sequencing of the largely unexplored small exRNA landscape, including unannotated transcripts, identified and reproducibly quantified small RNA clusters (smRCs). Their key genomic features were delineated across biospecimens and EV isolation techniques in prostate cancer and HCC. Three independent exRNA cancer datasets with a total of 479 samples from 375 patients, including longitudinal samples, were used for this study. RESULTS:ExRNA smRCs were dominated by uncharacterised, unannotated small RNA with a consensus sequence of 20 nt. An unannotated 3-smRC signature was significantly overexpressed in plasma exRNA of patients with HCC (p<0.01, n=157). An independent validation in a phase 2 biomarker case-control study revealed 86% sensitivity and 91% specificity for the detection of early HCC from controls at risk (n=209) (area under the receiver operating curve (AUC): 0.87). The 3-smRC signature was independent of alpha-fetoprotein (p<0.0001) and a composite model yielded an increased AUC of 0.93. CONCLUSION/CONCLUSIONS:These findings directly lead to the prospect of a minimally invasive, blood-only, operator-independent clinical tool for HCC surveillance, thus highlighting the potential of unannotated smRCs for biomarker research in cancer.

PMID: 34321221

ISSN: 1468-3288

CID: 5822812

Proceedings of the National Academy of Sciences of the United States of America (PNAS). 2021:118(34).DOI: 10.1073/pnas.2100761118

The Fermi-Dirac distribution provides a calibrated probabilistic output for binary classifiers

Kim, Sung-Cheol; Arun, Adith S; Ahsen, Mehmet Eren; Vogel, Robert; Stolovitzky, Gustavo

Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi-Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers. We show that the area under the receiver operating characteristics curve (AUC) in a classification problem is related to the temperature of an equivalent physical system. In a similar manner, the optimal decision threshold between the two classes is associated with the chemical potential of an equivalent physical system. Using our framework, we also derive a closed-form expression to calculate the variance for the AUC of a classifier. Finally, we introduce FiDEL (Fermi-Dirac-based ensemble learning), an ensemble learning algorithm that uses the calibrated nature of the classifier's output probability to combine possibly very different classifiers.

PMCID:8403970

PMID: 34413191

ISSN: 1091-6490

CID: 5822822

Frontiers in genetics. 2021:12.DOI: 10.3389/fgene.2021.778416

Open Problems in Extracellular RNA Data Analysis: Insights From an ERCC Online Workshop

Alexander, Roger P; Kitchen, Robert R; Tosar, Juan Pablo; Roth, Matthew; Mestdagh, Pieter; Max, Klaas E A; Rozowsky, Joel; Kaczor-Urbanowicz, Karolina Elżbieta; Chang, Justin; Balaj, Leonora; Losic, Bojan; Van Nostrand, Eric L; LaPlante, Emily; Mateescu, Bogdan; White, Brian S; Yu, Rongshan; Milosavljevic, Aleksander; Stolovitzky, Gustavo; Spengler, Ryan M

We now know RNA can survive the harsh environment of biofluids when encapsulated in vesicles or by associating with lipoproteins or RNA binding proteins. These extracellular RNA (exRNA) play a role in intercellular signaling, serve as biomarkers of disease, and form the basis of new strategies for disease treatment. The Extracellular RNA Communication Consortium (ERCC) hosted a two-day online workshop (April 19-20, 2021) on the unique challenges of exRNA data analysis. The goal was to foster an open dialog about best practices and discuss open problems in the field, focusing initially on small exRNA sequencing data. Video recordings of workshop presentations and discussions are available (https://exRNA.org/exRNAdata2021-videos/). There were three target audiences: experimentalists who generate exRNA sequencing data, computational and data scientists who work with those groups to analyze their data, and experimental and data scientists new to the field. Here we summarize issues explored during the workshop, including progress on an effort to develop an exRNA data analysis challenge to engage the community in solving some of these open problems.

PMCID:8762274

PMID: 35047007

ISSN: 1664-8021

CID: 5822832

Life science alliance. 2020:3(11).DOI: 10.26508/lsa.202000867

Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data

Tanevski, Jovan; Nguyen, Thin; Truong, Buu; Karaiskos, Nikos; Ahsen, Mehmet Eren; Zhang, Xinyu; Shu, Chang; Xu, Ke; Liang, Xiaoyu; Hu, Ying; Pham, Hoang Vv; Xiaomei, Li; Le, Thuc D; Tarca, Adi L; Bhatti, Gaurav; Romero, Roberto; Karathanasis, Nestoras; Loher, Phillipe; Chen, Yang; Ouyang, Zhengqing; Mao, Disheng; Zhang, Yuping; Zand, Maryam; Ruan, Jianhua; Hafemeister, Christoph; Qiu, Peng; Tran, Duc; Nguyen, Tin; Gabor, Attila; Yu, Thomas; Guinney, Justin; Glaab, Enrico; Krause, Roland; Banda, Peter; ,; Stolovitzky, Gustavo; Rajewsky, Nikolaus; Saez-Rodriguez, Julio; Meyer, Pablo

Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair-rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues.

PMCID:7536825

PMID: 32972997

ISSN: 2575-1077

CID: 5822752

Journal of computational biology. 2020:27(9):1337-1340.DOI: 10.1089/cmb.2019.0348

R/PY-SUMMA: An R/Python Package for Unsupervised Ensemble Learning for Binary Classification Problems in Bioinformatics

Ahsen, Mehmet Eren; Vogel, Robert; Stolovitzky, Gustavo A

The increasing availability of complex data in biology and medicine has promoted the use of machine learning in classification tasks to address important problems in translational and fundamental science. Two important obstacles, however, may limit the unraveling of the full potential of machine learning in these fields: the lack of generalization of the resulting models and the limited number of labeled data sets in some applications. To address these important problems, we developed an unsupervised ensemble algorithm called strategy for unsupervised multiple method aggregation (SUMMA). By virtue of being an ensemble method, SUMMA is more robust to generalization than the predictions it combines. By virtue of being unsupervised, SUMMA does not require labeled data. SUMMA receives as input predictions from a diversity of models and estimates their classification performance even when labeled data are unavailable. It then uses these performance estimates to combine these different predictions into an ensemble model. SUMMA can be applied to a variety of binary classification problems in bioinformatics including but not limited to gene network inference, cancer diagnostics, drug response prediction, somatic mutation, and differential expression calling. In this application note, we introduce the R/PY-SUMMA packages, available in R or Python, that implement the SUMMA algorithm.

PMID: 31905016

ISSN: 1557-8666

CID: 5822722

ACS nano. 2020:14(9):10784-10795.DOI: 10.1021/acsnano.0c05186

Deterministic Lateral Displacement: Challenges and Perspectives

Hochstetter, Axel; Vernekar, Rohan; Austin, Robert H; Becker, Holger; Beech, Jason P; Fedosov, Dmitry A; Gompper, Gerhard; Kim, Sung-Cheol; Smith, Joshua T; Stolovitzky, Gustavo; Tegenfeldt, Jonas O; Wunsch, Benjamin H; Zeming, Kerwin K; Krüger, Timm; Inglis, David W

The advent of microfluidics in the 1990s promised a revolution in multiple industries from healthcare to chemical processing. Deterministic lateral displacement (DLD) is a continuous-flow microfluidic particle separation method discovered in 2004 that has been applied successfully and widely to the separation of blood cells, yeast, spores, bacteria, viruses, DNA, droplets, and more. Deterministic lateral displacement is conceptually simple and can deliver consistent performance over a wide range of flow rates and particle concentrations. Despite wide use and in-depth study, DLD has not yet been fully elucidated or optimized, with different approaches to the same problem yielding varying results. We endeavor here to provide up-to-date expert opinion on the state-of-art and current fundamental, practical, and commercial challenges with DLD as well as describe experimental and modeling opportunities. Because these challenges and opportunities arise from constraints on hydrodynamics, fabrication, and operation at the micro- and nanoscale, we expect this Perspective to serve as a guide for the broader micro- and nanofluidic community to identify and to address open questions in the field.

PMID: 32844655

ISSN: 1936-086x

CID: 5822732

eLife. 2020:9.DOI: 10.7554/eLife.52707

The transcriptomic response of cells to a drug combination is more than the sum of the responses to the monotherapies

Diaz, Jennifer El; Ahsen, Mehmet Eren; Schaffter, Thomas; Chen, Xintong; Realubit, Ronald B; Karan, Charles; Califano, Andrea; Losic, Bojan; Stolovitzky, Gustavo

Our ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature of the synergistic combination was unique relative to that of either constituent monotherapy. The sequential activation of transcription factors in time in the gene regulatory network was implicated. The nature of this transcriptional cascade suggests that drug synergy may ensue when the transcriptional responses elicited by two unrelated individual drugs are correlated. We used these results as the basis of a simple prediction algorithm attaining an AUROC of 0.77 in the prediction of synergistic drug combinations in an independent dataset.

PMCID:7546737

PMID: 32945258

ISSN: 2050-084x

CID: 5822742

Lab on a chip. 2019:19(9):1567-1578.DOI: 10.1039/c8lc01408f

Gel-on-a-chip: continuous, velocity-dependent DNA separation using nanoscale lateral displacement

Wunsch, Benjamin H; Kim, Sung-Cheol; Gifford, Stacey M; Astier, Yann; Wang, Chao; Bruce, Robert L; Patel, Jyotica V; Duch, Elizabeth A; Dawes, Simon; Stolovitzky, Gustavo; Smith, Joshua T

We studied the trajectories of polymers being advected while diffusing in a pressure driven flow along a periodic pillar nanostructure known as nanoscale deterministic lateral displacement (nanoDLD) array. We found that polymers follow different trajectories depending on their length, flow velocity and pillar array geometry, demonstrating that nanoDLD devices can be used as a continuous polymer fractionation tool. As a model system, we used double-stranded DNA (dsDNA) with various contour lengths and demonstrated that dsDNA in the range of 100-10 000 base pairs (bp) can be separated with a size-selective resolution of 200 bp. In contrast to spherical colloids, a polymer elongates by shear flow and the angle of polymer trajectories with respect to the mean flow direction decreases as the mean flow velocity increases. We developed a phenomenological model that explains the qualitative dependence of the polymer trajectories on the gap size and on the flow velocity. Using this model, we found the optimal separation conditions for dsDNA of different sizes and demonstrated the separation and extraction of dsDNA fragments with over 75% recovery and 3-fold concentration. Importantly, this velocity dependence provides a means of fine-tuning the separation efficiency and resolution, independent of the nanoDLD pillar geometry.

PMID: 30920559

ISSN: 1473-0189

CID: 5822652