NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:stolog01

Total Results:

139

Scientific data. 2014:1.DOI: 10.1038/sdata.2014.9

The species translation challenge-a systems biology perspective on human and rat bronchial epithelial cells

Poussin, Carine; Mathis, Carole; Alexopoulos, Leonidas G; Messinis, Dimitris E; Dulize, Rémi H J; Belcastro, Vincenzo; Melas, Ioannis N; Sakellaropoulos, Theodore; Rhrissorrakrai, Kahn; Bilal, Erhan; Meyer, Pablo; Talikka, Marja; Boué, Stéphanie; Norel, Raquel; Rice, John J; Stolovitzky, Gustavo; Ivanov, Nikolai V; Peitsch, Manuel C; Hoeng, Julia

The biological responses to external cues such as drugs, chemicals, viruses and hormones, is an essential question in biomedicine and in the field of toxicology, and cannot be easily studied in humans. Thus, biomedical research has continuously relied on animal models for studying the impact of these compounds and attempted to 'translate' the results to humans. In this context, the SBV IMPROVER (Systems Biology Verification for Industrial Methodology for PROcess VErification in Research) collaborative initiative, which uses crowd-sourcing techniques to address fundamental questions in systems biology, invited scientists to deploy their own computational methodologies to make predictions on species translatability. A multi-layer systems biology dataset was generated that was comprised of phosphoproteomics, transcriptomics and cytokine data derived from normal human (NHBE) and rat (NRBE) bronchial epithelial cells exposed in parallel to more than 50 different stimuli under identical conditions. The present manuscript describes in detail the experimental settings, generation, processing and quality control analysis of the multi-layer omics dataset accessible in public repositories for further intra- and inter-species translation studies.

PMCID:4322580

PMID: 25977767

ISSN: 2052-4463

CID: 5822402

Nature biotechnology. 2013:31(2):126-34.DOI: 10.1038/nbt.2486

Evaluation of methods for modeling transcription factor sequence specificity

Weirauch, Matthew T; Cote, Atina; Norel, Raquel; Annala, Matti; Zhao, Yue; Riley, Todd R; Saez-Rodriguez, Julio; Cokelaer, Thomas; Vedenko, Anastasia; Talukder, Shaheynoor; ,; Bussemaker, Harmen J; Morris, Quaid D; Bulyk, Martha L; Stolovitzky, Gustavo; Hughes, Timothy R

Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein's DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro-derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.

PMCID:3687085

PMID: 23354101

ISSN: 1546-1696

CID: 5822142

Nanotechnology. 2013:24(19).DOI: 10.1088/0957-4484/24/19/195702

An electro-hydrodynamics-based model for the ionic conductivity of solid-state nanopores during DNA translocation

Luan, Binquan; Stolovitzky, Gustavo

A solid-state nanopore can be used to sense DNA (or other macromolecules) by monitoring ion-current changes that result from translocation of the molecule through the pore. When transiting a nanopore, the highly negatively charged DNA interacts with a nanopore both electrically and hydrodynamically, causing a current blockage or a current enhancement at different ion concentrations. This effect was previously characterized using a phenomenological model that can be considered as the limit of the electro-hydrodynamics model presented here. We show theoretically that the effect of surface charge of a nanopore (or electro-osmotic effect) can be equivalently treated as modifications of electrophoretic mobilities of ions in the pore, providing an improved physical understanding of the current blockage (or enhancement).

PMCID:3681960

PMID: 23579206

ISSN: 1361-6528

CID: 5822162

Science translational medicine. 2013:5(181).DOI: 10.1126/scitranslmed.3006112

Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer

Margolin, Adam A; Bilal, Erhan; Huang, Erich; Norman, Thea C; Ottestad, Lars; Mecham, Brigham H; Sauerwine, Ben; Kellen, Michael R; Mangravite, Lara M; Furia, Matthew D; Vollan, Hans Kristian Moen; Rueda, Oscar M; Guinney, Justin; Deflaux, Nicole A; Hoff, Bruce; Schildwachter, Xavier; Russnes, Hege G; Park, Daehoon; Vang, Veronica O; Pirtle, Tyler; Youseff, Lamia; Citro, Craig; Curtis, Christina; Kristensen, Vessela N; Hellerstein, Joseph; Friend, Stephen H; Stolovitzky, Gustavo; Aparicio, Samuel; Caldas, Carlos; Børresen-Dale, Anne-Lise

Although molecular prognostics in breast cancer are among the most successful examples of translating genomic analysis to clinical applications, optimal approaches to breast cancer clinical risk prediction remain controversial. The Sage Bionetworks-DREAM Breast Cancer Prognosis Challenge (BCC) is a crowdsourced research study for breast cancer prognostic modeling using genome-scale data. The BCC provided a community of data analysts with a common platform for data access and blinded evaluation of model accuracy in predicting breast cancer survival on the basis of gene expression data, copy number data, and clinical covariates. This approach offered the opportunity to assess whether a crowdsourced community Challenge would generate models of breast cancer prognosis commensurate with or exceeding current best-in-class approaches. The BCC comprised multiple rounds of blinded evaluations on held-out portions of data on 1981 patients, resulting in more than 1400 models submitted as open source code. Participants then retrained their models on the full data set of 1981 samples and submitted up to five models for validation in a newly generated data set of 184 breast cancer patients. Analysis of the BCC results suggests that the best-performing modeling strategy outperformed previously reported methods in blinded evaluations; model performance was consistent across several independent evaluations; and aggregating community-developed models achieved performance on par with the best-performing individual models.

PMID: 23596205

ISSN: 1946-6242

CID: 5822172

Journal of computational biology. 2013:20(5):373-4.DOI: 10.1089/cmb.2013.008p

Preface: RECOMB systems biology, regulatory genomics, and DREAM 2012 special issue [Editorial]

Califano, Andrea; Kellis, Manolis; Stolovitzky, Gustavo

PMID: 23641866

ISSN: 1557-8666

CID: 5822182

PLoS computational biology. 2013:9(5).DOI: 10.1371/journal.pcbi.1003047

Improving breast cancer survival analysis through competition-based multidimensional modeling

Bilal, Erhan; Dutkowski, Janusz; Guinney, Justin; Jang, In Sock; Logsdon, Benjamin A; Pandey, Gaurav; Sauerwine, Benjamin A; Shimoni, Yishai; Moen Vollan, Hans Kristian; Mecham, Brigham H; Rueda, Oscar M; Tost, Jorg; Curtis, Christina; Alvarez, Mariano J; Kristensen, Vessela N; Aparicio, Samuel; Børresen-Dale, Anne-Lise; Caldas, Carlos; Califano, Andrea; Friend, Stephen H; Ideker, Trey; Schadt, Eric E; Stolovitzky, Gustavo A; Margolin, Adam A

Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.

PMCID:3649990

PMID: 23671412

ISSN: 1553-7358

CID: 5822192

Genome research. 2013:23(11):1928-37.DOI: 10.1101/gr.157420.113

Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach

Meyer, Pablo; Siwo, Geoffrey; Zeevi, Danny; Sharon, Eilon; Norel, Raquel; ,; Segal, Eran; Stolovitzky, Gustavo

The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites.

PMCID:3814892

PMID: 23950146

ISSN: 1549-5469

CID: 5822202

Bioinformatics. 2013:29(22):2892-9.DOI: 10.1093/bioinformatics/btt492

Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge

Tarca, Adi L; Lauria, Mario; Unger, Michael; Bilal, Erhan; Boue, Stephanie; Kumar Dey, Kushal; Hoeng, Julia; Koeppl, Heinz; Martin, Florian; Meyer, Pablo; Nandy, Preetam; Norel, Raquel; Peitsch, Manuel; Rice, Jeremy J; Romero, Roberto; Stolovitzky, Gustavo; Talikka, Marja; Xiang, Yang; Zechner, Christoph; ,

MOTIVATION/BACKGROUND:After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein. RESULTS:Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team. The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e.g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams. AVAILABILITY/BACKGROUND:The lung cancer dataset is available from Gene Expression Omnibus (accession, GSE43580). The maPredictDSC R package implementing the approach of the best overall team is available at www.bioconductor.org or http://bioinformaticsprb.med.wayne.edu/.

PMCID:3810846

PMID: 23966112

ISSN: 1367-4811

CID: 5822212

Bioinformatics & biology insights. 2013:7:307-25.DOI: 10.4137/BBI.S12932

On Crowd-verification of Biological Networks

,; Ansari, Sam; Binder, Jean; Boue, Stephanie; Di Fabio, Anselmo; Hayes, William; Hoeng, Julia; Iskandar, Anita; Kleiman, Robin; Norel, Raquel; O'Neel, Bruce; Peitsch, Manuel C; Poussin, Carine; Pratt, Dexter; Rhrissorrakrai, Kahn; Schlage, Walter K; Stolovitzky, Gustavo; Talikka, Marja

Biological networks with a structured syntax are a powerful way of representing biological information generated from high density data; however, they can become unwieldy to manage as their size and complexity increase. This article presents a crowd-verification approach for the visualization and expansion of biological networks. Web-based graphical interfaces allow visualization of causal and correlative biological relationships represented using Biological Expression Language (BEL). Crowdsourcing principles enable participants to communally annotate these relationships based on literature evidences. Gamification principles are incorporated to further engage domain experts throughout biology to gather robust peer-reviewed information from which relationships can be identified and verified. The resulting network models will represent the current status of biological knowledge within the defined boundaries, here processes related to human lung disease. These models are amenable to computational analysis. For some period following conclusion of the challenge, the published models will remain available for continuous use and expansion by the scientific community.

PMCID:3798292

PMID: 24151423

ISSN: 1177-9322

CID: 5822222

Nanoscale. 2012:4(4):1068-77.DOI: 10.1039/c1nr11201e

Slowing and controlling the translocation of DNA in a solid-state nanopore

Luan, Binquan; Stolovitzky, Gustavo; Martyna, Glenn

DNA sequencing methods based on nanopores could potentially represent a low-cost and high-throughput pathway to practical genomics, by replacing current sequencing methods based on synthesis that are limited in speed and cost. The success of nanopore sequencing techniques requires the solution to two fundamental problems: (1) sensing each nucleotide of a DNA strand, in sequence, as it passes through a nanopore; (2) delivering each nucleotide in a DNA strand, in turn, to a sensing site within the nanopore in a controlled manner. It has been demonstrated that a DNA nucleotide can be sensed using electric signals, such as ionic current changes caused by nucleotide blockage at a constriction region in a protein pore or a tunneling current through the nucleotide-bridged gap of two nanoelectrodes built near a solid-state nanopore. However, it is not yet clear how each nucleotide in a DNA strand can be delivered in turn to a sensing site and held there for a sufficient time to ensure high fidelity sensing. This latter problem has been addressed by modifying macroscopic properties, such as a solvent viscosity, ion concentration or temperature. Also, the DNA transistor, a solid state nanopore dressed with a series of metal-dielectric layers has been proposed as a solution. Molecular dynamics simulations provide the means to study and to understand DNA transport in nanopores microscopically. In this article, we review computational studies on how to slow down and control the DNA translocation through a solid-state nanopore.

PMCID:3543692

PMID: 22081018

ISSN: 2040-3372

CID: 5822092