Searched for: in-biosketch:yes
person:stolog01
A noise model for mass spectrometry based proteomics
Du, Peicheng; Stolovitzky, Gustavo; Horvatovich, Peter; Bischoff, Rainer; Lim, Jihyeon; Suits, Frank
MOTIVATION/BACKGROUND:Mass spectrometry data are subjected to considerable noise. Good noise models are required for proper detection and quantification of peptides. We have characterized noise in both quadrupole time-of-flight (Q-TOF) and ion trap data, and have constructed models for the noise. RESULTS:We find that the noise in Q-TOF data from Applied Biosystems QSTAR fits well to a combination of multinomial and Poisson model with detector dead-time correction. In comparison, ion trap noise from Agilent MSD-Trap-SL is larger than the Q-TOF noise and is proportional to Poisson noise. We then demonstrate that the noise model can be used to improve deisotoping for peptide detection, by estimating appropriate cutoffs of the goodness of fit parameter at prescribed error rates. The noise models also have implications in noise reduction, retention time alignment and significance testing for biomarker discovery.
PMID: 18353791
ISSN: 1367-4811
CID: 5821922
Ordered cyclic motifs contribute to dynamic stability in biological and engineered networks
Ma'ayan, Avi; Cecchi, Guillermo A; Wagner, John; Rao, A Ravi; Iyengar, Ravi; Stolovitzky, Gustavo
Representation and analysis of complex biological and engineered systems as directed networks is useful for understanding their global structure/function organization. Enrichment of network motifs, which are over-represented subgraphs in real networks, can be used for topological analysis. Because counting network motifs is computationally expensive, only characterization of 3- to 5-node motifs has been previously reported. In this study we used a supercomputer to analyze cyclic motifs made of 3-20 nodes for 6 biological and 3 technological networks. Using tools from statistical physics, we developed a theoretical framework for characterizing the ensemble of cyclic motifs in real networks. We have identified a generic property of real complex networks, antiferromagnetic organization, which is characterized by minimal directional coherence of edges along cyclic subgraphs, such that consecutive links tend to have opposing direction. As a consequence, we find that the lack of directional coherence in cyclic motifs leads to depletion in feedback loops, where the number of nodes affected by feedback loops appears to be at a local minimum compared with surrogate shuffled networks. This topology provides more dynamic stability in large networks.
PMCID:2614745
PMID: 19033453
ISSN: 1091-6490
CID: 5821932
A single nucleotide polymorphism in the MDM2 gene disrupts the oscillation of p53 and MDM2 levels in cells
Hu, Wenwei; Feng, Zhaohui; Ma, Lan; Wagner, John; Rice, J Jeremy; Stolovitzky, Gustavo; Levine, Arnold J
Oscillations of both p53 and MDM2 proteins have been observed in cells after exposure to stress. A mathematical model describing these oscillations predicted that oscillations occur only at selected levels of p53 and MDM2 proteins. This model prediction suggests that oscillations will disappear in cells containing high levels of MDM2 as observed with a single nucleotide polymorphism in the MDM2 gene (SNP309). The effect of SNP309 upon the p53-MDM2 oscillation was examined in various human cell lines and the oscillations were observed in the cells with at least one wild-type allele for SNP309 (T/T or T/G) but not in cells homozygous for SNP309 (G/G). Furthermore, estrogen preferentially stimulated the transcription of MDM2 from SNP309 G allele and increased the levels of MDM2 protein in estrogen-responsive cells homozygous for SNP309 (G/G). These results suggest the possibility that SNP309 G allele may contribute to gender-specific tumorigenesis through further elevating the MDM2 levels and disrupting the p53-MDM2 oscillation. Furthermore, using the H1299-HW24 cells expressing wild-type p53 under a tetracycline-regulated promoter, the p53-MDM2 oscillation was observed only when p53 levels were in a specific range, and DNA damage was found to be necessary for triggering the p53-MDM2 oscillation. This study shows that higher levels of MDM2 in cells homozygous for SNP309 (G/G) do not permit coordinated p53-MDM2 oscillation after stress, which might contribute to decreased efficiency of the p53 pathway and correlates with a clinical phenotype (i.e., the development of cancers at earlier age of onset in female).
PMID: 17363597
ISSN: 0008-5472
CID: 5821882
Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications
Yu, Haiyuan; Jansen, Ronald; Stolovitzky, Gustavo; Gerstein, Mark
MOTIVATION/BACKGROUND:Many classifications of protein function such as Gene Ontology (GO) are organized in directed acyclic graph (DAG) structures. In these classifications, the proteins are terminal leaf nodes; the categories 'above' them are functional annotations at various levels of specialization and the computation of a numerical measure of relatedness between two arbitrary proteins is an important proteomics problem. Moreover, analogous problems are important in other contexts in large-scale information organization--e.g. the Wikipedia online encyclopedia and the Yahoo and DMOZ web page classification schemes. RESULTS:Here we develop a simple probabilistic approach for computing this relatedness quantity, which we call the total ancestry method. Our measure is based on counting the number of leaf nodes that share exactly the same set of 'higher up' category nodes in comparison to the total number of classified pairs (i.e. the chance for the same total ancestry). We show such a measure is associated with a power-law distribution, allowing for the quick assessment of the statistical significance of shared functional annotations. We formally compare it with other quantitative functional similarity measures (such as, shortest path within a DAG, lowest common ancestor shared and Azuaje's information-theoretic similarity) and provide concrete metrics to assess differences. Finally, we provide a practical implementation for our total ancestry measure for GO and the MIPS functional catalog and give two applications of it in specific functional genomics contexts. AVAILABILITY/BACKGROUND:The implementations and results are available through our supplementary website at: http://gersteinlab.org/proj/funcsim. SUPPLEMENTARY INFORMATION/BACKGROUND:Supplementary data are available at Bioinformatics online.
PMID: 17540677
ISSN: 1367-4811
CID: 5821892
Transcription factor expression in lipopolysaccharide-activated peripheral-blood-derived mononuclear cells
Roach, Jared C; Smith, Kelly D; Strobe, Katie L; Nissen, Stephanie M; Haudenschild, Christian D; Zhou, Daixing; Vasicek, Thomas J; Held, G A; Stolovitzky, Gustavo A; Hood, Leroy E; Aderem, Alan
Transcription factors play a key role in integrating and modulating biological information. In this study, we comprehensively measured the changing abundances of mRNAs over a time course of activation of human peripheral-blood-derived mononuclear cells ("macrophages") with lipopolysaccharide. Global and dynamic analysis of transcription factors in response to a physiological stimulus has yet to be achieved in a human system, and our efforts significantly advanced this goal. We used multiple global high-throughput technologies for measuring mRNA levels, including massively parallel signature sequencing and GeneChip microarrays. We identified 92 of 1,288 known human transcription factors as having significantly measurable changes during our 24-h time course. At least 42 of these changes were previously unidentified in this system. Our data demonstrate that some transcription factors operate in a functional range below 10 transcripts per cell, whereas others operate in a range three orders of magnitude greater. The highly reproducible response of many mRNAs indicates feedback control. A broad range of activation kinetics was observed; thus, combinatorial regulation by small subsets of transcription factors would permit almost any timing input to cis-regulatory elements controlling gene transcription.
PMCID:2042192
PMID: 17913878
ISSN: 0027-8424
CID: 5821902
Dialogue on reverse-engineering assessment and methods: the DREAM of high-throughput pathway inference
Stolovitzky, Gustavo; Monroe, Don; Califano, Andrea
The biotechnological advances of the last decade have confronted us with an explosion of genetics, genomics, transcriptomics, proteomics, and metabolomics data. These data need to be organized and structured before they may provide a coherent biological picture. To accomplish this formidable task, the availability of an accurate map of the physical interactions in the cell that are responsible for cellular behavior and function would be exceedingly helpful, as these data are ultimately the result of such molecular interactions. However, all we have at this time is, at best, a fragmentary and only partially correct representation of the interactions between genes, their byproducts, and other cellular entities. If we want to succeed in our quest for understanding the biological whole as more than the sum of the individual parts, we need to build more comprehensive and cell-context-specific maps of the biological interaction networks. DREAM, the Dialogue on Reverse Engineering Assessment and Methods, is fostering a concerted effort by computational and experimental biologists to understand the limitations and to enhance the strengths of the efforts to reverse engineer cellular networks from high-throughput data. In this chapter we will discuss the salient arguments of the first DREAM conference. We will highlight both the state of the art in the field of reverse engineering as well as some of its challenges and opportunities.
PMID: 17925349
ISSN: 0077-8923
CID: 5821912
Comparison of Amersham and Agilent microarray technologies through quantitative noise analysis
Held, G A; Duggar, Keith; Stolovitzky, Gustavo
We carried out a series of replicate experiments on DNA microarrays using two cell lines and two technologies--the Agilent Human 1A Microarray and the GE Amersham Codelink Uniset Human 20K I Bioarray. We demonstrated that quantifying the noise level as a function of signal strength allows identification of the absolute and differential mRNA expression levels at which biological variability can be resolved above measurement noise. This represents a new formulation of a sensitivity threshold that can be used to compare platforms. It was found that the correlation in expression level between platforms is considerably worse than the correlation between replicate measurements taken using the same platform. In addition, we carried out replicate measurements at different stages of sample processing. This novel approach enables us to quantify the noise introduced into the measurements at each step of the experimental protocol. We demonstrated how this information can be used to determine the most efficient means of using replicates to reduce experimental uncertainty.
PMID: 17233562
ISSN: 1536-2310
CID: 5821872
Reconstructing biological networks using conditional correlation analysis
Rice, John Jeremy; Tu, Yuhai; Stolovitzky, Gustavo
MOTIVATION/BACKGROUND:One of the present challenges in biological research is the organization of the data originating from high-throughput technologies. One way in which this information can be organized is in the form of networks of influences, physical or statistical, between cellular components. We propose an experimental method for probing biological networks, analyzing the resulting data and reconstructing the network architecture. METHODS:We use networks of known topology consisting of nodes (genes), directed edges (gene-gene interactions) and a dynamics for the genes' mRNA concentrations in terms of the gene-gene interactions. We proposed a network reconstruction algorithm based on the conditional correlation of the mRNA equilibrium concentration between two genes given that one of them was knocked down. Using simulated gene expression data on networks of known connectivity, we investigated how the reconstruction error is affected by noise, network topology, size, sparseness and dynamic parameters. RESULTS:Errors arise from correlation between nodes connected through intermediate nodes (false positives) and when the correlation between two directly connected nodes is obscured by noise, non-linearity or multiple inputs to the target node (false negatives). Two critical components of the method are as follows: (1) the choice of an optimal correlation threshold for predicting connections and (2) the reduction of errors arising from indirect connections (for which a novel algorithm is proposed). With these improvements, we can reconstruct networks with the topology of the transcriptional regulatory network in Escherichia coli with a reasonably low error rate.
PMID: 15486043
ISSN: 1367-4803
CID: 5821782
Statistical analysis of MPSS measurements: application to the study of LPS-activated macrophage gene expression
Stolovitzky, G A; Kundaje, A; Held, G A; Duggar, K H; Haudenschild, C D; Zhou, D; Vasicek, T J; Smith, K D; Aderem, A; Roach, J C
Massively Parallel Signature Sequencing (MPSS), a recently developed high-throughput transcription profiling technology, has the ability to profile almost every transcript in a sample without requiring prior knowledge of the sequence of the transcribed genes. As is the case with DNA microarrays, effective data analysis depends crucially on understanding how noise affects measurements. We analyze the sources of noise in MPSS and present a quantitative model describing the variability between replicate MPSS assays. We use this model to construct statistical hypotheses that test whether an observed change in gene expression in a pair-wise comparison is significant. This analysis is then extended to the determination of the significance of changes in expression levels measured over the course of a time series of measurements. We apply these analytic techniques to the study of a time series of MPSS gene expression measurements on LPS-stimulated macrophages. To evaluate our statistical significance metrics, we compare our results with published data on macrophage activation measured by using Affymetrix GeneChips.
PMCID:547838
PMID: 15668391
ISSN: 0027-8424
CID: 5821792
Lasting impressions: motifs in protein-protein maps may provide footprints of evolutionary events [Comment]
Rice, J Jeremy; Kershenbaum, Aaron; Stolovitzky, Gustavo
PMID: 15728355
ISSN: 0027-8424
CID: 5821802