Try a new search

Format these results:

Searched for:

person:statna01

Total Results:

60


A comprehensive evaluation of multicategory classification methods for microbiomic data

Statnikov, Alexander; Henaff, Mikael; Narendra, Varun; Konganti, Kranti; Li, Zhiguo; Yang, Liying; Pei, Zhiheng; Blaser, Martin J; Aliferis, Constantin F; Alekseyenko, Alexander V
BACKGROUND: Recent advances in next-generation DNA sequencing enable rapid high-throughput quantitation of microbial community composition in human samples, opening up a new field of microbiomics. One of the promises of this field is linking abundances of microbial taxa to phenotypic and physiological states, which can inform development of new diagnostic, personalized medicine, and forensic modalities. Prior research has demonstrated the feasibility of applying machine learning methods to perform body site and subject classification with microbiomic data. However, it is currently unknown which classifiers perform best among the many available alternatives for classification with microbiomic data. RESULTS: In this work, we performed a systematic comparison of 18 major classification methods, 5 feature selection methods, and 2 accuracy metrics using 8 datasets spanning 1,802 human samples and various classification tasks: body site and subject classification and diagnosis. CONCLUSIONS: We found that random forests, support vector machines, kernel ridge regression, and Bayesian logistic regression with Laplace priors are the most effective machine learning techniques for performing accurate classification from these microbiomic data.
PMCID:3960509
PMID: 24456583
ISSN: 2049-2618
CID: 764032

Strategic applications of gene expression: from drug discovery/development to bedside

Bai, Jane P F; Alekseyenko, Alexander V; Statnikov, Alexander; Wang, I-Ming; Wong, Peggy H
Gene expression is useful for identifying the molecular signature of a disease and for correlating a pharmacodynamic marker with the dose-dependent cellular responses to exposure of a drug. Gene expression offers utility to guide drug discovery by illustrating engagement of the desired cellular pathways/networks, as well as avoidance of acting on the toxicological pathways. Successful employment of gene-expression signatures in the later stages of drug development depends on their linkage to clinically meaningful phenotypic characteristics and requires a biologically meaningful mechanism combined with a stringent statistical rigor. Much of the success in clinical drug development is hinged on predefining the signature genes for their fitness for purposes of application. Specific examples are highlighted to illustrate the breadth and depth of the potential utility of gene-expression signatures in drug discovery and clinical development to targeted therapeutics at the bedside.
PMCID:3675744
PMID: 23319288
ISSN: 1550-7416
CID: 470392

Algorithms for Discovery of Multiple Markov Boundaries

Statnikov, Alexander; Lytkin, Nikita I; Lemeire, Jan; Aliferis, Constantin F
Algorithms for Markov boundary discovery from data constitute an important recent development in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight on local causal structure. Over the last decade many sound algorithms have been proposed to identify a single Markov boundary of the response variable. Even though faithful distributions and, more broadly, distributions that satisfy the intersection property always have a single Markov boundary, other distributions/data sets may have multiple Markov boundaries of the response variable. The latter distributions/data sets are common in practical data-analytic applications, and there are several reasons why it is important to induce multiple Markov boundaries from such data. However, there are currently no sound and efficient algorithms that can accomplish this task. This paper describes a family of algorithms TIE* that can discover all Markov boundaries in a distribution. The broad applicability as well as efficiency of the new algorithmic family is demonstrated in an extensive benchmarking study that involved comparison with 26 state-of-the-art algorithms/variants in 15 data sets from a diversity of application domains.
PMCID:4184048
PMID: 25285052
ISSN: 1532-4435
CID: 1299682

Microbiomic signatures of psoriasis: feasibility and methodology comparison

Statnikov, Alexander; Alekseyenko, Alexander V; Li, Zhiguo; Henaff, Mikael; Perez-Perez, Guillermo I; Blaser, Martin J; Aliferis, Constantin F
Psoriasis is a common chronic inflammatory disease of the skin. We sought to use bacterial community abundance data to assess the feasibility of developing multivariate molecular signatures for differentiation of cutaneous psoriatic lesions, clinically unaffected contralateral skin from psoriatic patients, and similar cutaneous loci in matched healthy control subjects. Using 16S rRNA high-throughput DNA sequencing, we assayed the cutaneous microbiome for 51 such matched specimen triplets including subjects of both genders, different age groups, ethnicities and multiple body sites. None of the subjects had recently received relevant treatments or antibiotics. We found that molecular signatures for the diagnosis of psoriasis result in significant accuracy ranging from 0.75 to 0.89 AUC, depending on the classification task. We also found a significant effect of DNA sequencing and downstream analysis protocols on the accuracy of molecular signatures. Our results demonstrate that it is feasible to develop accurate molecular signatures for the diagnosis of psoriasis from microbiomic data.
PMCID:3965359
PMID: 24018484
ISSN: 2045-2322
CID: 529142

Co-expression network analysis identifies Spleen Tyrosine Kinase (SYK) as a candidate oncogenic driver in a subset of small-cell lung cancer

Udyavar, Akshata R; Hoeksema, Megan D; Clark, Jonathan E; Zou, Yong; Tang, Zuojian; Li, Zhiguo; Li, Ming; Chen, Heidi; Statnikov, Alexander; Shyr, Yu; Liebler, Daniel C; Field, John; Eisenberg, Rosana; Estrada, Lourdes; Massion, Pierre P; Quaranta, Vito
BACKGROUND: Oncogenic mechanisms in small-cell lung cancer remain poorly understood leaving this tumor with the worst prognosis among all lung cancers. Unlike other cancer types, sequencing genomic approaches have been of limited success in small-cell lung cancer, i.e., no mutated oncogenes with potential driver characteristics have emerged, as it is the case for activating mutations of epidermal growth factor receptor in non-small-cell lung cancer. Differential gene expression analysis has also produced SCLC signatures with limited application, since they are generally not robust across datasets. Nonetheless, additional genomic approaches are warranted, due to the increasing availability of suitable small-cell lung cancer datasets. Gene co-expression network approaches are a recent and promising avenue, since they have been successful in identifying gene modules that drive phenotypic traits in several biological systems, including other cancer types. RESULTS: We derived an SCLC-specific classifier from weighted gene co-expression network analysis (WGCNA) of a lung cancer dataset. The classifier, termed SCLC-specific hub network (SSHN), robustly separates SCLC from other lung cancer types across multiple datasets and multiple platforms, including RNA-seq and shotgun proteomics. The classifier was also conserved in SCLC cell lines. SSHN is enriched for co-expressed signaling network hubs strongly associated with the SCLC phenotype. Twenty of these hubs are actionable kinases with oncogenic potential, among which spleen tyrosine kinase (SYK) exhibits one of the highest overall statistical associations to SCLC. In patient tissue microarrays and cell lines, SCLC can be separated into SYK-positive and -negative. SYK siRNA decreases proliferation rate and increases cell death of SYK-positive SCLC cell lines, suggesting a role for SYK as an oncogenic driver in a subset of SCLC. CONCLUSIONS: SCLC treatment has thus far been limited to chemotherapy and radiation. Our WGCNA analysis identifies SYK both as a candidate biomarker to stratify SCLC patients and as a potential therapeutic target. In summary, WGCNA represents an alternative strategy to large scale sequencing for the identification of potential oncogenic drivers, based on a systems view of signaling networks. This strategy is especially useful in cancer types where no actionable mutations have emerged.
PMCID:4029366
PMID: 24564859
ISSN: 1752-0509
CID: 830022

Multicriteria Engineering Optimization Problems: Statement, Solution and Applications

Statnikov, Roman; Matusov, Josef; Statnikov, Alexander
ISI:000310968500001
ISSN: 0022-3239
CID: 198192

In silico prediction of the neutralization range of human anti-HIV monoclonal antibodies [Meeting Abstract]

Shmelkov, E.; Krachmarov, C.; Grigoryan, A.; Agarwal, A.; Statnikov, A.; Cardozo, T.
ISI:000309472100405
ISSN: 1742-4690
CID: 181582

INFLAMMATORY GENOMIC AND PLASMA BIOMARKERS PREDICT PROGRESSION OF SYMPTOMATIC KNEE OA (SKOA) [Meeting Abstract]

Attur, M.; Statnikov, A.; Aliferis, C. F.; Li, Z.; Krasnokutsky, S.; Samuels, J.; Greenberg, J. D.; Patel, J.; Oh, C.; Lu, Q. A.; Ramirez, R.; Todd, J.; Abramson, S. B.
ISI:000303223300079
ISSN: 1063-4584
CID: 166845

Regression of atherosclerosis is characterized by broad changes in the plaque macrophage transcriptome

Feig, Jonathan E; Vengrenyuk, Yuliya; Reiser, Vladimir; Wu, Chaowei; Statnikov, Alexander; Aliferis, Constantin F; Garabedian, Michael J; Fisher, Edward A; Puig, Oscar
We have developed a mouse model of atherosclerotic plaque regression in which an atherosclerotic aortic arch from a hyperlipidemic donor is transplanted into a normolipidemic recipient, resulting in rapid elimination of cholesterol and monocyte-derived macrophage cells (CD68+) from transplanted vessel walls. To gain a comprehensive view of the differences in gene expression patterns in macrophages associated with regressing compared with progressing atherosclerotic plaque, we compared mRNA expression patterns in CD68+ macrophages extracted from plaque in aortic aches transplanted into normolipidemic or into hyperlipidemic recipients. In CD68+ cells from regressing plaque we observed that genes associated with the contractile apparatus responsible for cellular movement (e.g. actin and myosin) were up-regulated whereas genes related to cell adhesion (e.g. cadherins, vinculin) were down-regulated. In addition, CD68+ cells from regressing plaque were characterized by enhanced expression of genes associated with an anti-inflammatory M2 macrophage phenotype, including arginase I, CD163 and the C-lectin receptor. Our analysis suggests that in regressing plaque CD68+ cells preferentially express genes that reduce cellular adhesion, enhance cellular motility, and overall act to suppress inflammation.
PMCID:3384622
PMID: 22761902
ISSN: 1932-6203
CID: 171139

New methods for separating causes from effects in genomics data

Statnikov, Alexander; Henaff, Mikael; Lytkin, Nikita I; Aliferis, Constantin F
BACKGROUND: The discovery of molecular pathways is a challenging problem and its solution relies on the identification of causal molecular interactions in genomics data. Causal molecular interactions can be discovered using randomized experiments; however such experiments are often costly, infeasible, or unethical. Fortunately, algorithms that infer causal interactions from observational data have been in development for decades, predominantly in the quantitative sciences, and many of them have recently been applied to genomics data. While these algorithms can infer unoriented causal interactions between involved molecular variables (i.e., without specifying which one is the cause and which one is the effect), causally orienting all inferred molecular interactions was assumed to be an unsolvable problem until recently. In this work, we use transcription factor-target gene regulatory interactions in three different organisms to evaluate a new family of methods that, given observational data for just two causally related variables, can determine which one is the cause and which one is the effect. RESULTS: We have found that a particular family of causal orientation methods (IGCI Gaussian) is often able to accurately infer directionality of causal interactions, and that these methods usually outperform other causal orientation techniques. We also introduced a novel ensemble technique for causal orientation that combines decisions of individual causal orientation methods. The ensemble method was found to be more accurate than any best individual causal orientation method in the tested data. CONCLUSIONS: This work represents a first step towards establishing context for practical use of causal orientation methods in the genomics domain. We have found that some causal orientation methodologies yield accurate predictions of causal orientation in genomics data, and we have improved on this capability with a novel ensemble method. Our results suggest that these methods have the potential to facilitate reconstruction of molecular pathways by minimizing the number of required randomized experiments to find causal directionality and by avoiding experiments that are infeasible and/or unethical.
PMCID:3535696
PMID: 23282373
ISSN: 1471-2164
CID: 211162