Searched for: person:chakra01 or evrong01
Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS
Shi, Gang; Boerwinkle, Eric; Morrison, Alanna C; Gu, C Charles; Chakravarti, Aravinda; Rao, D C
We propose a two-stage approach to analyze genome-wide association data in order to identify a set of promising single-nucleotide polymorphisms (SNPs). In stage one, we select a list of top signals from single SNP analyses by controlling false discovery rate. In stage two, we use the least absolute shrinkage and selection operator (LASSO) regression to reduce false positives. The proposed approach was evaluated using simulated quantitative traits based on genome-wide SNP data on 8,861 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC) Study. Our first stage, targeted at controlling false negatives, yields better power than using Bonferroni-corrected significance level. The LASSO regression reduces the number of significant SNPs in stage two: it reduces false-positive SNPs and it reduces true-positive SNPs also at simulated causal loci due to linkage disequilibrium. Interestingly, the LASSO regression preserves the power from stage one, i.e., the number of causal loci detected from the LASSO regression in stage two is almost the same as in stage one, while reducing false positives further. Real data on systolic blood pressure in the ARIC study was analyzed using our two-stage approach which identified two significant SNPs, one of which was reported to be genome-significant in a meta-analysis containing a much larger sample size. On the other hand, a single SNP association scan did not yield any significant results.
PMCID:3624896
PMID: 21254218
ISSN: 1098-2272
CID: 2747342
A multilevel model to address batch effects in copy number estimation using SNP arrays
Scharpf, Robert B; Ruczinski, Ingo; Carvalho, Benilton; Doan, Betty; Chakravarti, Aravinda; Irizarry, Rafael A
Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of base pairs in the genome. Genomewide association studies (GWAS) may simultaneously screen for copy number phenotype and single nucleotide polymorphism (SNP) phenotype associations as part of the analytic strategy. However, genomewide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post hoc quality control procedures to exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of biallelic genotype calls from experimental data to estimate batch-specific and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in the quantile-normalized intensities, while the latter illustrates the robustness of our approach to a data set in which approximately 27% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package crlmm at Bioconductor (http:www.bioconductor.org).
PMCID:3006124
PMID: 20625178
ISSN: 1468-4357
CID: 2747382
SNPs and other features as they predispose to complex disease: genome-wide predictive analysis of a quantitative phenotype for hypertension
Won, Joong-Ho; Ehret, Georg; Chakravarti, Aravinda; Olshen, Richard A
Though recently they have fallen into some disrepute, genome-wide association studies (GWAS) have been formulated and applied to understanding essential hypertension. The principal goal here is to use data gathered in a GWAS to gauge the extent to which SNPs and their interactions with other features can be combined to predict mean arterial blood pressure (MAP) in 3138 pre-menopausal and naturally post-menopausal white women. More precisely, we quantify the extent to which data as described permit prediction of MAP beyond what is possible from traditional risk factors such as blood cholesterol levels and glucose levels. Of course, these traditional risk factors are genetic, though typically not explicitly so. In all, there were 44 such risk factors/clinical variables measured and 377,790 single nucleotide polymorphisms (SNPs) genotyped. Data for women we studied are from first visit measurements taken as part of the Atherosclerotic Risk in Communities (ARIC) study. We begin by assessing non-SNP features in their abilities to predict MAP, employing a novel regression technique with two stages, first the discovery of main effects and next discovery of their interactions. The long list of SNPs genotyped is reduced to a manageable list for combining with non-SNP features in prediction. We adapted Efron's local false discovery rate to produce this reduced list. Selected non-SNP and SNP features and their interactions are used to predict MAP using adaptive linear regression. We quantify quality of prediction by an estimated coefficient of determination (R(2)). We compare the accuracy of prediction with and without information from SNPs.
PMCID:3227593
PMID: 22140480
ISSN: 1932-6203
CID: 2747162
Quantifying and modeling birth order effects in autism
Turner, Tychele; Pihur, Vasyl; Chakravarti, Aravinda
Autism is a complex genetic disorder with multiple etiologies whose molecular genetic basis is not fully understood. Although a number of rare mutations and dosage abnormalities are specific to autism, these explain no more than 10% of all cases. The high heritability of autism and low recurrence risk suggests multifactorial inheritance from numerous loci but other factors also intervene to modulate risk. In this study, we examine the effect of birth rank on disease risk which is not expected for purely hereditary genetic models. We analyzed the data from three publicly available autism family collections in the USA for potential birth order effects and studied the statistical properties of three tests to show that adequate power to detect these effects exist. We detect statistically significant, yet varying, patterns of birth order effects across these collections. In multiplex families, we identify V-shaped effects where middle births are at high risk; in simplex families, we demonstrate linear effects where risk increases with each additional birth. Moreover, the birth order effect is gender-dependent in the simplex collection. It is currently unknown whether these patterns arise from ascertainment biases or biological factors. Nevertheless, further investigation of parental age-dependent risks yields patterns similar to those observed and could potentially explain part of the increased risk. A search for genes considering these patterns is likely to increase statistical power and uncover novel molecular etiologies.
PMCID:3198479
PMID: 22039484
ISSN: 1932-6203
CID: 2747182
Copy number variants in candidate genes are genetic modifiers of Hirschsprung disease
Jiang, Qian; Ho, Yen-Yi; Hao, Li; Nichols Berrios, Courtney; Chakravarti, Aravinda
Hirschsprung disease (HSCR) is a neurocristopathy characterized by absence of intramural ganglion cells along variable lengths of the gastrointestinal tract. The HSCR phenotype is highly variable with respect to gender, length of aganglionosis, familiality and the presence of additional anomalies. By molecular genetic analysis, a minimum of 11 neuro-developmental genes (RET, GDNF, NRTN, SOX10, EDNRB, EDN3, ECE1, ZFHX1B, PHOX2B, KIAA1279, TCF4) are known to harbor rare, high-penetrance mutations that confer a large risk to the bearer. In addition, two other genes (RET, NRG1) harbor common, low-penetrance polymorphisms that contribute only partially to risk and can act as genetic modifiers. To broaden this search, we examined whether a set of 67 proven and candidate HSCR genes harbored additional modifier alleles. In this pilot study, we utilized a custom-designed array CGH with approximately 33,000 test probes at an average resolution of approximately 185 bp to detect gene-sized or smaller copy number variants (CNVs) within these 67 genes in 18 heterogeneous HSCR patients. Using stringent criteria, we identified CNVs at three loci (MAPK10, ZFHX1B, SOX2) that are novel, involve regulatory and coding sequences of neuro-developmental genes, and show association with HSCR in combination with other congenital anomalies. Additional CNVs are observed under relaxed criteria. Our research suggests a role for CNVs in HSCR and, importantly, emphasizes the role of variation in regulatory sequences. A much larger study will be necessary both for replication and for identifying the full spectrum of small CNV effects.
PMCID:3119685
PMID: 21712996
ISSN: 1932-6203
CID: 2747262
Common variants in 22 loci are associated with QRS duration and cardiac ventricular conduction
Sotoodehnia, Nona; Isaacs, Aaron; de Bakker, Paul I W; Dorr, Marcus; Newton-Cheh, Christopher; Nolte, Ilja M; van der Harst, Pim; Muller, Martina; Eijgelsheim, Mark; Alonso, Alvaro; Hicks, Andrew A; Padmanabhan, Sandosh; Hayward, Caroline; Smith, Albert Vernon; Polasek, Ozren; Giovannone, Steven; Fu, Jingyuan; Magnani, Jared W; Marciante, Kristin D; Pfeufer, Arne; Gharib, Sina A; Teumer, Alexander; Li, Man; Bis, Joshua C; Rivadeneira, Fernando; Aspelund, Thor; Kottgen, Anna; Johnson, Toby; Rice, Kenneth; Sie, Mark P S; Wang, Ying A; Klopp, Norman; Fuchsberger, Christian; Wild, Sarah H; Mateo Leach, Irene; Estrada, Karol; Volker, Uwe; Wright, Alan F; Asselbergs, Folkert W; Qu, Jiaxiang; Chakravarti, Aravinda; Sinner, Moritz F; Kors, Jan A; Petersmann, Astrid; Harris, Tamara B; Soliman, Elsayed Z; Munroe, Patricia B; Psaty, Bruce M; Oostra, Ben A; Cupples, L Adrienne; Perz, Siegfried; de Boer, Rudolf A; Uitterlinden, Andre G; Volzke, Henry; Spector, Timothy D; Liu, Fang-Yu; Boerwinkle, Eric; Dominiczak, Anna F; Rotter, Jerome I; van Herpen, Ge; Levy, Daniel; Wichmann, H-Erich; van Gilst, Wiek H; Witteman, Jacqueline C M; Kroemer, Heyo K; Kao, W H Linda; Heckbert, Susan R; Meitinger, Thomas; Hofman, Albert; Campbell, Harry; Folsom, Aaron R; van Veldhuisen, Dirk J; Schwienbacher, Christine; O'Donnell, Christopher J; Volpato, Claudia Beu; Caulfield, Mark J; Connell, John M; Launer, Lenore; Lu, Xiaowen; Franke, Lude; Fehrmann, Rudolf S N; te Meerman, Gerard; Groen, Harry J M; Weersma, Rinse K; van den Berg, Leonard H; Wijmenga, Cisca; Ophoff, Roel A; Navis, Gerjan; Rudan, Igor; Snieder, Harold; Wilson, James F; Pramstaller, Peter P; Siscovick, David S; Wang, Thomas J; Gudnason, Vilmundur; van Duijn, Cornelia M; Felix, Stephan B; Fishman, Glenn I; Jamshidi, Yalda; Stricker, Bruno H Ch; Samani, Nilesh J; Kaab, Stefan; Arking, Dan E
The QRS interval, from the beginning of the Q wave to the end of the S wave on an electrocardiogram, reflects ventricular depolarization and conduction time and is a risk factor for mortality, sudden death and heart failure. We performed a genome-wide association meta-analysis in 40,407 individuals of European descent from 14 studies, with further genotyping in 7,170 additional Europeans, and we identified 22 loci associated with QRS duration (P < 5 x 10(-8)). These loci map in or near genes in pathways with established roles in ventricular conduction such as sodium channels, transcription factors and calcium-handling proteins, but also point to previously unidentified biologic processes, such as kinase inhibitors and genes related to tumorigenesis. We demonstrate that SCN10A, a candidate gene at the most significantly associated locus in this study, is expressed in the mouse ventricular conduction system, and treatment with a selective SCN10A blocker prolongs QRS duration. These findings extend our current knowledge of ventricular depolarization and conduction
PMCID:3338195
PMID: 21076409
ISSN: 1546-1718
CID: 137023
Diabetes and the risk of sudden cardiac death, the Atherosclerosis Risk in Communities study
Kucharska-Newton, Anna M; Couper, David J; Pankow, James S; Prineas, Ronald J; Rea, Thomas D; Sotoodehnia, Nona; Chakravarti, Aravinda; Folsom, Aaron R; Siscovick, David S; Rosamond, Wayne D
Studies suggest that diabetes may specifically elevate the risk of sudden cardiac death in excess of other heart disease outcomes. In this study, we examined the association of type 2 diabetes with the incidence of sudden cardiac death when compared to the incidence of non-sudden cardiac death and non-fatal myocardial infarction (MI). We used data from the Atherosclerosis Risk in Communities (ARIC) study to examine the incidence of sudden and non-sudden cardiac death and non-fatal MI among persons with and without diabetes in follow-up from the baseline data collection (1987-1989) through December 31, 2001. There were 209 cases of sudden cardiac death, 119 of non-sudden cardiac death, and 739 of non-fatal MI identified in this cohort over an average 12.4 years of follow-up. In analyses adjusted for age, race/ARIC center, gender, and smoking, the Cox proportional hazard ratio of the association of baseline diabetes was 3.77 (95% CI 2.82, 5.05) for sudden cardiac death, 3.78 (95% CI 2.57, 5.53) for non-sudden cardiac death, and 3.20 (95% CI 2.71, 3.78) for non-fatal MI. Elevated risk for each of the three outcomes associated with diabetes was independent of adjustment for measures of blood pressure, lipids, inflammation, hemostasis, and renal function. Among those with diabetes, the risk of cardiac death, but not of non-fatal MI, was similar for men and women. Findings from this prospective, population-based cohort investigation indicate that diabetes does not confer a specific excess risk of sudden cardiac death. Our results suggest that diabetes attenuates gender differences in the risk of fatal cardiac events.
PMCID:3064263
PMID: 19855920
ISSN: 1432-5233
CID: 2747462
Variation in the checkpoint kinase 2 gene is associated with type 2 diabetes in multiple populations
North, Kari E; Franceschini, Nora; Avery, Christy L; Baird, Lisa; Graff, Mariaelisa; Leppert, Mark; Chung, Jay H; Zhang, Jinghui; Hanis, Craig; Boerwinkle, Eric; Volcik, Kelly A; Grove, Megan L; Mosley, Thomas H; Gu, Charles; Heiss, Gerardo; Pankow, James S; Couper, David J; Ballantyne, Christie M; Linda Kao, W H; Weder, Alan B; Cooper, Richard S; Ehret, Georg B; O'Connor, Ashley A; Chakravarti, Aravinda; Hunt, Steven C
Identification and characterization of the genetic variants underlying type 2 diabetes susceptibility can provide important understanding of the etiology and pathogenesis of type 2 diabetes. We previously identified strong evidence of linkage for type 2 diabetes on chromosome 22 among 3,383 Hypertension Genetic Epidemiology Network (HyperGEN) participants from 1,124 families. The checkpoint 2 (CHEK2) gene, an important mediator of cellular responses to DNA damage, is located 0.22 Mb from this linkage peak. In this study, we tested the hypothesis that the CHEK2 gene contains one or more polymorphic variants that are associated with type 2 diabetes in HyperGEN individuals. In addition, we replicated our findings in two other Family Blood Pressure Program (FBPP) populations and in the population-based Atherosclerosis Risk in Communities (ARIC) study. We genotyped 1,584 African-American and 1,531 white HyperGEN participants, 1,843 African-American and 1,569 white GENOA participants, 871 African-American and 1,009 white GenNet participants, and 4,266 African-American and 11,478 white ARIC participants for four single nucleotide polymorphisms (SNPs) in CHEK2. Using additive models, we evaluated the association of CHEK2 SNPs with type 2 diabetes in participants within each study population stratified by race, and in a meta-analysis, adjusting for age, age(2), sex, sex-by-age interaction, study center, and relatedness. One CHEK2 variant, rs4035540, was associated with an increased risk of type 2 diabetes in HyperGEN participants, two replication samples, and in the meta-analysis. These results may suggest a new pathway in the pathogenesis of type 2 diabetes that involves pancreatic beta-cell damage and apoptosis.
PMCID:2965317
PMID: 19855918
ISSN: 1432-5233
CID: 2747472
Diversity of human copy number variation and multicopy genes
Sudmant, Peter H; Kitzman, Jacob O; Antonacci, Francesca; Alkan, Can; Malig, Maika; Tsalenko, Anya; Sampas, Nick; Bruhn, Laurakay; Shendure, Jay; Eichler, Evan E; [Chakravarti, Aravinda]
Copy number variants affect both disease and normal phenotypic variation, but those lying within heavily duplicated, highly identical sequence have been difficult to assay. By analyzing short-read mapping depth for 159 human genomes, we demonstrated accurate estimation of absolute copy number for duplications as small as 1.9 kilobase pairs, ranging from 0 to 48 copies. We identified 4.1 million "singly unique nucleotide" positions informative in distinguishing specific copies and used them to genotype the copy and content of specific paralogs within highly duplicated gene families. These data identify human-specific expansions in genes associated with brain development, reveal extensive population genetic diversity, and detect signatures consistent with gene conversion in the human species. Our approach makes ~1000 genes accessible to genetic studies of disease association.
PMID: 21030649
ISSN: 1095-9203
CID: 3984382
A map of human genome variation from population-scale sequencing
Abecasis, Gonzalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A; [Chakravarti, Aravinda]
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
PMID: 20981092
ISSN: 1476-4687
CID: 3984372