Try a new search

Format these results:

Searched for:

person:mostah02

in-biosketch:yes

Total Results:

12


Systematic differences in discovery of genetic effects on gene expression and complex traits

Mostafavi, Hakhamanesh; Spence, Jeffrey P; Naqvi, Sahin; Pritchard, Jonathan K
Most signals in genome-wide association studies (GWAS) of complex traits implicate noncoding genetic variants with putative gene regulatory effects. However, currently identified regulatory variants, notably expression quantitative trait loci (eQTLs), explain only a small fraction of GWAS signals. Here, we show that GWAS and cis-eQTL hits are systematically different: eQTLs cluster strongly near transcription start sites, whereas GWAS hits do not. Genes near GWAS hits are enriched in key functional annotations, are under strong selective constraint and have complex regulatory landscapes across different tissue/cell types, whereas genes near eQTLs are depleted of most functional annotations, show relaxed constraint, and have simpler regulatory landscapes. We describe a model to understand these observations, including how natural selection on complex traits hinders discovery of functionally relevant eQTLs. Our results imply that GWAS and eQTL studies are systematically biased toward different types of variant, and support the use of complementary functional approaches alongside the next generation of eQTL studies.
PMID: 37857933
ISSN: 1546-1718
CID: 5607672

Variable prediction accuracy of polygenic scores within an ancestry group

Mostafavi, Hakhamanesh; Harpak, Arbel; Agarwal, Ipsita; Conley, Dalton; Pritchard, Jonathan K; Przeworski, Molly
Fields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group (i.e., when there are negligible differences in linkage disequilibrium or in causal alleles frequencies), the prediction accuracy of polygenic scores can depend on characteristics such as the socio-economic status, age or sex of the individuals in which the GWAS and the prediction were conducted, as well as on the GWAS design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.
PMCID:7067566
PMID: 31999256
ISSN: 2050-084x
CID: 4481502

Identifying genetic variants that affect viability in large cohorts

Mostafavi, Hakhamanesh; Berisa, Tomaz; Day, Felix R; Perry, John R B; Przeworski, Molly; Pickrell, Joseph K
A number of open questions in human evolutionary genetics would become tractable if we were able to directly measure evolutionary fitness. As a step towards this goal, we developed a method to examine whether individual genetic variants, or sets of genetic variants, currently influence viability. The approach consists in testing whether the frequency of an allele varies across ages, accounting for variation in ancestry. We applied it to the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort and to the parents of participants in the UK Biobank. Across the genome, we found only a few common variants with large effects on age-specific mortality: tagging the APOE ε4 allele and near CHRNA3. These results suggest that when large, even late-onset effects are kept at low frequency by purifying selection. Testing viability effects of sets of genetic variants that jointly influence 1 of 42 traits, we detected a number of strong signals. In participants of the UK Biobank of British ancestry, we found that variants that delay puberty timing are associated with a longer parental life span (P~6.2 × 10-6 for fathers and P~2.0 × 10-3 for mothers), consistent with epidemiological studies. Similarly, variants associated with later age at first birth are associated with a longer maternal life span (P~1.4 × 10-3). Signals are also observed for variants influencing cholesterol levels, risk of coronary artery disease (CAD), body mass index, as well as risk of asthma. These signals exhibit consistent effects in the GERA cohort and among participants of the UK Biobank of non-British ancestry. We also found marked differences between males and females, most notably at the CHRNA3 locus, and variants associated with risk of CAD and cholesterol levels. Beyond our findings, the analysis serves as a proof of principle for how upcoming biomedical data sets can be used to learn about selection effects in contemporary humans.
PMCID:5584811
PMID: 28873088
ISSN: 1545-7885
CID: 5494902

Measuring intolerance to mutation in human genetics

Fuller, Zachary L; Berg, Jeremy J; Mostafavi, Hakhamanesh; Sella, Guy; Przeworski, Molly
In numerous applications, from working with animal models to mapping the genetic basis of human disease susceptibility, knowing whether a single disrupting mutation in a gene is likely to be deleterious is useful. With this goal in mind, a number of measures have been developed to identify genes in which protein-truncating variants (PTVs), or other types of mutations, are absent or kept at very low frequency in large population samples-genes that appear 'intolerant' to mutation. One measure in particular, the probability of being loss-of-function intolerant (pLI), has been widely adopted. This measure was designed to classify genes into three categories, null, recessive and haploinsufficient, on the basis of the contrast between observed and expected numbers of PTVs. Such population-genetic approaches can be useful in many applications. As we clarify, however, they reflect the strength of selection acting on heterozygotes and not dominance or haploinsufficiency.
PMCID:6615471
PMID: 30962618
ISSN: 1546-1718
CID: 5494922

Genetic interactions drive heterogeneity in causal variant effect sizes for gene expression and complex traits

Patel, Roshni A; Musharoff, Shaila A; Spence, Jeffrey P; Pimentel, Harold; Tcheandjieu, Catherine; Mostafavi, Hakhamanesh; Sinnott-Armstrong, Nasa; Clarke, Shoa L; Smith, Courtney J; Durda, Peter P; Taylor, Kent D; Tracy, Russell; Liu, Yongmei; Johnson, W Craig; Aguet, Francois; Ardlie, Kristin G; Gabriel, Stacey; Smith, Josh; Nickerson, Deborah A; Rich, Stephen S; Rotter, Jerome I; Tsao, Philip S; Assimes, Themistocles L; Pritchard, Jonathan K
Despite the growing number of genome-wide association studies (GWASs), it remains unclear to what extent gene-by-gene and gene-by-environment interactions influence complex traits in humans. The magnitude of genetic interactions in complex traits has been difficult to quantify because GWASs are generally underpowered to detect individual interactions of small effect. Here, we develop a method to test for genetic interactions that aggregates information across all trait-associated loci. Specifically, we test whether SNPs in regions of European ancestry shared between European American and admixed African American individuals have the same causal effect sizes. We hypothesize that in African Americans, the presence of genetic interactions will drive the causal effect sizes of SNPs in regions of European ancestry to be more similar to those of SNPs in regions of African ancestry. We apply our method to two traits: gene expression in 296 African Americans and 482 European Americans in the Multi-Ethnic Study of Atherosclerosis (MESA) and low-density lipoprotein cholesterol (LDL-C) in 74K African Americans and 296K European Americans in the Million Veteran Program (MVP). We find significant evidence for genetic interactions in our analysis of gene expression; for LDL-C, we observe a similar point estimate, although this is not significant, most likely due to lower statistical power. These results suggest that gene-by-gene or gene-by-environment interactions modify the effect sizes of causal variants in human complex traits.
PMCID:9300878
PMID: 35716666
ISSN: 1537-6605
CID: 5494942

Simple scaling laws control the genetic architectures of human complex traits

Simons, Yuval B; Mostafavi, Hakhamanesh; Smith, Courtney J; Pritchard, Jonathan K; Sella, Guy
ORIGINAL:0016889
ISSN: 2692-8205
CID: 5495012

Bayesian estimation of gene constraint from an evolutionary model with gene features

Zeng, Tony; Spence, Jeffrey P; Mostafavi, Hakhamanesh; Prtichard, Jonathan K
ORIGINAL:0016888
ISSN: 2692-8205
CID: 5495002

Bayesian model comparison for rare-variant association studies

Venkataraman, Guhan Ram; DeBoever, Christopher; Tanigawa, Yosuke; Aguirre, Matthew; Ioannidis, Alexander G; Mostafavi, Hakhamanesh; Spencer, Chris C A; Poterba, Timothy; Bustamante, Carlos D; Daly, Mark J; Pirinen, Matti; Rivas, Manuel A
Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.
PMCID:8715195
PMID: 34822764
ISSN: 1537-6605
CID: 5494932

Reduced signal for polygenic adaptation of height in UK Biobank

Berg, Jeremy J; Harpak, Arbel; Sinnott-Armstrong, Nasa; Joergensen, Anja Moltke; Mostafavi, Hakhamanesh; Field, Yair; Boyle, Evan August; Zhang, Xinjun; Racimo, Fernando; Pritchard, Jonathan K; Coop, Graham
UNLABELLED:Several recent papers have reported strong signals of selection on European polygenic height scores. These analyses used height effect estimates from the GIANT consortium and replication studies. Here, we describe a new analysis based on the the UK Biobank (UKB), a large, independent dataset. We find that the signals of selection using UKB effect estimates are strongly attenuated or absent. We also provide evidence that previous analyses were confounded by population stratification. Therefore, the conclusion of strong polygenic adaptation now lacks support. Moreover, these discrepancies highlight (1) that methods for correcting for population stratification in GWAS may not always be sufficient for polygenic trait analyses, and (2) that claims of differences in polygenic scores between populations should be treated with caution until these issues are better understood. EDITORIAL NOTE:This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
PMID: 30895923
ISSN: 2050-084x
CID: 5494912

Scaling the discrete-time Wright-Fisher model to biobank-scale datasets

Spence, Jeffrey P; Zeng, Tony; Mostafavi, Hakhamanesh; Pritchard, Jonathan K
The discrete-time Wright-Fisher (DTWF) model and its diffusion limit are central to population genetics. These models can describe the forward-in-time evolution of allele frequencies in a population resulting from genetic drift, mutation, and selection. Computing likelihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large samples or in the presence of strong selection. Existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here, we present a scalable algorithm that approximates the DTWF model with provably bounded error. Our approach relies on two key observations about the DTWF model. The first is that transition probabilities under the model are approximately sparse. The second is that transition distributions for similar starting allele frequencies are extremely close as distributions. Together, these observations enable approximate matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the tens of millions, paving the way for rigorous biobank-scale inference. Finally, we use our results to estimate the impact of larger samples on estimating selection coefficients for loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
PMID: 37724741
ISSN: 1943-2631
CID: 5610982