NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:baeleg01

Total Results:

109

BMC bioinformatics. 2014:15.DOI:

pi BUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios

Bielejec, Filip; Lemey, Philippe; Carvalho, Luiz Max; Baele, Guy; Rambaut, Andrew; Suchard, Marc A.

ISI:000335988300001

ISSN: 1471-2105

CID: 5170952

Bayesian model selection in phylogenetics and genealogy-based population genetics

Chapter by: Baele, Guy; Lemey, Philippe

in: BAYESIAN PHYLOGENETICS: METHODS, ALGORITHMS, AND APPLICATIONS by

pp. 59-93

ISBN: 978-1-4665-0082-2

CID: 5171342

Bioinformatics. 2013:29(16):1970-9.DOI: 10.1093/bioinformatics/btt340

Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency

Baele, Guy; Lemey, Philippe

MOTIVATION/BACKGROUND:The advent of new sequencing technologies has led to increasing amounts of data being available to perform phylogenetic analyses, with genomic data giving rise to the field of phylogenomics. High-performance computing is becoming an indispensable research tool to fit complex evolutionary models, which take into account specific genomic properties, to large datasets. Here, we perform an extensive Bayesian phylogenetic model selection study, comparing codon and nucleotide substitution models, including codon position partitioning for nucleotide data as well gene-specific substitution models for both data types. For the best fitting partitioned models, we also compare independent partitioning with standard diffuse prior specification to conditional partitioning via hierarchical prior specification. To compare the different models, we use state-of-the-art marginal likelihood estimation techniques, including path sampling and stepping-stone sampling. RESULTS:We show that a full codon model best describes the features of a whole mitochondrial genome dataset, consisting of 12 protein-coding genes, but only when each gene is allowed to evolve under a separate codon model. However, when using hierarchical prior specification for the partition-specific parameters instead of independent diffuse priors, codon position partitioned nucleotide models can still outperform standard codon models. We demonstrate the feasibility of fitting such a combination of complex models using the BEAGLE library for BEAST in combination with recent graphics cards. We argue that development and use of such models needs to be accompanied by state-of-the-art marginal likelihood estimators because the more traditional and computationally less demanding estimators do not offer adequate accuracy.

PMID: 23766415

ISSN: 1367-4811

CID: 5170002

BMC bioinformatics. 2013:14.DOI: 10.1186/1471-2105-14-85

Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution

Baele, Guy; Lemey, Philippe; Vansteelandt, Stijn

BACKGROUND:Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model's marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. RESULTS:We here assess the original 'model-switch' path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model's marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. CONCLUSIONS:We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.

PMCID:3651733

PMID: 23497171

ISSN: 1471-2105

CID: 5169992

Molecular biology & evolution. 2013:30(2):239-43.DOI: 10.1093/molbev/mss243

Accurate model selection of relaxed molecular clocks in bayesian phylogenetics

Baele, Guy; Li, Wai Lok Sibon; Drummond, Alexei J; Suchard, Marc A; Lemey, Philippe

Recent implementations of path sampling (PS) and stepping-stone sampling (SS) have been shown to outperform the harmonic mean estimator (HME) and a posterior simulation-based analog of Akaike's information criterion through Markov chain Monte Carlo (AICM), in bayesian model selection of demographic and molecular clock models. Almost simultaneously, a bayesian model averaging approach was developed that avoids conditioning on a single model but averages over a set of relaxed clock models. This approach returns estimates of the posterior probability of each clock model through which one can estimate the Bayes factor in favor of the maximum a posteriori (MAP) clock model; however, this Bayes factor estimate may suffer when the posterior probability of the MAP model approaches 1. Here, we compare these two recent developments with the HME, stabilized/smoothed HME (sHME), and AICM, using both synthetic and empirical data. Our comparison shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model. We also illustrate the importance of using proper priors on a large set of empirical data sets.

PMCID:3548314

PMID: 23090976

ISSN: 1537-1719

CID: 5169982

Reconstruction of an HIV Transmission History in a Bayesian Coalescent Frameworkd

Chapter by: Vrancken, Bram; Rambaut, Andrew; Baele, Guy; Vandamme, Anne-Mieke; Van Laethem, Kristel; Van Wijngaerden, Eric; Drummond, Alexei; Suchard, Marc; Lemey, Philippe

in: PROCEEDINGS IWBBIO 2013: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING by

pp. 555-

ISBN: 978-84-15814-13-9

CID: 5171362

Molecular biology & evolution. 2012:29(9):2157-67.DOI: 10.1093/molbev/mss084

Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty

Baele, Guy; Lemey, Philippe; Bedford, Trevor; Rambaut, Andrew; Suchard, Marc A; Alekseyenko, Alexander V

Recent developments in marginal likelihood estimation for model selection in the field of Bayesian phylogenetics and molecular evolution have emphasized the poor performance of the harmonic mean estimator (HME). Although these studies have shown the merits of new approaches applied to standard normally distributed examples and small real-world data sets, not much is currently known concerning the performance and computational issues of these methods when fitting complex evolutionary and population genetic models to empirical real-world data sets. Further, these approaches have not yet seen widespread application in the field due to the lack of implementations of these computationally demanding techniques in commonly used phylogenetic packages. We here investigate the performance of some of these new marginal likelihood estimators, specifically, path sampling (PS) and stepping-stone (SS) sampling for comparing models of demographic change and relaxed molecular clocks, using synthetic data and real-world examples for which unexpected inferences were made using the HME. Given the drastically increased computational demands of PS and SS sampling, we also investigate a posterior simulation-based analogue of Akaike's information criterion (AIC) through Markov chain Monte Carlo (MCMC), a model comparison approach that shares with the HME the appealing feature of having a low computational overhead over the original MCMC analysis. We confirm that the HME systematically overestimates the marginal likelihood and fails to yield reliable model classification and show that the AICM performs better and may be a useful initial evaluation of model choice but that it is also, to a lesser degree, unreliable. We show that PS and SS sampling substantially outperform these estimators and adjust the conclusions made concerning previous analyses for the three real-world data sets that we reanalyzed. The methods used in this article are now available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.

PMCID:3424409

PMID: 22403239

ISSN: 0737-4038

CID: 177226

Evolutionary biology. 2012:39(1):61-82.DOI:

Context-Dependent Evolutionary Models for Non-Coding Sequences: An Overview of Several Decades of Research and an Analysis of Laurasiatheria and Primate Evolution [Review]

Baele, Guy

ISI:000300579700006

ISSN: 0071-3260

CID: 5170902

Nature. 2011:479(7374):487-92.DOI: 10.1038/nature10640

The genome of Tetranychus urticae reveals herbivorous pest adaptations

GrbiÄ‡, Miodrag; Van Leeuwen, Thomas; Clark, Richard M; Rombauts, Stephane; Rouzé, Pierre; GrbiÄ‡, Vojislava; Osborne, Edward J; Dermauw, Wannes; Ngoc, Phuong Cao Thi; Ortego, Félix; HernÃ¡ndez-Crespo, Pedro; Diaz, Isabel; Martinez, Manuel; Navajas, Maria; Sucena, Ã‰lio; MagalhÃ£es, Sara; Nagy, Lisa; Pace, Ryan M; DjuranoviÄ‡, Sergej; Smagghe, Guy; Iga, Masatoshi; Christiaens, Olivier; Veenstra, Jan A; Ewer, John; Villalobos, Rodrigo Mancilla; Hutter, Jeffrey L; Hudson, Stephen D; Velez, Marisela; Yi, Soojin V; Zeng, Jia; Pires-daSilva, Andre; Roch, Fernando; Cazaux, Marc; Navarro, Marie; Zhurov, Vladimir; Acevedo, Gustavo; Bjelica, Anica; Fawcett, Jeffrey A; Bonnet, Eric; Martens, Cindy; Baele, Guy; Wissler, Lothar; Sanchez-Rodriguez, Aminael; Tirry, Luc; Blais, Catherine; Demeestere, Kristof; Henz, Stefan R; Gregory, T Ryan; Mathieu, Johannes; Verdon, Lou; Farinelli, Laurent; Schmutz, Jeremy; Lindquist, Erika; Feyereisen, René; Van de Peer, Yves

The spider mite Tetranychus urticae is a cosmopolitan agricultural pest with an extensive host plant range and an extreme record of pesticide resistance. Here we present the completely sequenced and annotated spider mite genome, representing the first complete chelicerate genome. At 90 megabases T. urticae has the smallest sequenced arthropod genome. Compared with other arthropods, the spider mite genome shows unique changes in the hormonal environment and organization of the Hox complex, and also reveals evolutionary innovation of silk production. We find strong signatures of polyphagy and detoxification in gene families associated with feeding on different hosts and in new gene families acquired by lateral gene transfer. Deep transcriptome analysis of mites feeding on different plants shows how this pest responds to a changing host environment. The T. urticae genome thus offers new insights into arthropod evolution and plant-herbivore interactions, and provides unique opportunities for developing novel plant protection strategies.

PMCID:4856440

PMID: 22113690

ISSN: 1476-4687

CID: 5171322

BMC evolutionary biology. 2011:11.DOI: 10.1186/1471-2148-11-145

Context-dependent codon partition models provide significant increases in model fit in atpB and rbcL protein-coding genes

Baele, Guy; Van de Peer, Yves; Vansteelandt, Stijn

BACKGROUND:Accurate modelling of substitution processes in protein-coding sequences is often hampered by the computational burdens associated with full codon models. Lately, codon partition models have been proposed as a viable alternative, mimicking the substitution behaviour of codon models at a low computational cost. Such codon partition models however impose independent evolution of the different codon positions, which is overly restrictive from a biological point of view. Given that empirical research has provided indications of context-dependent substitution patterns at four-fold degenerate sites, we take those indications into account in this paper. RESULTS:We present so-called context-dependent codon partition models to assess previous empirical claims that the evolution of four-fold degenerate sites is strongly dependent on the composition of its two flanking bases. To this end, we have estimated and compared various existing independent models, codon models, codon partition models and context-dependent codon partition models for the atpB and rbcL genes of the chloroplast genome, which are frequently used in plant systematics. Such context-dependent codon partition models employ a full dependency scheme for four-fold degenerate sites, whilst maintaining the independence assumption for the first and second codon positions. CONCLUSIONS:We show that, both in the atpB and rbcL alignments of a collection of land plants, these context-dependent codon partition models significantly improve model fit over existing codon partition models. Using Bayes factors based on thermodynamic integration, we show that in both datasets the same context-dependent codon partition model yields the largest increase in model fit compared to an independent evolutionary model. Context-dependent codon partition models hence perform closer to codon models, which remain the best performing models at a drastically increased computational cost, compared to codon partition models, but remain computationally interesting alternatives to codon models. Finally, we observe that the substitution patterns in both datasets are drastically different, leading to the conclusion that combined analysis of these two genes using a single model may not be advisable from a context-dependent point of view.

PMCID:3126739

PMID: 21619569

ISSN: 1471-2148

CID: 5169972