Try a new search

Format these results:

Searched for:

in-biosketch:yes

person:baeleg01

Total Results:

109


Sampling bias and incorrect rooting make phylogenetic network tracing of SARS-COV-2 infections unreliable [Comment]

Mavian, Carla; Pond, Sergei Kosakovsky; Marini, Simone; Magalis, Brittany Rife; Vandamme, Anne-Mieke; Dellicour, Simon; Scarpino, Samuel V; Houldcroft, Charlotte; Villabona-Arenas, Julian; Paisie, Taylor K; Trovão, Nídia S; Boucher, Christina; Zhang, Yun; Scheuermann, Richard H; Gascuel, Olivier; Lam, Tommy Tsan-Yuk; Suchard, Marc A; Abecasis, Ana; Wilkinson, Eduan; de Oliveira, Tulio; Bento, Ana I; Schmidt, Heiko A; Martin, Darren; Hadfield, James; Faria, Nuno; Grubaugh, Nathan D; Neher, Richard A; Baele, Guy; Lemey, Philippe; Stadler, Tanja; Albert, Jan; Crandall, Keith A; Leitner, Thomas; Stamatakis, Alexandros; Prosperi, Mattia; Salemi, Marco
PMID: 32381734
ISSN: 1091-6490
CID: 5170462

Online Bayesian Phylodynamic Inference in BEAST with Application to Epidemic Reconstruction

Gill, Mandev S; Lemey, Philippe; Suchard, Marc A; Rambaut, Andrew; Baele, Guy
Reconstructing pathogen dynamics from genetic data as they become available during an outbreak or epidemic represents an important statistical scenario in which observations arrive sequentially in time and one is interested in performing inference in an "online" fashion. Widely used Bayesian phylogenetic inference packages are not set up for this purpose, generally requiring one to recompute trees and evolutionary model parameters de novo when new data arrive. To accommodate increasing data flow in a Bayesian phylogenetic framework, we introduce a methodology to efficiently update the posterior distribution with newly available genetic data. Our procedure is implemented in the BEAST 1.10 software package, and relies on a distance-based measure to insert new taxa into the current estimate of the phylogeny and imputes plausible values for new model parameters to accommodate growing dimensionality. This augmentation creates informed starting values and re-uses optimally tuned transition kernels for posterior exploration of growing data sets, reducing the time necessary to converge to target posterior distributions. We apply our framework to data from the recent West African Ebola virus epidemic and demonstrate a considerable reduction in time required to obtain posterior estimates at different time points of the outbreak. Beyond epidemic monitoring, this framework easily finds other applications within the phylogenetics community, where changes in the data-in terms of alignment changes, sequence addition or removal-present common scenarios that can benefit from online inference.
PMCID:7253210
PMID: 32101295
ISSN: 1537-1719
CID: 5170432

Incorporating heterogeneous sampling probabilities in continuous phylogeographic inference - Application to H5N1 spread in the Mekong region

Dellicour, Simon; Lemey, Philippe; Artois, Jean; Lam, Tommy T; Fusaro, Alice; Monne, Isabella; Cattoli, Giovanni; Kuznetsov, Dmitry; Xenarios, Ioannis; Dauphin, Gwenaelle; Kalpravidh, Wantanee; Von Dobschuetz, Sophie; Claes, Filip; Newman, Scott H; Suchard, Marc A; Baele, Guy; Gilbert, Marius
MOTIVATION:The potentially low precision associated with the geographic origin of sampled sequences represents an important limitation for spatially explicit (i.e. continuous) phylogeographic inference of fast-evolving pathogens such as RNA viruses. A substantial proportion of publicly available sequences is geo-referenced at broad spatial scale such as the administrative unit of origin, rather than more precise locations (e.g. geographic coordinates). Most frequently, such sequences are either discarded prior to continuous phylogeographic inference or arbitrarily assigned to the geographic coordinates of the centroid of their administrative area of origin for lack of a better alternative. RESULTS:We here implement and describe a new approach that allows to incorporate heterogeneous prior sampling probabilities over a geographic area. External data, such as outbreak locations, are used to specify these prior sampling probabilities over a collection of sub-polygons. We apply this new method to the analysis of highly pathogenic avian influenza H5N1 clade data in the Mekong region. Our method allows to properly include, in continuous phylogeographic analyses, H5N1 sequences that are only associated with large administrative areas of origin and assign them with more accurate locations. Finally, we use continuous phylogeographic reconstructions to analyse the dispersal dynamics of different H5N1 clades and investigate the impact of environmental factors on lineage dispersal velocities. AVAILABILITY AND IMPLEMENTATION:Our new method allowing heterogeneous sampling priors for continuous phylogeographic inference is implemented in the open-source multi-platform software package BEAST 1.10. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
PMCID:7141868
PMID: 31790143
ISSN: 1367-4811
CID: 5170402

Accounting for population structure reveals ambiguity in the Zaire Ebolavirus reservoir dynamics

Vrancken, Bram; Wawina-Bokalanga, Tony; Vanmechelen, Bert; Martí-Carreras, Joan; Carroll, Miles W; Nsio, Justus; Kapetshi, Jimmy; Makiala-Mandanda, Sheila; Muyembe-Tamfum, Jean-Jacques; Baele, Guy; Vermeire, Kurt; Vergote, Valentijn; Ahuka-Mundeke, Steve; Maes, Piet
Ebolaviruses pose a substantial threat to wildlife populations and to public health in Africa. Evolutionary analyses of virus genome sequences can contribute significantly to elucidate the origin of new outbreaks, which can help guide surveillance efforts. The reconstructed between-outbreak evolutionary history of Zaire ebolavirus so far has been highly consistent. By removing the confounding impact of population growth bursts during local outbreaks on the free mixing assumption that underlies coalescent-based demographic reconstructions, we find-contrary to what previous results indicated-that the circulation dynamics of Ebola virus in its animal reservoir are highly uncertain. Our findings also accentuate the need for a more fine-grained picture of the Ebola virus diversity in its reservoir to reliably infer the reservoir origin of outbreak lineages. In addition, the recent appearance of slower-evolving variants is in line with latency as a survival mechanism and with bats as the natural reservoir host.
PMCID:7075637
PMID: 32130210
ISSN: 1935-2735
CID: 5170442

In Search of Covariates of HIV-1 Subtype B Spread in the United States-A Cautionary Tale of Large-Scale Bayesian Phylogeography

Hong, Samuel L; Dellicour, Simon; Vrancken, Bram; Suchard, Marc A; Pyne, Michael T; Hillyard, David R; Lemey, Philippe; Baele, Guy
Infections with HIV-1 group M subtype B viruses account for the majority of the HIV epidemic in the Western world. Phylogeographic studies have placed the introduction of subtype B in the United States in New York around 1970, where it grew into a major source of spread. Currently, it is estimated that over one million people are living with HIV in the US and that most are infected with subtype B variants. Here, we aim to identify the drivers of HIV-1 subtype B dispersal in the United States by analyzing a collection of 23,588 pol sequences, collected for drug resistance testing from 45 states during 2004-2011. To this end, we introduce a workflow to reduce this large collection of data to more computationally-manageable sample sizes and apply the BEAST framework to test which covariates associate with the spread of HIV-1 across state borders. Our results show that we are able to consistently identify certain predictors of spread under reasonable run times across datasets of up to 10,000 sequences. However, the general lack of phylogenetic structure and the high uncertainty associated with HIV trees make it difficult to interpret the epidemiological relevance of the drivers of spread we are able to identify. While the workflow we present here could be applied to other virus datasets of a similar scale, the characteristic star-like shape of HIV-1 phylogenies poses a serious obstacle to reconstructing a detailed evolutionary and spatial history for HIV-1 subtype B in the US.
PMCID:7077180
PMID: 32033422
ISSN: 1999-4915
CID: 5170422

Radiation of the coralline red algae (Corallinophycidae, Rhodophyta) crown group as inferred from a multilocus time-calibrated phylogeny

Pena, Viviana; Vieira, Christophe; Carlos Braga, Juan; Aguirre, Julio; Roesler, Anja; Baele, Guy; De Clerck, Olivier; Le Gall, Line
ISI:000552610200005
ISSN: 1055-7903
CID: 5171152

Hamiltonian Monte Carlo sampling to estimate past population dynamics using the skygrid coalescent model in a Bayesian phylogenetics framework

Baele, Guy; Gill, Mandev S; Lemey, Philippe; Suchard, Marc A
Nonparametric coalescent-based models are often employed to infer past population dynamics over time. Several of these models, such as the skyride and skygrid models, are equipped with a block-updating Markov chain Monte Carlo sampling scheme to efficiently estimate model parameters. The advent of powerful computational hardware along with the use of high-performance libraries for statistical phylogenetics has, however, made the development of alternative estimation methods feasible. We here present the implementation and performance assessment of a Hamiltonian Monte Carlo gradient-based sampler to infer the parameters of the skygrid model. The skygrid is a popular and flexible coalescent-based model for estimating population dynamics over time and is available in BEAST 1.10.5, a widely-used software package for Bayesian pylogenetic and phylodynamic analysis. Taking into account the increased computational cost of gradient evaluation, we report substantial increases in effective sample size per time unit compared to the established block-updating sampler. We expect gradient-based samplers to assume an increasingly important role for different classes of parameters typically estimated in Bayesian phylogenetic and phylodynamic analyses.
PMCID:7463299
PMID: 32923688
ISSN: 2398-502x
CID: 5170522

Pliocene colonization of the Mediterranean by Great White Shark inferred from fossil records, historical jaws, phylogeographic and divergence time analyses

Leone, Agostino; Puncher, Gregory N.; Ferretti, Francesco; Sperone, Emilio; Tripepi, Sandro; Micarelli, Primo; Gambarelli, Andrea; Sara, Maurizio; Arculeo, Marco; Doria, Giuliano; Garibaldi, Fulvio; Bressi, Nicola; Dall\Asta, Andrea; Minelli, Daniela; Cilli, Elisabetta; Vanni, Stefano; Serena, Fabrizio; Diaz-Jaimes, Pindaro; Baele, Guy; Cariani, Alessia; Tinti, Fausto
ISI:000512529700001
ISSN: 0305-0270
CID: 5171122

Distinct rates and patterns of spread of the major HIV-1 subtypes in Central and East Africa

Faria, Nuno R; Vidal, Nicole; Lourenco, José; Raghwani, Jayna; Sigaloff, Kim C E; Tatem, Andy J; van de Vijver, David A M; Pineda-Peña, Andrea-Clemencia; Rose, Rebecca; Wallis, Carole L; Ahuka-Mundeke, Steve; Muyembe-Tamfum, Jean-Jacques; Muwonga, Jérémie; Suchard, Marc A; Rinke de Wit, Tobias F; Hamers, Raph L; Ndembi, Nicaise; Baele, Guy; Peeters, Martine; Pybus, Oliver G; Lemey, Philippe; Dellicour, Simon
Since the ignition of the HIV-1 group M pandemic in the beginning of the 20th century, group M lineages have spread heterogeneously throughout the world. Subtype C spread rapidly through sub-Saharan Africa and is currently the dominant HIV lineage worldwide. Yet the epidemiological and evolutionary circumstances that contributed to its epidemiological expansion remain poorly understood. Here, we analyse 346 novel pol sequences from the DRC to compare the evolutionary dynamics of the main HIV-1 lineages, subtypes A1, C and D. Our results place the origins of subtype C in the 1950s in Mbuji-Mayi, the mining city of southern DRC, while subtypes A1 and D emerged in the capital city of Kinshasa, and subtypes H and J in the less accessible port city of Matadi. Following a 15-year period of local transmission in southern DRC, we find that subtype C spread at least three-fold faster than other subtypes circulating in Central and East Africa. In conclusion, our results shed light on the origins of HIV-1 main lineages and suggest that socio-historical rather than evolutionary factors may have determined the epidemiological fate of subtype C in sub-Saharan Africa.
PMCID:6897401
PMID: 31809523
ISSN: 1553-7374
CID: 5170412

BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics

Ayres, Daniel L; Cummings, Michael P; Baele, Guy; Darling, Aaron E; Lewis, Paul O; Swofford, David L; Huelsenbeck, John P; Lemey, Philippe; Rambaut, Andrew; Suchard, Marc A
BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrated into recent versions of popular phylogenetics software packages including BEAST and MrBayes and has been widely used across a diverse range of evolutionary studies. Here, we present BEAGLE 3 with new parallel implementations, increased performance for challenging data sets, improved scalability, and better usability. We have added new OpenCL and central processing unit-threaded implementations to the library, allowing the effective utilization of a wider range of modern hardware. Further, we have extended the API and library to support concurrent computation of independent partial likelihood arrays, for increased performance of nucleotide-model analyses with greater flexibility of data partitioning. For better scalability and usability, we have improved how phylogenetic software packages use BEAGLE in multi-GPU (graphics processing unit) and cluster environments, and introduced an automated method to select the fastest device given the data set, evolutionary model, and hardware. For application developers who wish to integrate the library, we also have developed an online tutorial. To evaluate the effect of the improvements, we ran a variety of benchmarks on state-of-the-art hardware. For a partitioned exemplar analysis, we observe run-time performance improvements as high as 5.9-fold over our previous GPU implementation. BEAGLE 3 is free, open-source software licensed under the Lesser GPL and available at https://beagle-dev.github.io.
PMCID:6802572
PMID: 31034053
ISSN: 1076-836x
CID: 5170322