NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:baeleg01

Total Results:

109

European journal of human genetics. 2010:18(10):1127-32.DOI: 10.1038/ejhg.2010.48

A screening methodology based on Random Forests to improve the detection of gene-gene interactions

De Lobel, Lizzy; Geurts, Pierre; Baele, Guy; Castro-Giner, Francesc; Kogevinas, Manolis; Van Steen, Kristel

The search for susceptibility loci in gene-gene interactions imposes a methodological and computational challenge for statisticians because of the large dimensionality inherent to the modelling of gene-gene interactions or epistasis. In an era in which genome-wide scans have become relatively common, new powerful methods are required to handle the huge amount of feasible gene-gene interactions and to weed out false positives and negatives from these results. One solution to the dimensionality problem is to reduce data by preliminary screening of markers to select the best candidates for further analysis. Ideally, this screening step is statistically independent of the testing phase. Initially developed for small numbers of markers, the Multifactor Dimensionality Reduction (MDR) method is a nonparametric, model-free data reduction technique to associate sets of markers with optimal predictive properties to disease. In this study, we examine the power of MDR in larger data sets and compare it with other approaches that are able to identify gene-gene interactions. Under various interaction models (purely and not purely epistatic), we use a Random Forest (RF)-based prescreening method, before executing MDR, to improve its performance. We find that the power of MDR increases when noisy SNPs are first removed, by creating a collection of candidate markers with RFs. We validate our technique by extensive simulation studies and by application to asthma data from the European Committee of Respiratory Health Study II.

PMCID:2987456

PMID: 20461113

ISSN: 1476-5438

CID: 5169942

BMC evolutionary biology. 2010:10.DOI: 10.1186/1471-2148-10-244

Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences

Baele, Guy; Van de Peer, Yves; Vansteelandt, Stijn

BACKGROUND:Recent approaches for context-dependent evolutionary modelling assume that the evolution of a given site depends upon its ancestor and that ancestor's immediate flanking sites. Because such dependency pattern cannot be imposed on the root sequence, we consider the use of different orders of Markov chains to model dependence at the ancestral root sequence. Root distributions which are coupled to the context-dependent model across the underlying phylogenetic tree are deemed more realistic than decoupled Markov chains models, as the evolutionary process is responsible for shaping the composition of the ancestral root sequence. RESULTS:We find strong support, in terms of Bayes Factors, for using a second-order Markov chain at the ancestral root sequence along with a context-dependent model throughout the remainder of the phylogenetic tree in an ancestral repeats dataset, and for using a first-order Markov chain at the ancestral root sequence in a pseudogene dataset. Relaxing the assumption of a single context-independent set of independent model frequencies as presented in previous work, yields a further drastic increase in model fit. We show that the substitution rates associated with the CpG-methylation-deamination process can be modelled through context-dependent model frequencies and that their accuracy depends on the (order of the) Markov chain imposed at the ancestral root sequence. In addition, we provide evidence that this approach (which assumes that root distribution and evolutionary model are decoupled) outperforms an approach inspired by the work of Arndt et al., where the root distribution is coupled to the evolutionary model. We show that the continuous-time approximation of Hwang and Green has stronger support in terms of Bayes Factors, but the parameter estimates show minimal differences. CONCLUSIONS:We show that the combination of a dependency scheme at the ancestral root sequence and a context-dependent evolutionary model across the remainder of the tree allows for accurate estimation of the model's parameters. The different assumptions tested in this manuscript clearly show that designing accurate context-dependent models is a complex process, with many different assumptions that require validation. Further, these assumptions are shown to change across different datasets, making the search for an adequate model for a given dataset quite challenging.

PMCID:2928787

PMID: 20698960

ISSN: 1471-2148

CID: 5169962

Journal of molecular evolution. 2010:71(1):34-50.DOI: 10.1007/s00239-010-9362-y

Using non-reversible context-dependent evolutionary models to study substitution patterns in primate non-coding sequences

Baele, Guy; Van de Peer, Yves; Vansteelandt, Stijn

We discuss the importance of non-reversible evolutionary models when analyzing context-dependence. Given the inherent non-reversible nature of the well-known CpG-methylation-deamination process in mammalian evolution, non-reversible context-dependent evolutionary models may be well able to accurately model such a process. In particular, the lack of constraints on non-reversible substitution models might allow for more accurate estimation of context-dependent substitution parameters. To demonstrate this, we have developed different time-homogeneous context-dependent evolutionary models to analyze a large genomic dataset of primate ancestral repeats based on existing independent evolutionary models. We have calculated the difference in model fit for each of these models using Bayes Factors obtained via thermodynamic integration. We find that non-reversible context-dependent models can drastically increase model fit when compared to independent models and this on two primate non-coding datasets. Further, we show that further improvements are possible by clustering similar parameters across contexts.

PMID: 20623275

ISSN: 1432-1432

CID: 5169952

Adaptive Control Mechanisms

Chapter by: Baele, Guy; Yao, Yao; Van de Peer, Yves; Winfield, Alan; Kernbach, Serge

in: SYMBIOTIC MULTI-ROBOT ORGANISMS: RELIABILITY, ADAPTABILITY, EVOLUTION by

pp. 229-336

ISBN: 978-3-642-11691-9

CID: 5171352

BMC evolutionary biology. 2009:9.DOI: 10.1186/1471-2148-9-87

Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences

Baele, Guy; Van de Peer, Yves; Vansteelandt, Stijn

BACKGROUND:Many recent studies that relax the assumption of independent evolution of sites have done so at the expense of a drastic increase in the number of substitution parameters. While additional parameters cannot be avoided to model context-dependent evolution, a large increase in model dimensionality is only justified when accompanied with careful model-building strategies that guard against overfitting. An increased dimensionality leads to increases in numerical computations of the models, increased convergence times in Bayesian Markov chain Monte Carlo algorithms and even more tedious Bayes Factor calculations. RESULTS:We have developed two model-search algorithms which reduce the number of Bayes Factor calculations by clustering posterior densities to decide on the equality of substitution behavior in different contexts. The selected model's fit is evaluated using a Bayes Factor, which we calculate via model-switch thermodynamic integration. To reduce computation time and to increase the precision of this integration, we propose to split the calculations over different computers and to appropriately calibrate the individual runs. Using the proposed strategies, we find, in a dataset of primate Ancestral Repeats, that careful modeling of context-dependent evolution may increase model fit considerably and that the combination of a context-dependent model with the assumption of varying rates across sites offers even larger improvements in terms of model fit. Using a smaller nuclear SSU rRNA dataset, we show that context-dependence may only become detectable upon applying model-building strategies. CONCLUSION:While context-dependent evolutionary models can increase the model fit over traditional independent evolutionary models, such complex models will often contain too many parameters. Justification for the added parameters is thus required so that only those parameters that model evolutionary processes previously unaccounted for are added to the evolutionary model. To obtain an optimal balance between the number of parameters in a context-dependent model and the performance in terms of model fit, we have designed two parameter-reduction strategies and we have shown that model fit can be greatly improved by reducing the number of parameters in a context-dependent evolutionary model.

PMCID:2695821

PMID: 19405957

ISSN: 1471-2148

CID: 5169932

Open-ended On-board Evolutionary Robotics for Robot Swarms

Chapter by: Baele, Guy; Bredeche, Nicolas; Haasdijk, Evert; Maere, Steven; Michiels, Nico; Van de Peer, Yves; Schmickl, Thomas; Schwarzer, Christopher; Thenius, Ronald

in: 2009 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION by

pp. 1123-

ISBN: 978-1-4244-2958-5

CID: 5171382

DOI: 10.1109/ComputationWorld.2009.9

On adaptive self-organization in artificial robot organisms [Meeting Abstract]

Kernbach, Serge; Hamann, Heiko; Stradner, Juergen; Thenius, Ronald; Schmickl, Thomas; Crailsheim, Karl; van Rossum, A. C.; Sebag, Michele; Bredeche, Nicolas; Yao, Yao; Baele, Guy; Van de Peer, Yves; Timmis, Jon; Mohktar, Maizura; Tyrrell, Andy; Eiben, A. E.; McKibbin, S. P.; Liu, Wenguo; Winfield, Alan F. T.

ISI:000277313700006

CID: 5171372

Systematic biology. 2008:57(5):675-92.DOI: 10.1080/10635150802422324

A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences

Baele, Guy; Van de Peer, Yves; Vansteelandt, Stijn

In this article, we present a likelihood-based framework for modeling site dependencies. Our approach builds upon standard evolutionary models but incorporates site dependencies across the entire tree by letting the evolutionary parameters in these models depend upon the ancestral states at the neighboring sites. It thus avoids the need for introducing new and high-dimensional evolutionary models for site-dependent evolution. We propose a Markov chain Monte Carlo approach with data augmentation to infer the evolutionary parameters under our model. Although our approach allows for wide-ranging site dependencies, we illustrate its use, in two non-coding datasets, in the case of nearest-neighbor dependencies (i.e., evolution directly depending only upon the immediate flanking sites). The results reveal that the general time-reversible model with nearest-neighbor dependencies substantially improves the fit to the data as compared to the corresponding model with site independence. Using the parameter estimates from our model, we elaborate on the importance of the 5-methylcytosine deamination process (i.e., the CpG effect) and show that this process also depends upon the 5' neighboring base identity. We hint at the possibility of a so-called TpA effect and show that the observed substitution behavior is very complex in the light of dinucleotide estimates. We also discuss the presence of CpG effects in a nuclear small subunit dataset and find significant evidence that evolutionary models incorporating context-dependent effects perform substantially better than independent-site models and in some cases even outperform models that incorporate varying rates across sites.

PMID: 18853356

ISSN: 1076-836x

CID: 5169922

Molecular biology & evolution. 2006:23(7):1397-405.DOI: 10.1093/molbev/msl006

An improved statistical method for detecting heterotachy in nucleotide sequences

Baele, Guy; Raes, Jeroen; Van de Peer, Yves; Vansteelandt, Stijn

The principle of heterotachy states that the substitution rate of sites in a gene can change through time. In this article, we propose a powerful statistical test to detect sites that evolve according to the process of heterotachy. We apply this test to an alignment of 1289 eukaryotic rRNA molecules to 1) determine how widespread the phenomenon of heterotachy is in ribosomal RNA, 2) to test whether these heterotachous sites are nonrandomly distributed, that is, linked to secondary structure features of ribosomal RNA, and 3) to determine the impact of heterotachous sites on the bootstrap support of monophyletic groupings. Our study revealed that with 21 monophyletic taxa, approximately two-thirds of the sites in the considered set of sequences is heterotachous. Although the detected heterotachous sites do not appear bound to specific structural features of the small subunit rRNA, their presence is shown to have a large beneficial influence on the bootstrap support of monophyletic groups. Using extensive testing, we show that this may not be due to heterotachy itself but merely due to the increased substitution rate at the detected heterotachous sites.

PMID: 16672284

ISSN: 0737-4038

CID: 5169912