NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:yanaii01

Total Results:

110

Genetics. 2015:200(3):975-89.DOI: 10.1534/genetics.115.175950

Remarkably Divergent Regions Punctuate the Genome Assembly of the Caenorhabditis elegans Hawaiian Strain CB4856

Thompson, Owen A; Snoek, L Basten; Nijveen, Harm; Sterken, Mark G; Volkers, Rita J M; Brenchley, Rachel; Van't Hof, Arjen; Bevers, Roel P J; Cossins, Andrew R; Yanai, Itai; Hajnal, Alex; Schmid, Tobias; Perkins, Jaryn D; Spencer, David; Kruglyak, Leonid; Andersen, Erik C; Moerman, Donald G; Hillier, LaDeana W; Kammenga, Jan E; Waterston, Robert H

The Hawaiian strain (CB4856) of Caenorhabditis elegans is one of the most divergent from the canonical laboratory strain N2 and has been widely used in developmental, population, and evolutionary studies. To enhance the utility of the strain, we have generated a draft sequence of the CB4856 genome, exploiting a variety of resources and strategies. When compared against the N2 reference, the CB4856 genome has 327,050 single nucleotide variants (SNVs) and 79,529 insertion-deletion events that result in a total of 3.3 Mb of N2 sequence missing from CB4856 and 1.4 Mb of sequence present in CB4856 but not present in N2. As previously reported, the density of SNVs varies along the chromosomes, with the arms of chromosomes showing greater average variation than the centers. In addition, we find 61 regions totaling 2.8 Mb, distributed across all six chromosomes, which have a greatly elevated SNV density, ranging from 2 to 16% SNVs. A survey of other wild isolates show that the two alternative haplotypes for each region are widely distributed, suggesting they have been maintained by balancing selection over long evolutionary times. These divergent regions contain an abundance of genes from large rapidly evolving families encoding F-box, MATH, BATH, seven-transmembrane G-coupled receptors, and nuclear hormone receptors, suggesting that they provide selective advantages in natural environments. The draft sequence makes available a comprehensive catalog of sequence differences between the CB4856 and N2 strains that will facilitate the molecular dissection of their phenotypic differences. Our work also emphasizes the importance of going beyond simple alignment of reads to a reference genome when assessing differences between genomes.

PMCID:4512556

PMID: 25995208

ISSN: 1943-2631

CID: 2049862

Nature. 2015:519(7542):219-22.DOI: 10.1038/nature13996

Spatiotemporal transcriptomics reveals the evolutionary history of the endoderm germ layer

Hashimshony, Tamar; Feder, Martin; Levin, Michal; Hall, Brian K; Yanai, Itai

The concept of germ layers has been one of the foremost organizing principles in developmental biology, classification, systematics and evolution for 150 years (refs 1 - 3). Of the three germ layers, the mesoderm is found in bilaterian animals but is absent in species in the phyla Cnidaria and Ctenophora, which has been taken as evidence that the mesoderm was the final germ layer to evolve. The origin of the ectoderm and endoderm germ layers, however, remains unclear, with models supporting the antecedence of each as well as a simultaneous origin. Here we determine the temporal and spatial components of gene expression spanning embryonic development for all Caenorhabditis elegans genes and use it to determine the evolutionary ages of the germ layers. The gene expression program of the mesoderm is induced after those of the ectoderm and endoderm, thus making it the last germ layer both to evolve and to develop. Strikingly, the C. elegans endoderm and ectoderm expression programs do not co-induce; rather the endoderm activates earlier, and this is also observed in the expression of endoderm orthologues during the embryology of the frog Xenopus tropicalis, the sea anemone Nematostella vectensis and the sponge Amphimedon queenslandica. Querying the phylogenetic ages of specifically expressed genes reveals that the endoderm comprises older genes. Taken together, we propose that the endoderm program dates back to the origin of multicellularity, whereas the ectoderm originated as a secondary germ layer freed from ancestral feeding functions.

PMCID:4359913

PMID: 25487147

ISSN: 1476-4687

CID: 2049872

Scientific reports. 2014:4.DOI: 10.1038/srep07387

Natural RNA interference directs a heritable response to the environment

Schott, Daniel; Yanai, Itai; Hunter, Craig P

RNA interference can induce heritable gene silencing, but it remains unexplored whether similar mechanisms play a general role in responses to cues that occur in the wild. We show that transient, mild heat stress in the nematode Caenorhabditis elegans results in changes in messenger RNA levels that last for more than one generation. The affected transcripts are enriched for genes targeted by germline siRNAs downstream of the piRNA pathway, and worms defective for germline RNAi are defective for these heritable effects. Our results demonstrate that a specific siRNA pathway transmits information about variable environmental conditions between generations.

PMCID:4894413

PMID: 25552271

ISSN: 2045-2322

CID: 2049882

Genome research. 2014:24(9):1497-503.DOI: 10.1101/gr.169722.113

Gene length and expression level shape genomic novelties

Grishkevich, Vladislav; Yanai, Itai

Gene duplication and alternative splicing are important mechanisms in the production of genomic novelties. Previous work has shown that a gene's family size and the number of splice variants it produces are inversely related, although the underlying reason is not well understood. Here, we report that gene length and expression level together explain this relationship. We found that gene lengths correlate with both gene duplication and alternative splicing: Longer genes are less likely to produce duplicates and more likely to exhibit alternative splicing. We show that gene length is a dynamic property, increasing with evolutionary time--due in part to the insertions of transposable elements--and decreasing following partial gene duplications. However, gene length alone does not account for the relationship between alternative splicing and gene duplication. A gene's expression level appears both to impose a strong constraint on its length and to restrict gene duplications. Furthermore, high gene expression promotes alternative splicing, in particular for long genes, and alternatively, short genes with low expression levels have large gene families. Our analysis of the human and mouse genomes shows that gene length and expression level are primary genic properties that together account for the relationship between gene duplication and alternative splicing and bias the origin of novelties in the genome.

PMCID:4158763

PMID: 25015383

ISSN: 1549-5469

CID: 2049892

Genome biology. 2014:15(3).DOI: 10.1186/gb4169

Seeing is believing: new methods for in situ single-cell transcriptomics [Comment]

Avital, Gal; Hashimshony, Tamar; Yanai, Itai

New methods employ RNA-seq to study single cells within complex tissues by in situ sequencing or mRNA capture from single photoactivated cells.

PMCID:4053714

PMID: 25000927

ISSN: 1474-760x

CID: 2049902

Development. 2014:141(5):1161-6.DOI: 10.1242/dev.105288

BLIND ordering of large-scale transcriptomic developmental timecourses

Anavy, Leon; Levin, Michal; Khair, Sally; Nakanishi, Nagayasu; Fernandez-Valverde, Selene L; Degnan, Bernard M; Yanai, Itai

RNA-Seq enables the efficient transcriptome sequencing of many samples from small amounts of material, but the analysis of these data remains challenging. In particular, in developmental studies, RNA-Seq is challenged by the morphological staging of samples, such as embryos, since these often lack clear markers at any particular stage. In such cases, the automatic identification of the stage of a sample would enable previously infeasible experimental designs. Here we present the 'basic linear index determination of transcriptomes' (BLIND) method for ordering samples comprising different developmental stages. The method is an implementation of a traveling salesman algorithm to order the transcriptomes according to their inter-relationships as defined by principal components analysis. To establish the direction of the ordered samples, we show that an appropriate indicator is the entropy of transcriptomic gene expression levels, which increases over developmental time. Using BLIND, we correctly recover the annotated order of previously published embryonic transcriptomic timecourses for frog, mosquito, fly and zebrafish. We further demonstrate the efficacy of BLIND by collecting 59 embryos of the sponge Amphimedon queenslandica and ordering their transcriptomes according to developmental stage. BLIND is thus useful in establishing the temporal order of samples within large datasets and is of particular relevance to the study of organisms with asynchronous development and when morphological staging is difficult.

PMID: 24504336

ISSN: 1477-9129

CID: 2049912

Trends in genetics. 2013:29(8):479-87.DOI: 10.1016/j.tig.2013.05.006

The genomic determinants of genotype x environment interactions in gene expression

Grishkevich, Vladislav; Yanai, Itai

Predicting phenotype from genotype is greatly complicated by the polygenic nature of most traits and by the complex interactions between phenotype and the environment. Here, we review recent whole-genome approaches to understand the underlying principles, mechanisms, and evolutionary impacts of genotype x environment (GxE) interactions, defined as genotype-specific phenotypic responses to different environments. There is accumulating evidence that GxE interactions are ubiquitous, accounting perhaps for the greater part of the phenotypic variation seen across genotypes. Such interactions appear to be the consequence of changes to upstream regulators as opposed to local changes to promoters. Moreover, genes are not equally likely to exhibit GxE interactions; promoter architecture, expression level, regulatory complexity, and essentiality correlate with the differential regulation of a gene by the environment. One implication of this correlation is that expression variation across genotypes alone could be used as a proxy for GxE interactions in those experimental cases where identifying environmental variation is costly or impossible.

PMID: 23769209

ISSN: 0168-9525

CID: 2049922

Bioinformatics. 2013:29(11):1455-7.DOI: 10.1093/bioinformatics/btt169

ELOPER: elongation of paired-end reads as a pre-processing tool for improved de novo genome assembly

Silver, David H; Ben-Elazar, Shay; Bogoslavsky, Alexei; Yanai, Itai

MOTIVATION: Paired-end sequencing resulting in gapped short reads is commonly used for de novo genome assembly. Assembly methods use paired-end sequences in a two-step process, first treating each read-end independently, only later invoking the pairing to join the contiguous assemblies (contigs) into gapped scaffolds. Here, we present ELOPER, a pre-processing tool for pair-end sequences that produces a better read library for assembly programs. RESULTS: ELOPER proceeds by simultaneously considering both ends of paired reads generating elongated reads. We show that ELOPER theoretically doubles read-lengths while halving the number of reads. We provide evidence that pre-processing read libraries using ELOPER leads to considerably improved assemblies as predicted from the Lander-Waterman model. AVAILABILITY: http://sourceforge.net/projects/eloper SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID: 23603334

ISSN: 1367-4811

CID: 2049932

Nucleic acids research. 2013:41(4):2191-201.DOI: 10.1093/nar/gks1360

Spatial localization of co-regulated genes exceeds genomic gene clustering in the Saccharomyces cerevisiae genome

Ben-Elazar, Shay; Yakhini, Zohar; Yanai, Itai

While it has been long recognized that genes are not randomly positioned along the genome, the degree to which its 3D structure influences the arrangement of genes has remained elusive. In particular, several lines of evidence suggest that actively transcribed genes are spatially co-localized, forming transcription factories; however, a generalized systematic test has hitherto not been described. Here we reveal transcription factories using a rigorous definition of genomic structure based on Saccharomyces cerevisiae chromosome conformation capture data, coupled with an experimental design controlling for the primary gene order. We develop a data-driven method for the interpolation and the embedding of such datasets and introduce statistics that enable the comparison of the spatial and genomic densities of genes. Combining these, we report evidence that co-regulated genes are clustered in space, beyond their observed clustering in the context of gene order along the genome and show this phenomenon is significant for 64 out of 117 transcription factors. Furthermore, we show that those transcription factors with high spatially co-localized targets are expressed higher than those whose targets are not spatially clustered. Collectively, our results support the notion that, at a given time, the physical density of genes is intimately related to regulatory activity.

PMCID:3575811

PMID: 23303780

ISSN: 1362-4962

CID: 2369442

Methods in molecular biology. 2013:1038:1-26.DOI: 10.1007/978-1-62703-514-9_1

An introduction to high-throughput sequencing experiments: design and bioinformatics analysis

Normand, Rachelly; Yanai, Itai

The dramatic fall in the cost of DNA sequencing has revolutionized the experiments within reach in the life sciences. Here we provide an introduction for the domains of analyses possible using high-throughput sequencing, distinguishing between "counting" and "reading" applications. We discuss the steps in designing a high-throughput sequencing experiment, introduce the most widely used applications, and describe basic sequencing concepts. We review the various software programs available for many of the bioinformatics analysis required to make sense of the sequencing data. We hope that this introduction will be accessible to biologists with no previous background in bioinformatics, yet with a keen interest in applying the power of high-throughput sequencing in their research.

PMID: 23872966

ISSN: 1940-6029

CID: 2049952