NYUHSL Faculty Bibliography

Searched for:

person:leed44

in-biosketch:yes

Total Results:

Science. 2017:355(6329):1040-1044.DOI: 10.1126/science.aaf4557

Design of a synthetic yeast genome

Richardson, Sarah M; Mitchell, Leslie A; Stracquadanio, Giovanni; Yang, Kun; Dymond, Jessica S; DiCarlo, James E; Lee, Dongwon; Huang, Cheng Lai Victor; Chandrasegaran, Srinivasan; Cai, Yizhi; Boeke, Jef D; Bader, Joel S

We describe complete design of a synthetic eukaryotic genome, Sc2.0, a highly modified Saccharomyces cerevisiae genome reduced in size by nearly 8%, with 1.1 megabases of the synthetic genome deleted, inserted, or altered. Sc2.0 chromosome design was implemented with BioStudio, an open-source framework developed for eukaryotic genome design, which coordinates design modifications from nucleotide to genome scales and enforces version control to systematically track edits. To achieve complete Sc2.0 genome synthesis, individual synthetic chromosomes built by Sc2.0 Consortium teams around the world will be consolidated into a single strain by "endoreduplication intercross." Chemically synthesized genomes like Sc2.0 are fully customizable and allow experimentalists to ask otherwise intractable questions about chromosome structure, function, and evolution with a bottom-up design strategy.

PMID: 28280199

ISSN: 1095-9203

CID: 2477422

Cell. 2016:167(2):355-368.e10.DOI: 10.1016/j.cell.2016.09.005

Enhancer Variants Synergistically Drive Dysfunction of a Gene Regulatory Network In Hirschsprung Disease

Chatterjee, Sumantra; Kapoor, Ashish; Akiyama, Jennifer A; Auer, Dallas R; Lee, Dongwon; Gabriel, Stacey; Berrios, Courtney; Pennacchio, Len A; Chakravarti, Aravinda

Common sequence variants in cis-regulatory elements (CREs) are suspected etiological causes of complex disorders. We previously identified an intronic enhancer variant in the RET gene disrupting SOX10 binding and increasing Hirschsprung disease (HSCR) risk 4-fold. We now show that two other functionally independent CRE variants, one binding Gata2 and the other binding Rarb, also reduce Ret expression and increase risk 2- and 1.7-fold. By studying human and mouse fetal gut tissues and cell lines, we demonstrate that reduced RET expression propagates throughout its gene regulatory network, exerting effects on both its positive and negative feedback components. We also provide evidence that the presence of a combination of CRE variants synergistically reduces RET expression and its effects throughout the GRN. These studies show how the effects of functionally independent non-coding variants in a coordinated gene regulatory network amplify their individually small effects, providing a model for complex disorders.

PMCID:5113733

PMID: 27693352

ISSN: 1097-4172

CID: 2746572

Bioinformatics. 2016:32(14):2205-7.DOI: 10.1093/bioinformatics/btw203

gkmSVM: an R package for gapped-kmer SVM

Ghandi, Mahmoud; Mohammad-Noori, Morteza; Ghareghani, Narges; Lee, Dongwon; Garraway, Levi; Beer, Michael A

UNLABELLED:We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel. AVAILABILITY AND IMPLEMENTATION:gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The Câ€‰++â€‰implementation is available at www.beerlab.org/gkmsvm CONTACT:mghandi@gmail.com or mbeer@jhu.edu SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

PMCID:4937197

PMID: 27153639

ISSN: 1367-4811

CID: 4133172

Bioinformatics. 2016:32(14):2196-8.DOI: 10.1093/bioinformatics/btw142

LS-GKM: a new gkm-SVM for large-scale datasets

Lee, Dongwon

UNLABELLED:gkm-SVM is a sequence-based method for predicting and detecting the regulatory vocabulary encoded in functional DNA elements, and is a commonly used tool for studying gene regulatory mechanisms. Here we introduce new software, LS-GKM, which removes several limitations of our previous releases, enabling training on much larger scale (LS) datasets. LS-GKM also provides additional advanced gapped k-mer based kernel functions. With these improvements, LS-GKM achieves considerably higher accuracy than the original gkm-SVM. AVAILABILITY AND IMPLEMENTATION:C/Câ€‰++â€‰source codes and related scripts are freely available from http://github.com/Dongwon-Lee/lsgkm/, and supported on Linux and Mac OS X. CONTACT:dwlee@jhu.edu SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

PMCID:4937189

PMID: 27153584

ISSN: 1367-4811

CID: 4133092

Scientific reports. 2016:6.DOI: 10.1038/srep28356

Rare coding TTN variants are associated with electrocardiographic QT interval in the general population

Kapoor, Ashish; Bakshy, Kiranmayee; Xu, Linda; Nandakumar, Priyanka; Lee, Dongwon; Boerwinkle, Eric; Grove, Megan L; Arking, Dan E; Chakravarti, Aravinda

We have shown previously that noncoding variants mapping around a specific set of 170 genes encoding cardiomyocyte intercalated disc (ID) proteins are more enriched for associations with QT interval than observed for genome-wide comparisons. At a false discovery rate (FDR) of 5%, we had identified 28 such ID protein-encoding genes. Here, we assessed whether coding variants at these 28 genes affect QT interval in the general population as well. We used exome sequencing in 4,469 European American (EA) and 1,880 African American (AA) ancestry individuals from the population-based ARIC (Atherosclerosis Risk In Communities) Study cohort to focus on rare (allele frequency <1%) potentially deleterious (nonsynonymous, stop-gain, splice) variants (n = 2,398 for EA; n = 1,693 for AA) and tested their effects on standardized QT interval residuals. We identified 27 nonsynonymous variants associated with QT interval (FDR 5%), 22 of which were in TTN. Taken together with the mapping of a QT interval GWAS locus near TTN, our observation of rare deleterious coding variants in TTN associated with QT interval show that TTN plays a role in regulation of cardiac electrical conductance and coupling, and is a risk factor for cardiac arrhythmias and sudden cardiac death.

PMCID:4913250

PMID: 27321809

ISSN: 2045-2322

CID: 2746612

Nature genetics. 2015:47(8):955-61.DOI: 10.1038/ng.3331

A method to predict the impact of regulatory variants from DNA sequence

Lee, Dongwon; Gorkin, David U; Baker, Maggie; Strober, Benjamin J; Asoni, Alessandro L; McCallion, Andrew S; Beer, Michael A

Most variants implicated in common human disease by genome-wide association studies (GWAS) lie in noncoding sequence intervals. Despite the suggestion that regulatory element disruption represents a common theme, identifying causal risk variants within implicated genomic regions remains a major challenge. Here we present a new sequence-based computational method to predict the effect of regulatory variation, using a classifier (gkm-SVM) that encodes cell type-specific regulatory sequence vocabularies. The induced change in the gkm-SVM score, deltaSVM, quantifies the effect of variants. We show that deltaSVM accurately predicts the impact of SNPs on DNase I sensitivity in their native genomic contexts and accurately predicts the results of dense mutagenesis of several enhancers in reporter assays. Previously validated GWAS SNPs yield large deltaSVM scores, and we predict new risk-conferring SNPs for several autoimmune diseases. Thus, deltaSVM provides a powerful computational approach to systematically identify functional regulatory variants.

PMID: 26075791

ISSN: 1546-1718

CID: 4133102

Genome research. 2014:24(12):1932-44.DOI: 10.1101/gr.164178.113

Divergent functions of hematopoietic transcription factors in lineage priming and differentiation during erythro-megakaryopoiesis

Pimkin, Maxim; Kossenkov, Andrew V; Mishra, Tejaswini; Morrissey, Christapher S; Wu, Weisheng; Keller, Cheryl A; Blobel, Gerd A; Lee, Dongwon; Beer, Michael A; Hardison, Ross C; Weiss, Mitchell J

Combinatorial actions of relatively few transcription factors control hematopoietic differentiation. To investigate this process in erythro-megakaryopoiesis, we correlated the genome-wide chromatin occupancy signatures of four master hematopoietic transcription factors (GATA1, GATA2, TAL1, and FLI1) and three diagnostic histone modification marks with the gene expression changes that occur during development of primary cultured megakaryocytes (MEG) and primary erythroblasts (ERY) from murine fetal liver hematopoietic stem/progenitor cells. We identified a robust, genome-wide mechanism of MEG-specific lineage priming by a previously described stem/progenitor cell-expressed transcription factor heptad (GATA2, LYL1, TAL1, FLI1, ERG, RUNX1, LMO2) binding to MEG-associated cis-regulatory modules (CRMs) in multipotential progenitors. This is followed by genome-wide GATA factor switching that mediates further induction of MEG-specific genes following lineage commitment. Interaction between GATA and ETS factors appears to be a key determinant of these processes. In contrast, ERY-specific lineage priming is biased toward GATA2-independent mechanisms. In addition to its role in MEG lineage priming, GATA2 plays an extensive role in late megakaryopoiesis as a transcriptional repressor at loci defined by a specific DNA signature. Our findings reveal important new insights into how ERY and MEG lineages arise from a common bipotential progenitor via overlapping and divergent functions of shared hematopoietic transcription factors.

PMCID:4248311

PMID: 25319996

ISSN: 1549-5469

CID: 4133152

Nature. 2014:515(7527):355-64.DOI: 10.1038/nature13992

A comparative encyclopedia of DNA elements in the mouse genome

Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D; Shen, Yin; Pervouchine, Dmitri D; Djebali, Sarah; Thurman, Robert E; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K; Williams, Brian A; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M A; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D; Bansal, Mukul S; Kellis, Manolis; Keller, Cheryl A; Morrissey, Christapher S; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S; Cayting, Philip; Kawli, Trupti; Boyle, Alan P; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S; Cline, Melissa S; Erickson, Drew T; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A; Rosenbloom, Kate R; Lacerda de Sousa, Beatriz; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W James; Ramalho Santos, Miguel; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J; Wilken, Matthew S; Reh, Thomas A; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P; Neph, Shane; Humbert, Richard; Hansen, R Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E; Orkin, Stuart H; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J; Blobel, Gerd A; Cao, Xiaoyi; Zhong, Sheng; Wang, Ting; Good, Peter J; Lowdon, Rebecca F; Adams, Leslie B; Zhou, Xiao-Qiao; Pazin, Michael J; Feingold, Elise A; Wold, Barbara; Taylor, James; Mortazavi, Ali; Weissman, Sherman M; Stamatoyannopoulos, John A; Snyder, Michael P; Guigo, Roderic; Gingeras, Thomas R; Gilbert, David M; Hardison, Ross C; Beer, Michael A; Ren, Bing

The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

PMCID:4266106

PMID: 25409824

ISSN: 1476-4687

CID: 4133162

PLoS computational biology. 2014:10(7).DOI: 10.1371/journal.pcbi.1003711

Enhanced regulatory sequence prediction using gapped k-mer features

Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A

Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a NaÃ¯ve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.

PMID: 25033408

ISSN: 1553-7358

CID: 4133112

Nucleic acids research. 2013:41(Web Server issue):W544-56.DOI: 10.1093/nar/gkt519

kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A

Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.

PMCID:3692045

PMID: 23771147

ISSN: 1362-4962

CID: 4133122