New database links regulatory DNA to its target genes Recent big genomics projects aid disease studies By Elizabeth Pennisi
in Texas. Simon Xi, a computational biologist in Cambridge, Massachusetts, who is using the GTEx data in his work on drug development, believes the databases are vital, however, but says they could be more user-friendly: “The challenge for us is how to integrate all those data.” The new work aims to address an ongoing source of frustration among disease researchers. A decade ago, geneticists set out to link specific DNA sequences to common diseases. In so-called genome-wide association studies (GWAS), massive consortia pooled tens of thousands of patients and came up with thousands of subtle genetic changes, called single nucleotide polymorphisms (SNPs), which appeared to increase the risk of inflammatory bowel disease, schizophrenia, autism, and a whole host of other common disorders. Puzzlingly, many
cientists have known for years that the genome is much more than a set of codes for proteins. It is also a giant switchboard, riddled with sequences that control gene activity. This extra layer of complexity has hampered searches for the genetic basis of diseases and for drugs that would target just the DNA at fault. In the past few months, however, several major research consortia have delivered what amount to user’s manuals for the genome, mapping the locations of thousands of those switches, the specific genes they control, and where in the body they are turned on or off. On pages 648, 660, and 666, the latest and arguably boldest of these big biology efforts reports preliminary results. By analyzing genetic material gleaned from more than 100 people who had died just hours before, the Genotype-Tissue Expression (GTEx) project catches gene regulation in action, identifying the genes switched on or off by subtle changes in Jonathan Pritchard, Stanford University DNA within 2 million bases of any gene. By evaluating multiple tissues from each body, it also charts the reach of those of these changes occurred outside genes regulatory sequences across cell types—some (Science, 27 May 2011, p. 1031). The mutaaffect a gene in all tissues; others are influentions presumably affected gene exprestial in a few tissues or just one. sion. But how? The impasse “suggested we Three efforts reported earlier took other needed to get moving on understanding approaches to mapping the genome’s many regulatory variation,” recalls Nancy Cox, a switches. Two, called BLUEPRINT and the quantitative human geneticist at Vanderbilt NIH Roadmap Epigenomics Project, chased University in Nashville. down the locations of DNA and its associFANTOM5, a $100 million effort led by ated proteins that are the target of chemical the RIKEN institute in Japan, has provided modifications called epigenetic marks, which part of the answer by mapping two kinds determine whether a gene can be activated. of regulatory sequences in the genome: A third, the latest iteration of a 20-year effort “promoters” that help kick off transcription called FANTOM (Functional ANnoTation Of and are located at the start of a gene, and the Mammalian genome), provides an exten“enhancers,” regulatory DNA that can be sive catalog of the beginnings of genes and of far from the genes they act on. The projtheir control sequences. Thanks to these four ect developed technology to capture RNA efforts, “we are on the cusp of learning a lot right as it starts to form off the DNA, which more about genome function,” says Jonathan pinpoints promoters. It catalogs enhancers Pritchard, a geneticist at Stanford University as well, because these pieces of regulatory in Palo Alto, California. DNA are also transcribed into RNA. Led by Not everyone is persuaded that these RIKEN’s Yoshihide Hayashizaki, FANTOM5 massive data-gathering efforts offer much surveyed RNA in every major human organ, practical help to biologists. “I am not a fan of hundreds of cancer cell lines, more than big science,” says Dan Graur, an evolution200 purified primary cell types, and in cells ary geneticist at the University of Houston at various stages of differentiation.
“We are on the cusp of learning a lot more about genome function.”
Earlier this year, the team described 201,000 human promoters and 65,000 human enhancers, showing that genes often contained several promoters that were activated differentially in various tissues (Science, 27 February, p. 1010). The effort is “absolutely unprecedented,” says Bing Ren, a molecular geneticist at the San Diego, California, branch of the Ludwig Institute for Cancer Research. “This is really a significant resource.” The $300 million NIH Roadmap Epigenomics project took a different approach to identifying enhancers. It mapped the epigenetic changes that associate with enhancers. For each cell type studied, assays of chemical marks called methylation and other changes in the DNA-protein matrix called chromatin helped pinpoint enhancers. Based on their sequences, investigators were also able to identify the proteins that help those enhancers turn on genes. The study, reported on 19 February in Nature, included 127 reference epigenomes—all the epigenetic marks on a genome—for various embryonic and adult tissues and cell types, including immune, brain, heart, muscle, gut, fat, and skin cells. The European Union’s €30 million BLUEPRINT project took an even deeper look into epigenomes, focusing on white and red blood cells. It determined the epigenomes of the primary blood stem cells and of those cells at various stages in their differentiation into mature white or red cells. Among other goals, BLUEPRINT is looking for differences between these cellular epigenomes in healthy individuals and people with leukemia, whose blood cells proliferate uncontrollably. “BLUEPRINT is going to show, together with Roadmap, the rules of regulation of gene expression,” says BLUEPRINT’s Willem Ouwehand, an experimental hematologist at the University of Cambridge in the United Kingdom. Once a GWAS identifies a SNP, data from Roadmap, BLUEPRINT, or FANTOM can provide further evidence that it might influence health by showing whether the variation falls in a regulatory region. GTEx goes a step further: It pins down how genetic variation, particularly in noncoding DNA, affects a gene’s activity across different parts of the body. To measure that gene activity, the project pulls out the transcribed RNA of tissues in multiple individuals. The researchers can then correlate changes in levels of specific RNA transcripts—an indicator that specific genes are active—with SNPs or other DNA sequence variations. “It will help the researchers narrow down the discoveries they have made in GWAS studies,” says Simona Volpi, a pharmacologist at the National Human sciencemag.org SCIENCE
8 MAY 2015 • VOL 348 ISSUE 6235
Published by AAAS
Downloaded from www.sciencemag.org on May 7, 2015
NEWS | I N D E P T H
Gene regulation cataloged Studies linking DNA to disease find that 80% of the genetic risk factors lie outside of genes themselves. Multiple large-scale efforts are helping geneticists home in on what DNA really matters in determining when, where, and how much a gene is active.
FANTOM5 pulls out the very beginning of RNA as it is being transcribed, identifying gene promoters, which kick off transcription, and enhancers, which control that kickoff. The project has data for every major mouse and human organ, as well as from more than 200 cancer cell lines and purified cell types.
NIH Genotype Tissue Expression (GTEx) has so far examined gene activity in up to 43 tissues of 175 people right after they died. The RNA samples analyzed show how each gene’s activity is shaped by sequence variations outside genes.
NIH Roadmap Epigenomics Project has cataloged chemical modifications to the genome, such as methylation, that alter how accessible genes are for activation. It looked at 127 cell and tissue types.
ILLUSTRATION: V. ALTOUNIAN/SCIENCE
ENCODE carried out biochemical assays on hundreds of mouse and human cell types to identify elements of the genome that may play a role in gene expression.
BLUEPRINT catalogs chemical modifications to DNA and associated proteins, like the NIH Roadmap, but focuses on development and disease in the various blood cells, such as immune cells. The 100 cellular “epigenomes” mapped come from healthy people and people with leukemia, a blood cancer.
Genome Research Institute based in Rockville, Maryland, who helps coordinate GTEx. Because the researchers needed multiple tissue samples from internal organs—too many to collect from living people—they turned to recently deceased people whose kin donated their bodies for research. The ultimate goal of the $100 million NIHfunded project is to collect and analyze about 25,000 tissues from 900 individuals; the data published so far include RNA from up to 43 tissue sites from 175 people. With those samples, “GTEx has the power to compare tissues within individuals and across individuals,” Volpi adds. “This is something that nobody has been able to do before at this scale.” Xi, who is tracking down drug targets for depression, schizophrenia, and Alzheimer’s and Parkinson’s diseases, is already turning to GTEx data to follow up on SNPs previously implicated in those brain disorders. “Having GTEx data helps connect the dots,” he says. The data also enable him to check whether DNA sequences implicated by GWAS are only active in the brain. That could make them especially promising drug targets, reducing the risk of broad side effects. Graur, who made a name for himself with his withering criticisms of a predecessor genomics project called ENCODE, distrusts the GTEx results because the project relied on postmortem samples. RNA degrades quickly, he notes. “If you want to do [gene] expression, you have to have live organisms,” he says. GTEx counters that it has validated that samples taken within 6 hours of death faithfully reflect natural gene activity. Still, GTEx and the other genomic projects have shortcomings. They don’t comprehensively cover all tissues, disappointing some eager for such data on their favorite cell type. Diabetes researcher Mark McCarthy of the University of Oxford in the United Kingdom, for example, is doing his own mini Roadmap and GTEx projects on pancreatic islet cells, which were omitted from the large-scale efforts. And tapping these genomic databases can also be challenging. “I see people going to the sites and struggling,” says Chris Tyler-Smith, an evolutionary geneticist at the Wellcome Trust Sanger Institute in Hinxton, U.K. Nonetheless, Tyler-Smith welcomes GTEx and the other genomic projects, saying they represent “getting together in large groups to do things that one could hardly do in one’s lab.” And for people like McCarthy, there is newfound hope of unraveling the complex genomic networks that underlie diabetes and other diseases. “I’m more optimistic than 4 to 5 years ago when it really wasn’t clear how we were going to deal with these regulatory signals,” he says. “We now have quite a few a clues.” ■ 8 MAY 2015 • VOL 348 ISSUE 6235
Published by AAAS