REVIEWS

Nature Reviews Genetics | AOP, published online 13 May 2014; doi:10.1038/nrg3681

Insights into RNA structure and function from genome-wide studies Stefanie A. Mortimer1,2*, Mary Anne Kidwell1* and Jennifer A. Doudna1,3–5

Abstract | A comprehensive understanding of RNA structure will provide fundamental insights into the cellular function of both coding and non-coding RNAs. Although many RNA structures have been analysed by traditional biophysical and biochemical methods, the low-throughput nature of these approaches has prevented investigation of the vast majority of cellular transcripts. Triggered by advances in sequencing technology, genome-wide approaches for probing the transcriptome are beginning to reveal how RNA structure affects each step of protein expression and RNA stability. In this Review, we discuss the emerging relationships between RNA structure and the regulation of gene expression. Secondary structures Two-dimensional non-covalent interactions within an RNA molecule that consist of contiguous base pairs and loops. RNA molecules contain both canonical Watson–Crick base pairs and many non-canonical base pairs.

Department of Molecular and Cell Biology, University of California, Berkeley. 2 Present address: Guardant Health, Inc., 2686 Middlefield Road, Redwood City, California 94063, USA. 3 Howard Hughes Medical Institute, University of California, Berkeley. 4 Department of Chemistry, University of California, Berkeley, California 94720, USA. 5 Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA. *These authors contributed equally to this work. Correspondence to J.A.D. e‑mail: [email protected] doi:10.1038/nrg3681 Published online 13 May 2014 1

RNA molecules influence all steps of gene expression and regulation1. Once thought to be only a cellular messenger between DNA and proteins, RNA is now known to be involved in many aspects of cellular physiology through activities that are attributable to its secondary structures and tertiary structures. The ability of RNA strands to fold back on themselves to form stable three-dimensional architectures is fundamental to RNA transcription, splicing, translation, localization and turnover 2–8. Classic techniques for biochemical interrogation of RNA structure focus on probing one type of RNA molecule at a time and have proved highly valuable to our understanding of RNA in biology and engineering applications9–12. These methods use enzymes or small molecules that react with the RNA in a structure-specific manner and are read out using gel electrophoresis. However, given the number and types of coding and non-coding RNAs that constitute the transcriptome, these methods have lagged far behind the demand for knowledge about RNA structure and its role in cellular activities. To meet these demands, computational methods have been developed to accurately predict the formation of loops and simple helices13–18, and the recent advent of crowdsourcing has further aided the prediction of more complex motifs19. Although computational approaches have helped to increase the speed of RNA structure modelling, the computationally predicted structures still need to be validated by analysing individual molecules. Recently, several different high-throughput techniques have been developed to enable more facile and accurate structure prediction of large numbers of transcripts20–29. The field of high-throughput RNA structure

probing and determination (BOX 1) is moving at a rapid pace, and further information can be found in recent reviews30,31. In simple terms, these high-throughput techniques involve the coupling of traditional probes of RNA structure with new advances in next-generation sequencing (NGS). The application of these methods to transcriptomes has yielded the first glimpse of the ‘RNA structurome’ and has highlighted, on a genome-wide scale, how structured regions control RNA functions and gene expression. Using genome-wide analyses of RNA secondary structures, it is possible to identify structural patterns and motifs in coding and non-coding RNAs from various organisms. This Review focuses on new insights into the regulation of gene expression as determined from investigations of the nature and abundance of structured regions within the transcriptomes of cells and viruses. We focus on the roles of these RNA structures in RNA translation, localization, stability, microRNA (miRNA)mediated regulation and splicing. We also explore the future of these techniques for examining structures in vivo.

RNA structures in translational control It has long been postulated that RNA structure helps to regulate each step of translation from initiation to termination5. With recent methods using high-throughput sequencing, it has become possible to closely examine the relationship between RNA structure and translational control in eukaryotic systems (BOX 2). Here, we discuss the contribution of RNA structure to various aspects of translation, including efficiency, initiation, ribosome pausing and termination.

NATURE REVIEWS | GENETICS

ADVANCE ONLINE PUBLICATION | 1 © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS Tertiary structures Non-covalent interactions of separate secondary structures within an RNA molecule. These interactions occur between the bases and 2ʹOH ribose to create precise threedimensional structures.

Next-generation sequencing (NGS). A high-throughput technology that simultaneously sequences millions of different DNA molecules by non-Sanger sequencing methods (most commonly sequencing by synthesis).

RNA structurome The collective group of secondary and tertiary structures formed within the transcriptome of a given organism.

Overall structure in the coding and non-coding regions of mRNAs. Systematic differences between the coding and non-coding regions of mRNAs can include patterns in sequence content, phylogenetic conservation and predicted secondary structures. Such evolutionarily conserved structural patterns in coding sequences imply functional importance that, at least in some cases, has been linked to translational regulation32. Global themes began to emerge in the first whole-genome structure analysis, which was carried out on genomic RNA extracted from HIV‑1 virions33. Using selective 2ʹ‑hydroxyl acylation analysed by primer extension (SHAPE)9,34, together with traditional Sanger sequencing methods read out through capillary electrophoresis, local nucleotide flexibility was assessed and used to derive a near-complete secondary-structure map of the singlestranded RNA (ssRNA) HIV‑1 genome. In this study, both the 5ʹ untranslated region (UTR) and the 3ʹUTR were found to be highly structured, whereas the proteincoding region contained periodic sections of highly structured and relatively unstructured RNA (FIG. 1a). This periodicity of structure in the coding region has important potential implications for ribosome pausing and consequences on protein folding (see below). A range of NGS-based methods have been applied to eukaryotic cells to analyse the structural content of their transcriptomes. These strategies use either combined nucleases (to cleave ssRNA or double-stranded RNA (dsRNA) bases20–22,24–26,29) or chemical probes with

NGS23,27,28,35 (BOX 1). Using parallel analysis of RNA structure (PARS), ds/ssRNA-seq, fragmentation sequencing (FragSeq) and chemical probing, transcriptomes from various organisms and cell types have been analysed to determine whether particular regions of the mRNAs contained characteristic structural features20–29. In results from separate PARS and ds/ssRNA-seq experiments, the structural content of the 5ʹUTRs and 3ʹUTRs relative to the coding regions was found to vary from organism to organism. The 5ʹUTRs and 3ʹUTRs were less structured than the coding regions on average for Saccharomyces cerevisiae and Arabidopsis thaliana, whereas opposite results were obtained for HIV‑1, Drosophila melanogaster, Caenorhabditis elegans and human mRNAs20,24,25,29,33 (FIG. 1). The lack of a universally conserved pattern could reflect functional differences in these regions, which are possibly influenced by the diversity of RNA-binding proteins and mi­RNAs that have evolved within each organism. Although there is no obvious trend in the overall architecture of the UTRs, structures at the end of the 5ʹUTRs and the beginning of the 3ʹUTRs are well conserved. For all five eukaryotic species, there is a significant decrease in mRNA secondary structure near both the start and the stop codons of the protein-coding sequence, and the implications of these observations are discussed below. Notably, the data from these NGS methods are consistent with previous wholegenome computational predictions for both mouse and human mRNAs36.

Box 1 | Methods for high-throughput RNA structure determination Recent experimental approaches for RNA structure determination have combined classic nuclease and chemical probing of RNA with next-generation sequencing (NGS) technologies. So far, there have been several iterations of these methods, each with their own advantages and disadvantages, which have combined either nucleases20–22,24–26,29 or chemical probes with NGS23,27,28,35. All of these methods enable more information-rich experiments involving RNA structure than has previously been possible. The table below summarizes the recent high-throughput methods for RNA structure probing and highlights the choice of structure probe, RNA population probed and NGS platform. The nucleases that were used to cleave at either double-stranded RNA (dsRNA) or single-stranded RNA (ssRNA) are indicated.

Methods

Probe

RNA pool

RNA folding

Organisms

NGS platform

PARS20,29 and PARTE26

RNase V1 (dsRNA) and S1 nuclease (ssRNA)

Poly(A) selection

In vitro

Saccharomyces cerevisiae and humans

ABI SOLiD and Illumina HiSeq2000

ds/ssRNA-seq21,24,25

RNase V1 (dsRNA) and RNase 1 (ssRNA)

Ribosomal RNA depleted

In vitro

Arabidopsis thaliana, Drosophila melanogaster and Caenorhabditis elegans

Illumina GAIIx and HiSeq2000

FragSeq22

P1 nuclease (ssRNA) Nuclear RNA

In vitro

Mice

ABI SOLiD3

SHAPE–seq23,102

1M7

In vitro

In vitro

Bacillus subtilis

Illumina GAIIx

DMS chemical probing27,28

DMS

Poly(A) selection

In vivo

A. Thaliana, S. cerevisiae and humans

Illumina HiSeq2000

MAP-seq35

DMS, CMCT and 1M7

In vitro

In vitro

Synthetic system

Illumina MiSeq

1M7, 1‑methyl‑7‑nitroisatoic anhydride; ABI SOLiD, Applied Biosystems sequencing by oligonucleotide ligation and detection; CMCT, 1‑cyclohexyl-(2‑morpholinoethyl)carbodiimide metho‑p‑toluene sulphonate; DMS, dimethyl sulphate; FragSeq, fragmentation sequencing; GAIIx, Genome Analyzer IIx; MAP-seq, multiplexed accessibility probing read out through sequencing; PARS, parallel analysis of RNA structure; PARTE, parallel analysis of RNA structures with temperature elevation; SHAPE–seq, selective 2ʹ‑hydroxyl acylation analysed by primer extension followed by sequencing.

2 | ADVANCE ONLINE PUBLICATION

www.nature.com/reviews/genetics © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS Shine–Dalgarno sequences Ribosome-binding sites in bacterial mRNA. They are generally located eight bases upstream of the AUG start codon and contain the six-base consensus sequence AGGAGG.

Folding energies The Gibbs free energies associated with particular RNA folds or structures.

Ribosome profiling Qualitative and quantitative sequencing of the RNAs attached to ribosomes as a signature of RNAs that are being translated.

Analyses of the coding regions revealed striking trends that warrant further investigation. In the PARS study using >3,000 mRNAs from S. cerevisiae, a periodic structure signal across the coding regions with a cycle of three nucleotides was detected20 (FIG. 1b). This periodicity has been interpreted to mean that, on average, the first nucleotide of each codon is the least likely, and the second nucleotide is the most likely, to be involved in secondary structural interactions. The amplitude of this periodicity is associated with ribosome density 37 and suggests a functional role in translation. In addition, similar to the results from the HIV‑1 analysis, larger secondary structures were found to have a regulatory role in protein translation. In the HIV‑1 genome, these larger RNA structures correspond to inter-protein linkers in polyprotein precursors and unstructured peptide loops that link protein domains which could affect ribosome pausing 33 (see below). Structure around start codons affects translational efficiency. A relationship between mRNA structure around the translation start site and translational efficiency has previously been shown using a combination of experimental and computational approaches in bacteria. From a synthetic library, >100 synonymous variants of the GFP gene were expressed in Escherichia coli. In this study, mRNA structure near the start codon (nucleotides –4 to +37) correlated with more than half of the observed variations in protein levels in this library 38. Another study using a similar approach showed that mRNAs lacking ribosome binding sites (that is, Shine–Dalgarno sequences) in α-proteobacteria, γ‑proteobacteria, cyanobacteria and plastids also display a pronounced lack of mRNA secondary structure at the start codons. This finding was functionally validated using reporter gene constructs

in E. coli and suggests that start codon accessibility is a major factor even for translation initiation that is independent of Shine–Dalgarno sequences39. However, although the expression levels in these studies were experimentally determined, 5ʹ mRNA folding energies were computationally predicted, and experimental validation of these folding energies is hence required. Transcriptome-wide RNA structure probing has provided some experimental support for the proposed correlation between mRNA structure and translational efficiency. In S. cerevisiae, a small but significant anticorrelation was observed between RNA structure of the region located ten nucleotides upstream of the translation start site and ribosome density throughout the transcript20. This finding was determined by ribosome profiling37, which is an indicator of translational efficiency. Notably, transcripts that showed less structure in their 5ʹUTRs than in the beginning of their coding region also tended to have a higher ribosome density 20 (FIG. 2a). This is reflected in the level of codon usage, in which AT‑rich codons are preferentially selected over GC‑rich codons40–42. These results provided the first comprehensive experimental evidence to validate the hypothesis that mRNA secondary structure around the start codon reduces translational efficiency. An extension of the PARS method — parallel analysis of RNA structures with temperature elevation (PARTE) — has allowed the genome-wide measurement of RNA folding energies of S. cerevisiae mRNAs by probing the secondary structure at temperatures ranging from 23 °C to 75 °C26. The PARTE profile of S. cerevisiae mRNAs confirms that the two local regions with the weakest structural pairings are at the start and stop codons. Regions that are ~20 nucleotides upstream of the start codon and ~10 nucleotides downstream of the stop codon have RNA structures that are significantly

Box 2 | Comparison of RNA structures in translational control The genome-wide structural studies have generated many important, but occasionally conflicting, observations about how RNAs are folded. The table below summarizes five features that have emerged from these studies. Structures in the 5ʹ or 3ʹ untranslated regions (UTRs) vary widely and are likely to represent regulatory elements, such as the pseudoknot in HIV‑1 that binds to the tRNALys3 primer33. Generally, there is a lack of structure surrounding the start codon (which increases translational efficiency) and a corresponding lack of structure around the stop codon. Interestingly, a periodic three-nucleotide signal in the coding sequence has been observed for a few species, and this may facilitate translation20,27,29. Overall, mRNAs have a complex structure landscape, which varies from organism to organism and reflects the distinct regulation pathways that have evolved in each system.

Species

5ʹUTR more structured than coding region

Structure near start codon

Threenucleotide periodicity

Structure near stop codon

3ʹUTR more structured than coding region

Refs

HIV‑1

Yes

NA

NA

NA

Yes

33

Saccharomyces cerevisiae

No

No

Yes

No

No

20,26

Arabidopsis thaliana

No

No

Yes

No

No

25,27

Drosophila melanogaster

Yes

No

NA

No

Yes

24

Caenorhabditis elegans

Yes

No

NA

No

Yes

24

Humans

Yes

No

Yes

No

Yes

29,36

NA, not applicable (that is, not evaluated in current genome-wide studies).

NATURE REVIEWS | GENETICS

ADVANCE ONLINE PUBLICATION | 3 © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS more stable26. This polarity of structural stability may be a consequence of secondary or higher-order structures of RNAs and could function in regulating translation initiation, elongation or termination.

a

In addition to structure around the start codon, evolutionarily conserved RNA structures and codon usage directly after the translation start site can affect protein expression43. For example, genome-wide studies of RNA

Highly structured 5′UTR and 3′UTR relative to coding sequence HIV-1, D. melanogaster, C. elegans and humans

HIV-1 genome SHAPE reactivity

0

0

0.1

0.1

0.2

0.2

0.3

0.3

0.4

0.4

0.5

0.5

Start 5′UTR

More structured

Less structured

Stop Coding sequence

b

Coding sequence

3′UTR

Periodicity in coding sequence

Frequency

400

Coding sequence 5′UTR 3′UTR

300 200 100 0 2

3

4

5

6

7

8

9

10

S. cerevisiae mRNA average PARS score

Period (number of nucleotides) 1.5

1.5

1.0

1.0

0.5

0.5

0

More structured

0

–0.5

–0.5

–1.0

–1.0

Less structured

Poorly structured 5′UTR and 3′UTR relative to coding sequence S. cerevisiae and A. thaliana 33 Figure 1 | Structural motifs within the transcriptome.  a | RNA structure of the HIV‑1 single-stranded RNA genome Nature Reviews | Genetics is shown. The selective 2ʹ‑hydroxyl acylation analysed by primer extension (SHAPE) reactivity is indicated in red, and higher SHAPE reactivity correlates with less structure. Horizontal blue lines denote the average SHAPE reactivity for the coding sequence. The highly structured untranslated regions (UTRs) are highlighted, and similar trends are apparent for the Drosophila melanogaster, Caenorhabditis elegans and human transcriptome24,29,33. b | The average parallel analysis of RNA structure (PARS) score across >3,000 Saccharomyces cerevisiae mRNA transcripts is shown for the 5ʹUTR, the coding sequence and the 3ʹUTR20. Horizontal blue lines denote the average PARS score for the coding sequence. The poorly structured UTRs are highlighted, and similar trends are apparent for the Arabidopsis thaliana transcriptome20,25. The inset shows the three-nucleotide periodicity that is only found in the protein-coding sequences of these mRNAs20. Figure reproduced with permission from: a, REF. 33, Nature Publishing Group; b, REF. 20, Nature Publishing Group.

4 | ADVANCE ONLINE PUBLICATION

www.nature.com/reviews/genetics © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS structure have revealed an increase in RNA structure downstream of the start codon for genes that localize to distinct cellular domains20. For codon usage, rare codons are frequently found at the beginning of the gene, which perhaps act to slow translating ribosomes to help a Accessibility of the 5′UTR increases translation rate

Structures near the start codon and in the coding sequence reduce translation rate

Coding sequence

Coding sequence

3′UTR

b

0.1

0.1

HIV-1 genome SHAPE reactivity

5′UTR

0.2

0.2 structured

0.3

0.3

0.4

0.4

0.5

0.5

0.6

0.6

More

Less structured

Inter-protein linkers Protein domain junctions

p6

N

Matrix

Nucleocapsid

Capsid

Figure 2 | Structure around start codons and translational efficiency.  Nature Reviews | Genetics a | Accessibility of the 5ʹ untranslated region (UTR) increases translation rate. Structure around the 5ʹUTR directly upstream of the start codon has a large influence on translation initiation rates and protein abundance. Decreased structure in this region allows efficient ribosome binding and start codon scanning. b | The RNA folding energy of different segments of the coding region is associated with protein structure33. Increased RNA structure in the coding region correlates with inter-protein linkers and protein domain junctions. The increased structure of these regions promotes ribosome pausing and assists in protein domain folding103,104. The protein domains shown are Protein Data Bank identifiers 2GOL and 1A43. SHAPE, selective 2ʹ‑hydroxyl acylation analysed by primer extension. Figure reproduced with permission from: a, REF. 30, Nature Publishing Group; b, REF. 33, Nature Publishing Group.

ribosome allocation and to prevent jamming 44. However, this may also be a result of RNA structure, as rare codons in E. coli are frequently AT‑rich at the third position42. Overall, these results may be useful for optimizing gene expression levels when designing expression constructs for various applications, and further experiments using controlled synthetic libraries could help to dissect the effects of codon usage versus RNA structure on translational efficiency. Structure in coding regions: ribosome pausing and translational efficiency. Many proteins are known to fold co‑translationally, and folding intermediates may be formed for almost every protein45. The rate of translation, which can vary by several orders of magnitude across a transcript, can influence the protein folding pathway 46,47. Aside from physical obstacles to translation, such as RNA-binding proteins and ribosome pausing due to rare codon usage or low tRNA abundance, RNA structure can have a profound effect on the rate of translation48,49. Previous studies have focused on individual transcripts, which highlighted how mutations can change the structure of an mRNA and lead to downstream changes in protein levels 50. A recurring pattern of RNA structure located near or after the regions of the genome that encode autonomously folding protein domains was discovered in the HIV‑1 genome. This observation led to the proposal that protein structure is encoded by both the primary sequence and the higher-order structure of an mRNA33. Ribosome pausing due to highly structured RNA could facilitate the folding of individual domains within a single protein or a polyprotein. Analyses of the ribosome pause sites in the HIV‑1 genome revealed that structured regions of the RNA were overrepresented at pause sites. This co‑translational pausing could be a common feature among eukaryotic mRNAs51,52 (FIG. 2b). Although there are variations in overall structure of coding regions in the mRNAs from different organisms, in both S. cerevisiae and A. thaliana there is a correlation between highly structured features in mRNAs and increased protein translation20,25. A study comparing S.  cerevisiae PARS-determined mRNA folding strength with protein abundance came to the surprising conclusion that mRNAs with highly structured features in their coding regions were more highly expressed53. To determine global protein abundance in S. cerevisiae, this study used the average of four previously published quantitative data sets54–56; for ribosome densities on mRNAs, two genome-wide measurements were considered, one of which was generated by ribosome profiling 37 and the other by microarray 57. With the exception of codon bias, RNA secondary structure stability was the feature with the strongest correlation to protein abundance and remained significant when controlling for mRNA levels. This study also found a strong positive correlation between RNA folding strength and ribosome density. Therefore, it seemed that highly structured RNAs might be expressed at higher levels — an observation that is not easily accounted for by mRNA half-lives because

NATURE REVIEWS | GENETICS

ADVANCE ONLINE PUBLICATION | 5 © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS the correlation between mRNA folding strengths and half-lives is weak. One explanation for this observation could be that RNA structure assists in protein folding and therefore enables more productive expression levels. More experiments on a genome-wide scale with detailed biochemical follow‑up are needed to fully understand the effect of increased mRNA structure and other important factors on protein expression levels.

Signal peptide A short peptide of 5–30 amino acids at the amino terminus of newly synthesized proteins in the secretory pathway.

RNA structures guiding RNA localization Previously thought to be a specialized mechanism that is limited to a small number of transcripts, it is now known that >70% of mRNAs can be asymmetrically localized in certain cell types58. RNA localization is an important aspect of gene expression and provides both temporal and spatial control that is crucial for cell migration and differentiation59. One method to control RNA localization is through cis-acting RNA elements known as zipcodes, which are found primarily within the 3ʹUTRs of mRNAs6. Transported mRNAs can contain multiple zipcodes, ranging from a few nucleotides to >1 kb in size, that act together to recruit a combination of proteins60. These proteins then associate with cytoskeletal motors and regulate transport throughout the cell61. Thus, levels of structure across the transcriptome could reveal whether genes with shared cellular localization sites tend to have similar degrees of structure. The PARS study in S. cerevisiae found increased RNA structure in transcripts encoding proteins that localize to specific cellular domains20, which suggests the presence of distinct structural localization elements. Surprisingly, this increased structure was prevalent within the coding regions. Although less common, localization elements can be found within the coding region of mRNAs, such as the well-characterized ASH1 mRNA from S. cerevisiae61, and this study implies that we may need to rethink where localization elements are typically found. Some mRNAs, particularly those encoding secreted proteins, contain sequences in the 5ʹ end of the coding region that affect RNA localization62. The aminoterminal hydrophobic amino acid sequence (that is, the signal peptide) in a nascent polypeptide guides the protein and mRNA co‑translationally to the endoplasmic reticulum; however, the mRNA region that encodes the signal peptide also contains important information at the nucleotide level for mRNA localization. The 5ʹUTRs and the first 30 coding nucleotides of these mRNAs are important for nuclear export 62. From the PARS data, these regions in mRNAs encoding secreted proteins were less structured on average than the same regions in other mRNAs20. In many cases, three-dimensional RNA structure may be crucial for recognition. Several prominent examples support this hypothesis, such as the region of RNA recognized by the RNA-binding protein Egalitarian and the unusual A‑form helix recognized for transport by dynein63,64. Further analyses of structure, combined with new proteomic and high-resolution cell imaging data, could provide valuable insights in the future.

RNA thermometers and RNA stability RNA structure and folding energies can influence both the translation and the stability of an mRNA under stress conditions in vivo. During periods of stress, protein translation is sustained by mechanisms that facilitate translational initiation or elongation47. An early example of this phenomenon comes from bacteria, in which temperature-sensitive structures known as RNA thermometers affect mRNA translation65,66. At low or physio­ logical temperatures, these RNA structures are stable and block ribosome binding to the 5ʹUTR. At elevated temperatures, such structures unfold to expose ribosome binding sites that are normally occluded, thus promoting the expression of factors that stabilize unfolded proteins67 (FIG. 3a). Although RNA thermometers have been found predominantly in bacteria, this mechanism may also exist in eukaryotes68,69. The folding energies of the entire S. cerevisiae transcriptome were determined using PARTE, which revealed that thermodynamically unstable structures are enriched in ribosome binding sites of mRNA 5ʹUTRs26. This suggested that RNA thermometers can function as an evolutionarily conserved heat shock mechanism in eukaryotes; however, specific examples that affect translation have yet to be verified biochemically. Whereas the 5ʹUTRs contain thermodynamically unstable structures to allow translation initiation in general, mRNAs that were thermodynamically more stable across the entire transcript were more abundant during heat shock in vivo 26,70. These structures could prevent degradation by the exosome complex, which is the major 3ʹ-to‑5ʹ exonuclease in eukaryotes and requires a 3ʹ ssRNA region of ~30 nucleotides in its targets. The exosome was inhibited if the 3ʹ end of the RNA was structured or contained only a short 3ʹ overhang 71 (FIG.  3b) . Mutational inactivation of the exosome in S. cerevisiae decreased the degradation of all mRNAs, and especially those with unstable RNA structures26. These results were recapitulated in an in vitro assay, in which a reconstituted exosome was able to rapidly degrade an mRNA encoding a ribosomal protein, which has a less thermodynamically stable structure overall. Many of these housekeeping genes are generally repressed during heat shock to prevent the transcription, translation and accumulation of misfolded proteins72. By contrast, an mRNA encoding a key activator of the unfolded protein response with a stable 3ʹUTR structure showed a much longer half-life26. The exosome complex thus seemed to discriminate between structured and unstructured substrates for decay by selecting unfolded RNA structures for degradation in a temperature-dependent manner (FIG. 3b). The situation differs in A. thaliana, in which a significant negative correlation between RNA secondary structure and mRNA abundance was found by comparing RNA sequencing and ds/ssRNA-seq data25. RNAs with low levels of secondary structure were on average more abundant in the transcriptome and vice versa. This finding was contrary to the observed relationship in S. cerevisiae mRNAs and could be explained by the increased tendency of structured RNAs to be processed

6 | ADVANCE ONLINE PUBLICATION

www.nature.com/reviews/genetics © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS a Bacteria

b Yeast

Increased temperature

Heat shock response protein-encoding mRNAs (e.g. rpoH)

Increased expression

Decreased expression

Unfolded response protein-encoding mRNAs (e.g. HAC1 and PTC2)

Ribosomal proteinencoding mRNAs (e.g. RPL1A)

Coding sequence

RBS 5′UTR

3′UTR ΔT

ΔT

3′UTR ΔT Exosome

Ribosome

Figure 3 | RNA structure and stability under heat shock.  a | RNA thermometers are in equilibrium between folded and Nature Reviews | Genetics unfolded states. At low temperatures the RNA is predominantly in the folded state, which obscures the ribosome binding site (RBS) and thus inhibits translation. As the temperature increases, the RNA adopts a less structured conformation, which allows ribosome binding and translation of heat shock response proteins. RNA thermometers are well characterized in bacteria, such as the transcript of the rpoH heat shock gene66. In Saccharomyces cerevisiae, bases that melt between 30 °C and 37 °C in the 5ʹ untranslated region (UTR) are found most densely at –10 and –20 nucleotides upstream of the start codon, which indicates that a similar mechanism might operate in eukaryotes26. b | Structured mRNAs are stabilized in S. cerevisiae. RNA in unstable structures becomes unfolded at higher temperatures, which allows degradation by the exosome. This promotes the expression of chaperones and unfolded response proteins (for example, HAC1 and PTC2) while shutting down other processes (for example, transcription and translation) to conserve energy expenditure26. ΔT indicates the change in temperature.

into small regulatory RNAs in plants25. As viral RNAs tend to be highly structured33, there could be selective pressure in A. thaliana to differentiate between foreign and endogenous RNA molecules.

RNA structure in gene regulation by small RNAs It is becoming increasingly clear that RNA structure contributes substantially to gene regulation by mi­RNAs: structure in miRNA precursor transcripts controls their processing, and structure within mRNAs influences susceptibility to miRNA targeting. In the canonical miRNA biogenesis pathway, dsRNA regions of precursors are recognized by Dicer (an RNase III type enzyme) and are processed into 20–30‑nucleotide mi­RNAs. These small RNA duplexes are loaded onto an Argonaute (AGO) protein, which is part of an RNA-induced silencing complex (RISC), and the miRNA guides RISC to complementary regions in target mRNAs. Following RISC binding, the translation levels of these mRNAs are reduced, and the mRNA may be cleaved for complete translational repression 73. As most regulatory small RNAs are produced from dsRNA precursors, mRNA secondary structure could, in principle, affect the total levels of small RNAs generated by the processing machinery of the cell. In C. elegans, D. melanogaster and A. thaliana, the highly base-paired regions in all interrogated RNA categories — including functional RNAs (for example, ribosomal RNAs, small nuclear RNAs (snRNAs), small nucleolar RNAs (snoRNAs) and tRNAs) and

pre-mRNAs — were found to be preferentially processed into small RNAs21,24,25. This finding suggested that transcripts containing long based-paired stem–loops, either isolated or in clusters, are recognized as small RNA precursors (FIG. 4a). Additionally, long dsRNAs (which are common in the life cycles of various viruses) are known to activate an antiviral innate immune response in mammals, which induces cleavage of these non-self RNAs74. Thus, the need to avoid cleavage by cellular machineries is one reason that the protein-coding regions in many mRNAs maintain low levels of secondary structure. In addition to influencing the biogenesis of these small regulatory RNAs, structure also affects the accessibility and stability of miRNA binding sites. Until recently, the incorporation of thermodynamics into algorithms that predict mi­RNAs and their targets was limited by a lack of experimental data 75. Given the availability of methods to probe mRNA structure on a genome-wide scale, strategies for predicting and validating functional miRNA binding sites are now becoming increasingly accurate76. Investigating the structure within 3ʹUTRs of C.  elegans and D.  melanogaster mRNAs on a transcriptome-wide scale was the first interrogation of the structural patterns within miRNA binding sites24. Overall, increased secondary structure in the 3ʹUTRs of animal mRNAs relative to S. cerevisiae20 transcripts led to the hypothesis that such structures mask miRNA binding sites to prevent or reduce silencing of transcripts, as miRNA-mediated gene regulation is active in multicellular animals but not in budding

NATURE REVIEWS | GENETICS

ADVANCE ONLINE PUBLICATION | 7 © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS a

Dicer

Other RNases

b

AGO

miRNA AGO

mRNA

Normalized reads

c Degraded mRNA sequences

Upstream mRNA

miRNA site

Downstream mRNA

Figure 4 | mRNA structure involved in the regulation of Nature Reviews | Genetics translation by small RNAs.  a | RNA structure is involved in the processing of small RNAs (shown in red). Long double-stranded RNAs or hairpin RNAs are recognized by Dicer (an RNase III type enzyme) and are processed into 20–25‑nucleotide small RNAs. Other RNases can also bind to these RNA structures to generate additional small RNAs for gene regulation. b | mRNA structure affects microRNA (miRNA) binding to target sites. Small RNAs are loaded onto Argonaute (AGO), which is a key component of the RNA-induced silencing complex (RISC) that binds to target mRNAs to silence expression. Structure around the AGO-binding sites in target mRNAs prevents this posttranscriptional regulation. c | Degradome sequencing reveals novel miRNA binding sites in mRNAs. Reads that map to the cleavage sites are enriched for miRNA binding sites (shown in blue)83.

yeast 77. This hypothesis was tested by determining the structure at predicted miRNA binding sites, as well as the 50‑bp sequence immediately upstream and downstream78,79. This analysis revealed an overall decrease in the tendency to detect structured RNA across the length of miRNA binding sites in C. elegans target mRNAs (FIG. 4b). However, the opposite trend was observed for D. melanogaster miRNA binding sites. In this case, a significant increase in secondary structure specifically

within the seed-pairing region of miRNA binding sites was observed. Together, these findings suggest that miRNA regulatory complexes encounter different mRNA structural contexts in these two animals; nonetheless, the findings in D. melanogaster seem to be the exception rather than the rule75. Recent findings from additional genome-wide structure probing studies in humans provide strong experimental evidence that miRNA binding sites tend to be less structured than their flanking regions, which allows efficient targeting by small regulatory RNAs29. Currently, mi­RNAs and their targets are difficult to predict de novo owing to a lack of conservation in the sequence of small RNAs and to competing structural elements in their targets. Degradome sequencing, which detects RNAs in the process of being degraded, has facilitated the discovery of novel miRNA–target RNA pairs. Degradome sequencing takes advantage of the fact that for ligation during the library preparation step of Illumina sequencing, RNAs with a free 5ʹ‑monophosphate are required, which occurs naturally through cellular pathways of RNA decay. RNA intermediates of mRNA turnover can therefore be ligated and sequenced to find novel miRNA–target RNA pairs by working in reverse from the cleaved transcripts80–83. In A. thaliana, in which miRNAs are completely complementary to their target sites, strong signals are observed for sites of small-RNA-guided cleavage83 (FIG. 4c). A complementary method for organisms in which small RNAguided cleavage sites are more challenging to determine is AGO high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS–CLIP) and its variants84–87; this approach can simultaneously isolate mi­RNAs and mRNA target sites that are bound by AGO proteins. Linking both methods with transcriptomewide RNA structure predictions may help to accurately predict small RNAs and their targets.

RNA structure in vivo Until recently, high-throughput cellular RNA structural studies have used in vitro re‑folded RNA, which may not reflect the true structure in vivo in which RNAs interact with other cellular components. An important challenge is to adapt current technologies for applications in vivo using reagents that can penetrate cell membranes to provide readouts with enhanced sensitivity. Various compounds have been developed for examining intracellular RNA structures88–92, and the first global in vivo probing was recently carried out using dimethyl sulphate (DMS) on the A. thaliana, S.  cerevisiae, and human transcriptomes 27,28. DMS probing methylates adenine and cytosine bases in nonWatson–Crick conformations in unprotected RNAs, which are then detected by reverse transcription and NGS93. A caveat of this and other chemical probing methods is that dsRNA and protein-bound ssRNA may appear the same in the analysis. A complementary approach to these studies was carried out using PARS under near-in vivo conditions by deproteinizing natively folded RNA molecules from lymphoblastoid cells of a human parent–offspring trio29.

8 | ADVANCE ONLINE PUBLICATION

www.nature.com/reviews/genetics © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS A primary observation from the A. thaliana transcriptome study 27 was the presence of a three-nucleotide periodic repeat pattern in the coding region of mRNAs. This pattern was especially prominent for highly translated genes and is the first in vivo demonstration of triplet periodicity in a multicellular organism. Notably, these results are consistent with in vitro observations from S. cerevisiae and with computational predictions in humans20,36, which suggests a universal feature of translated mRNAs. Two additional features revealed from the study on A. thaliana were a pattern of strong and weak RNA structures at alternative polyadenylation cleavage sites, and stable RNA structures at the 5ʹ splice sites of unspliced pre-mRNAs, which suggests that secondary structure inhibits the first step of splicing. These results were corroborated by PARS on human transcripts, in which decreased RNA structure was found at the 5ʹ splice sites29. The A. thaliana study also observed increased structure at the 3ʹ splice sites, which supports the influence of RNA structure on mRNA maturation in vivo. A future direction for global RNA structural studies is to examine how genetic variation alters RNA structure to control gene regulation, thus providing insights into how RNA structure contributes to inter-individual differences in gene expression. An advantage of using cells from a family trio of humans is to evaluate how single-nucleotide variants (SNVs) influence RNA structure29. SNVs that alter RNA structure have been termed riboSNitches94. Analysing the genomic distribution of these riboSNitches provides information on how evolution suppresses SNVs that disrupt functionally important RNA structure. For example, riboSNitches are depleted in 3ʹUTRs, particularly around miRNA binding sites. This study was the first global evaluation of riboSNitches, and further in vivo studies can investigate more detailed mechanisms for how riboSNitches lead to regulation of gene expression at the RNA level. Finally, a surprising result from the pair of in vivo DMS probing studies was that the mRNAs in vivo were modified by DMS to a greater extent than expected, which suggests that they were less structured or more dynamic than those observed in vitro27,28. In particular, A. thaliana mRNAs encoding proteins involved in stress responses tended to have more single-stranded regions when the structure was probed in vivo25. There are several hypotheses to explain why RNAs in vivo are less structured than those predicted from in vitro data. The first is that cellular RNAs are remodelled by energy-dependent processes such as protein translation. Second, this in vivo ‘snapshot’ is an average of multiple RNA structures that represent different stages in the life cycle of an RNA molecule. Specific examples of how RNA structural changes in response to protein binding and other cellular factors have yet to be validated to clarify this global picture of the in vivo RNA structure landscape. RiboSNitches Single-nucleotide variants that alter RNA structure and that potentially affect gene regulation.

Tertiary RNA structure It has been difficult to assess levels of RNA tertiary structure on a genome-wide scale. This aspect of structure determination lags behind secondary structure

detection owing to inherent challenges in accurately decoding long-range interactions by chemical and enzymatic probing methods. Methods such as X‑ray crystallography, NMR and cryo-electron microscopy provide high-resolution information but require preparation of biochemically ‘well-behaved’ samples, which limits their throughput. Small-angle X‑ray scattering can provide information about conformational states and changes in large sets of RNAs that are analysed individually, but it does not produce atomic-resolution structural models95,96. Recent strategies for the detection of three-dimensional RNA structures have coupled experimental and computational methods to allow much higher throughput for generating atomic-level RNA models97, but their output remains to be validated experimentally 98. Tertiary RNA structure determination may give new insights into the growing class of long non-coding RNAs (lncRNAs), which have been mostly omitted from genome-wide structural studies carried out so far 99,100. Unlike mRNAs, lncRNAs are low in abundance and difficult to enrich for sequencing. In the PARTE study 26, non-coding RNAs (such as rRNA, tRNA, snoRNA and snRNA) had a higher average melting temperature per nucleotide than mRNAs, which suggests that noncoding RNAs contain more higher-order structures101. With improved sensitivity and modelling, lncRNAs and other poorly understood RNA families can be explored to gain insights into their function.

Conclusions and future perspectives The rapid pace of new transcript discovery has driven efforts to detect and analyse the molecular structures of RNA more quickly than has been possible using traditional biophysical approaches. The combination of classic RNA chemical probing methods and advances in sequencing technology has enabled large-scale studies on whole transcriptomes and on intact viral genomes, which has provided comprehensive insights into RNA structure27–29,33. Three major findings have emerged from such studies so far. The first is that, unlike overcooked spaghetti, many mRNAs include regions of stable secondary and possibly tertiary structure. This is true regardless of organism; although these regions are not necessarily conserved evolutionarily, existing evidence supports their role in regulating access to ribosomes, mi­RNAs and the RNA degradation machinery 20,24,25,79–81. The second is that the levels of detected structure correspond to the levels of precursor RNA processing. Thus, plant transcripts that are not substrates for dsRNA-processing enzymes (which are hence less structured overall) tend to be more abundant in the transcriptome21,25,80. The opposite situation occurs in S. cerevisiae, which does not have as much dsRNA processing (as they lack mi­RNAs) and also has a more highly structured transcriptome20. The third major finding is that RNAs are less structured, or at least more structurally dynamic, in vivo than in vitro 27,28. Plant, yeast and mammalian transcripts share this property, possibly owing to intracellular protein binding kinetics and/or to association and rearrangements within large multicomponent complexes.

NATURE REVIEWS | GENETICS

ADVANCE ONLINE PUBLICATION | 9 © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS Future efforts will expand the use of genome-wide RNA structure analysis methods, particularly by improving approaches to handling and interpreting large data sets. These global studies currently report an averaged snapshot of the entire RNA population in various biological processes, and experimental strategies for deconvoluting the single-molecule states need to be developed. Furthermore, experimental strategies for preserving and detecting both native protein–RNA interactions and co‑transcriptional folding of RNA are important. Computational methods are indispensable in the era of genome-wide probing, and we need to understand and minimize the types of errors that can arise in computational prediction of RNA structures, even with the incorporation of experimental

Sharp, P. A. The centrality of RNA. Cell 136, 577–580 (2009). 2. Cruz, J. A. & Westhof, E. The dynamic landscapes of RNA architecture. Cell 136, 604–609 (2009). 3. Warf, M. B. & Berglund, J. A. Role of RNA structure in regulating pre-mRNA splicing. Trends Biochem. Sci. 35, 169–178 (2010). 4. McManus, C. J. & Graveley, B. R. RNA structure and the mechanisms of alternative splicing. Curr. Opin. Genet. Dev. 21, 373–379 (2011). 5. Kozak, M. Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene 361, 13–37 (2005). 6. Martin, K. C. & Ephrussi, A. mRNA localization: gene expression in the spatial dimension. Cell 136, 719–730 (2009). 7. Garneau, N. L., Wilusz, J. & Wilusz, C. J. The highways and byways of mRNA decay. Nature Rev. Mol. Cell Biol. 8, 113–126 (2007). 8. Mauger, D. M., Siegfried, N. A. & Weeks, K. M. The genetic code as expressed through relationships between mRNA structure and protein function. FEBS Lett. 587, 1180–1188 (2013). 9. Merino, E. J., Wilkinson, K. A., Coughlan, J. L. & Weeks, K. M. RNA structure analysis at single nucleotide resolution by selective 2ʹ‑hydroxyl acylation and primer extension (SHAPE). J. Am. Chem. Soc. 127, 4223–4231 (2005). 10. Laederach, A., Shcherbakova, I., Jonikas, M. A., Altman, R. B. & Brenowitz, M. Distinct contribution of electrostatics, initial conformational ensemble, and macromolecular stability in RNA folding. Proc. Natl Acad. Sci. 104, 7045–7050 (2007). 11. Adilakshmi, T., Bellur, D. L. & Woodson, S. A. Concurrent nucleation of 16S folding and induced fit in 30S ribosome assembly. Nature 455, 1268–1272 (2008). 12. Weeks, K. M. Advances in RNA structure analysis by chemical probing. Curr. Opin. Struct. Biol. 20, 295–304 (2010). 13. Hofacker, I. L. Vienna RNA secondary structure server. Nucleic Acids Res. 31, 3429–3431 (2003). 14. Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415 (2003). 15. Disney, M. D., Childs, J. L., Schroeder, S. J., Zuker, M. & Turner, D. H. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl Acad. Sci. 101, 7287–7292 (2004). 16. Das, R. et al. Structural inference of native and partially folded RNA by high-throughput contact mapping. Proc. Natl Acad. Sci. 105, 4144–4149 (2008). 17. Ouyang, Z., Snyder, M. P. & Chang, H. Y. SeqFold: genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data. Genome Res. 23, 377–387 (2013). 18. Smith, M. A., Gesell, T., Stadler, P. F. & Mattick, J. S. Widespread purifying selection on RNA structure in mammals. Nucleic Acids Res. 41, 8220–8236 (2013). 19. Lee, J. et al. RNA design rules from a massive open laboratory. Proc. Natl Acad. Sci. 111, 2122–2127 (2014). 1.

constraints. Additionally, computational studies of evolutionary constraints across the genome may help to decipher the results from genome-wide experimental studies by assessing whether the reported signals of RNA structure are indeed under direct selection. Finally, RNA tertiary structure needs to be considered. Tertiary structure analyses and predictions are improving with the increased knowledge of RNA structural motifs from atomic-resolution models and with the increased knowledge of RNA folding principles. Together, such analyses coupled to functional studies and applied to a more diverse range of systems will enable better comparisons and more in‑depth analyses of regulatory functions of RNA structure in biology.

20. Kertesz, M. et al. Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 (2010). This paper reports the development of PARS and the secondary structure profile of budding yeast mRNA. 21. Zheng, Q. et al. Genome-wide double-stranded RNA sequencing reveals the functional significance of basepaired RNAs in Arabidopsis. PLoS Genet. 6, e1001141 (2010). 22. Underwood, J. G. et al. FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing. 7, 995–1001 (2010). This paper reports the development of FragSeq and uses it to map the structure of the mouse nuclear transcriptome. 23. Lucks, J. B. et al. Multiplexed RNA structure characterization with selective 2ʹ‑hydroxyl acylation analyzed by primer extension sequencing (SHAPE– seq). Proc. Natl Acad. Sci. 108, 11063–11068 (2011). This study presents the first multiplexed approach to structure probing using SHAPE with NGS readout. 24. Li, F. et al. Global analysis of RNA secondary structure in two metazoans. Cell Rep. 1, 69–82 (2012). 25. Li, F. et al. Regulatory impact of RNA secondary structure across the Arabidopsis transcriptome. Plant Cell 24, 4346–4359 (2012). References 24 and 25 apply ds/ssRNA-seq to the A. thaliana, D. melanogaster and C. elegans transcriptomes. 26. Wan, Y. et al. Genome-wide measurement of RNA folding energies. Mol. Cell 48, 169–181 (2012). 27. Ding, Y. et al. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700 (2013). 28. Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2013). 29. Wan, Y. et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–709 (2014). References 27–29 investigate the structure of RNA in vivo. 30. Wan, Y., Kertesz, M., Spitale, R. C., Segal, E. & Chang, H. Y. Understanding the transcriptome through RNA structure. Nature Rev. Genet. 12, 641–655 (2011). 31. Silverman, I. M., Li, F. & Gregory, B. D. Genomic era analyses of RNA secondary structure and RNA-binding proteins reveal their significance to posttranscriptional regulation in plants. Plant Science 205–206, 55–62 (2013). 32. Katz, L. & Burge, C. B. Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Res. 13, 2042–2051 (2003). 33. Watts, J. M. et al. Architecture and secondary structure of an entire HIV‑1 RNA genome. Nature 460, 711–716 (2009). This is the first genome-wide study of RNA structure using SHAPE on ex virio HIV‑1 RNA. 34. Mortimer, S. A. & Weeks, K. M. A fast-acting reagent for accurate analysis of RNA secondary and tertiary structure by SHAPE chemistry. J. Am. Chem. Soc. 129, 4144–4145 (2007).

10 | ADVANCE ONLINE PUBLICATION

35. Seetin, M., Kladwang, W., Bida, J. P. & Das, R. Massively parallel RNA chemical mapping with a reduced bias MAP–seq protocol. Methods Mol. Biol. 1086, 95–117 (2014). 36. Shabalina, S. A. A periodic pattern of mRNA secondary structure created by the genetic code. Nucleic Acids Res. 34, 2428–2437 (2006). 37. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009). 38. Kudla, G., Murray, A. W., Tollervey, D. & Plotkin, J. B. Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258 (2009). 39. Scharff, L. B., Childs, L., Walther, D. & Bock, R. Local absence of secondary structure permits translation of mRNAs that lack ribosome-binding sites. PLoS Genet. 7, e1002155 (2011). 40. Gu, W., Zhou, T. & Wilke, C. O. A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput. Biol. 6, e1000664 (2010). 41. Bentele, K., Saffert, P., Rauscher, R., Ignatova, Z. & Blüthgen, N. Efficient translation initiation dictates codon usage at gene start. Mol. Syst. Biol. 9, 675 (2013). 42. Goodman, D. B., Church, G. M. & Kosuri, S. Causes and effects of N‑terminal codon bias in bacterial genes. Science 342, 475–479 (2013). 43. Mao, Y., Wang, W., Cheng, N., Li, Q. & Tao, S. Universally increased mRNA stability downstream of the translation initiation site in eukaryotes and prokaryotes. Gene 517, 230–235 (2013). 44. Tuller, T., Waldman, Y. Y., Kupiec, M. & Ruppin, E. Translation efficiency is determined by both codon bias and folding energy. Proc. Natl Acad. Sci. 107, 3645–3650 (2010). 45. Komar, A. A. A pause for thought along the co‑translational folding pathway. Trends Biochem. Sci. 34, 16–24 (2009). 46. Wolin, S. L. & Walter, P. Ribosome pausing and stacking during translation of a eukaryotic mRNA. EMBO J. 7, 3559–3569 (1988). 47. Shah, P., Ding, Y., Niemczyk, M., Kudla, G. & Plotkin, J. Rate-limiting steps in yeast protein translation. Cell 153, 1589–1601 (2013). 48. Meyer, I. M. Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs. Nucleic Acids Res. 33, 6338–6348 (2005). 49. Dana, A. & Tuller, T. Determinants of translation elongation speed and ribosomal profiling biases in mouse embryonic stem cells. PLoS Comput. Biol. 8, e1002755 (2012). 50. Nackley, A. G. et al. Human catecholO‑methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 314, 1930–1933 (2006). 51. Wen, J. et al. Following translation by single ribosomes one codon at a time. Nature 452, 598–603 (2008). 52. Han, Y. et al. Monitoring cotranslational protein folding in mammalian cells at codon resolution. Proc. Natl Acad. Sci. 109, 12467–12472 (2012). 53. Zur, H. & Tuller, T. Strong association between mRNA folding strength and protein abundance in S. cerevisiae. EMBO Rep. 13, 272–277 (2012).

www.nature.com/reviews/genetics © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS 54. Ghaemmaghami, S. et al. Global analysis of protein expression in yeast. Nature 425, 737–741 (2003). 55. Newman, J. R. S. et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846 (2006). 56. Lee, M. V. et al. A dynamic model of proteome changes reveals new roles for transcript alteration in yeast. Mol. Syst. Biol. 7, 514 (2011). 57. Arava, Y. et al. Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc. Natl Acad. Sci. 100, 3889–3894 (2003). 58. Lécuyer, E. et al. Global analysis of mRNA localization reveals a prominent role in organizing cellular architecture and function. Cell 131, 174–187 (2007). 59. Holt, C. E. & Bullock, S. L. Subcellular mRNA localization in animal cells and why it matters. Science 326, 1212–1216 (2009). 60. Jambhekar, A. & DeRisi, J. Cis-acting determinants of asymmetric, cytoplasmic RNA transport. RNA 13, 625–642 (2007). 61. Chartrand, P., Meng, X. H., Huttelmaier, S., Donato, D. & Singer, R. H. Asymmetric sorting of ash1p in yeast results from inhibition of translation by localization elements in the mRNA. Mol. Cell 10, 1319–1330 (2002). 62. Palazzo, A. F. et al. The signal sequence coding region promotes nuclear export of mRNA. PLoS Biol. 5, e322 (2007). 63. Dienstbier, M., Boehl, F., Li, X. & Bullock, S. L. Egalitarian is a selective RNA-binding protein linking mRNA localization signals to the dynein motor. Genes Dev. 23, 1546–1558 (2009). 64. Bullock, S. L., Ringel, I., Ish-Horowicz, D. & Lukavsky, P. J. Aʹ-form RNA helices are required for cytoplasmic mRNA transport in Drosophila. Nature Struct. Mol. Biol. 17, 703–709 (2010). 65. Skripkin, E. A., Adhin, M. R., de Smit, M. H. & van Duin, J. Secondary structure of the central region of bacteriophage MS2 RNA. Conservation and biological significance. J. Mol. Biol. 211, 447–463 (1990). 66. Kortmann, J. & Narberhaus, F. Bacterial RNA thermometers: molecular zippers and switches. Nature Rev. Microbiol. 10, 255–265 (2012). 67. Chowdhury, S., Maris, C., Allain, F. H. & Narberhaus, F. Molecular basis for temperature sensing by an RNA thermometer. EMBO J. 25, 2487–2497 (2006). 68. Meyer, M., Plass, M., Pérez-Valle, J., Eyras, E. & Vilardell, J. Deciphering 3ʹSS selection in the yeast genome reveals an RNA thermosensor that mediates alternative splicing. Mol. Cell 43, 1033–1039 (2011). 69. Vandivier, L. et al. Arabidopsis mRNA secondary structure correlates with protein function and domains. Plant Signal. Behav. 8, e24301 (2013). 70. Gasch, A. P. et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257 (2000). 71. Bonneau, F. et al. The yeast exosome functions as a macromolecular cage to channel RNA substrates for degradation. Cell 139, 547–559 (2009). 72. Panniers, R. Translational control during heat shock. Cell 76, 737–747 (1994).

73. Wilson, R. C. & Doudna, J. A. Molecular mechanisms of RNA interference. 42, 217–239 (2013). 74. Gantier, M. P. & Williams, B. R. G. The response of mammalian cells to double-stranded RNA. Cytokine Growth Factor Rev. 18, 363–371 (2007). 75. Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nature Genet. 39, 1278–1284 (2007). 76. Low, J. T. et al. SHAPE-directed discovery of potent shRNA inhibitors of HIV‑1. Mol. Ther. 20, 820–828 (2012). 77. Bartel, D. P. MicroRNAs: target recognition and regulatory functions. Cell 136, 215–233 (2009). 78. Ruby, J. G. et al. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res. 17, 1850–1864 (2007). 79. Dai, X., Zhuang, Z. & Zhao, P. X. Computational analysis of miRNA targets in plants: current status and challenges. Brief. Bioinform. 12, 115–121 (2011). 80. Addo-Quaye, C., Eshoo, T. W., Bartel, D. P. & Axtell, M. J. Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome. Curr. Biol. 18, 758–762 (2008). 81. German, M. A. et al. Global identification of microRNA–target RNA pairs by parallel analysis of RNA ends. Nature Biotech. 26, 941–946 (2008). 82. Gregory, B. D. et al. A link between RNA metabolism and silencing affecting Arabidopsis development. Dev. Cell 14, 854–866 (2008). 83. Willmann, M. R., Berkowitz, N. D. & Gregory, B. D. Improved genome-wide mapping of uncapped and cleaved transcripts in eukaryotes—GMUCT 2.0. Methods http://dx.doi.org/10.1016/j. ymeth.2013.07.003 (2013). 84. Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS–CLIP decodes microRNA–mRNA interaction maps. Nature 479, 479–486 (2009). 85. Zisoulis, D. G. et al. Comprehensive discovery of endogenous Argonaute binding sites in Caenorhabditis elegans. Nature Struct. Mol. Biol. 17, 173–179 (2010). 86. Yang, J. et al. starBase: a database for exploring microRNA–mRNA interaction maps from Argonaute CLIP–seq and degradome–seq data. Nucleic Acids Res. 39, D202–209 (2011). 87. Li, J., Liu, S., Zhou, H., Qu, L. & Yang, J. starBase v2.0: decoding miRNA–ceRNA, miRNA–ncRNA and protein–RNA interaction networks from large-scale CLIP–seq data. Nucleic Acids Res. 42, D92–D97 (2014). 88. Wells, S. E., Hughes, J. M., Igel, A. H. & Ares, M. Jr. Use of dimethyl sulfate to probe RNA structure in vivo. Methods Enzymol. 318, 479–493 (2000). 89. Ziehler, W. A. & Engelke, D. R. in Current Protocols in Nucleic Acid Chemistry Unit 6.1 (Wiley, 2001). 90. Spitale, R. C. et al. RNA SHAPE analysis in living cells. Nature Chem. Biol. 9, 18–20 (2012).

NATURE REVIEWS | GENETICS

91. Tyrrell, J., McGinnis, J. L., Weeks, K. M. & Pielak, G. J. The cellular environment stabilizes adenine riboswitch RNA structure. Biochemistry 52, 8777–8785 (2013). 92. Kwok, C. K., Ding, Y., Tang, Y., Assmann, S. M. & Bevilacqua, P. C. Determination of in vivo RNA structure in low-abundance transcripts. Nature Commun. 4, 2971 (2013). 93. Tijerina, P., Mohr, S. & Russell, R. DMS footprinting of structured RNAs and RNA–protein complexes. Nature Protoc. 2, 2608–2623 (2007). 94. Halvorsen, M., Martin, J. S., Broadaway, S. & Laederach, A. Disease-associated mutations that alter the RNA structural ensemble. PLoS Genet. 6, e1001074 (2010). 95. Stoddard, C. D. et al. Free state conformational sampling of the SAM‑I riboswitch aptamer domain. Structure 18, 787–797 (2010). 96. Fang, X. et al. An unusual topological structure of the HIV‑1 rev response element. Cell 155, 594–605 (2013). 97. Kladwang, W. & VanLang, C. C., Cordero, P. & Das, R. A two-dimensional mutate-and-map strategy for non-coding RNA structure. Nature Chem. 3, 954–962 (2011). 98. Kielpinski, L. J. & Vinther, J. Massive parallelsequencing-based hydroxyl radical probing of RNA accessibility. Nucleic Acids Res. 42, e70 (2014). 99. Mercer, T. R. & Mattick, J. S. Structure and function of long noncoding RNAs in epigenetic regulation. Nature Struct. Mol. Biol. 20, 300–307 (2013). 100. Novikova, I. V., Hennelly, S. P., Tung, C. & Sanbonmatsu, K. Y. Rise of the RNA machines: exploring the structure of long non-coding RNAs. J. Mol. Biol. 425, 3731–3746 (2013). 101. Clote, P. Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. RNA 11, 578–591 (2005). 102. Aviran, S. et al. Modeling and automation of sequencing-based characterization of RNA structure. Proc. Natl Acad. Sci. 108, 11069–11074 (2011). 103. Kelly, B. N. et al. Implications for viral capsid assembly from crystal structures of HIV‑1 Gag1–278 and CAN133–278. Biochemistry 45, 11257–11266 (2006). 104. Worthylake, D. K., Wang, H., Yoo, S., Sundquist, W. I. & Hill, C. P. Structures of the HIV‑1 capsid protein dimerization domain at 2.6 Å resolution. Acta Crystallogr. D Biol. Crystallogr. 55, 85–92 (1999).

Acknowledgements

The authors apologize to colleagues whose work was not cited owing to space limitation. They thank Y. Bai, R. Wilson, S. Floor, M. Hammond and members of J.A.D.'s laboratory for discussions; K. Weeks for sharing HIV-1 SHAPE data; and J. Ji for reading the manuscript. This work was supported in part by a grant from the US National Institutes of Health (to J.A.D.). J.A.D. is a Howard Hughes Medical Institute Investigator.

Competing interests statement

The authors declare no competing interests.

ADVANCE ONLINE PUBLICATION | 11 © 2014 Macmillan Publishers Limited. All rights reserved

Insights into RNA structure and function from genome-wide studies.

A comprehensive understanding of RNA structure will provide fundamental insights into the cellular function of both coding and non-coding RNAs. Althou...
1MB Sizes 2 Downloads 4 Views