MBE Advance Access published November 15, 2013

Matching of Soulmates: Coevolution of snoRNAs and Their Targets Stephanie Kehr,*,1 Sebastian Bartschat,1 Hakim Tafer,1 Peter F. Stadler,1,2,3,4,5,6 and Jana Hertel1 1

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany 2 Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany 3 RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany 4 Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria 5 Center for Non-Coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark 6 The Santa Fe Institute *Corresponding author: E-mail: [email protected]. Associate editor: Ryan Hernandez

Ribosomal and small nuclear RNAs (snRNAs) comprise numerous modified nucleotides. The modification patterns are retained during evolution, making it even possible to project them from yeast onto human. The stringent conservation of modification sites and the slow evolution of rRNAs and snRNAs contradicts the rapid evolution of small nucleolar RNA (snoRNA) sequences. To explain this discrepancy, we investigated the coevolution of snoRNAs and their targeted sites throughout vertebrates. To measure and evaluate the conservation of RNA-RNA interactions, we defined the interaction conservation index (ICI). It combines the quality of individual interaction with the scope of its conservation in a set of species and serves as an efficient measure to evaluate the conservation of the interaction of snoRNA and target. We show that functions of homologous snoRNAs are evolutionarily stable, thus, members of the same snoRNA family guide equivalent modifications. The conservation of snoRNA sequences is high at target binding regions while the remaining sequence varies significantly. In addition to elucidating principles of correlated evolution, we were able, with the help of the ICI measure, to assign functions to previously orphan snoRNAs and to associate snoRNAs as partners to known chemical modifications unassigned to a given snoRNA. Furthermore, we used predictions of snoRNA functions in conjunction with sequence conservation to identify distant homologies. Because of the high overall entropy of snoRNA sequences, such relationships are hard to detect by means of sequence homology search methods alone. Key words: target prediction, RNA-RNA interaction, ICI.

Small nucleolar RNAs (snoRNAs) compose a special class of noncoding RNA genes responsible for guiding protein complexes to specific positions in other RNA molecules. Those snoRNPs introduce chemical modifications of single residues in the target RNA. Precise targeting is achieved by means of sequence complementarity, while the protein components stabilize the snoRNPs and catalyze the chemical reactions (Bachellerie et al. 2002). There are two main types of snoRNAs: box H/ACA and box C/D snoRNAs. While the first guide pseudouridylation, the latter direct 20 -O-ribose methylation of any kind of nucleotide. Both modifications are introduced concurrently or immediately after transcription of the rRNA operon, before cleavage of the 45S rRNA (primary transcription product of the rRNA operons). These modifications are essential for maturation of rRNAs. The special snoRNAs U3, U8, U14, U17, and U22 direct cleavage steps of the 45S rRNA rather than chemical modifications (Atzorn et al. 2004). Small Cajal body-specific RNAs (scaRNAs) constitute a third type of snoRNA. These snoRNAs guide the modification

of small nuclear RNAs (snRNAs) in the Cajal body of eukaryotic cells (Darzacq et al. 2002). Some scaRNAs combine features of both snoRNA classes, see Marz et al. (2011) for more details on these atypical RNAs. Beyond these regular functions, there exist snoRNAs with different tasks. The brain specific box C/D snoRNA SNORD115 (HBII-52), for example, has been reported to affect editing and/or alternative splicing of mRNAs in human and mouse (Kishore and Stamm 2006; Doe et al. 2009; Kishore et al. 2010; Soneo et al. 2010). A box H/ACA snoRNA domain has been detected at the 30 -end of vertebrate telomerase RNAs (Mitchell et al. 1999). Taft et al. (2009) and Ender et al. (2008) proved that some snoRNAs may be processed into smaller RNAs, so-called sdRNAs, that act in a microRNA-like fashion in posttranscriptional gene silencing. In fact, most box H/ACA snoRNA snoRNAs, but not box C/D snoRNA snoRNA, are substrates for Dicer (Langenberger et al. 2012). Despite this diversity of snoRNA function, we limit our attention here to the common modification tasks, pseudouridylation, and 20 -O-ribose methylation.

ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]

Mol. Biol. Evol. doi:10.1093/molbev/mst209 Advance Access publication October 24, 2013

1

Article

Introduction

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

Abstract

Kehr et al. . doi:10.1093/molbev/mst209

target RNA that pairs with the fifth base of the ASE is methylated. For box H/ACA snoRNAs, the situation is more complex. Here the ASE, which has a total length of 6–20 nt, is bipartite and located in an interior loop of the hairpin(s), the pseudouridylation pocket. The modified uridine is anchored, unpaired, underneath the upper stem, central to the two parts of the ASE (Ganot et al. 1997). Again, only a few mismatches but no bulges are tolerated. Efficient software tools to predict targets based on these principles are PLEXY (Kehr et al. 2011) for box C/D snoRNAs and RNAsnoop (Tafer et al. 2010) for box H/ACA snoRNAs. Nevertheless, there are still about 40 orphan snoRNA families without known or predicted target sites (Bachellerie et al. 2002). The stringent conservation of modification sites as well as the slow evolution of rRNAs and snRNAs are at odds with the rapid evolution of snoRNA sequences (Weber 2006; Luo and Li 2007; Schmitz et al. 2008; Hoeppner and Poole 2012). To better understand this apparent contradiction, we investigate here the coevolution of snoRNAs with their target sites across vertebrates in detail. Thus, we clarify

A

B

Fig. 1. (A) In box C/D snoRNAs a short terminal helix encloses a large unstructured loop. Box motifs C and D and variant copies C0 and D0 are located within the loop. Target RNAs form small duplexes with the ASEs adjacent to D and D0 box, respectively. The “core region” of the interaction ranges from 3rd to 11th position. (B) Box H/ACA snoRNAs have a hairpin-hinge–hairpin-tail architecture, with box H located in the hinge region, and box ACA in the tail. Target RNAs bind into the pseudouridylation pockets forming small duplexes with the bipartite ASE.

2

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

The positions of chemical modifications in rRNAs and snRNAs are evolutionary highly conserved. For example, all modified residues of the large subunit of rRNA (LSU) in human can be aligned to the respectively modified residues of LSU in mouse (Ofengand and Bakin 1997). Many of these sites are even conserved in yeast (Saccharomyces cerevisiae) (Maden 1986, 1996). A correspondence of the (predicted) target sites was used successfully to identify box C/D snoRNAs in the fruitfly genome (Accardo et al. 2004). Most known modified sites in rRNAs and snRNAs (101 of 113 pseudouridines and 112 of 130 methylations) correspond to at least one known snoRNA (Lestrade and Weber 2006). snoRNAs form characteristic secondary structures and encode antisense elements (ASEs) at specific positions. These are responsible for the recognition of the target (see fig. 1). In case of box C/D snoRNAs, a target sequence can directly be inferred from the ASE, adjacent to the 50 -end of the box D sequence element(s). The snoRNA/target duplex has a length of 7–20 nt, no bulges, and may only be interrupted by a few mismatches (Chen et al. 2007). The nucleotide in the

MBE

Coevolution of SnoRNAs and Their Targets . doi:10.1093/molbev/mst209

whether snoRNAs are stable interaction partners for a certain modification or whether a changeover of the RNA guide is the common pattern in vertebrates.

New Approaches To study conservation of interaction, we started by computing targets for each individual snoRNA sequence (see Materials and Methods), and subsequently, we evaluated their conservation in other vertebrate species. To formally investigate the conservation, we develop here the interaction conservation index (ICI).

Interaction Conservation Index

"ðt, s, kÞ ¼

min Emfe ðx, yt,k Þ,

ð1Þ

x2Xðt, s, kÞ

where Emfe ðx, yt,k Þ is the minimum free energy (mfe) of the interaction between the snoRNA x and the sequence yt, k centered at the target t in species k, computed as minimum free energy of the interaction given by PLEXY or RNAsnoop. Hence, if we have more than one paralog in the same family s, we consider only the one with the best interaction energy. The average predicted P interaction energy for target t in  kÞ ¼ s2Sðt, kÞ "ðt, s, kÞ= j Sðt, kÞ j , while the species k is "ðt, average interaction energy of snoRNA s with all its putative P ^ kÞ ¼ t2Tðs, kÞ "ðt, s, kÞ= jTðs, kÞj . targets t in species k is "ðs, Averaging over all species in which the interaction is predicted, we introduce the two normalized parameters  kÞ, icimod ðt, s, kÞ ¼ "ðt, s, kÞ="ðt, ^ kÞ, icisno ðt, s, kÞ ¼ "ðt, s, kÞ="ðs,

ð2Þ

For k 2 = Oðt, sÞ, these energy-based scores are not defined because no member of family s interacts with target t in species k. To summarize these data over all species, we define the ICIs P  kÞ ICImod ðt, sÞ ¼ "ðt, s, kÞ="ðt, k2Oðt, sÞ P ð3Þ ^ kÞ "ðt, s, kÞ="ðs, ICIsno ðt, sÞ ¼ k2Oðt, sÞ

for the modification (target) and the snoRNA, respectively. Both scores measure how much better s fits the target t compared with the predicted alternatives for which an interaction is also feasible. Large values of ICIðt, sÞ 1 suggest that

t is consistently a target of snoRNA family s. The parameter ICImod ðt, sÞ emphasizes the conservation of the modification site, while ICIsno ðt, sÞ emphasizes the conservation of the snoRNAs ASE.

Results Target Prediction RNAsnoop returned 59 known human box H/ACA snoRNA interactions although it has been trained on yeast only. Deactivation of the yeast model and scoring predictions based on interaction energy only, yielded 86 of the 112 known interactions, including all those predicted with the yeast model. Surprisingly, the recovery of known human box C/D snoRNA targets performed better without the use of accessibility information. While considering the internal structure of the target RNA, 103 of the 115 known interactions were recovered, and neglection of accessibility information recovered 111. This is in agreement with the observation that accessibility around methylated site does, in contrast to pseudouridylated residues, not significantly differ from the average accessibility of nucleotides in the ribosomal RNAs (data not shown). Subsequently, predicted pseudouridylated and methylated positions were mapped to the corresponding columns of the target RNA alignments.

Conservation of Known Modifications Vertebrate rRNAs have a highly conserved modification pattern and a high level of sequence conservation in the vicinity of the modified nucleotides. By contrast, snoRNAs are famous for their overall high-sequence entropy. Nevertheless, this study confirms that in general the same modifications are guided by the same snoRNA families (Hoeppner and Poole 2012). In fact, together with the functional sequence boxes, the ASEs form the regions of strongest sequence conservation within the snoRNAs, disclosing coevolution of snoRNAs and their targets. Furthermore, the interactions are maintained by compensatory mutations within the snoRNA sequences preserving the base pairing. Known Human Interactions First, we investigated conservation of the known human interactions (data retrieved from snoRNA-LBME-db [snoRNAbase, https://www-snorna.biotoul.fr//, last accessed November 7, 2013]). Therefore, we evaluated the individual target predictions from all investigated species using the new ICI scores (see Materials and Methods). We recovered 87% of the interactions as conserved within vertebrate species with known snoRNA and target RNA sequences. In 18S rRNA, all 35 reported human interactions with box C/D snoRNAs are conserved at least in Eutheria (fig. 2). For 18S rRNA and box H/ACA snoRNAs, we find 31 of the 39 reported human interactions to be evolutionarily conserved at least in mammals. Five s in 18S rRNA are reported to have two matching box H/ACA guides. We predicted conserved function of both guides only in one case. For the remaining doubly guided modifications, only a single conserved snoRNA guide (supplementary fig. S1.2) was identified. 3

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

A target t specifies a particular column in the alignments of the target sequences. A snoRNA family s is defined by homology, that is, it may contain more than one paralog. We denote by Xðt, s, kÞ the set of all snoRNAs from family s in species k that are predicted to target t in species k. Furthermore, we write Sðt, kÞ for the set of snoRNA families predicted to target t in species k, that is, Sðt, kÞ ¼ ½s j Xðt, s, kÞ 6¼ ;. Similarly, Oðt, sÞ ¼ ½k j Xðt, s, kÞ 6¼ ; denotes the set of species in which family s has a representative that targets t, and Tðs, kÞ ¼ ½t j Xðt, s, kÞ 6¼ ; is the set of targets of the snoRNA family s in species k. The interaction is scored at the level of families

MBE

Kehr et al. . doi:10.1093/molbev/mst209

MBE

Figure 2 shows the conservation pattern of interactions between box C/D snoRNA guides and methylated sites in the 18S rRNA. The header row lists the abbreviated species names of investigated vertebrates. The subsequent rows provide detailed information about the interactions of a modification site (first column) and certain snoRNAs (2nd last column). For each target site, 3 snoRNAs are shown. These are ordered by their ICImod scores (last column). The color intensity of each field is correlated to the predicted minimum free energy of the individual interaction. A field is crossed out if the snoRNA (from lower left to upper right) or rRNA sequence (from upper left to lower right) is not available for the species. For empty white fields, no interaction was 4

predicted. The rows are ordered according to the range of conservation, that is, interactions with stable partners in all vertebrates appear in higher rows than interactions conserved only within mammals. In general, we observe that once an snoRNA family has occurred, its function is conserved. This results in three main groups. The first group of snoRNAs emerged at the root of vertebrates and accordingly its function is conserved in all Vertebrata, the second group of interactions appeared at within Teleostomi, and the third main group arose in Amniota. Supplementary figures S1.2–S1.6 analogously show conservation of box C/D snoRNA-, box H/ACA snoRNA-, and scaRNA- interactions with 18S rRNA, 28S rRNA, 5.8 rRNA, and snRNAs, respectively.

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

Fig. 2. Interaction conservation of box C/D snoRNA and targets in 18S rRNA. For each target t  3, snoRNA families are displayed in different colors. Interaction energies "ðt, s, kÞ determine the saturation of the color. ASEs of both D and the D0 box are considered. Crossed-out fields indicate a missing part of the rRNA (from upper left to lower right) and a missing snoRNA (from lower left to upper right). Empty (white) fields indicate that no interaction was predicted. For details see text.

MBE

Coevolution of SnoRNAs and Their Targets . doi:10.1093/molbev/mst209

ICI Scores The distributions for known human interactions are given in the left column of figure 3. The median ICImod values for box C/D snoRNAs interactions with the small subunit (SSU), large subunit (LSU), and snRNAs are 1.04, 0.81, and 0.85, respectively. In analogy, median ICIsno values are 1.64, 1.14, and 1.13. Box H/ACA snoRNAs interacting with SSU, LSU, and snRNAs yield median ICImod scores of 1.07, 0.73, and 0.75, respectively. Here, median ICIsno values are 1.08, 0.69, and 0.89 (table 1). We missed only one of 98 interaction between box C/D snoRNAs and ribosomal RNAs and 2 of 18 with snRNAs. Because of a more complicated interaction structure of box H/ACA snoRNAs and target RNAs, RNAsnoop is less sensitive. Missed interactions (15 of 93) explain the zero peaks in box H/ACA snoRNA concerning ICI curves (solid lines).

For all interactions with targets on 18S rRNA, the scores are >1, displaying high conservation of the interactions. Because of lower alignment quality and therefore putative unaligned modification sites, we obtain lower average ICI scores for 28S rRNA interactions. This emphasizes the importance of high-quality alignments for reliable statements on targets and target conservation. For many target sites, more than one snoRNA family is predicted to have an appropriate ASE to interact. Comparing the ICImod scores for these families show that there is almost always a single dominating family whose ICImod scores can clearly be discriminated from those of alternative predictions, see figure 3 right part. The ICIsno scores show a similar behavior. SNORD68 and 18S-484 SNORD68 and 18S-484 is a good example for a conserved interaction. The region containing the methylated uridine at 18S-484 (corresponding alignment position to 18S-428 in the human sequence) and the D0 -ASE of SNORD68 comprises Table 1. Median ICI Values for the Interactions Listed in snoRNAbase. Score ICImod

rRNA 18S 28S snRNAs

All 1.05 0.76 0.81

C/D 1.04 0.81 0.85

H/ACA 1.07 0.73 0.75

ICIsno

18S 28S snRNAs

1.23 0.90 0.91

1.64 1.14 1.13

1.08 0.69 0.89

Fig. 3. ICImod and ICIsno scores. The plots separately show the values of both types of ICI score (sno and mod) and both types of snoRNA (C/D and H/ ACA) on the putative targets 18S rRNA, 28S rRNA, and snRNAs. The left part shows the density distribution of the ICI scores. The right part shows a comparison of the best three snoRNA families according to ICI scores for each target site.

5

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

For LSU (28S and 5.8S rRNAs), we found 61 of 62 human box C/D snoRNA interactions and 47 of 54 box H/ACA snoRNA interactions conserved in vertebrates (supplementary figs. S1.3, S1.4, and S1.5). In LSU for one of two doubly guided 20 -O-ribose methylations and three of four s, only a single guide was found to be conserved. One position (alignment site: 28S-4754, corresponding to human 28S-3797) is reported as methylated and pseudouridylated. Interestingly, high ICI scores agree with both interactions. In snRNAs, we recover 16 of 18 box C/D snoRNA interactions and 17 of the 20 box H/ACA snoRNA interactions (supplementary fig. S1.6). Two s and two methylations are reported to have two interacting scaRNAs. We recovered both guides for the methylated residues but only one guiding scaRNA for both pseudouridylations.

MBE

Kehr et al. . doi:10.1093/molbev/mst209

H_sapiens-1 P_troglodytes-1 G_gorilla-2 P_pygmaeus-1 C_jacchus-1 M_murinus-1 R_norvegicus-1 M_musculus-1 S_tridecemlineatus-1 C_porcellus-1 O_cuniculus-1 F_catus-1 C_familiaris-1 A_melanoleuca-1 B_taurus-1 E_caballus-1 M_lucifugus-1 P_vampyrus-1 E_europaeus-1 L_africana-1 E_telfairi-1 D_novemcinctus-2 M_domestica-1 O_anatinus-1 A_carolinensis-1 M_gallopavo-1 G_gallus-1 X_tropicalis-1 L_chalumnae-2 T_nigroviridis-1 G_aculeatus-1 O_latipes-1 D_rerio-3 P_marinus-1 *O_garnetti-1

0.09 and 0.07, and low average "ðt, s, kÞ of 9.76 and 10:2 imply low stability and weak conservation of these interactions, although the snoRNA families are present in Eutheria and Amniota, respectively. Furthermore, SNORD68 is a doubly guiding box C/D snoRNA. The ASE upstream of the D-box is complementary to alignment site 28S-3267. This interaction is well conserved in vertebrates. An ICI value of 0.94 and an average mfe of 20.01 indicate that this interaction is very stable. Special Cases We observed several modifications for which at some point in evolution a second snoRNA guide occurs and is retained. A feasible explanation may be differential snoRNA expression so that a back up of the modification under different cellular conditions is necessary. We found several examples supporting this hypothesis. Pseudouridylation of site 28S-5501 (corresponding to human 28S-4491) has been reported to be guided by SNORA10. With the snoStrip, pipeline homologs have been identified in supraprimates, carnivores, cow, and horse. For four of five species, where snoRNA and LSU sequences are available, the interaction is predicted with ICImod ¼ 0:73. As additional guide matching 28S-5501, we found SNORA63. The interaction is conserved throughout vertebrates with ICImod ¼ 0:81. The two snoRNAs are encoded in introns (sense direction) of RPS2 (ribosomal

...... CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCGCUGU GGAUUCUGGAGAGGG&CAUUCUCCGGAAUCGCUGU CGAUUCUGGAGAGGG&CAUUCUCCGGAAUCGCUGU CCAUUCUGGAGAGGG&CAUUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&CAGUCUCCGGAAUCGCUGC CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&AAUUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&CAGUCUCCGGAAUCGCUGC CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCGCUGC CGAUUCCGGAGAGGG&CAGUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&CAGUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCGCUGC CGAUUCCGGAGAGGG&CAGUCUCCGGAAUCUCUGC CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&AAUUCUCCGGAAUCGCUGC CGAUUCCGGAGAGGG&CAGUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&CAAUCUCCGGAAUCGCUGC CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCACUGU CGAUUCCGGAGAGGG&AAUUCUCCGGAAUCUCUGU CGAUUCCGGAGAGGG&CAUUCUCCGGAAUCUCUGU CGAUUCCGGAGAGGG&AAUUCUCCGGAAUCUCUGU CGAUUCCGGAGAGGG&AAGUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&AAGUCUCCGGAAUCGCUGU CGAUUCCGGAGAGGG&CUUUCUCCGGAAUCUCUGC CGAUUCCGGAGAGGG&CUUUCUCCGGAAUCUCUGU CGAUUCCGGAGAGGG&AAAUCUCCGGAAUCUCUGU CGAUUCCGGGAGAGG&AAAACUCCGGAAUCUCUGU CGAUUCCGGAGAGGG&ACAUCUCCGGAAUCUCUGU CGAUUCCGGAGAGGG&CUUUCUCCGGAAUCUCUGC CGAUUCCGGAGAGGG&GCUUCUCCGGAAUCUCUGU CGAUUCCGGAGAGGG&CAUUGUCUGAAAUUGCUGA |---M--rRNA---|&|--ASE-SNORD68-|D’|

-22.3 -19.3 -20.1 -15.1 -22.3 -21.6 -22.3 -22.3 -22.3 -22.3 -21.6 -22.3 -21.6 -21.6 -22.3 -20.6 -22.3 -22.3 -22.3 -21.6 -21.7 -21.8 -21.3 -21.3 -21.3 -21.6 -21.6 -24.9 -24.9 -20.7 -15.1 -20.8 -24.6 -24.3 NA

Fig. 4. Conservation of the interaction between the region upstream of D0 -box of snoRNA family SNORD68 (right side) and the region around the 20 O-methylated uridine at alignment column 484 (left side). Target RNA segment and ASE are separated by &. The methylated residue is marked with M in the last row. The first row provides information about the structure of the RNA-RNA interaction. Intermolecular base pairs are indicated by arrow brackets. The blue box marks the D0 -box. Red and green columns highlight conservation of the RNA-RNA interaction. Completely conserved base pairs are shown in red. Green columns mark base pairs with compensatory mutations. Lighter colors indicate loss of base pairs in individual species. The gray bars at the bottom correspond to the degree of sequence conservation computed by RNAalifold (Bernhart et al. 2008). mfes for the interactions ["ðt, s, kÞ] as predicted by PLEXY are provided in the last column. In O. garnettii no interaction can occur due to additional mutations in the core region; therefore, it is marked with asterisk.

6

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

11 nt complementarity. The two interacting sequence segments are almost completely conserved from human to sea lamprey (figs. 2 and 4). Two alignment columns show mutations on the rRNA side, but these mutated nucleotides can still form G-U pairs with the corresponding snoRNA nucleotides. The snoRNA region not involved in the interaction display lower sequence conservation. The ICI reflects the good conservation of the interaction: ICImod ð18S  484, SNORD68 D0 Þ ¼ 1:03. The average individual mfe "ðt, s, kÞ of the interaction is 21:6 kcal=mol. In macaques, no interaction could have been predicted because the rRNA sequence of the crucial segment is missing. For elephant, shark, duck, zebra finch, wallaby, shrew, pig, kangaroo rat, and tree shrew, no SNORD68 homolog has been detected with snoStrip. In bush baby, no interaction with 18S-484 was predicted with PLEXY although both the rRNA sequence and the snoRNA homolog are present. Though the ASE of the snoRNA comprises four mutations, two of which do not disrupt base pairing at the respective positions but result in wobble pairs. Nevertheless, the other two mutations lead to mismatches within the “core region” of the interaction, wherein only one mismatch is tolerated according to Chen et al. (2007). Two other snoRNA ASEs show a certain amount of complementarity to the 18S-484 target region in chimp, gorilla, and orangutan. Our low ICI values

MBE

Coevolution of SnoRNAs and Their Targets . doi:10.1093/molbev/mst209

Case Studies We used the ICI score to investigate conservation of the few published experimentally verified interactions between an snoRNA and its guided modification. Human Experimentally Verified Interactions Xiao et al. (2009) tested 16 predicted interactions of box H/ACA snoRNAs and s in human rRNAs. Although 12 have been verified, four predictions have been rejected in their study. We measured the conservation throughout vertebrates of all 16 interactions using our ICI score (see table 2). High ICI values agree with the experimentally verified interactions. Two of the four negative results are not conserved at all so that no ICI could be computed. Of the remaining two, one cannot be conclusively resolved by our method, leaving a single case where our predictions disagree with the experimental results. Because of high sequence similarity, snoRNA families SNORA50 and SNORA76 as well as SNORA42 and SNORA80, respectively, have been merged into one family each in our snoRNA data set by snoStrip, denoted SNORA50-76 and SNORA42-80 in following (see supplementary material S3.1 for the alignments). In terms of guiding potential, we treat snoRNA families as entities and do not distinguish between guiding potential of single paralogous sequences. This is not problematic for family SNORA50-76, where both sequences have been verified to guide 18S-46 (human 18S-34) and 18S-118 (human 18S105) in human by Xiao et al. (2009). In the second case, however, SNORA42 and SNORA80 both have been verified to guide 18S-122 (human 18S-109), but only SNORA80 interacts with 18S-634 (human 18S-572) while SNORA42 does not. The resolution of our analysis correctly predicts that family SNORA42-80 targets 18S-572. The confirmed interaction between SNORA24 and modification corresponding to 18S-961 (human 18S-863) yields a very low value of ICImod ¼ 0:38. Nevertheless, SNORA24 homologs are predicted to interact with the verified target in elephant, opossum, armadillo, wallaby, kangaroo rat, black

Table 2. Comparison of ICImod Values with Experimentally Tested Interactions between Box H/ACA snoRNAs and Ribosomal RNAs.

Fig. 5. Expression of SNORA10 and SNORA63 host genes EIF4A2 and RPS2 is shown at different developmental stages (left) and in different organism parts (right). Arrows above the bars indicate down and up regulation, respectively. Figures are taken from the Gene Expression Atlas.

Alignment 18S-46 18S-118 18S-122 18S-634 18S-673 18S-908 18S-961 18S-964

Human 18S-34 18S-105 18S-109 18S-572 18S-609 18S-815 18S-863 18S-866

Guiding Families SNORA50,SNORA76 SNORA50,SNORA76 SNORA80,SNORA42 SNORA80,SNORA42 SNORA24 SNORA28 SNORA24,SNORA19 SNORA28,SNORA19

Verified +, + +, + +, + +,   + +,  +, 

28S-4573 28S-4665

28S-3618 28S-3709

SNORA19 SNORA19

+ +

ICImod 1.47 1.33 1.47 1.11  0.93 0.38, 0.86 0.89,  0.76 0.58

NOTE.—The SNORA50 and SNORA76 as well as SNORA80 and SNORA42 are paralogs and hence members of the same family in our survey. + , interaction verified; , interaction rejected.

7

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

protein 2) and EIF4A2 (eukaryotic translation initiation factor), respectively. The expression of these proteins in human is antagonistically up- and downregulated in fetus and adult state as well as in blood and umbilical cord tissue (according to data from the Gene Expression Atlas (Kapushesky et al. 2012) E-GEOD-6236 (Goh et al. 2007), http://www.ebi.ac.uk/gxa/experiment/E-GEOD-6236/, last accessed November 7, 2013; see fig. 5). As a consequence, the expression patterns of these host genes ensure that at least one of the two snoRNAs is present in the different tissues and for different developmental stages. Another example of redundant guides is the modification of site 18S-761 (corresponding to human 18S-683) by SNORD19 and SNORD69. Both interactions are conserved in amniotes with ICI values of 1.41 and 0.7, respectively. Both snoRNAs reside in introns of guanine nucleotide binding protein-like 3 (GNL3). This protein comprises different isoforms. All transcripts include the intron encoding SNORD19 but at least one isoform ends in front of the intron hosting SNORD69. Thus, SNORD69 expression is regulated by alternative splicing of GNL3. Additional data are compiled in supplementary table S2. A comprehensive analysis of host gene expression patterns, however, will be presented elsewhere. Another exception from the general pattern of canonical snoRNA functionality is a changeover of the snoRNA guide for a single modification. Consider, for example, the methylation of 18S-1457 (corresponding to sequence position 1328 in human and 1286 in chicken). The site matches the 30 -ASE of the mammalian-specific SNORD32 family. In aves, on the other hand, the modification is addressed by the 50 -ASE of bird-specific snoRNA family GGgCD25. We discovered this interesting behavior by high-scoring ICIs of 1.1 and 1.08 for the same modification. Zemann et al. (2006) reported similar observations in nematodes.

MBE

Kehr et al. . doi:10.1093/molbev/mst209

rat, marmoset, zebra finch, turkey, chicken, lizard, and coelacanth with an average interaction energy of 26.96. SNORA19, however, is the best scoring interaction for the modification at 18S-961 (18S-863 in publication). Although this interaction seems to be conserved throughout vertebrates, it has been rejected by Xiao et al. (2009). In the remaining two cases (18S-673 [human 18S-609] and SNORA24 and 18S-964 [human 18S-866] and SNORA19), our analysis agrees with the published negative experimental results.

The Two Most Conserved Cs The two most conserved s are modified by SNORA74 (U19) at least in vertebrates. The modifications at alignment positions 28S-4697 and 28S-4699 (corresponding to human residues 28S-3741 and 28S-3743) are conserved even in bacteria. There, these modifications are produced by the specialized pseudouridine synthase RluD (Ofengand 2002; Ofengand and Bakin 1997). Both s are located in the decoding center, a central region of the ribosome contacting the SSU and the passing tRNAs. SNORA74 has an exceptional three-hairpin structure conserved from yeast to human (Badis et al. 2003). The computed ICImod values of 0.95 and 0.92 confirm the complementarity of corresponding alignment sites and ASEs in the 50 - and 30 hairpins in all vertebrates (see supplementary material S1.4 and S3.1)

New snoRNAs for Known Human Modifications Although most 20 -O-ribose methylations and pseudouridylations in human rRNAs can already be assigned to snoRNA guides, some cases do not match with a given snoRNA. We suggest that snoRNA families with above average ICImod values are most likely the unrecognized conserved snoRNA guides for these modifications. Thus, we predicted appropriate snoRNA guides for 8 of 21 rRNA sites and 2 of 10 unaccounted for U2 snRNA modifications. The results are summarized in table 3 and supplementary figure S.4.1.1 and illustrated in supplementary figures S4.1.2 and S4.1.3. For the rest of the modifications unassigned to known snoRNAs, we suggest that the vertebrate genomes still harbor a few undiscovered snoRNA families containing matching ASEs. 8

Alignment 18S-526 18S-756 18S-890 18S-1026 18S-1513

Human 18S-468 18S-681 18S-797 18S-918 18S-1383

Type Am  Cm  Am

snoRNA SNORD83 SNORA55 GGgCD20 SNORA61 SNORD30

ASE D HP2 D HP1 D0

ICImod 1.16 1.14 0.98 1.13 1.1

28S-2722 28S-4824 28S-5263

28S-1849 28S-3863 28S-4266

  

SNORA51 SNORA84 SNORA78

HP1 HP2 HP1

1.09 0.86 0.83

U2-15 U2-47

 Um

GGoACA7 GGgCD76

HP2 D

0.83 0.9

U2-23 U2-79

NOTE.—HP1 and HP2 stand for ASE in 50 -hairpin or 30 -hairpin of the box H/ACA snoRNA, respectively.

SSU rRNA For three methylations and two s in 18S with previously unassigned snoRNAs, we predict appropriate guides (supplementary fig. S4.1.2). The 30 hairpin of SNORA55 putatively guides pseudouridylation of 18S-756 (human 18S-681) in tetrapods and the 50 hairpin of SNORA61 pseudouridylation of 18S-1026 (human 18S-918) in tetrapods and coelacanth. The interactions have ICI scores of 1.14 and 1.13, respectively. We suppose that these hairpins are double guides, as both hairpins have alternative targets listed in snoRNAbase. These interactions are also widely conserved in amniotes and in tetrapods and coelacanth with ICImod values of 1.24 and 1.05, respectively). We observe a double guiding function of a target binding region not only in box H/ACA snoRNAs but also within the box C/D snoRNAfamily SNORD30. The reported target to the ASE adjacent to the D0 -box is 28S-4761 (28S-3804 in human). We support this interaction by conservation and suggest additional conserved guiding potential for the unassigned methylation of the cytosine corresponding to alignment column 18S-1513 (18S-1383 in human). For orphan methylation of the cytosine at 18S-890 (18S-797 in human), we detected conserved complementarity (ICImod ¼ 0:98) from coelacanth throughout tetrapods to chicken snoRNA GGgCD20. For this snoRNA family, no vertebrate homologs were known previously to the snoStrip search. At last, we identified an interaction between residue 18S-526 (18S-468 in human) and the orphan snoRNA family SNORD83. This interaction is conserved in Teleostomi and has not been reported earlier.

LSU rRNA We find three box H/ACA snoRNAs with ICImod value above average as putative guides for modified nucleotides with unknown guide in 28S (supplementary fig. S4.1.3). The first hairpin of SNORA78 is able to interact with the pseudouridylated residue at 28S-5263 (28S-4266 in human), while the other hairpin of this snoRNA is known to target position 28S5339 (28S-4331 in human). The fact that box H/ACA snoRNAs often guide nearby pseudouridylation enhances our prediction. The two other box H/ACA snoRNAs

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

Zebra Fish snoRNAs Zebra fish snoRNAs that are essential during embryonal development have recently been identified by Higa-Nakamine et al. (2012). Three methylated residues in the rRNAs are guided by snoRNA families SNORD44, SNORD78, and SNORD26, respectively. Our method identified all three interactions as conserved within vertebrates. The interaction between SNORD44 and site 18S-180 (18S-163 in publication) yields a high ICImod score of 1.04. SNORD78 guides 20 -O-ribose methylation of the according guanine 28S-5615 (28S-3745 in publication) in vertebrates with ICImod ¼ 0:85. We recover SNORD26 as a conserved guide for modification of adenosine at alignment position 28S-939 (28S-398 in publication) with ICImod ¼ 0:62. The lower value is probably explained by the lower quality of this alignment.

Table 3. Predictions for Known Modifications without Assigned snoRNA Guides.

Coevolution of SnoRNAs and Their Targets . doi:10.1093/molbev/mst209

interacting with the previously unknown guided s 28S-2722 (28S-1849 in human) and 28S-4824 (28S-3863 in human) are SNORA51 and SNORA84, respectively. Both families are reported as orphan snoRNAs, so far. SNORA51 is identified as a homolog of orphan chicken snoRNA GGoACA9 during snoStrip run, and our predicted interaction also turned out to be conserved in chicken.

U2 snRNA

Functions for Orphan snoRNAs The function of 41 human snoRNAs is still unknown. We used our ICIsno score to identify conserved complementarity of these orphan snoRNAs to rRNAs or snRNAs. In table 4, we summarize all predictions where at least one of the ICI scores is positive, as well as those where the predicted target site is known to be modified. The interactions are illustrated in supplementary materials S4.2.2 and S4.2.3.

Orphan Guides for Modifications without Matching snoRNA We identified four orphan snoRNAs as conserved guides for known modifications with previously unknown guides (GGg CD20 and 18S-890, SNORD83 and 18S-468, SNORA51 and 28S-2722, and SNORA84 and 28S-4824). These interactions were already described in the previous section.

Table 4. Predictions for Orphan snoRNA Families. snoRNA GGgCD20 SNORD109 SNORD116 SNORD125 SNORD83 SNORD86

ASE D D0 D0 D D D

Alignment 18S-890 28S-5424 18S-1286 18S-1623 18S-526 18S-1345

Human 18S-797 28S-4414 18S-1162 18S-1440 18S-468 18S-1219

Type Cm* m5 C C C Am* C

SNORA49

HP1 HP2 HP1 HP2

28S-3725 28S-3729 28S-2722 28S-4824

28S-2826 28S-2830 28S-1849 28S-3863

U U * *

SNORA51 SNORA84

ICImod 0.97 0.9 1.09 0.88 1.16 1.1

ICIsno 1.49 0.93 1.21 1.77 1.86 1.64

0.75 0.9 1.1 0.86

0.71 0.87 1.07 0.72

NOTE.—Known modifications are marked by asterisk. HP1 and HP2 stand for ASE in 50 -hairpin or 30 -hairpin of the box H/ACA snoRNA, respectively.

Orphan snoRNA targets m5 C Orphan SNORD109 has two paralogs in human encoded in introns of the paternally expressed SNURF-SNRNP locus. SNORD109 is expressed in brain and kidney and at lower levels in lung and muscle (Runte et al. 2001). Surprisingly, our analysis revealed that ten nucleotides conserved complementarity of the D0 -ASE to target site 28S-5424 (corresponding to human 28S-4414) with ICImod and ICIsno values of 0.9 and 0.93, respectively (supplementary figs. S3.2 and S4.2). This nucleotide is reported as 5-methylcytidine (m5 C) in 3D Ribosomal Modification Maps Database (http:// people.biochem.umass.edu/fournierlab/3dmodmap/humlsu2 dframes.php, last accessed November 7, 2013). The methylation of the nucleobase instead of the associated ribose is a chemical modification not normally associated with snoRNAs. It should be kept in mind, however, that also other SNURFSNRNP-snoRNA complexes, such as SNORD115, have been shown to exhibit noncanonical behavior.

Orphan snoRNAs and Intricate Structures Furthermore, we predict complementarities between stretches of ribosomal RNA without reported modifications and orphan snoRNAs. A careful examination of how differential expression of snoRNAs affects rRNA modification has not been carried out so far (Xue and Barna 2012). Hence, we cannot exclude the possibility of undetected modifications occurring only under certain conditions. High-scoring interactions are predicted for SNORD116 and 18S-1286 (human position 18S-1162) and SNORD86 and 18S1345 (human 18S-1219). The putatively targeted nucleotides are located next to pseudoknotted RNA stretches (according to 3D Ribosomal Modification Maps Database, http://people.biochem.umass.edu/fournierlab/3dmodmap/ humssu2dframes.php, last accessed November 7, 2013). Because modifications are capable of stabilizing intricate structures, our predictions are plausible.

Orphan snoRNA and Extensively Modified Regions High scores of ICImod ¼ 0:88 and ICIsno ¼ 1:77 are computed for the ASE upstream of the D-box of SNORD125 as a guide for modification of 18S-1623 (human 18S-1440). It is located in a functional region of the SSU, close to tRNA binding sites, where other reported modifications are proximal. For SNORA49, we predict targets for both hairpins. The putative modifications are 28S-3725 and 28S-3729 (human 28S-2826 and 28S-2830). They are located in a small helical structure with multiple reported modifications (http:// people.biochem.umass.edu/fournierlab/3dmodmap/humlsu2 dframes.php, last accessed November 7, 2013). A complete list of predictions of rRNA and snRNA targets for the orphan snoRNAs is provided in the supplementary table S4.2.1. The remaining snoRNAs that lack complementarity to rRNAs and snRNAs might have alternative functions, for example, cleavage of the primary rRNA transcript, targeting mRNA and altering their alternative splicing, or translational control in a miRNA-like fashion after being processed into smaller fragment (sdRNAs). 9

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

U2 snRNA is pseudouridylated at position 23 (U2-15 in human). We predict an interaction of this orphan  with snoRNAs that belong to chicken-annotated orphan GGoACA7 that is conserved throughout vertebrates with ICImod ¼ 0:83 (supplementary fig. S4.1.4). A high-scoring snoRNA family for unassigned 20 -O-ribose methylation at site 79 (corresponding to human 47) is chicken snoRNA GGgCD76. The chicken interaction has already been predicted by Shao et al. (2009) without respect to any conservation issues. With ICImod ¼ 0:9, this interaction is conserved in vertebrates since coelacanth.

MBE

MBE

Identification of Distant Homologs

and the known and predicted interactions were studied in various aspects. We introduced the ICI to measure snoRNA and target coevolution. Our ICI score combines thermodynamic stability of the RNA-RNA duplex with its evolutionary preservation. Evaluation of this measure on all known human interactions recovered ~87% (scaRNAs-snRNAs), 83% (box H/ACA snoRNAs-rRNAs), and nearly 100% (box C/D snoRNAs-rRNAs) as evolutionary conserved (at least) in Eutheria. The benefit of our ICI measure is further supported by consistently high values for experimentally verified interactions. The lack of published negative experimental results makes the estimation of false-positive rates infeasible, because we cannot exclude the functionality of high-scoring interactions. Surprisingly, complete 28S rRNA sequences are not available for many of the species investigated here. Results concerning snoRNA guides targeting the LSU therefore remain incomplete. Ribosomal RNA operons exist in tandem arrays of different sequence variants. Because of their repetitive nature, they are not included in genome assemblies. However, specific mutations in rRNAs may disable or enable snoRNA interactions. This may expand the complex system of snoRNA families with various slightly different paralogs modifying “special” nucleotides in slightly different rRNAs. Such variations may contribute to the biogenesis of specialized ribosomes, highly adaptive to certain cellular demands (Xue and Barna 2012). When rRNA operons are fully included in genome assemblies, studies like this can be completed. There have been assumptions that snoRNAs frequently change targets and have lineage-specific functions (Zemann et al. 2006), while other studies assume conserved functions for snoRNA families (Hoeppner and Poole 2012). For a number of reasons, previous studies were unable to investigate snoRNA function over time in such detail. They were based on a more limited set of species, and/or snoRNA sequences, were limited only to box C/D snoRNA interactions, and had no formal method to evaluate conservation of targets. Using the ICI, we traced back the evolutionary history of all snoRNA families and observed stable partnerships between snoRNAs and their associated target sites throughout vertebrates. This is at odds with the stringent conservation of modification sites and the slow evolution of rRNAs and snRNAs on the one hand, while snoRNAs show high variation in their sequences on the other. Closer inspection of the snoRNA sequences revealed high selective pressure on the ASEs preserving the complementarity to the target. Redundant guides and less common changeovers of guides could be resolved using our evaluation method. In individual cases, we found that redundant guides might be processed from host genes with anticorrelated expression profiles, suggesting how such redundant snoRNAs could evolutionarily be maintained. This is in agreement with the hypothesis of Hoeppner and Poole (2012), who suggest constrained drift during the evolution of snoRNAs: An ongoing mobility of snoRNA genes with occupation of genomic locations that maintain the adequate expression pattern.

Early studies into snoRNAs frequently used homologous target sites as an argument for the homology of the snoRNAs themselves. The chicken snoRNAs GGgCD3, GGg CD4, GGgCD14, GGgCD24, GGgCD29, GGgCD63, GGgCD64, and GGgCD66 reported by Shao et al. (2009), all members of the box C/D class, may serve as a good example for the validity of this approach. According to the Basic Local Alignment Search Tool (BLAST)-based homology search procedure implemented in snoStrip, they are specific to the avian lineage. Target prediction with PLEXY and their ICImod scores identified them as conserved guides for methylations of the target alignment positions 18S-129, 18S-134, 18S-653, 18S-1415, 18S-1892, 28S-5348, 28S-5371, and 28S5474 (the coordinates refer to the homologous nucleotides in the human RNAs). These positions are targeted by human snoRNA families SNORD42, SNORD4, SNORD62, SNORD110, SNORD43, SNORD60, SNORD1, and SNORD69, respectively. Homologs of these families were readily identified by snoStrip in other mammals but not in sauropsids. They can be combined into common alignments, and a detailed inspection shows that they are indeed homologs. (Alignments are provided in supplementary material S3.2.) Although most platypus snoRNA sequences from the study of Schmitz et al. (2008) could be merged to mammalian families during the snoStrip search procedure, some appeared as species specific. Analogous to the avian snoRNA families, platypus snoRNA sequences Oa1759, Oa2916, and Oa2126 could also be identified as homologs of the vertebrate snoRNA families SNORD110, SNORD96, and SNORD4 respectively. In case of box H/ACA snoRNAs, the inference of a evolutionary origin from functional homology was possible only for SNORA64 and GGgACA47, both guiding pseudouridylation of 28S-6029 (28S-4975 in human). Particularly divergent sequences were observed for the family containing human SNORD62, chicken GGgCD14, and platypus bOaCD62i. The alignment reveals a large deletion in aves lineage in the 30 -part of the snoRNA. The selective pressure is focused on the ASE downstream of the prime box to retain complementarity to the target in the SSU. The aves lineage has 12 nucleotides perfectly complementary to the region around the modified adenine 653 (18S-590 in human), while in mammals the interaction is with 14 nucleotides longer but comprises a mismatch at the eighth position. (Alignment and detailed figure of interaction is provided on the supplementary page S3.2.) Alignments of avian, platypus, and mammalian families can readily be combined by manual inspection and editing according to the list of correspondences in table 5, demonstrating that these families are indeed homologs.

Discussion Starting from known snoRNA sequences reported in chicken, platypus, and human, we compiled a data set of 723 snoRNAs categorized into 259 snoRNA families analyzing 47 vertebrate species. Target predictions were computed for all of them, 10

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

Kehr et al. . doi:10.1093/molbev/mst209

MBE

Coevolution of SnoRNAs and Their Targets . doi:10.1093/molbev/mst209

Table 5. Based on Conserved Targets, Distant Homologies Could be Identified between Chicken snoRNAs Thought to be Avian Specific, Platypus Sequences Thought to be Species Specific, and Mammalian snoRNA Families. RFAM RF00150

Chicken

Platypus

rRNA 18S

aln 129

hsa 116

gga 116

oan 95

18S

134

121

121

100

18S

653

590

551

594

18S

1415

1288

1246

1293

18S

1892

1705

1658

1706

5.8S

88

75

74

74

28S

5348

4340

3825

NA

28S 28S 28S

5371 5474 6029

4362 4464 5285

3847 3949 4282

NA NA NA

SNORD42

GGgCD3a

SNORD4

GGgCD4a,c

SNORD62

GGgCD14c

Oa1817a,b Oa2691a,b Oa2132a,b Oa2126a,b bOaCD62ia,b Oa2054a,b

SNORD110

GGgCD24

Oa1759

SNORD43

GGgCD29a,c

SNORD96

Oa2961

SNORD60

GGgCD63

SNORD1 SNORD69 SNORA64

GGgCD64a,c GGgCD66a GGgACA47a,c

Oa1765a,b Oa2470a,b

Modified Position in

NOTE.—Sequence ahomology identifiable by RFAM search, bpresent in RFAM alignment, and chomology confirmed by Makarova and Kramerov (2011). The third column provides a mapping of the alignment column (aln) and modified positions in the human (hsa), chicken (gga), and platypus (oan) rRNA sequences.

Hoeppner and Poole (2012) also suggest an artifactual split into separate snoRNA families through RNA family database (RFAM) covariance models. We agree with that idea and were even able to use conserved targets, identified by high ICI values to unravel unrecognized distant homologies. We applied the ICI to all putative modifications and all snoRNAs and could add new edges to the network of snoRNAs and the network of interactions. We merged snoRNA families, reported in distantly related organisms, based on sequence similarity and function. We further assigned guides to 10 of the 31 modifications with so-far unknown snoRNAs in rRNA and U2 snRNA, designated putative functions for 9 of 41 orphan snoRNAs. Thus, the ICI proved to be a very useful measure to match so far lonesome soulmates. Further, the method gives the opportunity to incorporate different aspects of snoRNA evolution. For example, as suggested by Hoeppner et al. (2009), one could investigate whether more mobile snoRNA genes might have adopted evolutionary younger functions or snoRNA mobility and conservation of snoRNA function are unrelated. We suspect that our approach can also be generalized to other types of RNA-RNA interactions such as prokaryotic mRNA-sRNA interactions.

Materials and Methods snoRNA Data Set We have collected a comprehensive set of 723 snoRNAs assigned to 259 different snoRNA families in 47 vertebrate species (supplementary fig. S5.1) employing the snoStrip pipeline (Bartschat et al. 2013). As queries, we have combined human sequences from snoRNAbase (Lestrade and Weber 2006), chicken snoRNAs reported by Shao et al. (2009), and platypus snoRNAs from Schmitz et al. (2008). Based on

sequence similarity, snoRNAs initially distinct in the start set have been identified as homologous during the search procedure. These have been merged into single families. Because the standard search method BLAST used by snoStrip has not been able to detect all members of the snoRNA families located within host genes GAS5 and U22HG, we have completed the search with GotohScan to reveal more distantly related genes. Curated muscle (Edgar 2004) alignments of the final snoRNA families are provided in the supplementary material S3.1 and S3.2.

Target Set of rRNAs and snRNAs The sequences of all parts of the rRNA operon have been collected for each of the investigated species as putative targets. In particular, available sequences for 18S, 5.8S, and 28S rRNA have been retrieved from the comprehensive ribosomal RNA databases SILVA (http://www.arb-silva.de/, last accessed November 7, 2013) (Pruesse et al. 2007), Ensembl (http://www.ensembl.org, last accessed November 7, 2013), and NCBI (National Center for Biotechnology Information) nucleotide (http://www.ncbi.nlm.nih.gov/nuccore, last accessed November 7, 2013) databases. Using the RNAmmer (Lagesen et al. 2007) approach, we have successfully identified some of the sequences not annotated in the databases so far. For human and chicken, we have used the same rRNA sequences as in Lestrade and Weber (2006) and Shao et al. (2009) for verification reasons. To determine homologous positions of known or predicted modification sites between different species, high-quality alignments of the vertebrate rRNAs are required. We have used RNAsalsa (Stocsits et al. 2009) to compute sequence structure alignments for 18S, 28S, and 5.8S sequences. The initial input alignments for RNAsalsa have been computed with muscle. Secondary structure information of human 18S and 28S rRNA sequences 11

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

RF00266 CL00053 RF00153 CL00068 RF00610 CL00076 RF00221 CL00059 RF00055 CL00072 RF00271 CL00066 RF00213 RF00574 RF00264 CL00041

Human

MBE

Kehr et al. . doi:10.1093/molbev/mst209

Target Prediction For each single snoRNA sequence of an snoRNA family, an independent target prediction was performed. The internal structure of the target RNA determines the accessibility of a prospective binding region and thus has a large influence on thermodynamics and kinetics of hybridization. These effects can be captured at the level of secondary structures by modified RNA folding algorithms such as RNAup (Mu¨ckstein et al. 2006) or intaRNA (Busch et al. 2008). These programs compute the probability P that a certain sequence interval is unpaired and hence provide the energy RN ln P necessary to make the interaction site available for binding. Here, we used RNAup to determine the opening energies for intervals of up to 30 nt for all rRNAs and snRNAs contained in our target 12

data set. Incorporating such accessibility information into RNAsnoop (Tafer et al. 2010) from Vienna RNA Package

1.7 improved the prediction of box H/ACA snoRNAs targets. Target prediction for box C/D snoRNAs was performed with PLEXY (Kehr et al. 2011). (The exact parameters for the program calls are given in the supplemental material S7.)

Acknowledgments This work was supported in part by the European Union under the auspices of the FP-7 QUANTOMICS project and by the Deutsche Foschungsgemeinschaft under the auspices of SPP1174 “Deep Metazoan Phylogeny” STA850/2-3 project.

References Accardo MC, Giordano E, Riccardo S, Digilio FA, Iazzetti G, Calogero RA, Furia M. 2004. A computational search for box C/D snoRNA genes in the D. melanogaster genome. Bioinformatics 20:3293–3301. Atzorn V, Fragapane P, Kiss T. 2004. U17/snR30 is a ubiquitous snoRNA with two conserved sequence motifs essential ribosome assembly. Cell 72:443–457. Bachellerie JP, Cavaill J, Hu¨ttenhofer A. 2002. The expanding snoRNA world. Biochimie 84:775–790. Badis G, Fromont-Racine M, Jacquier A. 2003. A snoRNA that guides the two most conserved pseudouridine modifications within rRNA confers a growth advantage in yeast. RNA 9:771–779. Bartschat S, Kehr S, Tafer H, Stadler PF, Hertel J. 2013. snoStrip: A snoRNA annotation pipeline. Bioinformatics. Advance Access published October 29, 2013, doi: 10.1093/bioinformatics/btt604. Bernhart SH, Hofacker IL, Will S, Gruber AR, Stadler PF. 2008. RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics 9:474. Busch A, Richter AS, Backofen R. 2008. IntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions. Bioinformatics 24:2849–2856. Cannone JJ, Subramanian S, Schnare MN, et al. (14 co-authors). 2002. The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3:2. Cantara WA, Crain PF, Rozenski J, McCloskey JA, Harris KA, Zhang X, Vendeix FA, Fabris D, Agris PF. 2011. The RNA modification database, RNAMDB: 2011 update. Nucleic Acids Res. 39: D195–D201. Chen CL, Perasso R, Qu LH, Amar L. 2007. Exploration of pairing constraints identifies a 9 base-pair core within box C/D snoRNA-rRNA duplexes. J Mol Biol. 369:771–783. Darzacq X, Ja´dy BE, Verheggen C, Kiss AM, Bertrand E, Kiss T. 2002. Cajal body-specific small nuclear RNAs: a novel class of 20 -O-methylation and pseudouridylation guide RNAs. EMBO J. 21:2746–2756. Doe CM, Relcovic D, Garfield AS, Dalley JW, Theobald DE, Humby T, Wilkinson LS, Isles AR. 2009. Loss of imprinted snoRNA mbii-52 leads to increased 5htr2c pre-RNA editing and altered 5HT2CRmediated behaviour. Hum Mol Genet. 18:2140–2148. Do¨nmez G, Hartmuth K, Lu¨hrmann R. 2004. Modified nucleotides at the 50 end of human U2 snRNA are required for spliceosomal E-complex formation. RNA 10:1925–1933. Edgar RC. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. Ender C, Krek A, Friedla¨nder MR, Beitzinger M, Weinmann L, Chen W, Pfeffer S, Rajewsky N, Meister G. 2008. A human snoRNA with microRNA-like functions. Mol Cell. 32:519–528. Ganot P, Bortolin ML, Kiss T. 1997. Site-specific pseudouridine formation in preribosomal RNA is guided by small nucleolar RNAs. Cell 89: 799–809.

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

(Cannone et al. 2002) have been used as initial structural constraints. The initial structure constraint for human 5.8S rRNA has been predicted using RNAfold (Hofacker et al. 1994). Unfortunately, rRNA operons are often excluded from genome assemblies. At the same time, in particular, LSU rRNAs have been rarely cloned and sequenced in independent studies, for example, for phylogenetic purposes. As a consequence, we have better coverage of snoRNAs than of rRNAs in many of the recently sequenced vertebrates. Although we have detected a majority of SSU RNAs (44/47), for LSU, only 17 of 47 nearly full-length sequences could be used for comparative analyses of modification sites. A suitable 5.8 rRNA sequence could be found in 41 species. Nevertheless, the phylogenetic range of the sequences in all three alignments spans the vertebrates from lamprey to human. All alignments are provided in the supplementary material S6.1, S6.2, and S6.3. Spliceosomal RNAs (U1, U2, U4, U5, U6, U11, U12, U4atac, and U6atac) have been taken from Marz et al. (2008), where manually curated alignments are provided. Experimentally verified positions of chemical modifications within rRNAs and snRNAs have been collected from Maden (1986, 1996), Ofengand and Bakin (1997), the SSU rRNA Modification Database (http://library.med.utah.edu/ SSUmods/, last accessed November 7, 2013) (McCloskey and Rozenski 2005), and the RNA Modification Database (http://mods.rna.albany.edu/, last accessed November 7, 2013) (Cantara et al. 2011). Modifications in U2 snRNA have been taken from Do¨nmez et al. (2004) and Yu et al. (1998). Predicted and verified interactions between snoRNAs and their targets have been collected from snoRNAbase (Lestrade and Weber 2006) and from the literature (Badis et al. 2003; Xiao et al. 2009; Higa-Nakamine et al. 2012). Mapping between positions in the human rRNA and snRNA sequences and their corresponding positions in the alignments has been realized with the BioPerl packages AlignIO and SimpleAlign (Stajich et al. 2002). A table of correspondences between putatively modified positions in rRNAs and snRNAs is provided in the supplementary table S6.4.

Coevolution of SnoRNAs and Their Targets . doi:10.1093/molbev/mst209

Mitchell JR, Cheng J, Collins K. 1999. A box H/ACA small nucleolar RNAlike domain at the human telomerase RNA 30 end. Mol Cell Biol. 19: 567–576. Mu¨ckstein U, Tafer H, Hackermu¨ller J, Bernhart SH, Stadler PF, Hofacker IL. 2006. Thermodynamics of RNA-RNA binding. Bioinformatics 22: 1177–1182. Ofengand J. 2002. Ribosomal RNA pseudouridines and pseudouridine synthases. FEBS Lett. 514:17–25. Ofengand J, Bakin A. 1997. Mapping to nucleotide resolution of pseudouridine residues in large subunit ribosomal RNAs from representative eukaryotes, prokaryotes, archaebacteria, mitochondria and chloroplasts. J Mol Biol. 266:246–268. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glo¨ckner FO. 2007. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35:7188–7196. Runte M, Hu¨ttenhofer A, Gross S, Kiefmann M, Horsthemke B, Buiting K. 2001. The IC-SNURF-SNRPN transcript serves as a host for multiple small nucleolar RNA species and as an antisense RNA for UBE3A. Hum Mol Genet. 10:2687–2700. Schmitz J, Zemann A, Churakov G, Kuhl H, Gru¨tzner F, Reinhardt R, Brosius J. 2008. Retroposed SNOfall—a mammalian-wide comparison of platypus snoRNAs. Genome Res. 18:1005–1010. Shao P, Yang JH, Zhou H, Guan DG, Qu LH. 2009. Genome-wide analysis of chicken snoRNAs provides unique implications for the evolution of vertebrate snoRNAs. BMC Genomics 10:86. Soneo Y, Taya Y, Stasyk T, Huber LA, Aoba T, Hu¨ttenhofer A. 2010. Identification of novel ribonucleo-protein complexes from the brain-specific snoRNA MBII-52. RNA 16:1293–1300. Stajich JE, Block D, Boulez K, et al. (21 co-authors). 2002. The bioperl toolkit: Perl modules for the life sciences. Genome Res. 12:1611–1618. Stocsits RR, Letsch H, Hertel J, Misof B, Stadler PF. 2009. Accurate and efficient reconstruction of deep phylogenies from structured RNAs. Nucleic Acids Res. 37:6184–6193. Tafer H, Kehr S, Hertel J, Hofacker IL, Stadler PF. 2010. RNAsnoop: efficient target prediction for H/ACA snoRNAs. Bioinformatics 26: 610–616. Taft RJ, Glazov EA, Lassmann T, Hayashizaki Y, Carninci P, Mattick JS. 2009. Small RNAs derived from snoRNAs. RNA 15:1233–1240. Weber MJ. 2006. Mammalian small nucleolar RNAs are mobile genetic elements. PLoS Genet. 2:e205. Xiao M, Yang C, Schattner P, Yu YT. 2009. Functionality and substrate specificity of human box H/ACA guide RNAs. RNA 15: 176–186. Xue S, Barna M. 2012. Specialized ribosomes: a new frontier in gene regulation and organismal biology. Nat Rev Mol Cell Biol. 13:355–369. Yu YT, Shu MD, Steitz JA. 1998. Modifications of U2 snRNA are required for snRNP assembly and pre-mRNA splicing. EMBO J. 17: 5783–5795. Zemann A, op de Bekke A, Kiefmann M, Brosius J, Schmitz J. 2006. Evolution of small nucleolar RNAs in nematodes. Nucleic Acids Res. 34:2676–2685.

13

Downloaded from http://mbe.oxfordjournals.org/ at McGill University Libraries on November 21, 2013

Goh SH, Josleyn M, Lee YT, Danner RL, Gherman RB, Cam MC, Miller JL. 2007. The human reticulocyte transcriptome. Physiol Genomics 30(2):172–178. Higa-Nakamine S, Suzuki T, Uechi T, Chakraborty A, Nakajima Y, Nakamura M, Hirano N, Suzuki T, Kenmochi N. 2012. Loss of ribosomal RNA modification causes developmental defects in zebrafish. Nucleic Acids Res. 40:391–398. Hoeppner MP, Poole AM. 2012. Comparative genomics of eukaryotic small nucleolar RNAs reveals deep evolutionary ancestry amidst ongoing intragenomic mobility. BMC Evol Biol. 12:183. Hoeppner MP, White S, Jeffares DC, Poole AM. 2009. Evolutionarily stable association of intronic snoRNAs and microRNAs with their host genes. Genome Biol Evol. 1:420–428. Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P. 1994. Fast folding and comparison of RNA secondary structures (the Vienna RNA package). Monatshefte f Chemie 2:167–188. Kapushesky M, Adamusiak T, Burdett T, et al. (22 co-authors). 2012. Gene Expression Atlas update—a value-added database of microarray and sequencing-based functional genomics experiments. Nucl. Acids Res. 40(D1):D1077–D1081. Kehr S, Bartschat S, Stadler PF, Tafer H. 2011. PLEXY: efficient target prediction for box C/D snoRNAs. Bioinformatics 27:279–280. Kishore S, Khanna A, Zhang Z, Hui J, Balwierz PJ, Stefan M, Beach C, Nicholls RD, Zavolan M, Stamm S. 2010. The snoRNA MBII-52 (SNORD115) is processed into smaller RNAs and regulates alternative splicing. Hum Mol Genet. 19:1153–1164. Kishore S, Stamm S. 2006. The snoRNA HBII-52 regulates alternative splicing of the serotonin receptor 2C. Science 311:230–232. Lagesen K, Hallin P, Rdland EA, Staerfeldt HH, Rognes T, Ussery DW. 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35:3100–3108. Langenberger D, C¸akir MV, Hoffmann S, Stadler PF. 2012. Dicer-processed small RNAs: rules and exceptions. J Exp Zool: Mol Dev Evol. 320:35–46. Lestrade L, Weber MJ. 2006. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res. 34: D158–D162. Luo Y, Li S. 2007. Genome-wide analyses of retrogenes derived from the human box H/ACA snoRNAs. Nucleic Acids Res. 35:559–571. Maden BE. 1986. Identification of the locations of the methyl groups in 18S ribosomal RNA from Xenopus laevis and man. J Mol Biol. 189: 681–699. Maden T. 1996. Ribosomal RNA. Click here for methylation. Nature 383: 675–676. Makarova JA, Kramerov DA. 2011. SNOntology: myriads of novel snoRNAs or just a mirage? BMC Genomics 12:543. Marz M, Gruber AR, Ho¨ner zu Siederdissen C, et al. (16 co-authors). 2011. Animal snoRNAs and scaRNAs with exceptional structures. RNA Biol. 8:938–946. Marz M, Kirsten T, Stadler PF. 2008. Evolution of spliceosomal snRNA genes in metazoan animals. J Mol Evol. 67:594–607. McCloskey JA, Rozenski J. 2005. The small subunit rRNA modification database. Nucleic Acids Res. 33:D135–D138.

MBE

Matching of Soulmates: coevolution of snoRNAs and their targets.

Ribosomal and small nuclear RNAs (snRNAs) comprise numerous modified nucleotides. The modification patterns are retained during evolution, making it e...
759KB Sizes 0 Downloads 0 Views