J. Mol. Biol. (1991) 222, 219-232

Spliceosomal Small Nuclear RNAs of Tetrahymena thermuphila and Some Possible snRNA-snRNA Base-pairing Interactions Henrik 0rum-f$, Henrik Nielsen and Jan Engbergl Department of Biochemistry B Panum Institute, University of Copenhagen Blegdamsvej 3, DK-2200 Copenhagen N, Denmark (Received 15 April

1991; accepted 15 July

1991)

We have identified and characterized the full set of spliceosomal small nuclear RNAs (snRNAs; Ul, U2, U4, U5 and U6) from the ciliated protozoan Tetrahymena thermophila. With the exception of U4 snRNA, the sizes of the T. thermophila snRNAs are closely similar to their metazoan homologues. The T. thermophila snRNAs all have unique 5’ ends, which start with an adenine residue. In contrast, with the exception of U6, their 3’ ends show some size heterogeneity. The primary sequences of the T. thermophila snRNAs contain the sequence motifs shown, or proposed, to be of functional importance in other organisms. Furthermore, secondary structures closely similar to phylogenetically proven models can be inferred from the T. thermophila data. Analysis of the snRNA sequences identifies three potential snRNA-snRNA base-pairing interactions, all of which are consistent with available phylogenetic data. Two of these occur between U2 and U6, whereas the third occurs between Ul and U2. The proposed interactions locate the intron 5’ splice-site close to the intron branch-site nucleotide as well as to the most highly conserved domain of U6. We envisage that these interactions may facilitate the first step of pre-mRNA splicing Keywords: Tetrahymena thermophila; spliceosomal snRNAs; RNA secondary structure snRNA-snRNA base-pairing; pre-mRNA splicing

1. Introduction

and, with the exception of some unusually large snRNAs in Saccharomyces cerevisiae, the sizes of homologous snRNAs are quite similar (for a review, see Reddy & Busch, 1988). The sequence of U6 snRNA is remarkably conserved throughout evolution (Brow & Guthrie, 1988) whereas the sequences of Ul, U2, U4 and U5 snRNA exhibit considerable inter-species variation. However, all five snRNAs contain a few small sequence motifs that are highly conserved throughout evolution (Guthrie & Patterson, 1988). One of these, termed the Sm-site, is found in all the snRNAs, except U6, and constitutes the binding site for the common snRNP proteins (for a review, see Mattaj, 1988). Despite differences in primary sequences, the secondary structures of homologous snRNAs are highly conserved throughout evolution and to a considerable extent this also applies to the large snRNAs of S. cerevisiae (Liao et al., 1990; Guthrie & Patterson, 1988).

The removal of introns from nuclear pre-mRNA occurs by two sequential transesterifications and requires a complex and ordered interplay between &s-acting intron elements, four snRNA-protein complexes, termed snRNPs§ (small nuclear ribonucleoproteins) and numerous protein factors (for a recent review, see Zieve & Sauterer, 1990; Woolford, 1989). The snRNPs are composed of either a single snRNA (Ul, U2 and U5) or two snRNAs (U4/U6) and a set of up to nine different proteins (for a review, see Liihrmann, 1988). The snRNAs have been characterized from a range of different species t Author to whom all correspondence should be addressed. $ Present address: The Royal Danish School of Pharmacy, Department of Biological Sciences, C’niversitetsparken 4, DK-2100, Copenhagen 0, Denmark. $ Abbreviations used: snRNPs, small nuclear ribonucleoproteins; snRNAs, small nuclear ribonucleic acids; Sm-site, binding site for the common snRNPs; DTT, dithiothreitol; bp, base-pair(s); PCR, polymerase chain reaction nt, nucleotides; BPS, branch-point sequence: AMV, avian myeloblastosis virus. 0022%2836/91/220219-14

$03.00/O

It has been known

for quite

some time

that

the

conserved &-acting elements at the 5’ and 3’ ends of pre-mRNA introns bear similarities to the corresponding elements of the group II self-splicing introns and mechanistically the two splicing reactions are similar 219

(for a review,

see Jacquier,

1990).

0 1991 Academic Press Limited

220

H. 0rum

This has prompted the suggestion that pre-mRNA introns and group II introns share a common ancestor and that pre-mRNA splicing is an RNA catalyzed reaction in which parts of the RNA catalysts (perhaps former c&acting elements) are supplied in tram in the form of the snRNAs (Cech, 1986). Thus, several c&-acting elements of both group I and II self-splicing introns have been shown to be capable of carrying out catalytic reactions in vitro when supplied in tram (van der Horst et al., 1991; Doudna & Szostak, 1989; Altura et al., 1989). Likewise, the trans-acting Ul snRNA in pre-mRNA splicing can be functionally substituted by a cisacting element (Bruzik & Steitz, 1990). If splicing is indeed an RNA-catalyzed reaction, then it is reasonable to assume that base-pairing interactions between the various RNA components (pre-mRNA and snRNAs) are important in formulating the catalytic structure. As indicated by the observation that conformational changes occur in the active splicing complex (Ryder et al., 1990; Rymond & Rosbash, 1988), this may not be a single, but rather several, structures, the transition between which could be catalyzed by the recently characterized RNA-dependent helicase/ATPase splicing proteins (Company et al., 1990; Schwer & Guthrie, 1991). There is substantial evidence that Ul and U2 snRNAs base-pair to the intron 5’ splice-site (Zhuang & Weiner, 1986; Siliciano & Guthrie, 1988; Seraphin et al., 1988) and branch-site sequence (Parker et al., 1987; Zhuang & Weiner, 1989; Wu & Manley 1989); respectively, whereas little is known about snRNA-snRNA base-pairing interactions during splicing. In fact, the only well-documented snRNA-snRNA base-pairing interaction occurs between the U4 and U6 snRNA in the U4jIJ6 snRNP (Brow & Guthrie, 1988) and this interaction is either partly, or fully, resolved before or during the first transesterification step (Lamond et al., 1988; Blencowe et al., 1989). Recently, in vitro psoralen-crosslinking experiments in HeLa cell extracts have demonstrated a base-pairing interaction between a small domain at the 5’ and 3’ end of U2 and U6, respectively (Hausner et al., 1990). The ability to form a related helix is retained in all species for which the sequence of U2 and U6 is known, thus lending phylogenetic support to the interaction. Several features of the snRNAs of Tetrahymena thermophila have been published (Pedersen et al., 1985). A subset of these snRNAs were proposed, based on a number of physical and chemical properties, to represent the homologues of mammalian Ul, U2, U5 and U6. However, a U4 homologue could not be identified. We report here that T. thermophila contains the full set of spliceosomal snRNAs, including U4. The sequences and possible secondary structures of these snRNAs are presented. Analysis of the snRNAs identifies three potential snRNA-snRNA basepairing interactions that are consistent with the available phylogenetic data.

et al.

2. Materials and Methods (a) Isolation of individual

snRNAs from ‘I’. thermophila

Total snRNA from T. thermophila was isolated from purified nuclei as described by Pedersen et al. (1985) and fractionated by electrophoresis on a Sq/, polyacrylamide preparative sequencing gel. The snRNAs were detected by staining with ethidium bromide and individual snRNAs were cut out of the gel and eluted overnight at 37°C’ in 10 m&r-Tris.HCl (pH 7.5), 025 M-NaCl and 1 mM-EDTA. The eluate was extracted twice with phenol, twice with chloroform and the snRNAs were precipitated with 3 vols 96%

(v/v)

ethanol.

Isolated snRNAs were 3’ end labelled with [32P]p(‘p and phage T4 RNA ligase as described by England et al. (1977). Labelled RNAs were electrophoresed on 6 or 8 Y. polyacrylamide preparative sequencing gels and detected by autoradiography. Bands corresponding to full-length snRNAs were cut out and eluted as described above. Chemical sequencing was performed according to Peattie (1979). (c) Dideozy sequencing of snRXAs Our chemical sequencing data were used to design oligonucleotide primers complementary to the 3’ end of the snRNAs, which were synthesized using a Cyclone synthesizer. All primers had t, values higher than 65°C’ when calculated from the equation: t, = 2(A+T)+4((‘+G). Primers were labelled in a 20 ~1 reaction vol. containing 100 ng primer. 50 PCi [y-32P]ATP. 100 rnM-Tris. HCI (pH %O). 10 mM-MgCl,, 5 mM-dithiothreit,ol (DTT) and 1.5 units phage T4 polynucleotide kinase (BRL). The reaction was incubated at 37°C for 40 min followed b> incubation at 65°C for IO min. Labelled primers were’ recovered by precipitation with 1 vol. 4 M-ammonium acetate and 3 vols 9S”/b (v/v) ethanol. Total nuclear RNA (approx. 3 pg) was mixed with 50 ng labelled primers in a 10 /*l reaction vol. rontaining 7 mi\l-Tris. HCI (pH 7.5). 7 mM-MgCl,. 30 m,n-NaCl, 10 mM-DTT and 0.1 mMEDTA. The annealing reaction was heated at 95°C for 5 min, centrifuged for 5 s and further incubated for I h at 55°C. Following this incubation the sample was clentrifuged for 5 s and 2 ~1 samples were transferred to each of 5 tubes marked C. A, T, C and X (primer extension reaction) containing 33 ~1 reverse transcription buffer (24 miw-Tris.HCl (pH 8.3). 16 mM-MgCl,, IO mM-DTT. @8 mM-dGTP, @4 mM-dATP, 0.4 mM-dTTP. 0.4 mMdCTP), 1 unit/PI AM\’ reverse transcriptase (Boehringerof 1 mM-ddGTI’ (G-tube). Mannheim) and 14 1 mM-ddATP (A-tube). 2 mm-ddTTP (T-tube) and 1 mM-ddCTP (C-tube). The samples were incubated for 1 h at 55°C and terminated by the addition of 4 pl loading buffer (lOOoh formamide. 0.30/,, (v/v) bromophenol blue and @3% xylene cyanol). The samples were heated at 95°C‘ for 3 min and 1 ~1 of samples was loaded on 6 or 8%) sequencing gels. In some cases the gels caontained 40% (v/v) formamide in order to prevent sequence compression artefacts. Following electrophoresix the gels were subjected to autoradiography. (d) Dideozg sequencing of jrst strand ~nR,l’rl cDXA Total nuclear RNA (approx. 30 pg) was polyadenylated 100 ~1 reaction vol. containing 50 mM-Tris. HC4 (pH 7.9). 2.5 mM-MnC4,. 0.25 M-NaCl, IO mM-MgCl,, in

Spliceosomal snRNAs and Base-pairing 03 mM-ATP 5 units polymerase and poly(A) (Pharmacia). The reaction was incubated for 40 min at 37”C, terminated by phenol and chloroform extractions and precipitated with 3 vols 96% (v/v) ethanol. Recovered polyadenylated RNA was mixed with 3.5 pg poly(dT),, primer in a 50 ~1 reaction vol. containing 50 mm-Tris.HCl (pH %3), 40 mM-KCl, 7 mrvr-MgCl, and 10 mM-DTT. The mixture was heated at 95°C for 5 min, centrifuged for 5 s and incubated on ice for 10 min. AMV reverse transcriptase (30 units) and dNTPs to a final concentration of 1 mM each were added to the reaction and incubation was continued at 42°C for 60 min. RNase H (Amersham) was added to 1 unit/p1 and incubation was continued for an additional 60 min at 37°C. The reaction was terminated by phenol and chloroform extractions and cDNA was recovered by precipitations with 1 vol. 4 M-ammonium acetate and 3 vols 96% (v/v) ethanol. Dideoxy sequencing was performed exactly as described above, except that primers complementary to the cDNA were used. This method is referred to as reverse dideoxy sequencing. (e) Dideoxy sequencing-chase with terminal deoxynucleotidyl transferase Material corresponding to the full-length snRNA in lanes C. A. T. C and X (primer extension) was cut out. of the sequencing gel and eluted overnight at 37°C in @5 M-ammonium acetate (pH 7.5). The eluate was extracted twice with phenol and chloroform and the eluted cDNA was recovered by precipitation with 3 vols 96% (v/v) ethanol. Half of the recovered cDNA was incubated at 37°C for 2 h in a 10 ~1 reaction vol. 100 mM-potassium cacodylate (pH 7.2), containing 1 mM-dATP and 15 units 1 mM-CoCl,, 1 mM-DTT, terminal deoxynucleotidyl transferase (BRL). At the end of the incubat,ion period both the chased reaction and the unchased half (control) of the cDNA were precipitated with carrier tRNA (1 pg), 1 vol. 4 M-ammonium acetate and 3 vols 96% (v/v) ethanol. Recovered cDNA was dissolved in 2 ~1 loading buffer, denatured at 95°C for 3 min and loaded on 6 or 8% sequencing gels. Following electrophoresis the gel was subjected to autoradiography. (f) snRNA gene probes about 100 bp long, for the Specific probes, T. thermophila snRNA genes were generated by PCR amplification using total genomic DNA and sets of snRNA-specific phosphorylated oligonucleotide primers. The crude amplificate was precipitated with 1 vol. 4 M-ammonium acetate and 3 vols 96% (v/v) ethanol, filled-in with Klenow polymerase and dNTPs and then cloned in the plasmid Bluescript SK + , which had been linearized with SmaI and treated with calf-intestinal phosphatase. Plasmids isolated from selected clones of recombinant Esch,erichia coli JM103 were sequenced with phage T7 sequencing kit (Pharmacia). Purified plasmids were used to synthesize 32P-labelled probes by the random primer method according to the supplier’s instructions (Boehringer-Mannheim).

3. Results (a) The snRNAs T. thermophila contains several abundant snRNAs (more than 2 x lo5 copies/cell) some of which have previously been shown to have physical

Interactions

221

and chemical properties similar to those of the major U snRNAs of mammalian cells (Pedersen et al., 1985). Without sequence data, however, it was not possible to make conclusive assignment and the snRNAs were therefore named Tl to T7 according to their gel-electrophoretic mobilities. T3 is a collective term for several snRNAs in the size range of 125 to 140 nucleotides, whereas the terms Tl, T2, T4 and T7 relate to apparently unique snRNAs (Tl approx. 110 nt to T7 approx. 256 nt). In the present study, snRNAs belonging to the T3 cluster and the snRNAs previously suggested to be the homologues of mammalian Ul(T4), U2(T5), U5(T2) and U6(Tl) were isolated from total nuclear RNA by preparative gel electrophoresis. The recovered snRNAs were labelled at their 3’ ends with [32P]pCp and phage T4 RNA ligase and repurified by a second round of electrophoresis. Labelled, full-length snRNAs were isolated from the gel and sequenced using the chemical sequencing method. In this way sequence information was obtained corresponding to at least 40 to 50 nucleotides close to the 3’ end of T2, T4, T5 and several T3 snRNAs, but not Tl (see below). To obtain sequence information on the remaining part of the snRNAs, oligonucleotides complementary to a region close to the 3’ end were synthesized and used as primers using the dideoxy sequencing method. The complete sequences were subsequently compared with the sequences of mammalian U snRNAs and from this comparison it was possible to identify T2, T4, T5 and one of the T3 snRNAs molecules as being the T. thermophila homologues of mammalian U5, Ul, U2 and U4, respectively. It was suggested that Tl represents the T. thermophila homologue of the mammalian U6 snRNA (Pedersen et al., 1985). However, [32P]pCp-labelled, full-length Tl snRNAs consistently gave unreadable sequences when sequenced by the chemical method. We therefore switched to an alternative strategy in order to identify the T. thermophila U6 homologue. It has been noted that a highly conserved sequence motif in U6 is always able to base-pair with two highly conserved regions of the cognate U4 (Brow & Guthrie 1988). Since the two conserved sequences could be identified in the T. thermophila U4 homologue, it suggested to us that the T. thermophila U6 might also contain the corresponding highly conserved U6 sequence motif. On the basis of this assumption, we designed a “U6” oligonucleotide primer, taking into account both the known phylogenetic variation of the U6 sequence motif and the sequence of the T. thermophila U4. When this oligonucleotide was used as primer in dideoxy sequencing, using total nuclear RNA as template, a unique sequence was obtained that could be identified as the 5’ half of U6. The remaining part of the U6 sequence was subsequently determined by reverse dideoxy sequencing (see below) and the combined results showed U6 to have a size of 106 nucleotides, which corresponds to the estimated size of Tl. Thus, the Tl material isolated from the preparative gels un-

H. 0rum et al. RNA sequence

snRNP

5*X-ACWBCCUGG CnuuGcncnn CRAUUUCUOC

M-1

w-1

w-2

u2-2

5,x-nunccuucuc 9x-nunccuucuc CU6nUnCUGU CU6IURCUGU

CUGGflGUUUG

CUAUCGAUCR

UGRnGG6Un6

C6GCUUn666

UG6nGCnGGU

nn6nuGucuG Un6GG6n61C

unnunccuun GUGCnCUUn6

uu6uucccc6 UOCUOUCUCC

u6c6666nnc GCU(C)-3’

c6nrtncGGcn

6Gccuuuu66 66ccuuuu66

cunnGnucnn cunn6nucnn

6u6uG6unuc 6kwm6ufiuc

u6uucuufluc u6uucuunuc

n6uGu6nm nGu6u6nmn

CCCUnCUn66 CCCUlCUn6G

6nCnUGU66U bnCnUCU66U

UUCACAUUAR UUCACRUURA

UUUUUCnCG6 UUUUUCGC1G

66GUC6GllUU 66GUC66AUU

CCCGnCGC66 CCCGnCGC66

UUGCCCUU66 UUGCCCUU66

CAUGCnCGCUnCUnn66n6C CCU6CnC6CURCUAll66R6C II

6U6nC6GUUn nc6cuu6c6u

ICUGAUGC6C c66Gucccnn

uuuuc6ccuu uuuucGccuu

uuncunnnGn uuncunfnm

6 uuncc6u uuficc6usn 602

UUU6CnG6nU uuu6cnG6nu

UCUUUUGAAU ucuuuuGmu

CCUCUtCnCI-3’ ccucu-3’

UnCUn6UG6CUUGCCCUnGU 0CnCUntiU66CUUGCCCunGU GGCUnCCCCU 6GCUACCCCU

u4-1

N.C

us-1

us-1 us-2

N.E

GOlCl-3’ G(RCk3

5’x-NnCCUC6C6n n6uu6nnnnc 6UC6UIUGnC

CnGG66CnnU uncuucnnun CUn(CU)-3’

nUnGCn6Cnn nGuGGnnnc6

51x-nucncnGnnc 5*x-nucncnGnnc

ucntcucnuu ucatcucnuu

nc6cuuunnu nc6ftuuuGnu II

CUGGGUUU C CflnUGUGnnU cu666uuu IEIu cnnuGu6ftnu U6-1

N.C

sx-N6nncccGnn uGc6c6nnG6

n66Gucnucc nuGncnc6cu

UnUUnnnnUU unuunnnnuu 6uunnnnuu6 cnnn6c6nGn

6nncGnuncn n66nuucccc

6nGnnGnuun nuuuuu-3’

URUUAUUGCU uuucu66~~6

6cnuG6cccc

Figure 1. The sequences of the T. thermophila spliceosomal snRrJAs and of the coding region of some of their genes. Variable positions in U2 and U5 snRNA genes are boxed. In the case of U5, only transcripts arising from the U5-1 gene are detected as snRNA sequence. An X designates the 5’ cap structure, which although unknown in T. thermophila does not appear to be the usual (2,2,7) trimethylguanosine (Pedersen et al., 1985). The K at the 5’ end of I74 and IiS designates that the identity of these nucleotides has not been established with certainty. Nucleotides in parenthesis indicate snRNA 3’ end extensions as determined by reverse dideoxy sequencing (see Materials and Methods, section (d) for details). K.(’ designates that the corresponding gene has not been cloned and N.E designates that the corresponding snRNA does not appear to be expressed. doubtedly contained the T. thermophila U6 homologue, as well as at least one unrelated snRNA. The T. thermophila U1, U2 and U5 snRNAs migrate as distinct bands in polyacrylamide gels and material isolated from these bands does not contain other snRNAs as evidenced by the chemical sequencing data. Thus, several physical and chemical features of these snRNAs can be inferred by reference to earlier results on snRNAs isolated from these bands (Pedersen et al., 1985). This implies that Ul, U2 and U5 are present in the nucleoplasm in roughly 8 x IO’, 9 x lo5 and 6 x 10’ copies per cell, respectively, and that they contain several 0-ribose methylated nucleotides and pseudouridines. No implications can be made in the case of U6 and U4, which comigrate with at least one unrelated snRNA and with several T3 snRNAs, respectively. We were concerned that some of the modified nucleotides in the snRNAs might have altered basepairing properties that would affect the accuracy of the sequence data obtained using the dideoxy sequencing method. To address this question, we have cloned and sequenced one Ul , two U2 and two U5 genes (in the case of U5 this corresponds to all of the genes as evidenced by Southern hybridization; data not shown). Gene/snRNA sequence comparisons revealed that the sequence of the Ul gene, of both U2 genes and of one of the U5 genes (U5-1)

were colinear with the corresponding snF1NA sequences, indicating that the modified nucleotides showed normal base-pairing characteristics during the dideoxy sequencing reactions. The coding region of the two tr2 genes (I’%-1 and 2) and the two LJ5 genes (US-1 and 2) exhibit minor differences, all of which are single nucleotide subs& tutions. The differences between the two c’;! genes occur at positions 121 and 162. and transcript,s arising from both genes could be clearly identified in the sequence of the U2 snRNA. The two lJ5 genes differ at four positions (24, 59, 69 and 70) and only transcripts arising from the 115-l gene could be detected in the sequence of the C5 snRNA. This suggests that’ the U5-2 gene was either not expressed under the experimental growth conditions (exponential growth) or the steady-state amount of the corresponding transcript was below the level of detection. The possibility that U5-2 is a pseudogene appears unlikely for two reasons. First, one would expect sequence variations in the coding region to be random and this does not seem t)o br the case. since the observed variations comply wit>h the predicted U5 snRNA secondary structure map (see below). Second, the IJ5-2 gene contains the samP promoter elements located 5’ and 3’ of the coding region as can be found in the expressed gene (H. Orum. unpublished results). Thus. we favour

Spliceosomal

snRNAs

and Base-pairing

Interactions

223

the possibility that some T. thermophila snRNA genes are differentially expressed and this is currently the subject of investigation. 1 shows the Figure sequences of the T. thermophila spliceosomal snRNAs, which we propose to rename according to the mammalian U snRNA nomenclature. Also shown, the putative transcript of the apparently non-expressed U5-2 gene. (b) The 3’ end of the snRNAs The 3’ ends of the snRNAs and the sequence of the 3’ part of U6 were determined using a new procedure: total nuclear RNA was polyadenylated using E. coli poly(A) polymerase and this material was subsequently used as template to generate first strand cDNA using oligo(dT) primers and AMV reverse transcriptase. Oligonucleotides complementary to selected regions of the snRNA-specific cDNA were then used as primers in dideoxy sequencing reactions of the first strand cDNA. This method is referred to as reverse dideoxy sequencing. In the case of Ul the 3’-end cDNA sequence reads GCTA/CA, (A/C indicates a variable position; see Fig. 2). The corresponding region in the Ul gene reads GCTCAC (data not shown) and we therefore assign the 3’ end of the shortest Ul snRNA to the T residue (underlined). The C residue at the variable A/C position and the first A in the following poly(A) sequence are consistent with the genomic sequence, which indicates that there is some size heterogeneity at the 3’ end of the Ul snRNA (whether the heterogeneity extends to the A residue cannot be determined because the RNA was polyadenylated prior to 1st strand cDNA synthesis). The 3’ ends of the other snRNAs were determined in a similar way and, with the exception of U6, 3’-end heterogeneity was observed with U2 (additional AC), U4 (additional CU) and U5 (additional CAC). In mammalian cells, several of the snRNAs have been shown to be transcribed as slightly larger precursors that are subsequently processed to the mature-sized species (for a review, see Zieve & Sauterer , 1990), and it seems likely that the observed 3’ end heterogeneity in T. thermophila indicates that a similar process is operating in this organism. (c) The 5’ end of the snRNAs

Figure 2. Characterization of the 3’ end of Ul by reverse dideoxy sequencing (see Materials and Methods, section (d) for details). The deduced sequence at the extreme 3’ end is indicated.

The 5’ ends of the individual snRNAs identified by primer extension and dideoxy sequencing appeared homogeneous, suggesting that transcription initiates at a single defined nucleotide. In vertebrates, initiation of snRNA gene transcription occurs with great precision at the nucleotide corresponding to the 5’ nucleotide of the mature U snRNA (for a review, see Dahlberg & Lund, 1988) and

in all

organisms

this

nucleotide

is a purine

residue, preferentially an adenine. In order to determine the 5’ nucleotide of the T. thermophila snRNAs, we isolated full-length

224

H. @rum et al.

(bl

UI G

Chased

A

T

C

X

Figure 3. Determination of the 5’ nucleotide of Ul. (a) The sequence to the left (termed control) represents the isolated Ul cDNA material (b) The sequence in the middle (termed chase) represents the isolated Ul cDNA material treated wit,h terminal transferase. (c) The sequence at the right is the initial Ul snRNA dideoxy sequence. A subset of “background” bands in the control sequence are not removed by the terminal transferase treatment (compare (a) and (b)). Since thesrl bands correspond to the actual sequence minus one. we conclude that they result from primer heterogeneity.

cDNA corresponding to the five RNA-sequence reactions (G, A, T, C and primer extension) from a preparative sequencing gel. Each of the five isolates was subsequently treated with the enzyme terminal transferase, which will extend cDNA molecules carrying a free 3’-OH group, but, will not extend cDNA molecules terminated by a dideoxy nucleotide. In the case of Ul such treatment shows that the dideoxy-terminated cDNA molecules corresponding to full-length Ul are located in the T reaction, indicating that Ul snRNA starts with an A residue (see Fig. 3). Consistent with this result, the 5’ end of Ul determined by primer extension mapped to an A residue in the cloned Ul gene. Similarly, the 5’ ends of U2 and U5, determined by primer extension, mapped to an A residue in the corresponding genes. In all cases, this A residue is the first purine downstream from a highly conserved sequence motif in the snRNA genes (H. Orurn, unpublished results), strongly indicating that

primer extension of the snRNAs.

mapping

defines the correctt 5’ end

(d) The possible secondary structure of the snuW.4~ Phylogenetically supported models have been worked out snRNAs (Guthrie & Patterson, that the T. thermophila snRNAs similar structures. Furthermore, location of most of the highly motifs shown to be, or proposed importance are also found in snRNAs.

secondary structure for the spliceosomal 1988). We show here can be folded into the sequence and conserved sequence to be, of functional the T. thermophda

(i) Ul Four evolutionarily have been shown to These are the strictly nition sequences, the

conserved Ul sequence motifs be of functional importance. conserved 5’ splice-site recogSm-site and the sequences of

Spliceoswnal

snRNAs

and Base-pairing

loops I and II, which are the major determinants for binding of the Ul-specific proteins (Scherly et al., 1989; Patton et al., 1989; Hamm et al., 1988). All of these sequence motifs are present in T. thermophila Ul and occur at phylogenetically conserved positions in the proposed secondary structure (see Fig. 4). The 5’ splice-site recognition sequence of Tetrahymena is strictly conserved. This was not entirely expected because the sequence of the intron 5’ splice-site is somewhat unusual in T. thermophila (Csank et al., 1990). In all other organisms characterized so far, positions +5 and +6 of the 5’ splicesites are dominated by G and U residues, respectively, both of which are complementary to the Ul sequence (Jacob & Gallinaro, 1989). In contrast, these positions are dominated by non-complemenAs a tary A and A/U residues in T. thermophila. result, a significant number of predicted 5’ splicesite/U1 helices contain as little as four non-consecutive base-pairs. How this affects splicing in T. thermophila is not known at present. The sequence of loop II is highly conserved in all published Ul snRNAs. However, the T. thermophiza Ul sequence in this region is totally different from the consensus sequence, as is the recently published sequence of Physarum polycephalum Ul (Myslinski et al., 1989). Thus, the sequence of this loop is not a phylogenet’ically conserved feature. (ii) 112 Phylogenetic sequence variation in the 5’ part of U2 is consistent with any of three different structures (Guthrie & Patterson, 1988), and these can all be formed in the T. thermophila U2. Recently, several trypanosomatid U2 snRNAs have been characterized in which only one of these structures can be formed (Hartshorne & Agabian, 1990). This structure was also arrived at by in vitro chemical structure probing and in vivo mutational analysis of U2 of S. cerevisiae (Ares & Igel, 1990). Therefore, U2 is presented in accordance the T. thermophila with this structure (see Fig. 4). The sequence of loop IIa is conserved among species (consensus U/GUAAYN) and it contains a hitherto invariant UAA trinucleotide. Deletion or substit,ution of the entire loop sequence produced a lethal phenotype in S. cerevisiae (Ares & Igel, 1990). However, several smaller mutations within the loop sequence (including mutations in the UAA triplet) had no apparent effect on either cell viability or U2 snRNP assembly (Ares & Igel 1990); this suggests that the precise sequence was not required for U2 function. The sequence of the T. thermophila U2 loop IIa (GUGAAA) does not contain the invariant I;AA triplet and, furthermore, differs from the consensus sequence by having an A residue at a position otherwise occupied by a pyrimidine residue. Several evolutionarily conserved sequence motifs are found in U2 including the branch-site recognition sequence (BPS), an Sm-site motif and conserved sequences in loops III and IV; these are also present, in T. thermophila U2 (see Fig. 4). Loop IV contains the major determinant for binding of

225

Interactions

the UB-specific proteins U2-A’ and B” (Scherly et al., 1990). Interestingly, one of the differences between the two expressed T. thermophila U2 snRNAs falls within this putative protein-binding motif (position 162) (see Fig. 4). (iii) U5 Figure 4 shows the predicted secondary structure of the T. thermophila U5 snRNA, including the variations related to the putative transcript of the U5-2 gene. Three of the variations fall within singlestranded regions and one (position 59) falls within the predicted stem Ib and constitutes a neutral mutation. U5 from all species contains two conserved sequence motifs, a Sm-site that is located in the single-stranded region connecting hairpin I and II and a sequence motif that forms stem/loop Ic and the 3’ part of the internal bulge, ILl. Both of these elements are found in the T. thermophila U5 and except for the basal base-pair of stem/loop Ic, the sequence of this entire structure is identical between T. thermophila and mammals. (iv) U4/U6 The most notable feature of U4 and C6 is their capability of forming two intermolecular helices (Brow & Guthrie 1988) and these structures can also be formed between T. thermophila U4 and U6. In comparison with other organisms, the regions of the T. thermophila U4 and U6 that are predicted to form stem II contain several base changes, thus providing additional phylogenetic support for this structure (see Fig. 4). In contrast, the sequences of U4 and U6 that form stem I are not informative, since they fall within the phylogenetic variations already known (Guthrie & Patterson, 1988; Reddy & Gupta, 1990). The T. thermophila U4 snRNA is the second smallest U4 snRNA charact(erized so far. Nevertheless, the predicted secondary structure is closely similar to the U4 consensus structure and it contains the evolutionarily conserved sequence motif in the 5’ hairpin loop and a putative Sm-site. It has been noted that the sequence of U6 is remarkably conserved in evolution and this observation can now be extended to T. thermophila. Thus, with the exception of a single variation (an insertion of a G residue at position 66) the sequence from positions 23 to 79 is identical between T. thermophila and human U6. The only evolutionarily conserved U6 secondary structure is a small hairpin at the 5’ end (Brow & Guthrie, 1988) and can also be formed this structure with T. thermophila U6. 4. Discussion (a) Possible snRNA-snRNA interactions

base-pairing

It has been speculated for some time that premRNA splicing is an RNA-catalysed reaction (Cech, 1986; Jacquier, 1990) and by analogy to splicing of

226

H. 0rum et al. CUUU C 6

uu

“I\,

stem

loop

6 U U n

I

CU-Ru U-A U-R IC U-6 30-u-n -50 R-U

am

Ill

CRC

U n C

6 c

u-n 6-U 6

*tom/loop II

6-C’

n-u ml

6” RIJ u n-u C-6 U-6-n2 C-6 6-C R-U lb C-6 U-6

II 6

ston/loop II

IO-C-6

U

u2

R-U R-VU n/j2 c-10 6

U

IL2 90 I

C-6 ww-nnuunuunnn-cn-ucuxncw

C-6’ u-n U-R R-U--II0 6-C 6-C

U

R

C-6-150

mtem/loop III

6

*tern/loop IU

647

’ 6-C 110 u-n n-u ‘O,C-6 6-C I 5’-HRCUURCCU66-CnCRnUWCU6CU8666-C6CUK~

130

S’SS rot.

sm Blto

uuuu

tl

u

u

n

C-6 C-6 C-6 stem/loop *t*m/loop II B C-6 C-6 lou-n 6-C 6-C u-n I- ” 5’-HnUn~kR;6U6UR6UnUCU6UUCUU-ii -$ n-u 7iir u-n n-u 6-C

6

6-C u 6-C 6-C

n-u 5

I

50-c-6

-3’

Sm alte

uu

130-C C

sto,::,op

u

n-u C-6 u6-C n-U,9O 6-U ‘-’ UU6 UUCCCC ‘u c-6 n6c 616666 c6 n-U

66Un6C 66CU CUnUC6uuU6n cun6

Ii-lull

II

6-U

6nn6

JO-6

uu

C-6 6-U

6-2 u-n 50-6-C 6-C

9tedloOp I

nc

R-N IIn

C-6

U-IQ R-U

n-ull

n

\ UU

u-n C-6 n-u

9tDm/loop III

C un6-c 6-C 99 IIS C-6 15” I u-n t 6UUUCnCRUURnUUUUUCRCA6666-C6C66UU6CCCUU66 u C66C6n66RllUC Sm alto n C C T c CU6wlC~-3’ I 190

I-

6

9tem/loop Ill C, I’ Cku6 n

C n

CGC Y 110

‘P 6

nn

n

C-6 C-6 C-6

5’ rtem/loop

5’w6-cnucc6uunnnnuu66nnc6nuncn6n6

C-6

sm rlto

,3.-C-6 U

nu

3’ 9tomlloop

6

c-6

C-6 -90 u-n 6-C C-6 6 C uu

U

n

u

‘u-n n-u u-n 50-c-6

5’ 9temAoop

6-C C-6’ 6-C Au-RR’

central stem/loop

lJ6 C %I

30 6U IF

UU6 EC

Figure 4. The predicted secondary structure of the T. thermo@ila spliceosomal snRNAs based on a comparison to phylogenetically supported models (Guthrie BEPatterson, 1988). The putative Sm-site of Ul, U2, U4 and LJ5. the strictly conserved sequence at the 5’ end of Ul (5’ splice-site recognition sequence; 5’ ss ret) and the conserved branch point sequence of U2 (BPS) are underlined. Alternative bases in U2 and U5 are indicated with a subscript that denotes the corresponding genes (see Fig. 1). The graphical presentation and designation of structural elements are present,ed according to Guthrie & Patterson (1988) except U2, which is presented according to Ares BEIgel (1990) and Hartshorne and Agabian (1990). 5’-X designates the cap structure and an N designates any unspecified nucleotide (see legend t,o Fig. 1 for details). Y designates a pseudoknot. Numbering starts at the 1st encoded nucleotide. Domains in U2 (positions 4 to 11) and U6 (positions 86 to 93) that are proposed to base-pair are boxed (see Discussion for details).

Spliceosomal snRNAs and Base-pairing group 1 and 11 introns, base-pairing between the various RNA components (pre-mRNA and snRNAs) are thought to be important in creating the catalytic structure(s). We have used the availspliceoable sequence data of the T. thermophila somal snRNAs in an attempt to identify snRNAsnRNA interactions that would be consistent with both phylogenetic data and the current consensus structures of the snRNAs (Guthrie & Patterson, 1988; Hartshorne & Agabian, 1990). In particular, we have searched for interactions that would bring the intron 5’ splice-site into close proximity with the intron branch-site sequence, a proximity that is likely to be required for the first transesterification reaction in splicing. So far our analysis has enabled us to identify three putative snRNA-snRNA basepairing interactions. Two of these occur between U2 and U6, termed U2/U6-I and -11, and one occurs between Ul and U2, termed Ul/U2-T. (i) C2/ IJ6-I This interaction occurs between the 5’ end of U2 and the 3’ end of U6 and corresponds to the interaction that has recently been identified between U2 and U6 by psoralen-crosslinking in HeLa cell extracts (Hausner et al., 1990). As noted by these authors, a similar interaction is possible between sets of 1~2 and U6 from all other species. However, the precise number of base-pairs in the predicted C2/C6-I helix varies considerably among species (6 consecutive base-pairs in humans; 11 base-pairs in S. cerevisiae) as does the precise position of the U2 domain (nt positions 1 to 7 in T. brucei; nt positions 6 to 13 in Pisum sativum) and the U6 domain (9 to 13 nt 3’ to a highly conserved sequence motif, consensus AGUA/CGGAA) (Hausner et al., 1990). Thus, the interaction is not phylogenetically supported in the sense that neutral and compensatory base changes correlate with a defined and precisely located structure. As discussed by Hausner et al. (1990), several studies, mainly from yeast, indicate that the U2/U6-1 interaction is dispensable for both splicing and viability. However, Datta & Weiner (1991) and Wu & Manley (1991) have recently provided genetic evidence in support of the proposed structure in HeLa cells by demonstrating the rescue of a U2 splicing mutant by a compensatory base-pair change in U6. In T. thermophila, the UB/ITB-I interaction occurs between nucleotides at positions 4 to 11 in U2 and nucleotides at positions X6 to 93 in U6 (boxed in Fig. 4). The predicted helix contains eight consecutive base-pairs. This interaction overlaps partly with a previously proposed interaction (McPheeters et al., 1989). The position of the U2/U6-II domain is highly conserved in evolution. As depicted in Figure 5, the U2 domain (14 nt) is located immediately downstream from the U2 branch-site sequence and the U6 domain (15 nt) is located three nucleotides upstream from the U6 domain involved in U4/U6 stem I formation. The proposed U2/U6-II helix

Interactions

227

consists of either 11 or 12 base-pairs, and in all organisms but T. brucei, the structure of the helix is extremely well conserved. Phylogenetic support for the proposed interaction is good, four base-paired positions being supported by neut’ral base changes in U6 and one being supported by a neutral base change in U2 (marked by asterisks in Fig. 5). The 5’ part of the predicted T. brucei structure is quite different from the consensus structure shown in Figure 5. However, an alternative consensus structure can be drawn in which the invariant 5’ part of the U2 domain (5’ UCUGU 3’) is base-paired to the invariant 3’ part of the U6 domain (5’ ACAGA 3’) and this alternative structure closely resembles the predicted T. brucei structure shown in Figure 5. This rearrangement, however, disrupts two basepairs that are supported by neutral base changes in U6. Thus, we favour the consensus structure depicted in Figure 5. The available experimental data relating to the proposed U2/US-II interaction, are difficult to interpret unambiguously. Consistent with the proposed interaction, deletions that cover part or all of the involved U2 or U6 domains abolish spliceosome assembly and splicing (Fabrizio et al., 1989; Bendereif et aE., 1990; Hamm et al., 1989; McPheeters et al., 1989). However, none of these experiments specifically investigated the U2 and U6 domains in question, and as such the data cannot be interpreted exclusively in terms of the proposed interaction. Point mutations in the S. cerevisiae U6 domain (positions 37, 38, 41, 42 and 44) and U2 domain (positions 50 and 53) that disrupt single base-pairs have little or no apparent effect on cell viability or in vitro splicing (Madhani et al., 1990; Fabrizio & Abelson, 1990), suggesting either that the Us/Us-II interaction is tolerant of small changes, or that the effects of these changes are too subtle to be detected by the assay used. Likewise, a triple deletion (38UGA40) in U6 of S. cerevisiae, predicted to disrupt’ two base-pairs, has no effect on splicing in vitro (Fabrizio & Abelson, 1990). In contrast, a triple mutation in Xenopus laevis U6 (35AAC37 to 35UUG37), predicted to cause some structural rearrangements of the helix, has an intermediate effect on splicing (Vankan et al., 1990). Point mutations in the strictly conserved U6 sequence, ACAGA (position 47 to 51), result in a range of growth phenotypes and in vitro splicing defects in S. cerevisiae (Madhani et al., 1990; Fabrizio & Abelson, 1990) that do not fully correlate with the predicted effect on base-pairs at positions 47, 50 and 51. However, given the strict conservation of this U6 sequence motif, it seems likely that it has a role in splicing beyond the ability to form the proposed helix. In U2, part of the domain (positions 48 to 53) forms half of stem IIa in the U2 snRNP (see Fig. 4), and the integrity of this structure is important for cell viability and splicing in S. cerevisiae (Ares & Igel, 1990). However, mutations in stem Ha affected splicing in a way that was not unambiguously correlated with the expected severity of helical disruption, suggesting

H. 0rum

228

et al.

I 3,

T: thermophila S?

I

I I UC uwuCuuRucnti-U2 I!! I!!! f!! !i?? -U6 (RC c I

3’

-

_, RG --nvnn-nn

nriuv

I

r 5’u6

Drosophila u

melanoqaster 53 1

I I UC uriuucuunucn6-U2 I! i!! ,nc

ii

ic

II

I

53

I

uwuCuunucnC-UZ

II IllI II Ill1

n6-muw-fln 66uu -u6 I 45

I!! !d?J U6 c I-

nc

L

F! sativum I*

I I UC uwucuunucnt -U2

II nwCnn I Ill II 6tuy Ill1

n6

I nc

-u6

Figure 5. Graphical representation of the proposed base-pairing interaction, U2/US-II. Phylogenetic data were obtained from Guthrie & Patterson (1988), Reddy (1988), Reddy & Gupta (1990) and Hanley & Schuler (1991; Pisuw~ sativum). Phylogenetically supported base-pairs in the consensus structure are marked by an asterisk. R and Y designate purine and pyrimidine residues, respectively. An A/U in the consensus structure designates an A or a I’ residue at this position. In all organisms the 3’ end of the U2 domain reads AG. In C. e2egan.sU2 the A nucleotide is lacking thereby moving the terminal G 1 nucleotide to the 5’ and shortening the predicted 3’ half interaction by I base-pair. The 3’ end of the proposed U6 domain is located 3 nt 5’ to the 1st base-pair in U4/U6 stem I, except in T. bruce:i where it. is located 4 nt 5’ to the stem. that the nucleotides involved in formation of this stem have additional roles in splicing (Ares & Igel, 1990). It is interesting to note that the only stem IIa-disruptive mutation that did not appear to affect cell viability and splicing corresponds to a U to A change of the bulged residue at position 49 in the proposed U2/U6-II interaction, and this is phylogenetically established change (see Fig. 5). The UZ/US-II interaction was not found in the in vitro crosslinking experiments by Hausner et al. (1990) that revealed the U2/U6-I interaction. The reason for this could be that the latter structure is more accessible to crosslinking. It has been noted that the relaxed helix geometry of a terminal duplex, such as U2/U6-J, favors psoralen-crosslinking (Cimino et aZ., 1985). (iii) UllU2-I There are several indications of a Ul and U2 snRNP interaction. Thus both snRNPs can be immunoprecipitated by monospecific antibodies against either Ul or U2 snRNP (Mattaj et aE., 1986) and Ul snRNP has a role in mediating U2 snRNP binding to the pre-mRNA in vitro (Barabino et al.,

1990). Most interestingly, co-immunoprecipitation of U2 snRNPs by Ul monospecific antibodies is abolished by deletions of U2 snRNA sequences that are not required for U2 snRNP assembly (Mattaj $ de Robertis, 1985) suggesting that part of the Ul/U2 snRNP interaction may be caused by direct snRNA-snRNA base-pairing interactions. In our analysis of possible base-pairing interactions between Ul and U2 snRNA, we have noted a small interaction that is consistent with both phylogenetic data and the current snRNA consensus secondary structures. It involves the dinucleotide immediately 3’ to the Ul long-range interaction helix and the invariant UC dinucleotide immediately upstream from the 172 branch-site sequence (see Fig. 6). In most organisms, the IT1 dinucleotides read AU, but in D. melanogaster and T. thermophila they read GU and AC, respectively, both of which constitute neutral-base changes with respect to the U2 dinucleotide. Tn Caenorhubditis elegans, the Ul dinucleotide reads IX, and basepairing to U2 would therefore require shifting the interaction one nucleotide 5’ in U2, to the following GA dinucleotide (see Fig. 6). Clearly. suggesting a

Spliceosomal snRNAs and Base-pairing S.pambc

Interactions

229

C. elegans

S. ccre visiae

T. ihermaphita -1

L? me/anagaster

IT2

6UR6”” !!mncu!1;! I II

1

Bean

IU ‘f’ r RUllRUUUCU66Un6

II III YEnnEUnCn

“6666:

4

1

III II Ii11

nuc66uuuu~-U2

Y

I!%

Ul !!U2 n IIS

rpecie6

f=g$q

nURnUUU6UU6CCU6n6666

II

1.1

“““‘““66CC6’6:

-Ul

II I I111111 I

6nnCUn6nnUC66UUUUC~

-u2

Figure 6. Graphical representation of the proposed bipartite interaction between the 5’ end of IJ2 and 3’ end of Ul (Ul/U2-I). Phylogenetic data were obtained from Guthrie & Patterson (1988); Reddy (1988); Reddy & Gupta (1999) and partial pea Ul sequences from Krol & Ebel (1983). “Bean species” designates that the Ul and U2 sequence are from fhaseolus vuZguris (Ul) and Viscia j’aba (U2). R and Y in the consensus structure designates purine and pyrimidine residues, respectively. In C. elegans, the Ul dinucleotide reads UU, and base-pairing to U2 is predicted to shift 1 nucleotide 5’ to the following GA dinucleotide. The phylogenetically variable 2nd part of the interaction is indicated by filled bars in the Ul and U2 snRNA connected by broken lines. The position of the Ul Sm-site and U2 BPS are indicated by bars.

dinucleotide interaction based on the limited amount of phylogenetic data available at present is highly speculative and, in fact, the interspecies variation in the Ul dinucleotide is consistent with any of several alternative interactions. However, the dinucleotide interaction is part of a larger interaction that can occur between domains further 3’ and 5’ in Ul and U2, respectively (see Fig. 6). The second interaction exhibits considerable phylogenetic variation with respect to the precise location of the interacting domains, in the structure of the helix and in the number of base-pairs involved. However, this is also the case with the experimentally supported US/US-I interaction, (Hausner et al., 1990; Datta & Weiner, 1991; Wu & Manley, 1991). Precise deletion of the X. Zaevis U2 domain (positions 14 to 24) involved in the second half of the Ul/UB-I interaction, abolishes Ul/U2 snRNP co-immunoprecipitation (Mattaj et al., 1986) and slightly reduces the U2 snRNP’s ability to complement splicing in Xenopus oocytes (Hamm et al., 1989). In conclusion, we imagine a bipartite UljU2 interaction consisting of a two base-pair interaction, the position of which is conserved, and a more loosely defined interaction, which may function to stabilize the dinucleotide interaction. Combining the three snRNA-snRNA interactions

proposed above and the known snRNA-pre-mRNA interactions, we arrive at the model shown in Figure 7. The model is a static model focusing on interactions that could promote the first step of splicing, i.e. the cleavage at the 5’ splice-site and the concomitant formation of a 2’-5’ phosphodister bond at the intron branch-site. The model does not incorporate any interactions at the 3’ splice-site and consequently does not shed light on the problem of bringing the 5’ and 3’ exons together for exon ligation during the second step of splicing. Furthermore, U4jUS snRNA interactions are excluded from the model because it is unclear to what extent these interactions are present during the first transesterification step (Lamond et al., 1988; Blencowe et al., 1989). In our model, the 5’ splice-site/U1 interaction is brought into close proximity to the intron branch-site/U2 interaction by the Ul/UZ-I interaction. The most conserved part of U6, which has been speculated to act as catalyst in the splicing reaction (Brow & Guthrie, 1989), is positioned in close contact to the 5’ splicesite junction as well as to the branch point site by virtue of the U2/U6-II interaction. The resulting structure is further stabilized by the U2/US-I “longWe note that the topology of range” interaction. the overall structure depicted in Figure 7 resembles

H. 0rum et al.

230

m@Nfi

5’

.

. .

I

I..

.

.

(5’

U,“,,).

.

.

. .

.

. .

.

. I..

.

. .

I36-

16 U 66666

I ICUCCEUUC6C6CUUUCCCCU6

- 3’

Figure 7. Graphical representation of the combined U2/U6-I, U2/US-II and UI/IJ2-I interactions. The sequences 01 Ul, U2 and U6 are those of rat (Reddy, 1988) because the nature and location of modified bases are known in this organism (modified nucleotides are indicated with a subscript. according to Reddy (1988) and pseudouridine residues are indicated by the letter ‘I’). In respect of these modified bases (in particular pseudouridine residues), we note that a significant proportion are located within the known, or proposed snRNA-snRNA or snRNA-pre-mRNA interaction domains. The pre-mRNA substrate is indicated as broken lines (5’ and 3’-exons) and a continuous line (intron). The mammalian consensus sequences for the 5’ splice site, branch-site sequence and 3’ splice site sequence (Zieve & Sauterer, 1999) are used in the presentation. The interaction between the intron BPS and the U2 BPS is presented as a bulged helix structure as proposed for the corresponding interaction in S. cerevisiae (Parker et al., 1987). The proposed U2/U6-I and II and Ul/U2-I interactions are boxed. The arrow indicates the 2 nucleotides that are covalently joined by a 2’-5’ phosphodiester bond in the 1st trans-esterification step. The part of U6 snRNA that base-pairs to IJ4 snRNA (Brow & Guthrie, 1988) is underlined by a broken line.

that of a hammerhead ribozyme (Forster & Symons, 1987). We have not provided any experimental support for the model. However, we hope that our model can serve as a framework for further experimental and phylogenetic studies. We are grateful to Dr Nina Pedersen for help and discussions throughout this work, to Dr Astrid K. N. Iversen for supplying snRNA components during the initial phase of this work and to Dr Otto Dahl for providing oligonucleotides. The work was supported by the Danish Natural Science and Medical Science Research Councils. This project was part of a program under the Research Center for Medical Biotechnology, University of Copenhagen. The nucleotide sequences reported will appear in the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession numbers:

X58841 TTU5.1: X58842 TTU2.1; X58843 TTU6.1: X58844 TTU4.1; X58845 TTU1.l; X58846 TTU2.2.

References Altura, R.. Rymond, B., Seraphin? H. B Rosbash. ,M. (1989). Sequence requirements for branch formation in a group II self-splicing intron. Nucl. Acids lieu. 17. 335-353.

Ares, M.. Jr & Igel, A. H. (1990). Lethal and temperaturesensitive mutations and their suppressors identify an essential structural element in U2 small nuclear RNA. Genes Develop. 12A, 2132-2145. Barabino, S. M. L., Blencowe. B. J., Ryder, IT., Sproat, B. S. 8r. Lamond, A. I. (1990). Targeted snRNP depletion reveals an additional role for mammalian I71 snRNP in spliceosome assembly. Cell, 63, 293302.

Spliceosomal

snRNAs

and Base-pairing

Bindereif, A., Wolff, T. & Green, M. R. (1990). Discrete domains of human U6 snRNA required for the assembly of U4/U6 snRNP and splicing complexes. EMBO J. 9, 251-255. Blencowe; B. J., Sproat, B. S., Ryder, U., Barabino, S. & Lamond, A. I. (1989). Antisense probing of the human U4/U6 snRNP with biotinylated 2’-OMe RNA oligonucleotides. Cell, 59, 531-539. Brow, D. A. & Guthrie, C. (1988). Spliceosomal RNA U6 is remarkably conserved from yeast to mammals. Nature (London), 334, 213-218. Brow, D. A. & Guthrie, C. (1989). Splicing a spliceosomal RNA. Nature (London), 337, 14-15. Bruzik, J. P. & Steitz, J. A. (1990). Spliced leader RNA sequences can substitute for the essential 5’ end of Ul RNA during splicing in a mammalian in vitro system. Cell, 62, 889-899. Cech, T. R. (1986). The generality of self-splicing RNA: relationship to nuclear mRNA splicing. Cell, 44, 207210. Cimino, G. D., Gamper. H. B., Isaacs, S. T. & Hearst, ,J. E. (1985). Psoralens as photoactive probes of and function: organic acid structure nucleic chemistry, photochemistry and biochemistry. Annu. Rev. Biochem. 54, 1151-1193. Company. M., Arenas, J. & Abelson, J. (1990). Requirement of the RNA helicase-like protein PRP22 for release of messenger RNA from spliceo349, 487-493. somes. Nature (London), Csank, C.. Taylor, F. M. & Martindale, D. W. (1990). Nuclear pre-mRNA introns: analysis and comparison of intron sequences from Tetrahymena thermophila and other eukaryotes. Nucl. Acids Res. 18, 51335141. Dahlberg, J. E. & Lund, E. (1988). In Structure and and Minor Small Nuclear Function of Major Ribonucleoprotein Particles (Birnstiel, M. L., ed.), pp. 38-70. Springer-Verlag, Berlin. Datta, B. & Weiner, A. M. (1991). Genetic evidence for base pairing between U2 and U6 snRNA in mammalian snRNA splicing. Nature (London), 352, 821-824. Doudna. *J. A. &, Szostak, J. W. (1989). RNA-catalysed synthesis of complementary strand RNA. Nature (London), 339, 519-522. England, T. E., Gumport, R. I. & Uhlenbeck, 0. C. (1977). Dinucleoside pyrophosphates are substrates for T4-induced RNA ligase. Proc. Nat. Acad. Sci., C.S.il. 74, 4839-4842. Fabrizio, P. & Abelson, J. (1990). Two domains of yeast U6 small nuclear RNA required for both steps of nuclear precursor messenger RNA splicing. Science, 250, 404-409. Fabrizio, P., McPheeters, D. S. & Abelson, J. (1989). In vitro assembly of yeast U6 snRNP: a functional assay. Genes Develop. 3, 2137-2150. Forster. A. C. & Symons, R. H. (1987). Self-cleavage of plus and minus RNAs of a virusoid and a structural model for the active sites. Cell, 49, 211-220. Guthrie. C. & Patterson, B. (1988). Spliceosomal snRNAs. Annu. Rev. Genet. 22, 387-419. Hamm? J. van Santen, V. L., Spritz, R. A. & Mattaj, T. W. (1988). Loop 1 of Ul small nuclear RNA is the only essential RNA sequence for binding of specific Cl small nuclear ribonucleoprotein particle proteins. Mol. Cell. Biol. 8, 4787-4791. Hamm, ,J., Dathan. N. A. & Mattaj, I. W. (1989). Functional analysis of mutant Xenopus U2 snRNAs. Cell, 59. 159-169.

Interactions

231

Hanley, B. A. t Schuler, M. A. (1991). DNA cloning of Ul , U2, U4 and U5 snRNA families expressed in pea nuclei. Nucl. Acids Res. 19, 1861-1869. Hartshorne, T. & Agabian, N. (1990). A new U2 RNA secondary structure provided by phylogenetic analysis of trypanosomatid U2 RNAs. Genes Develop. 4, 2121-2131. Hausner, T. P., Giglio, L. M. & Weiner, A. M. (1990). Evidence for base-pairing between mammalian U2 and U6 small nuclear ribonucleoprotein particles. Genes Develop. 4, 2146-2156. Horst van der G., Christian, A. & Inoue, T. (1991). Reconstitution of a group I intron self-splicing reaction with an activator RNA. Proc. Nat. Acad. Sci., U.S.A. 88, 184-188. Jacob, M. & Gallinaro, H. (1989). The 5’ splice site: phylogenetic evolution and variable geometry of association with Ul RNA. Nucl. Acids Res. 17, 21592178. Jacquier, A. (1990). Self-splicing group II and nuclear pre-mRNA introns: how similar are they? Trends Biochem. Sci. 15, 351-354. Krol, A. & Ebel, J.-P. (1983). Ul, U2 and U5 small RNAs are found in plants cells. Complete nucleotide sequence of the U5 RNA family from pea nuclei. Nucl. Acids Res. 11, 8583-8594. Lamond, A. I., Konarska, M. M., Grabowski, P. J. & Sharp, P. A. (1988). Spliceosome assembly involves the binding and release of U4 small nuclear ribonucleoprotein. Proc. Nat. Acad. Sci., U.S.A. 85, 411415. Liao, X., Kretzner, L., Seraphin, B. bz Rosbash, M. ( 1990). Universally conserved and yeast-specific U 1 snRNA sequences are important but not essential for Ul snRNP function. Genes Develop. 4, 1766-1774. Liihrmann, R. (1988). In Structure and Function of Major and Minor Small Nuclear Ribonucleoprotein Particles (Birnstiel, M. L., ed.), pp. 71-79. Springer-Verlag, Berlin. Madhani, H. D., BordonnB, R. & Guthrie. C. (1990). Multiple roles for U6 snRNA in the splicing pathway. Genes Develop.

4, 2264-2277.

Mattaj, I. W. (1988). UsnRNP assembly and transport. In Structure and Function of Major and Minor Small Nuclear Ribrmucleoprotein Particles (Birnstiel, M. L., ed.), pp. 100-l 14, Springer-Verlag, Berlin. Mattaj, I. W. & de Robertis, E. M. (1985). Nuclear segregation of U2 snRNA requires binding of specific snRNP proteins. Cell, 40, 11 l-1 18. Mattaj, I. W., Habets, W. J. & van Venrooij, W. J. (1986). Monospecific antibodies reveal details of U2 snRNP structure and interaction between Ul and U2 snRNPs. EMBO J. 5, 997-1002. McPheeters, D. S., Fabrizio, P. & Abelson, ,J. (1989). In vitro reconstitution of functional yeast U2 snRNPs. Genes Develop. 3, 2124-2136. Myslinski, E., Wilhelm, F.-X. & Branlant, C. (1989). A structural analysis of P. polycephdum Ul RNA at the RNA and gene levels. Are there differentially expressed Ul RNA genes in P. polycephalum. Nucl. Acids Res. 17, 1019-1034. Parker, R., Siliciano, P. G. & Guthrie, C. (1987). Recognition of the TACTAAC box during mRNA splicing in yeast involves base pairing to the US-like snRNA. Cell, 49, 229-239. Patton, J. R., Habets, W., van Venrooij, W. J. & Pederson, T. (1989). Ul small nuclear ribonucleoprotein particle-specific proteins interact with the first and second stem-loops of Ul RNA, wit,h the A pro-

232

H.

0rum et al

tein binding directly to the RNA independently of the 70 K and Sm proteins. Mol. Cell. Riol. 9, 33603368. Peattie, D. A. (1979). Direct chemical method for sequencing RNA. Proc. Nat. Acad. Sci., U.S.A. 76. 1760-1764. Pedersen, N., Hellung-Larsen, P. C Engberg, J. (1985). Small nuclear RNAs in the ciliate Tetrabymena. Nucl. Acids Res. 13, 4203-4223.

Reddy, R. (1988). Compilation

of small RNA sequences.

Nucl.

Acids Res. 16, 171.-185. Reddy, R. & Busch, H. (1988). In Structure and Function of Major and Minor Small Nuclear Ribonucleoprotein Particles (Birnstiel, M. L., ed.), pp. 1-37. Springer-

Verlag, Berlin. Reddy, R. & Gupta, 6. (1990). Compilation of small RNA sequences. Nucl. Acids Res. 18, 2231-2235. Ryder, U., Sproat, B. S. & Lamond, A. I. (1990). Sequence-specific affinity selection of mammalian splicing complexes. Nucl. Acids Res. 18. 7373-7379. Rymond, B. C. & Rosbash, M. (1988). A chemical modification/interference study of yeast pre-mRNA spliceosome assembly and splicing. Genes Develop. 2, 428439.

Scherly, D., Boelens, W., van Venrooij, W. ,J., Dathan, N. A.. Hamm, J. & Mattaj, I. W. (1989). Identification of the RNA binding segment of human Ul A protein and definition of its binding site on IT1 snRNA. EMBO J. 8, 4163-4170. Scherly, D., Boelens. W., Dathan, N. A., van Venrooij, W. J. & Mattaj, I. W. (1990). Major determinants of the specificity of interaction between small nuclear ribonucleoproteins UlA and U2B and their cognate RNAs. Nature (London), 345, 502-506. Edited

Schwer, B. & Guthrie. (1. (1991). RPPl6 is an RNA-dependent ATPase that interacts transiently with the spliceosome. Nature (London), 349. 494499.

S&aphin, B.. Kretzner, L. & Rosbash, M. (1988). A Ul snRNA pre-mRNA base-pairing interaction is required early in yeast spliceosome assembly but does not uniquely define the 5’ cleavage site. EMBO .I. 7. 2533-2538. Siliciano, P. G. & Guthrie, (,‘. (1988). 5’ splice site selection in yeast: genetic alterations in base-pairing with ('1 reveal additional requirements. Genes 1)eaelop. 2. 1258-1267. Vankan. P.. McGuigan. ( ‘. & Mattaj, I. W. (1990). Domains of U5 and U6 for snRNP assembly and in Xenopus oocytes. splicing complementation EMBO

,J. 9, 3397-3404.

Woolford, J. L., Jr (1989). Nuclear pre-mRNA splicing in yeast. Yeast. 5. 439-457. Wu. J. & Manley. ?J. I,. (1989). Mammalian pre-mRNA branch site selection by c’2 snRNP involves base pairing. Genes Develop. 3, 1553-1561. Wu,

Spliceosomal small nuclear RNAs of Tetrahymena thermophila and some possible snRNA-snRNA base-pairing interactions.

We have identified and characterized the full set of spliceosomal small nuclear RNAs (snRNAs; U1, U2, U4, U5 and U6) from the ciliated protozoan Tetra...
4MB Sizes 0 Downloads 0 Views