This article was downloaded by: [Rutgers University] On: 08 April 2015, At: 15:24 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Biomolecular Structure and Dynamics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tbsd20

Structural Analysis of a Group II Intron by Chemical Modifications and Minimal Energy Calculations a

a

Jan H.J.M. Kwakman , Danielle A.M. Konings , a

a

Paulien Hogeweg , Herman J. Pel & Leslie A. Grivell

a

a

Section for Molecular Biology Department of Molecular Cell Biology , University of Amsterdam , Kruislaan 318, 1098 SM , Amsterdam , The Netherlands Published online: 21 May 2012.

To cite this article: Jan H.J.M. Kwakman , Danielle A.M. Konings , Paulien Hogeweg , Herman J. Pel & Leslie A. Grivell (1990) Structural Analysis of a Group II Intron by Chemical Modifications and Minimal Energy Calculations, Journal of Biomolecular Structure and Dynamics, 8:2, 413-430, DOI: 10.1080/07391102.1990.10507813 To link to this article: http://dx.doi.org/10.1080/07391102.1990.10507813

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages,

and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

Downloaded by [Rutgers University] at 15:24 08 April 2015

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Journal of Biomolecular Structure & Dynamics, ISSN 0739-1102 Volume 8, Issue Number 2 (1990), "'Adenine Press (1990).

Structural Analysis of a Group II Intron by Chemical Modifications and Minimal Energy Calculations

Downloaded by [Rutgers University] at 15:24 08 April 2015

Jan H.J.M. Kwakman, Danielle A.M. Konings *, Paulien Hogeweg t, Herman J. Pel, and Leslie A. Grivell Section for Molecular Biology Department of Molecular Cell Biology University of Amsterdam Kruislaan 318, 1098 SM Amsterdam, The Netherlands Abstract Folding of the yeast mitochondrial group II intron al5c has been analysed by chemical modification of the in vitro synthesised RNA with dimethylsulfate and diethylpyrocarbonate. Computer calculations of the intron secondary structure through minimization of free energy were also performed in order to study thermodynamic properties of the intron and to relate these to data obtained from chemical modification. Comparison of the two sets of data with the current phylogenetic model structure of the intron al5 reveals close agreement, thus lending strong support for the existence of a typical group II intron core structure comprising six neighbouring stem-loop domains. Local discrepancies between the experimental data and the model structures have been analyzed by reference to thermodynamic properties of the structure. This shows that use of the latest refined set of free energy values improves the structure calculation significantly.

Introduction The genomes of mitochondria and chloroplasts of numerous eukaryotes contain split genes, whose introns can be divided into two separate classes (groups I and II) according to the presence of conserved short sequence stretches and higher order structure elements (1-3). Members of the same class thus share distinctive core conformations, both of which bring the intron borders into relatively close spatial proximity. This structural division in two groups is supported by the mechanism by which some introns of the two classes carry out self-splicing in vitro (4-7). To understand the splicing mechanism and in particular the role of RNA catalysis in this process, more detailed information about the structure of these introns is essential. Extensive biochemical and mutational studies have been performed for group I introns (8). These introns share many phylogenetically conserved structural elements, thus forming a good framework for such studies, which furthermore have been facilitated *Section for Biocomputing, EMBL, Meyerhofstrasse 1, D6900 Heidelberg, West-Germany. Present adress: Laboratory of Mathematical Biology, NCI/FCRF, Frederick, MD 21701, USA tBioinformatics Group, University of Utrecht, Padualaan 8, Utrecht, The Netherlands.

413

414

Kwakman eta/.

Downloaded by [Rutgers University] at 15:24 08 April 2015

by the relatively small size of the group I intron present in the large rRNA gene of Tetrahymena. This 414 nt RNA readily lends itself to secondary structural analysis via application of thermodynamic rules and free energy minimization (2, 9, 10). In these theoretical studies biochemical data were used as a reference to evaluate alternative structures calculated. In contrast to the many conserved structure elements of group I introns, existing models of group II introns reveal only two phylogenetically well conserved secondary structure elements (domain V and Vl, see below) that are located in the 3' part of the intron ( 1). Nevertheless, a common skeleton core structure comprised of six separate domains of variable lengths (domains I-Vl) has been proposed for the group II introns, primarily on the basis of a pairwise comparison (introns all/al2: (1); introns ai5/bll: (11)). For the best studied group II introns (all, ai2, ai5 and bll) some of the peripheral structures of domain I and III have also been reconstructed by phylogenetic comparison between most closely related introns (ai5 versus bll and ai2 versus all). For group II, the introns all, ai5 and bll have been shown to possess self-splicing activity in vitro. In addition, the introns all and ai2 contain long open reading frames, whose encoded proteins may play a role in their in vivo splicing. Little structure-function information for the group II introns is available because of the limited number of in vivo mutants known within this group. In vitro mutagenesis, on the other hand, can only sensibly be performed if more information is available about the RNA structure in solution. Computer calculations of the secondary structures of the group II introns, for example by minimal energy folding, are relatively difficult because the introns are rather long and show an over-representation ofA and U residues (e.g. for all, ai2, ai5 and bll: A+ U > 75%) which raises the number of alternative interactions. This paper describes the structural analysis of almost 90% of the self-splicing group II intron ai5 by chemical modification with dimethylsulfate (DMS) and diethylpyrocarbonate (DEPC)(8), and of the entire intron by computer calculations through minimization of free energy. We find that our biochemical modification data closely match the phylogeneticallyproposed secondary structure model deduced for the largest part of the intron ai5, as well as the minimal energy foldings of most parts of the intron. The analysis by minimal energy folding allows us to explain those modification patterns that display local deviations from the phylogenetic model structure, and may pinpoint features of RNA folding rules that are still imperfectly understood. The analysis also shows that secondary structure calculation can be improved significantly by application of the most recent set of free energy values (12). Materials and Methods Template Chemical modifications were carried out on a precursor RNA transcribed from the mutant pTI+ T856, that contains an additional uri dine residue inserted at position 856 of the intron. This residue lies opposite to and basepairs with the branchpoint adenosine in domain Vl. Splicing of this precursor RNA occurs only at high ionic strength (13). The mutant template was constructed by cloning the EcoRI-Hindiii

Structural Analysis of a Group 1/lntron

415

fragment of pS/ai5+T856 (13) in the vector pEP40XEcoRIXHindiii, which is a derivative of pGEM4 containing the TI and SP6 promoters (Promega). The orientation was such that the coding strand was transcribed from the T7 RNA polymerase promoter.

Downloaded by [Rutgers University] at 15:24 08 April 2015

Transcription Transcription reactions were performed in a volume of20 111 containing lJ..lg EcoRI linearized DNA, 5 mM DTT, 2 mM spermidine, 5 mM MgC1 2, 40 mM Tris pH 8.0, 0.5 mM GTP, CTP, ATP, 10 11Ci a 32 P-UTP (5000 Ci/mmol), 1 u RNAsin and 1 uTI RNA polymerase. Incubations were at 37 oc for 30 min .. The transcription mix was phenol-extracted, purified over a 1 ml Sephadex-G50 column and alcohol-pre cipitated.

DMS Modification Precursor RNA (2.5 pmole) was purified from polyacrylamide and modified essentially as described by Peattie and Gilbert (14). The RNA precipitate was dissolved in 380 111 50 mM sodium cacodylate pH 7.5 with 12 11g yeast tRNA as carrier. 90 111 was kept apart as an unmodified control. 1111 DMS (BDH) was added to the rest and one third was modified for 1 minute at 90°C with the addition of 1 mM EDTA (denatured RNA) and 30 minutes at 37°C with the addition of 5 mM MgC12 or 100 mM MgC12 (native RNA). The reaction was stopped by alcohol-precipitation. Modified nucleotides were detected by primer extension analysis followed by separation on a 6% polyacrylamide gel containing 7 M urea.

DEPC Modification Precursor RNA was purified from polyacrylamide and modified as described (14, 15) in the presence of l2J..1g carrier yeast tRNA The RNA was dissolved in 250 Jll50 mM sodium cacodylate pH 7.5. 50Jll was kept apart as untreated RNA 2111 of fresh DEPC (Sigma) was added to the rest and divided in four parts. Modification was performed for 180 seconds at 90° C in the presence of 1 mM EDTA (denatured) or.for 30 minutes at 37 oC in the presence of 5 mM, 50 mM or 100 mM MgC1 2 (native). The reactions were stopped by alcohol-precipitation. The modified RNA molecules were cleaved by aniline-induced chain scission as described (15) prior to primer extension followed by separation on a 6 % polyacrylamide gel containing 7 M urea.

Primer Extension Al 2P-ATP labelled oligonucleotide was hybridized with the RNA by coprecipitation. The dried pellet was dissolved in 5 11150 mM Tris-Cl pH 8.3, 100 mM KCl, 10 mM MgC1 2, 10 mM DTT, 0.5 mM dNTP's and 1 u of AMV reverse transcriptase. Incubations were at 4rc for 45 minutes. The reactions were stopped by addition of 3!11 sample mix (0.03% bromphenol blue, 0.03% xylene cyanol, 25 mM EDTA in formamide). The following oligonucleotides were used as primers in reverse transcriptions:

416

Kwakman eta/.

(5'GAAAATGTCCCACCACGTAGTAAG3', pos. 1-24 exon AS), (5'CACTTGTAATTAGAC3', pos. 73-87 intron al5), (5'ATTCACTTTCGTTGAGCAT3', pos 198-216 intron a15), (5'GCTCATTTCCAATAC3', pos 324-338 intron al5), (5'CTTTTACCCGGGTTTATTC3', pos. 413-423 and 580-589 of intron al5), (5'CAATCTTTTACCCTCTAA3', pos. 57 5-593 intron a IS), (5' GCAGATTTTCATC3', pos. 647-659 intron al5), (5' ATAAACCTCCTATCT3', pos. 737-751 intron al5), (5'CAAGTTTTCCCTCCGGTAAG3',pos.848-866intronal5)and(5'GTAGTATCCTGC3',pos. 39-50 exon A6).

Downloaded by [Rutgers University] at 15:24 08 April 2015

Minimal energy foldings The secondary structure analysis of intron RNA was performed using the minimal energy folding algorithm recently described by Zuker (32). Sub-optimal foldings generated by the programme were also studied to check for relevant alternative configurations, e.g. those matching with the modification data in case the optimal folding did not. We have previously stressed the importance of knowledge of various possible alternative structures (17). Folding of an RNA by minimizing free energy could in principle be constrained with experimental data on the structure. Experimental data such as those obtained by the modification experiments reported here contain structural information in terms of pairing or non-pairing ofisolated residues, but do not contain sufficient information to structure the RNA in two dimensions by defining a specific set of secondary structure interactions. However, in this study we did not use this feature of constrained folding in a strict sense (see section 5). The analysis was performed using the improved set offree energy values given by Jaeger et al (12).

Results and Discussion (1) Modification of Intron-Containing RNA by DMS and DEPC Chemical modifications were carried out on an intron-containing RNA derived from the mutant pTI+ T856. Splicing of this mutant RNA occurs only at high ionic strength (13), as the result of the insertion of a uridine residue at position 856 of the intron, a residue which lies opposite to and basepairs with the branchpoint adenosine in domain VI. The mutant is thus a ready source of precursor RNA that is stable under conditions normally used for splicing. Precursor RNAs were purified from in vitro transcription mixes by polyacrylamide gel electrophoresis and modified with either DMS or DEPC as described in the Methods section. In the case of DEPC, modified RNA molecules were cleaved by aniline-induced chain scission as described by Conway and Wickens (15). Modified nucleotides were then detected by primer extension analysis and protection against chemical modification was scored by comparing the intensities of the stops obtained with denatured and native RNA (see legend Figure 1). Results obtained are shown in Figures 1 and 2, for a set of primers from which almost 90% of the intron could be screened. Overviews of the results of DMS and DEPC modification data are presented in Figures 3 and 4, respectively, superimposed on the available model structure of al5 (see section 3 below).

417

Downloaded by [Rutgers University] at 15:24 08 April 2015

Structural Analysis of a Group 1/lntron

Figure 1: Dimethylsulfate modification of intron al5 RNA Dimethylsulfate modification of al5 were performed as described in Methods. Primer extension stops of reverse transcriptase caused by methylation at Nl of adenosine or N3 of cytosine residues are shown in lanes I (denatured), lanes 3 (native, 5 mM MgC12) and lanes 4 (native, 100 mM MgC12). Natural stops of reverse transcriptase are shown in lanes 2 (unmodified control). The branchpointA880 is indicated by an asterisk*. The uri dine insertion between nucleotides G855 and G856, which was present in the RNA structure analysed, is depicted as+ U856 and has not been included in the numbering of the intron. Bands present in lanes I originate from stops of reverse transcriptase at DMS methylated A's and C' sand at the aspecific stops shown in lanes 2. Subtraction of lanes 2 from !leaves combined A and C-tracks from which the intron sequence readily can be derived. The nature of the RNA structure in which the nucleotides participate was scored by comparing lanes I with lanes 3 and 4, the total amount of radioactivity in each lane being taken into consideration. Nucleotides which act as natural stops for reverse transcriptase were not scored for modification behaviour. A decrease ofintensity of a factor 5 and more in the presence of magnesium was scored as presence in a stable double-stranded structure. A lower decrease in activity was scored in terms of a breathing double-stranded structure. Equal intensity was scored as a single-stranded structure and an increase of intensity was scored as a bulge, or small internal orterminalloopsAbbreviations: I-VI and A-D (helices of al5 (Figure 3)), A5 (5'exon), A6 (3'exon), IBSI!IBS2 (intron binding sites 1/2), EBSI!EBS2 (exon binding sites 1/2).

Downloaded by [Rutgers University] at 15:24 08 April 2015

418

Kwakman eta/.

Figure 2: Diethylpyrocarbonate modification of intron al5 RNA Diethylpyrocarbonate modifications were performed as described in Methods. Primer extension stops reverse transcriptase caused by carbethoxylation at N7 of adenosine and guanosine residues, followed by aniline-induced chain scission are shown in lanes 1 (denaturated), lanes 3 (native, 5 mM MgC12), lanes 4 (native, 50 mM MgC12) and lanes 5 (native, 100 mM MgC1 2). Natural stops of reverse transcriptase are shown in lanes 2 (unmodified control). For abbreviations and interpretation of the gels see legend to Figure 1, although it should be noted that subtraction oflanes 2 from lanes I leaves combined A and G-tracks. The uridine insertion between G855 and G856 is depicted as +U856 and has not been included in the numbering of the intron.bp: branchpoint.

Probing of RNA structure in solution by DMS modification followed by reverse transcription directly screens for A and C residues involved in Watson-Crick base pairing (27). In addition, reverse transcriptase sometimes stops at G residues, which are preferentially methylated by DMS at N7 (18). Because carbethoxylation of guanosines at N7 by DEPC without aniline-induced strand scission does not stop AMV reverse transcriptase in our experiments, it is still unclear whether these DMS-induced G-stops are caused by specific strand breakage or by some other methylation like 2mG, which has been shown to stop reverse transcriptase in E. coli 16S rRNA(l9). The modification ofguanosine N7 by DMS and DEPC and adenosine N7 by DEPC is directly affected only by base-stacking. Thus, this modification of A or G residues does not exclude hydrogen-bonding of these nucleotides (27). Since N7 modification is not understood in terms of structural features (8, 14, 20), we have focussed primarily on the modifications of adenosine and cytosine by DMS for the interpretation of intron structure. One of the difficulties of the chemical probing method is the tendency of reverse

Structural Analysis of a Group 1/lntron

419

Downloaded by [Rutgers University] at 15:24 08 April 2015

transcriptase to stop or pause at sites of strong structural interactions and, less specifically, at uridine residues in the template. The strong inherent stops of reverse transcriptase are indicated in Figure 3 (see below). Most of the natural stops in fact occur at the dinucleotide CA, in agreement with the previously observed preference of reverse transcriptase to stop at this and some other dinucleotides (i.e. preference CA > CU > UA; 21, 22). These stops are probably due to hydrolysis of the labile pyrimidine-A bonds (33) but we cannot entirely rule out that these dinucleotidespecific stops also occur partly because of higher order structural features, e.g. in the case of the stops in the exon binding site 2 (EBS2). Nevertheless, there are stops which seem unambiguously specific for secondary structure, like those located at the 3' flank of domain II, IV and V.

(2) Magnesium Effects and Natural Stops Modification in the presence of increased magnesium concentrations did not affect the accessibility of the nucleotides to DMS, but had consequences for modification by D EPC. Enhanced D EPC modification at increasing magnesium concentrations was detected in the internal and terminal loops of helix lie, in the terminal loop sequence of helix D2a (A252) and at the branchpoint in wild type RNA (not shown). The reverse effect of magnesium, namely reduction of reactivity with DEPC, is a more common feature, which is seen both in the perfect hairpins such as D2b and IVb and in the conserved terminal loop sequence of domain V. These results are consistent with crystallographic studies of tRNA, which have shown that magnesium ions tend to position at N7 of purine bases present in a stable helix (23). The results also confirm the capacity of small looped-out sequences to bind divalent metal ions.

(3) Phylogenetic Secondary Structure Model The phylogenetic secondary structure model of intron ai5 is shown in Figure 3. Domains I, III, V and VI of the model have been established by the phylogenetic method. Initially, Michel eta/. ( 1) pointed out one sequence which is highly conserved in the introns a I 1, ai2, a IS and bi 1 and which can form two helical structures located in the 80-100 nucleotides immediately preceding the 3' exon border. Additional significant sequence similarity of these four introns is only present in pairwise combinations; all versus ai2 and ai5 versus bll. Therefore, to establish a phylogenetically based structural model for ai5, its sequence could only be compared with that ofbi 1. Schmelzer et al. (11) have proposed a common core for the introns ai5 and bll, which, like the configuration proposed earlier for introns all and ai2, consists of six separate domains. The distinct sequence blocks identified are placed in corresponding positions in the core structure. New sequences of introns belonging to group II determined since then have added weight to this overall group II core configuration (reviewed by Michel et al. (24)). Jacquier and Michel (25) reconstructed a common structural model for the peripheral parts of domain I and III ofintrons ai5 and bll by first identifying conserved sequence blocks in these regions. On the basis of phylogenetic and genetic evidence, these authors further proposed three conserved tertiary pairing interactions (IBSl-EBS 1, IBS2-EBS2 and a -a') between corresponding positions of the two introns. Their final model, not including peripheral

Kwakman eta/.

420

Downloaded by [Rutgers University] at 15:24 08 April 2015

DMS MODIFICATIONS

Figure 3a: Secondary structure model for intron al5: dimethylsulfate modifications. The modification data obtained with DMS (see Figure 1) are summarized by superimposition on the secondary structure model of intron al5. The structural and modification data of domains II and IV are shown in Figure 4. The regions that could be screened with the available set of oligonucleotides are included between open triangles(>> strong hit,> normal hit,> moderate hit. Protected A's and C's are not marked. Strong stops of reverse transcriptase on unmodified RNA are shown as asterisks (*). The rare stops of reverse transcriptase at methylated G's are not shown, apart from the clear stop at G885, which was scored as single-strandard in the presence of magnesium. The structures of domain I, III, V and V1 were established by phylogentic comparsion with intron bll (25). The nomenclature of the helices is that previously presented by Jacquier and Michel (25) and is specified further by each type of single-stranded interruption in helical branches (i.e. bulges and internal loops). This specification follows a numbering system (1,2, etc.), which starts at the tops (loops) of the sub-structures.

Structural Analysis of a Group 1/lntron

421

Downloaded by [Rutgers University] at 15:24 08 April 2015

DEPC MODIFICATIONS

Figure 3b: Diethylprocarbonate modification. The modification data obtained with DEPC (see Figure 2) are summarized by superimposition on the secondary structure model of intron al5. The regions that could be screened with the available set of oligonucleotides are included between open triangles (> > strong hit, > normal hit, > moderate hit. For minimal energy calculations of the model see Methods and legend Figure 3a.

structures for domains II and IV, is that shown in Figure 3. The sequences of the remaining domains, II and IV, are too variable to establish a common structural model on a phylogentic basis (see below, section 5).

422

Kwakman et a/.

11-e

Downloaded by [Rutgers University] at 15:24 08 April 2015

11-s

domoh II oiS

IY-a

domain II bll domain IV a IS

Figure 4: Secondary structure model of domains II and IV. The sub-optimal foldings of domain II (II -s, 4a) and IV (IV-s, 4c) and the differences with the optimal foldings (II-o and IV-o) are presented (see section 5). DMS modification data were extracted from Figure 1 and are depicted as described in the legend of Figure 3a. The caculated free energies of domains II and IV are described in Table I. The minimal energy folding of domain II of the bll intron (4b) is shown for comparsion (caculated free energy: -18.6 Kcal).

Structural Analysis of a Group 1/lntron

423

(4) Comparsion of Modification Data and the Model

Downloaded by [Rutgers University] at 15:24 08 April 2015

5' exon/Domain I

The DMS modifications of domain I agree well, but not perfectly, with the phylogenetic model (see section 3, Figure 3). Except for those nucleotides present in the internal loop (All-Gl4, A409-A411), stem I is a stable helix completely protected against DMS modification. Interestingly, at least the 5' side of this stem showed no protection against DEPC modification (the 3' side has not been scanned by DEPC). This discrepancy between DMS and DEPC modification was also observed in some other helices of the intron, like stems I-C13 and I-D33, which may relate to weak stacking which allows DEPC modification (14). The accessibility of the A-residues in the lower part of stem A (A2) to DMS might indicate that they are involved in weak base-pairing and can breathe (see section 5 below). For the following stem (B) structural data are lacking. The data on helices Cl and C2 fit very well with the complex pattern of internal bulges and loops in the model, except that stem C2-l seems to be rather loose, being more modified than expected. The DMS hit of the A and C residue in stem C (closing the stems Cl and C2) cannot easily be explained, since this helix does not appear to be weak. The last major part of domain I, which comprises stem D, D', D" and D"' represents a somewhat unstable (breathing) doublestranded region, which is moderately DMS sensitive. Of the top three helical structures (D2a, D2b and D3), helix D2b is protected from modification by both DMS and DEPC. This indicates a relatively high stability for this long non-interrupted helix, for which a high degree ofhydrophobic stacking of the nucleotides is predicted (see section 5 below). The proposed tertiary interaction element a-a' (25) did not show any protection against modification under these conditions. The established base-pairing intron- and exon binding sites EBSl-IBSl and EBS2-IBS2 (see Figure 3), on the other hand, could not be analysed because these elements proved to be insuperable barriers for reverse transcriptase, creating strong stops even on unmodified RNA (see Figures 1, 3 and section 1). Domain III

In contrastto the top helix III -1 which is not accessible to DMS, helices III-2 and III3 become modified. This suggests a high degree ofbreathing in these substructures. The internal loop A598-A599, of which A599 gives the strongest signal, is also modified by DMS. The exposure of the A599 base ring in solution could be of importance for the group II splicing reaction, since deletion of the equivalent base in intron bll is known as mutation Ml301, which results in splicing deficiency in vivo and an inability to form intron lariats in vitro (26). A similar, less pronounced, enhancement ofDMS modification was also detected with nucleotide A587, which has been noted to be complementary to the 3' terminal intron nucleotide (the y-y' pair; 25) and was shown to be involved in the second transesterification reaction. This modification could be due to the fact that this interaction exists in a conformation of the intron only after 5' hydrolysis has occurred; we may be unable to detect it because the mutant intron we study is already blocked at the stage of the first transesterification.

424

Kwakman eta/.

Domain V The DMS modifications match the phylogenetic model structure of this most conserved domain of group II introns perfectly. The nucleotides show an almost identical protection pattern against DEPC modification, which suggests a high stability including stacking of the helix or shielding by other( neighbouring) helices. The fact that nucleotides A832 and G829 of the four-membered hairpin loop are partially protected against DEPC modification may be explained by continued stacking on the helix which is supported by the high stability ofGAAA tetraloops (12).

Downloaded by [Rutgers University] at 15:24 08 April 2015

Domain VI Once again, the DMS modifications in this domain correspond exactly to the model, except for A859 and A862 which, although predicted to be in an internal loop, are protected. The branchpoint A880 was not hit because a mutant precursor RNA, containing a uridine inserted opposite the branchpoint has been used (13). Apart from the branchpoint, which is stacked into the helix of stem VI, the secondary structure of this mutant RNA is unlikely to differ significantly from that of the wild type, while its low reactivity under normal splicing conditions allows modification to occur in the absence of splicing. In order to verity whether any major differences in modification pattern could be detected with wild type aiS, this RNA was also screened with DEPC in the region of the branchpoint (data not shown). We found that all double-stranded regions of the wild type domain VI are protected against DEPC modification and that the modifications of RNA in this domain showed an overall identity to the mutant structure, with the exception that the branchpoint is now hit by DEPC. Interestingly, the adenosine residues A859-A862, present in the 5' part of the internal loop, are only moderately hit by DEPC. Together with the DMS protection of A859 and A862, this finding suggests that the nucleotides of this internal loop are continuously stacked on the helix. 3'exon The region of the 3' exon tested is partially protected against DMS modification, probably due to some base-pairing with non-intronic sequences of the precursor RNA This putative structure will not be discussed. Schmelzer and Schweyen (26) have shown for intron bll that self-splicing is still possible even when only two nucleotides of the 3' exon are present, so that this additional structure is unlikely to provide an essential contribution to self-splicing activity. (5) ModellingofDomains II andWby Minimizing Free Energy: a Comparison with Modification Data The usefulness of minimal energy calculations has clearly been improved by the use of newly determined refined free energy values (28, see Methods). This improvement had previously been demonstrated by several studies ( 17, 30 and Turner, D.H., personal communication), which showed that calculated structures of small nuclear U-RNAs, for example, become more similar or identical to known phylogenetic models of the respective molecules. Potential structures for domains II and IV were previo1,1sly described by Kwakman et al. (31) purely on the basis of unconstrained

Downloaded by [Rutgers University] at 15:24 08 April 2015

Structural Analysis of a Group 1/lntron

425

minimal energy folding of the sequences. In contrast to our previous studies, we now make use of the folding programme recently described by Zuker. This programme can incorporate the full set of the most recently adopted free energy values (12), allowing calculation of the contributions of dangling ends, terminal mismatches, extraordinary stabilities of certain tetra-loops and adjusted destabilities of all loop types. Moreover, the programme has the ability to generate alternative sub-optimal foldings (see Methods). For the secondary structure modelling of the two domains we compare significantly different sub- optimal and optimal secondary structures with our modification data. We chose not to constrain the folding with these data, first because some modification is not as strict as the folding of the RNA may suggest. For example, base-pairs that close a hairpin loop, frequently become modified due to their apparent accessibility (see Figure 3). Second, modification data were not complete for all domains (e.g. 1-B and IV), and such partial, unbalanced constraints may misdirect structure calculation by minimizing free-energy. In the following we discuss the comparison of alternative sub-optimal foldings and the following selection of possible model structures. Domain II

Figure 4a shows a sub-optimal structure (11-s) for domain II and the difference with the optimal structure (11-o) (i.e. that with the lowest free energy). The sub-optimal structure is the most stable structure giving the best agreement with the modification data.Apart from the structural difference presented (c 1 ofll -s versus II -o; difference is 0.7 Kcal), the optimal structure does not include the two base-pairs closing helix 11-b (difference is 1.4 Kcal). This helix (IIb-3) includes two A residues (A536 and A571) which are not modified. The obvious breathing of top bl will be discussed in the following section (6). The structures presented show that domain II is comprised of a conserved helix of 10 basepairs, interrupted by two bulging nucleotides. The overall structure contains four helical elements (IIa-d) arranged in a cloverleaf pattern. Note that the structure presented is nearly the same as that previously suggested by us (31, see above). The presence of the top helices Ilb-d of this domain, however, is not essential for group II self-splicing, as has been shown by deletion analysis (31 ). Accordingly, at least the main and peripheral part does not contribute to the essential catalytic core of the intron. It can be noted that for domain II of the related intron bl 1 a structure is generated by minimal energy folding including two top helices (see Figure 4b). One of these helices seems to correspond to the helical structure lie of the intron al5 since in both introns these helices are closed by two CG pairs, which are rather rare in the introns. Domain IV

Figure 4c shows the sub-optimal (IV-s) structure for domain IV which agrees the best with the modification data; the difference in IV-d with the optimal structure (IV-o) is also presented for comparison (difference is 0.8 Kcal). The modification data which, however, do not match the model concern residues A801-A804. These A-residues are clearly protected against modification, but are predicted to be singlestranded. A possible explanation for this discrepancy is that the residues are involved

Downloaded by [Rutgers University] at 15:24 08 April 2015

426

Kwakman et a/.

in a secondary structure interaction with nucleotides shown in hairpin IV-cl. This would, however, result in a much higher overall free energy content (7 Kcal). Since domain IV could not completely be probed, not enough information is available to resolve this point. The domain IV structure model previously presented by us (31) is, although having A801-A804 base-paired, less preferable since it fails to match the modification data obtained for the 5'part of the domain (i.e. the many distributed A's and the C684). The closing stem ofthe domain which was earlier proposed by phylogenetic comparison (25) corresponds to stem IV-a shown in Figure 4c. Comparison with domain IV of the intron bll is not possible, since for this related intron the domain is composed of a single stable stem of 11 base-pairs with a threemembered hairpin loop (25). At least the spacing of domain IV with its flanking domains is comparable for the ai5 and bll intron. We conclude that both for domain II and IV the modification data are closely matched by minimizing free energy. Those structural ambiguities which are shown occur only locally and relate to only small free energy differences.

(6) Evaluation of the Phylogenetic Model by Free Energy Calculations The observed correspondence of minimal energy foldings with modification data and/or with phylogenetic model structures of other RNAs (see above) led us to study also the phylogenetically structured domains of the ai5 intron (domains I, III, V and VI) by minimal energy folding in order to understand both the folding of the intron in terms of thermodynamic rules and to rationalize those modification patterns that deviate from the model structure (e.g. the "breathing" of relatively unstable structures). In addition, we hoped to learn more about RNA folding rules. Our analysis reveals that the minimal energy structures are basically identical to the phylogenetic model structure. A comparison with the minimal energy structures as generated by the old Zuker programme (16), which can only partly incorporate the improved free energies (28), reveals that application of the complete latest available set of free energy values ( 12) improves secondary structure calculation significantly. Now folding of the entire ai5 RNA intron shows the different domains separately, i.e. they do not interact via long-range base-pairings, as predicted by the older programme. The same holds for all helical branches of the different domains as suggested by phylogenetic analysis. In general, this implies that local hairpin structures are now generated more frequently due to the decreased destabilizing effect of hairpins with small top loops. This fact relates to the observation made with the old Zuker programme, that if the destabilizing effect of multi-stem loops is decreased (i.e. from 4 to 2 Kcal), the phylogenetic model is closely generated by minimizing free energy. Obviously, this decrease compensates for the too high destabilizing contribution of small hairpin loops as decribed by Freier et al. (28). In contrast, using the new programme the minimal energy folding starts deviating from the phylogenetic structures with decreasing destabilizing effect of multi-stem loops. So a tuning of the relative values of destabilizing contributions of hairpin and multistem loops seems to be necessary to obtain the phylogenetic consensus pattern. On the other hand, the ai5 foldings generated with the new programme (32, see above) are insensitive to a change in the extraordinary stability of some defined tetra-loops

Structural Analysis of a Group 1/lntron

427

Downloaded by [Rutgers University] at 15:24 08 April 2015

(e.g. from -2 to 0 Kcal, see ref. 12). It should be noted that the extent of improvement of folding by using the new programme as reported here for the al5 RNA does not apply to all types of RNAs in the sense that some RNAs, like U snRNAs (17), matched available phylogenetic structures already closely by using the incomplete set of improved free energy values (28). The relatively large change and subsequent improvement on the al5 intron structure may be partly due to its high frequency of A and U residues (>75%). The over-representation of these two (complementary) bases increases the number of possible and reasonable interaction patterns. Indeed, many alternative optimal and sub-optimal foldings exist for this RNA, all of which include several, but not all identical, long non-interrupted helices. Table I shows the free energies of the constituent non-interrupted helices of the six separate domains. The energies represent those that build up from the hairpin loops and that follow the helices until the closing stem of the domain (see Figures 3 and 4). Where the phylogenetic structures (Figure 3) differ from the unconstrained free energy this is indicated (asterisks) and the respective energy differences are presented. These deviations between the new minimal energy foldings and the phylogenetic structures have only local consequences and show rather negligible energy differences (Table 1). Most of these deviations can be explained in terms of modification patterns, like the modification of closing stem I C. Thermodynamic structure calculation shows that in this part of the RNA two alternative interactions may exist which are closely related in space and do not involve changes in the other phylogenetically proposed helices. One of these interactions shows the residues A-187 and C-188 indeed unpaired, and therefore the two structures are most probably in equilibrium with each other. The same phenomenon of two alternating structures applies to a class of substructures which seem to "breathe", i.e. short lived structures of which the nucleotides switch between an unbound and base-paired stage. This breathing may explain the single-strandedness of I-C21 and the extra base-pairing in the loop region between 1-D"-D"' and in the multi-stem loop of domain III as revealed in the minimal energy folding. Furthermore, breathing may explain why the set of most unstable top structures (see Figures 3 and 4 and Table I) are not reflected by our modification data (I-C21, 11-cl, 11-bl, 111-2, III-3, IV-c): their free energies are all around zero. The" existence" of these unstable sub-structures still seems likely since their closing stems fit with the modification pattern. This observation relates to the fact that the non-interrupted helical structures protected against OMS as well as DEPC modifications, in particular I-D2b, V-1/-2 and VI-2/-3, are indeed the most stable ones among those for which both type of modification data are obtained. On the other hand, based on the available free energy calculations, there is no reason to predict the modification of the lower stem I-A2. The fact that the stem is not directly stabilized by an adjacent helix and that, furthermore, it ends in a rather complex multi-stem loop, with many constraints in space, may make it particularly unstable. The largest energy difference between the minimal energy folding and the phylogenetic model (2.7 Kcal) coincides with the location of the EBSl and a sites which are proposed to form tertiary interactions. The optimal folding shows a different local base-pairing scheme not agreeing with the modification data. It is however not known if and how the proposed tertiary interactions constrain the

Downloaded by [Rutgers University] at 15:24 08 April 2015

428

Kwakman et a/.

Table I Cumulative free energies (Kcal) of the helical structures of the six different domains of the al5 intron. The nomenclature of the helics corresponds to those shown in Figures 3 and 4. The values represented are the free energies (Kcal) building up cumulatively from the tops of the substructures following the structure and its branching pattern as shown in Figures 3 and 4 for each domain. This means that stabilizing contributions of successive helical fragments and destabilizing contributions of interrupting single- stranded regions between these fragments are included in these free energy values. The free energy content for each domain does not include the contribution of dangling ends. The separation of the energy values in different columns follows the structure of the foldings, i.e. to separate helics which close hairpin loops (hairpin tops). For domain VI the energy values presented refer to the wild type structure, including the bulging branchpoint. Locations where the minimal energy foldings deviate from the model structures presented (Figures 3 and 4) are indicated(*). The deviations are given in Kcal in the lower table together with the nature of the structural difference (breathing or alternative basepairing). The structures and their free energies are calculated using the algorithm of Zuker (32), with the free energy values for secondary structure formation given by Jaeger et al. (12).

Model Structures helix I AI A2 B Cll Cl2 Cl3 C2l* C22 C23 C24 C* D2a D2b D31* D32 D33 D"' * D" D' D l 2 11-s d cl*

hairpin top

cumulative

-1.5 - 3.5 -0.4 -0.8 4.9 7.5 -0.3

c2 c3 bl b2 b3* al a2

hairpin top

cumulative - 4.0 5.9

-0.8 1.2 + 1.0 8.2 -15.6

III

+1.1 - 0.8 - 3.4 -12.1 -3.4 -9.9 -1.2 + 3.6 - 3.2 -15.1 -16.6 -18.6 -23.6 -45.5 -47.1 -4.6 -0.8

helix

l 2 3 * 4 5 IV-s dl* d2 d3 c b a

-4.6 -0.8 -0.5 5.7 8.2 -2.2 + 0.9 8.5 +0.2 -6.5 -15.6

v 1 2 VI 1 2 3 *

*Breathing and alternative base pairing structures:

-6.2 -15.9

-1.7 2.2 5.0

Structural Analysis of a Group 1/lntron

429

Table I

continued Deviations: helix

breathing

I C21

Downloaded by [Rutgers University] at 15:24 08 April 2015

c

D31 D'"-D" II b3 cl III 3-4 IV dl

alternative

0.2 0.6 2.7 0.9 1.4 0.7 0.7 0.8

VI 3

0.2

secondary structure (in particular while we have no evidence that these tertiary interactions are formed under the conditions analysed here).It is interesting that for the al5 intron it is the optimal folding which picked up a pattern that largely coincides with the phylogenetically conserved structure and which agrees with the modification data. Moreover, the alternative sub-optimal foldings generated which differ from the conserved pattern agree considerably less with the available modification pattern and/or do not show a corresponding structure for the bll intron in those regions which are sufficiently conserved to enable phylogenetic comparison.

Acknowledgements We thank Dr. L. Conway and Dr. M. Wickens for making his protocol for DEPC modification and aniline-induced strand scission available to us prior to publication. The help ofR. Vermeij and H. v.d. Meyden in photographic work is gratefully acknowledged. This work was supported in part by grants from the Netherlands Foundation for Chemical Research (SON) with financial aid from the Netherlands Organization for the Advancement of Research (NWO). References and Footnotes 1. Michel, F., Jacquier, A and Dujon, B., Biochimie 64, 867 (1982). 2. Cech, T.R., Tanner, N.K, Tinoco, 1., Jr., Weir, B.R., Zuker, M. and Perlman, P.S.,Proc. Nat/. Acad. Sci. USA 80, 3903 (1983). 3. Waring, R.B. and Davies, R.W., Gene 28,277 (1984). 4. Van der Veen, R., Amberg, AC., van der Horst, G., Bonen, L., Tabak, H.F., Grivell, LA, Cell 44, 225 (1986). 5. Peebles, C.L., Perlman, P.S., Mecklenburg, K.L., Petrillo, M.L., Tabor, J.H., Jarrell, K.A, Cheng, HL., Ce/144, 213 (1986). 6. Cech, T.R. and Bass, B.L.,Ann. Rev. Biochem. 55,599 (1986). 7. Tabak, H.F., Horst, G. van der, Winter, AJ., Smit, J., Veen, R. van der, Kwakman, J.H.J.M., Grivell,

Downloaded by [Rutgers University] at 15:24 08 April 2015

430 8. 9. 10. II. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 30. 31. 32. 33.

Kwakman eta/.

LA and Amberg, AC., Cold Spring Harbor Syrnp. on Quant. Bioi. UI, 213 (1987). Inoue, T. and Cech, T.R., Proc. Nat/. Acad. Sci. USA 82,648 (1985). Martinez, H.M., Nucleic Acids Res. 12, 32 (1984). Williams, AL. and Tinoco, I., Nucleic Acids Res. 14, 299 (1986). Schmelzer, C., Schmidt, C., May, K., Schweyen, R.J.,. EMBO J. 2, 2047 (1983) Jaeger, J.A, Turner, D.H. and Zuker, M., Proc. Nat/. Acad. Sci. USA 86, 7706 (1989). VanderVeen, R. Kwakman, J.H.J.M. and Grivell, LA, EMBO J. 6, 3827 (1987). Peattie, D.A and Gilbert, W.,. Proc. Nat/. Acad. Sci. USA 77,4679 (1980). Conway, L. and Wickens, M., EMBO J. 6, 4177 (1987). Zuker, M. and Stiegler, P., Nucleic Acids Res. 9, 133 (1981). Konings, DAM. and Hogeweg, P.,J. Mol. Bioi. 207,597 (1989). Lawley, P.D. and Brookes, P., Biochern. J. 89, 127 (1963). Youvan, D.C. and Hearst, J.E., Proc. Natl. Acad. Sci. USA 76, 3751 (1979). Swerdlow, H. and Guthrie, C.,J. Bioi. Chern. 259,5197 (1984). Shelness, G.S. and Williams, D.L.,J. Bioi. Chern. 260,8637 (1985). Liarakos, C.D., Maddox, R.P., Hilscher, K.A, Bischop, J.R., McGuire, D.K. and Kopper, R.A, Nucleic Acids Res. 16, 11249 (1988). Robertus, J.D., Ladner, J.E., Finch, J.T., Rhodes, D., Brown, R.S., Clark. B.F.C. and Klug, A, Nature 250, 546 (1974). Michel, F., Umesono, K. and Ozeki, H., Gene 82, 5 (1989). Jacquier, A and Michel, F., Cell 50, 17 (1987). Schmelzer, C. and Schweyen, R.J., Cell 46, 557 (1986). Ehresmann, C., Baudin, F., Mougel, M., Romby, P., Ebel, J-P. and Ehresmann, B., Nucleic Acids Res.,.15, 9109 (1987). Freier, S.M., Kierzek. R., Jaeger, J.A, Sugimoto, N., Caruthers, M.H., Neilson, T. and Turner, D.H., Proc. Nat/. Acad. Sci. USA 83,9373 (1986). Hughes, J.M.X, Konings, DAM. and Cesarini, G., EMBOJ. 6, 2145 (1987). Kwakman, J.HJ.M., Konings, DAM., Pel, H.J. and Grivell, L.A, Nucleic Acids Res. 17, 4205 (1989). Zuker, M., Science 244,48 (1989). Brownlee, G.G., In: Determination of sequences in RNA (eds. T.S. Work and E. Work). American Elsevier Publishing Co., Inc., New York., p. 107 (1972).

Date Received: March 2, 1990

Communicated by the Editor C. W. Hilbers

Structural analysis of a group II intron by chemical modifications and minimal energy calculations.

Folding of the yeast mitochondrial group II intron aI5c has been analysed by chemical modification of the in vitro synthesised RNA with dimethylsulfat...
2MB Sizes 0 Downloads 0 Views