Vol.8, no.4. 1992 Pages 389-399

CABIOS

Base-pair probability profiles of RNA secondary structures Michael Schmitz and Gerhard Steger Abstract

Introduction The secondary structure of a single-stranded RNA molecule is important for the biological function of the RNA. Well-known examples are ribosomal RNAs (Rau6 et al., 1988), stability of mRNA (Chen et al, 1991), self-splicing RNAs like the intervening sequence from rRNA of Tetrahymena (Been et al., 1987) or the satellite of tobaccoringspotvirus (Bruening, 1990), and viroid RNA, e.g. potato spindle tuber viroid (Henco et al., 1979; Riesner and Steger, 1990). In most of these examples not only is the most stable secondary structure important for function but also suboptimal structures are of relevance. This is not surprising because many suboptimal structures are present in solution with concentrations nearly as high as the optimal structure, according to the appropriate equilibrium constants and free energies. The dynamic programming method (Bellman, 1958; Bellman lnstitulfiirPhysikalischcBiologic, Geb. 26.12.U1, Htinrich-Hcine Univtnitai, UniveniiOlssirasse 1, D 4000 DUsseldorf 1, Germany

© Oxford University Presi

389

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

Dynamic programming algorithms are able to predict optimal and suboptimal secondary structures of RNA. These suboptimal or alternative secondary structures are important for the biological function of RNA. The distribution of secondary structures present in solution is governed by the thermodynamic equilibrium between the different structures. An algorithm is presented which approximates the total partition function by a Boltzmann-weighted summation of optimal and suboptimal secondary structures at several temperatures. A clear representation of the equilibrium distribution of secondary structures is derived from a two-dimensional bonding matrix with basepairing probability as the third dimension. The temperature dependence of the equilibrium distribution gives the denaturation behavior of the nucleic acid, which may be compared to experimental optical denaturation curves after correction for the hypochromicities of the different base-pairs. Similarly, temperature-induced mobility changes detected in temperaturegradient gel electrophoresis of nucleic acids may be interpreted on the basis of the temperature dependence of the equilibrium distribution. Results are illustrated for natural circular and synthetic linear potato spindle tuber viroid RNA respectively, and are compared to experimental data.

and Kalaba, 1960) is used in several programs to predict secondary structures of RNA. Of these, the Zuker-Stiegler program (Zuker and Stiegler, 1981; Zuker, 1989) is widely used and is based on an algorithm of Nussinov et al. (1978) and Waterman and Smith (1978). This program reads the RNA sequence file, computes the base-pairing energy matrix, and then predicts the structure of minimal free energy AG° by backtracking of the matrix. At least three different programs, based on the same or similar algorithm, allow the prediction of not only the structure of lowest free energy, but also the next-optimal structures (Steger et al., 1984; Williams and Tinoco, 1986; Jaeger et al., 1990). This is done by searching for the next-lowest energies in the base-pairing energy matrix and appropriate backtracking through the matrix to predict the suboptimal structures. At least two programs (Steger et al., 1984; Jaeger et al., 1990) allow the calculation to be done not only for linear sequences, but also for circular sequences. In many cases, suboptimal secondary structures are quite similar to the optimal one, differing only by minor shifts of helices. However, there are also structures that are quite different from the optimal one. To search by hand through dozens of structures in order to get impressions of the possibilities of an RNA structure for rearrangements is quite laborious. In this paper we show an answer to this problem, i.e. representation of the optimal and suboptimal secondary structures in a dot-plot matrix with base-pairing probability as the third dimension. A second problem is that of correlation between calculated secondary structures and experimental results in order to test the predictions or, conversely, to interpret the experimental results. A convenient experimental method is the optical detection of thermal denaturation, which allows one to determine the enthalpy (AH°) from the steepness of the transition curve and the entropy (A5°) via the midpoint temperature (T^ of the transition curve. An optical transition curve may be calculated from optimal secondary structures (Steger et ah, 1984). Firstly, each hypochromicity value of the calculated curve has to be derived from the secondary structure optimal at that temperature point, which needs a lot of computer time. Secondly, the accuracy of a transition curve calculated only from optimal secondary structures is limited: two structures are equally concentrated at the Tm according to definition of the Tm, whereas in this model only one structure is used. This leads to a discontinuous behavior of the calculated transition

M.Schmltz and G.Steger

curve. In order to solve the described problem, in this paper we propose a new algorithm that allows one to approximate the equilibrium distribution of secondary structures of a singlestranded RNA. The temperature dependence of the distribution allows, after correction for the hypochromicities of the different base-pairs (Coutts, 1971), predictions of optical denaturation curves. Results of this algorithm and the representation of the concentration distribution of secondary structures will be shown for two viroid-related RNA sequences and compared with experimental results.

The programs described in this paper are written using VAX FORTRAN v. 4.0. The graphics are implemented via VAX GKS v. 3.0, and run on a VAXstation II with operating system MicroVMS v. 4.7 (DEC Digital Equipment Corporation, Maynard, MA). The main program runs interactively, producing output on the monochrome video monitor VR 260 (DEC), or on an IBM-compatible personal computer with polySTAR/240 v. 1.10 (Polygon Inc., Saint Louis, MO), or on an Atari ST computer with UniTerm v. 2.0e009 (Simon Poole, Wettingen, Switzerland) for terminal emulation, VT200 or Tektronix 4014 respectively. Screen output may be copied to a Hewlett-Packard 7550A graphics plotter. Synthesis and analysis of the transcripts with PSTVd sequence and of the mutants are described in detail elsewhere (Loss et at., 1991).

Calculation of the ensemble properties Optical denaturation profiles, base-pairing probability profiles and thermodynamic information are calculated at a given temperature, by Boltzmann-weighting and averaging the individual structure properties over the total structure ensemble obtained by energy-minimizing calculations and selection criteria, as described above. For each structure /, its free energy is calculated from enthalpy and entropy values taken from the data set:

&G°(J) = AH° - T-AS? and its Boltzmann statistical weight, p,{T), is calculated from the free energy:

Algorithm The algorithm described below generates a temperaturedependent equilibrium distribution of secondary structures of a single-stranded RNA. Temperature-dependent properties of the RNA, like hypochromicify, specific heat or base-pairing probabilities, are derived from this distribution. This allows comparison of the theoretical results with experimental data from optical melting curves, or to give an impression of the equilibrium distribution of structures of the RNA at chosen temperatures. As described below, all calculations are performed on the basis of a structure ensemble consisting of optimal and suboptimal secondary structures calculated at different temperatures. Generation of the 'structure ensemble' For a discrete set of temperatures, secondary structures of a RNA, N nucleotides long, are calculated using the energyminimizing method of Zuker and Stiegler (1981), modified by Steger et al. (1984). Each structure, i, is represented by a basepairing list or structure vector, S,, with:

( 390

n if nucleotides m and n are paired 0

el*

p,(T) = exp[-AG°(T)/R-T] Average properties of the ensemble are then calculated using pHX) as a weighting factor, describing the contribution of structure i to the whole equilibrium distribution. When calculating the average value < V> of a quantity v,, the sum of all weighting factors yields an approximated partition function of the ensemble:

Given a set of relative hypochromicities, A/4x:y, for basepair disruption (Coutts, 1971), the total hypochromicity, A/ft, for disrupting all existing base-pairs of structure i can be calculated as the sum of the base-pair-specific hypochromicities multiplied with the number of each type of base-pairs involved in the structure:

The hypochromicity of a structure ensemble at a given temperature, , is calculated by averaging all structure hypochromicities and correcting for the lowest

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

System and methods

Each set of optimal and suboptimal structures has to be large enough in order to represent the equilibrium distribution at the temperature under consideration. The temperature distances of these sets have to be small enough to ensure extrapolation from one set of structures to the next, i.e. the sets have to be overlapping in terms of the participating structures. These requirements are explained in detail in the Discussion. From these sets of structures, a unique structure ensemble is created by purging all multiple occurrences of each structure, to avoid consideration of degeneracy. Once the structure set is purged from doublets, the enthalpies and the structure vectors of all remaining structures are saved as the basic data set for further calculations.

Base-pair probability profiles of RNA secondary structures

calculated temperature, where the hypochromicity is supposed to be zero. =

a two-dimensional plot in combination with the base-pairing probability as the third dimension (see Figure ID). For such a structure representation, a matrix of base-pairing probabilities, T, is calculated from the structure vectors:

The offset, A ^ , may be calculated as the average hypochromicity at the lowest temperature:

if nucleotides m and n are base-paired in structure i else

n, dm - i, /v> >e. the dot-plot matrix of structure i n -

Representation of the equilibrium distribution of secondary structures Individual secondary structures may easily be visualized by programs like SQUIGGLES (Osterburg and Sommer, 1981; Devereux et ai, 1984), but this is no longer possible for structure ensembles as described above. An alternative is a dot-plot representation (see Figure 1C; Tinoco et ai, 1971), which is very obscure and not instructive, because it shows all possible base-pairs. A clear representation is derived from such

1, N

T(7) = Thus, T forms a Boltzmann-weighted superposition of dot-plotlike base-pairing matrices of all structures. The visualization of such a temperature-dependent, average base-pairing probability representation exhibits all structural features of the involved structures together with information about the respective contributions of the structures. This probability matrix, which is symmetric to the diagonal, can be represented in a three-dimensional plot [see Figure ID, sequence versus sequence versus probability p(7)]. Peaks B

1 i A U A U A C C G G G G 'A A I I I I I I I II A UAUAU ,. CCCC.

A

A

a-c o-c a-c

20 i

,-G-C - 30 A A

*CCCC,

, 40

II

A

III

I

Cc

c-o C-G

c-a c-a A A

IV

Fig. 1. Representation of secondary structures of a circular single-stranded RNA, 48 nucleotides long, with different methods. Roman numbers I - V represent helices identical in each representation. Optimal (A) and first suboptimal (B) secondary structure at 35"C. ( O Dot-plot representation (sequence 5' — 3', i, versus sequence 3' — 5', j). Each dot marks a possible base-pair, thick dots in the upper left half of the plot show the helical elements of the secondary structures shown in (A) and (B). The plot is symmetrical to the diagonal. (D) Three-dimensional base-pairing plot (sequence 5' — 3', i, versus sequence 3' — 5',j. versus base-pairing probability, p) at 35°C. The structure ensemble contained 27 different structures calculated at 35°C. Only such peaks, i.e. helices, and surrounding lines are displayed for which the probability of the respective helix is above 0.01.

391

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

Other experimental quantities, e.g. the specific heat, < C p ( 7 ) > , may be calculated accordingly.

M.Schmitz and G.Steger

Implementation The programs necessary to calculate and to represent secondary structures of a linear (or circular) RNA sequence, as well as to calculate the denaturation and the distribution profile, are connected via a user-friendly command procedure. The sequence files have to be in a special format (a program for conversion to and from most other formats is available): the first line of the sequence file must consist of the sequence length as a number of four digits and may contain a comment. The subsequent lines define the base sequence. Each line consists of up to 70 characters of A, U, G, C, T, Y or X. The latter three symbols cover special cases: T is not able to basepair with G, and Y is not able to base-pair at all. Three contiguous X mark the open end of the sequence; a sequence without X is treated as a circular molecule. A bimolecular complex is defined as a sequence with two contiguous X stretches at respective sites, but the calculated AC values are still valid for a first order reaction and have to be corrected, correspondingly. During generation of the structure ensemble (see Algorithm) structures calculated at different temperatures are sampled in a file. Structures occurring more than once are purged to avoid taking into account degenerated energy levels. To recognize identical structures, a rough estimate is done based on the structure enthalpy and entropy. Since enthalpy is simply calculated from the type and number of stacking base-pairs found in the structure, enthalpy values are always exact, to the precision of the used set of energy parameters. Entropy is then calculated from the free energy resulting from the energyminimizing algorithm at the known temperature and the enthalpy computed from stacking: 392

AS,° = (A//,° - &G?)IT Therefore, the program only requires a list of stacking enthalpies, which should allow the ready implementation of different energy rules. Due to round-off errors, the free energy is less exact than the enthalpy, and perfect matching entropy values for identical structures calculated at different temperatures cannot be expected. For entropy values matching within a range of 10%, the structures themselves are compared by calculating the difference of their base-pairing lists: SS = |S/ - Sj\ = 0 for identical structures which is too time consuming to be done for the whole structure ensemble. For the practical reason of numerical stability, the statistical weights, p/(7), are calculated relative to the optimal structure at each temperature, so that for this structurep iopt = 1, and for all other structures p, S 1. Thus, the partition function, seen as a function of temperature, represents a measure of the number of structures significantly participating in the ensemble, and its maxima represent the temperatures where several structures are contributing to the ensemble with high similar statistical weights, e.g. when other structures become optimal with varying temperature.

Example The following two applications of the described algorithm are taken from the analysis of natural potato spindle tuber viroid (PSTVd) and of transcripts with PSTVd sequence containing single-site mutations. Viroids are single-stranded, circular RNAs with a size of — 300 nucleotides and are able to infect certain plants. Because of their lack of any protein-coding capability, one has to assume that viroid replication and pathogenesis depend completely on the enzyme systems of the host. Thus, all of their biological functions have to be determined by their secondary structure, their ability to undergo structural rearrangements, and their capability to interact with host cell factors. In Figure 2(A), the secondary structure of lowest free energy, calculated at 25°C, is shown for circular PSTVd. This structure is in accordance with all known experimental data: enzymatic digestion patterns (Gross et al., 1978), hydrodynamic properties, number of helices from dye binding (Riesner et al., 1978) and type of loops from oligonucletoide binding studies (Wild etal., 1980). From experimental optical denaturation curves and analysis of the transitions by fast and slow temperature jump techniques (Henco et al, 1979), it was concluded that the native rod-shaped secondary structure of PSTVd denatures in a single, highly cooperative main transition into a branched secondary structure mainly containing the three stable hairpins, designated I, II and

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

appearing in the graph show regions that have a high probability of base-pairing, and gaps between these peaks show the position of loop regions. Laterally adjacent peaks, representing basepairing schemes excluding each other according to sterical selection rules, indicate that different conformations are almost equally possible within that region, e.g. peaks II and III are incompatible with peaks IV and V in Figure 1(D). Regions reaching the maximum peak level, however, show that there is no alternative base-pairing scheme for this region occurring in the ensemble, e.g. peak I in Figure 1(D). Temperature-dependent rearrangement of structural elements can be elucidated by plotting several adjacent temperature calculations sequentially (see example in Figure 4); moving peaks indicate helical regions being either formed or disrupted as temperature changes. Structural rearrangements as well as partial denaturation of structural elements can easily be distinguished from constant parts of the structure. The influence of mutations on the secondary structure of a sequence may easily be detected by comparison of the probability plot of the wildtype sequence with that of the mutated sequence at a single temperature (see example in Figure 5).

Base-pair probability profiles of RNA secondary structures

A

T l A A C W^ AA CU

11

1 1 1

t UftAOVC C

1 1 1 1

4 4 UU

AC A C C U

v£^££

1 1 1 1 1 1 11

QA 4 C A A " " AA 4 A

11 1 1

»;.

r A A I I I

I

I I I I

I I

1111

1111

£ UC 4 ft ^B A

1 1

* v y y c j u

1 1 1 1

11111 1 1 1 1

I I

l i l t

1i 1 r 1 1 t

i*n

I I I I I

1TI

11

... _nv_

in

A

I I I I I

11 1 1 1

i.



».

tf

A AA Afl4 C 4 A

1 1 1 1 1 1

I I I I I I I

I I I

I I

t i l l

I I

I I I I I

IT

14*

I I I I

I I I I

I I I I I

lit

B 49 G

C

U

ACACCU CUCC GAGCAG I I I I I I I I I I I I I I I I CGGQQCUUCGUU I 1 312 G, 331

87 -

102

236

82 ) C C - * 0L,1B - C-G - 319 C-G C-G C-G G-C

28 I

U G U 133 - G-C G-C G-C U-G G-C G-C

C-G U-A

%:% -

I

A

127 - C-G - 168 A

«-

227 -• C-G - 328

III Fig. 2 . Secondary structure and structural elements of PSTVd RNA. (A) Secondary structure of lowest free energy at 25°C. Nuclcotide numbering is according to Gross et al. (1978). Roman numbers I, II and in, and nucleotide stretches in bold face both depict self-complementary regions, which are not base-paired in the native structure. (B) Enlarged cut of the secondary structure of lowest free energy at 25°C showing the position of the mutations C 3 | g — G and Gj2 6 — A respectively. (C) Hairpins I, II and ID formed by the self-complementary regions shown in (A). The number of loop nudeotides is given. In addition, for hairpin n the position of the mutations C 3 I 8 — G and G 32 6 — A are shown. These hairpins are part of the secondary structure of lowest free energy at 80°C.

in (see Figure 2A and C), which are not part of the native structure. These hairpins denature into the open circular structure without any base-pairs at still higher temperatures. This denaturation mechanism will be verified below for linear transcripts with PSTVd sequence. Figure 3(A) shows an experimental optical denaturation curve of PSTVd, which may be compared with the denaturation curve calculated by the described algorithm (Figure 3B). The good agreement of Figures 3(A) and (B) clearly demonstrates that our algorithm is able to predict the experimental behavior with a fair degree of accuracy. In PSTVd, eight single-site mutations were generated in the segments which form hairpin n , in order to verify the denaturation mechanism described above and to get information about a possible biological role of hairpin II (Loss et al., 1991). The denaturation behavior of the transcript with wildtype sequence and of the transcripts with mutations C3|g — G

and G326 — A, respectively, will be discussed below. In Figure 4, the three-dimensional probability plots and secondary structures of lowest free energy of the transcript with wild-type PSTVd sequence are shown at several temperatures. The native secondary structure at 20°C is, with the exception of the .linearization site, identical to that of the circular PSTVd (cf. Figure 2A). The probability plot shows that most of the molecules have a rod-shaped secondary structure. There is only a minor contribution to the equilibrium ensemble from molecules with a branched secondary structure. These branched structures, however, become dominant at 45°C. In contrast to the best secondary structure at 45 °C it may be seen from the probability plot that one helix of the rod-shaped part (see arrow in plot) is denatured to - 4 0 % . This helix is known as the 'pre-melting region' and is thought to be involved in symptom expression (Schnolzer et al., 1985). At 79°C, the midpoint temperature of the main transition, the secondary structure of

393

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

30

M.Schmitz and G.Steger

B

ao.

ea.

70.

73.

BO.

as.

»o.

»s.

i oo.

T/°C Fig. 3 . Differentiated optical denaturation curve of circular PSTVd RNA at 260 nm. Numbers refer to the highly cooperative main transition (1), the transition in which mainly hairpins I and IE denature (2) and the high temperature transitions in which mainly hairpin n denatures (3). (A) Experimental curve reproduced from (Steger et aL, 1986). The solution contained 0.2 A^ RNA/ml in 4 M urea, 500 mM NaCI, 0.1 mM EDTA, 1 mM Na-cacodylate and had a pH of 6.8. The curve was extrapolated to 1 M ionic strength by a shift of I5°C (Steger et aL, 1986). The sample contained - 10% of linear PSTVd, which were co-purified during preparation of the circular molecules. These linear molecules are responsible for the shoulder in the main transition below 75°C, due to their lower denaturation temperature. ( B - E ) Calculated with the described algorithm. The open state without any base pairs was taken into account as an internal loop. The vertical scale for (E) is different from those of ( B ) - ( D ) . The basic structure ensemble contained (B) 166 different structures from 30 optimal and suboptimal secondary structures at each of seven temperatures (25, 50, 65. 70, 75, 80 and 9 0 ° O , or ( O the 11 secondary structures optimal in the temperature range from 25 to 90°C, or (D) the six secondary structures optimal at 25, 50, 65, 70, 75, 80 and 90°C, or (E) the 166 different structures of (B), but averaged at a temperature of 10 K.

394

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

T/°C

lowest free energy already contains hairpins II and III, i.e. the native structure is completely disrupted, whereas the probability plot shows the coexistence of the structure from 45 °C and several branched structures containing hairpins I, II and III. Hairpins I and III begin to denature directly after the main transition and at a temperature of 85 °C hairpin II is the dominant structural element. The mutation C318 — G changes an internal loop, formed by a single base-pair mismatch in the native secondary structure, into a G:U base-pair (see Figure 2B), whereas it elongates the stem of hairpin II by one base-pair (see Figure 2C). The mutation G326 — A exchanges a G:U base-pair in the native secondary structure with an A:U base-pair (see Figure 2B), whereas it disrupts a base-pair in the stem of hairpin II (see Figure 2C). In Figure 5, the effect of these mutations may be seen easily by comparing the probability plots of the wild-type transcript and the transcirpts containing the mutations at 79 and 85°C. The mutation G326 — A does not alter the stability of the native secondary structure, but it destabilizes completely hairpin II. Therefore, the transition equilibrium between the native structure and the structure containing the hairpins 1, II and III is shifted in favor of the native structure, i.e. the temperature of the main transition is shifted upwards. The mutation C318 — G stabilizes the native secondary structure by creating a long helix (open arrow in Figure 5, probability plot of G318 at 79°C) which is not present in the wild-type structure. Thus, the main transition is shifted to a higher temperature. In addition, hairpin II is also stabilized, but this does not compensate for the stabilization of the native structure, i.e. the stabilized hairpin II is not able to shift the main transition to lower temperatures. A convenient experimental method for the analysis of structural properties is temperature-gradient gel electrophoresis (TGGE) (Rosenbaum and Riesner, 1987; Riesner et al., 1991), which does not depend on highly purified RNA samples but gives information about Tm values and hydrodynamic shape of the secondary structures, and even allows analysis of coexisting structures. Therefore, the predicted effects of the mutations may be verified by experimental analysis of the three transcripts by TGGE, as shown in Figure 6. In both gels (Figure 6A and B), a mixture of wild-type transcript and one mutated transcript is applied in order to allow direct comparison of the two transcripts. Because the rod-shaped native structure of all transcripts is very mobile, the main transition results in a drastic retardation of gel electrophoretic mobility. The main transitions of mutants A326 and G318 are, as predicted, at higher temperature than that of the wild-type transcript. With mutant G318, the main transition leads to a higher retardation than that of the wild-type transcript, and the final mobility of the linear molecule without internal base-pairing is achieved only after performing a second transition (cf. arrow in Figure 6B), i.e. the denaturation of the stabilized hairpin II is seen as a transition well resolved from the main transition. Moreover, an additional

Base-pair probability profiles of RNA secondary structures

20 °C

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

45 °C

1O*

tit

m

79 °C

1

50

100

150 2t»

250 300

350

Nucleotide i

85 °C

50

100 150 200 250 JOO 350

Nucleotide I

Fig. 4. Denaturation of transcript with wild-type PSTVd sequence. The nucleotide numbering is from 5' to 3' end of the transcript, i.e. is not identical to that of the circular PSTVd shown in Figure 2. Hairpins I (positions 294-302/317-325), n (position 83-92/175-184) and III (position 342-348/18-24) are shown as stippled areas. Left side: three-dimensional base-pairing plots (nucleotide i versus nucleotide./ versus probability ptp at temperatures of 20, 45, 79 and 85°C respectively. The basic structure ensemble contained 226 different structures from 30 optimal and suboptimal secondary structures calculated at temperatures from 20 to 90°C in steps of IO°C. Right side: secondary structures of lowest free energy at temperatures of 20, 45, 79 and 85°C respectively.

transition curve is observed for the mutant G318 (318m in Figure 6B). It represents a multibranched conformer, which characteristically undergoes a transition into the non-base-paired

conformation within the temperature range of the hairpin II transition (arrow in Figure 6B). This multibranched secondary structure is basically the structure shown in the probability plot

395

M.SchmiU and G-Steger

79 °C

85

WT

100 ISO 200 250 JOO JSO

1

50 I N 150 200 250 JOO ISO

Nucleotide I

Nucleotide I

100 150 200 150 JOO ISO

50

100 ISO 200 250 JOO 350

Nucleotide I

Nucleotide I

A326

100 150 2O0 250 300 JSO

Nucleotide I

SO

100 ISO 200 250 JOO JSO

Nucleotide i

Fig. 5. Three-dimensional base-pairing plots (nucleotide i versus nucleotide j versus probability p^ at temperatures of 79°C (left side) and 85°C (right side) for three different transcripts with PSTVd sequence. The nucletotde numbering is from the 5' to the 3' end of the transcripts, i.e. is not identical to that of the circular PSTVd shown in Figure 1. Hairpins 1 (position 294-302/317-325), II (position 83-92/175-184) and III (position 342-348/18-24) are shown as stippled areas. The basic structure ensemble contained 226 (wild-type), 226 (C 3lg ) and 224 (G325), respectively, different structures from 30 optimal and suboptimal secondary structures calculated at temperatures from 20 to 90*C in steps of 10"C. Top: transcript with wild-type PSTVd sequence. Middle: transcript with mutation C 3 |j — O (position 174). Bottom: transcript with mutation G ^ — A (position 182).

of G318 at 85 °C (Figure 5), but with additional base-pairs at lower temperatures. Thus, this structure is metastable, but most probably the thermal energy present at room temperature was lower than the activation energy required for rearrangement of the structure^), present in the low-salt conditions of the experiments, to the equilibrium ensemble. A further discrepancy between theoretical results and TGGE relates also to the low ionic strength used in gel electrophoresis (— 3 mM) in contrast to the 1 M salt conditions used for calculations. Theory predicts a difference between the midpoint temperature, Tm, of the main transition and the 7^ of the hairpin II denaturation of ~ 15°C, whereas the difference in TGGE for mutant G318 is

396

only ~4°C, and for the wild-type transcript, no separate transition of hairpin II is even seen. This may be due to high repulsion of the ends in each molecule under low ionic strength conditions. Such an end effect is not corrected for in theory. In general, the agreement between theory and experimental data supports the concept that the denaturation mechanism of PSTVd may indeed be described as a switch between an extended rod-shaped native structure and a branched structure containing at least three stable hairpins at the temperature of the main transition. In the paper by Loss et al. (1991), it is concluded from further experiments that hairpin II is critical for infectivity of PSTVd and may play a role as a binding site

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

G318

Base-pair probability profiles of RNA secondary structures

52°C

+ _

25°C

52°C

Fig. 6. Analysis of structural transitions of transcripts with PSTVd sequence by temperature-gradient gel electrophorcsis (TGGE) in 5% polyacrylamide. Buffer conditions were 17.8 mM Tris, 17.8 mM boric acid, 0.4 mM EDTA. The direction of electrophoresis is from top to bottom. The lowest and highest temperatures of the linear temperature gradient are indicated at the bottom. (A) TGGE of a mixture of transcript with wild-type sequence and of transcript with mutation G j ^ — A. Marker slots (M) contain the same mixture. (B) TGGE of a mixture of transcript with wild-type sequence and of transcript with mutation C 3 u — G. The curve designated 318 m corresponds to a metastable, multibranched conformer. The arrow indicates the high temperature transition of the mutated transcript. Marker slots (M) contain natural circular PSTVd ( Q and natural linear PSTVd (L).

for host cell transcription factors. Further examples for the use of the presented algorithm with viroid RNA (Steger et aL, 1992) and mRNA (Rosenbaum et al., 1992) respectively, will be published. Discussion An algorithm is presented that approximates the total partition function by a Boltzmann-weighted summation of a structure ensemble. The structure ensemble consists of optimal and suboptimal secondary structures for a single-stranded RNA, calculated at a series of temperatures. Average properties (e.g. the hypochromicity) of the ensemble are generated using

397

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

25°C

the approximated partition function, as a weighting factor for the contributions of individual structures. These properties may be compared to corresponding experimental data, e.g. optical denaturation curves. A folding algorithm is used to construct the optimal and suboptimal secondary structures. In principle, any of the published folding algorithms may be used that are able to perform the calculations at different temperatures, and to produce output similar to that used by the program SQUIGGLES to draw secondary structures. An interfacing program purges those structures that occur more than once, determines enthalpy and entropy of the individual secondary structures, and writes the structure ensemble into a new file. The final interactive program calculates the average properties and displays them. In order to get reliable results that may be compared to experimental data, the following precautions have to be taken. The set of temperatures, which was used to prepare the structure ensemble, should comprise the entire range of temperature of interest. Extrapolation of structures, especially to the hightemperature range, is critical. As the free energy of each structure varies only linearly with temperature, interpolation of the equilibrium distribution between adjacent temperatures is possible, as far as the corresponding sets of structures overlap. In this case, variation of the free energy is only minor, since a near-optimal structure at one temperature is still found as a suboptimal structure at the adjacent temperature. For this reason, spacing between adjacent temperatures does not have to be too narrow in general, even though it principally depends on the individual properties of the structure sets. Finally, a basic requirement applies to the set of structures calculated at each temperature, to form a valid approximation of the real equilibrium ensemble: each set of structures should be complete, i.e. only structures contributing to the ensemble less than a given limit ( < 1 % in the examples) can be omitted from the calculation. This break-off criterion should ensure that only insignificant structures are omitted, thus yielding a rough estimate of error in the ensemble properties. For practical purposes, a compromise between a maximum of accuracy and a minimum in computer time has to be chosen. From application of our algorithm to viroid RNA, we have found that a spacing of 20°C in the interval between 20 and KX)CC, and a few additional structure sets, each containing 30 structures, at temperatures of major structural rearrangements, are sufficient to meet the above mentioned requirements. In Figure 3 the experimental and several calculated differentiated denaturation curves of circular PSTVd are compared. The experimental curve shows one major transition at 77 °C (Figure 3A, 1) and two minor transitions at higher temperatures (Figure 3A, 2 and 3). The sample for the experimental curve contained - 1 0 % of linear PSTVd, which were co-purified during preparation of the circular PSTVd. These linear molecules denature at a lower temperature than the circular molecules,

M.Schmitz and G.Steger

A recent publication (McCaskill, 1990) proposes a direct evaluation of the partition function for the construction of basepairing probability plots similar to those presented in this paper. In contrast to our algorithm, the partition function is no approximation but only average quantities are calculated directly; thus, no information about indivdiual secondary structures forming the equilibrium ensemble may be obtained. In order to simulate a transition curve, the calculation of the partition function, which is only valid for a single temperature, has to be repeated for many temperatures to get the desired resolution of the transition curve. For the example given above, calculations at 200 temperatures would be necessary. As shown above, our algorithm allows the interpolation between the temperatures at which the basic structure ensemble is calculated, and therefore needs calculation at only seven temperatures. Because the computer time for the calculation of a partition function should be comparable to that of structures at one temperature, our algorithm is clearly more efficient. The electrophoretic mobility as a function of temperature has become more and more interesting since experimental techniques such as temperature-gradient gel electrophoresis were

398

developed. Because precise models describing electrophoretic mobility of single-stranded RNA as a function of structural parameters are, in contrast to that of double-stranded nucleic acid, not available, a straightforward method for mobility calculation has to be developed. The probability plots, however, are an excellent tool for the interpretation of temperaturegradient gels—the plots show the secondary structures that coexist at a certain temperature, as well as the stuctural properties (loop size and degree of bifurcation) that are most important for the gel mobility of the structures. In summary, we presented in this paper an algorithm that calculates the equilibrium concentration of optimal and suboptimal secondary structures of RNA in solution and displays the result in a clear and graphic way, thus allowing quantitative thermodynamic interpretation of biophysical experiments. Acknowledgements We are indebted to Professor D.Riesner for his continuous support and stimulating discussions throughout the course of this work. We thank Professor P.Palukaitis and Dipl.-Phys. H. Wemtges for critical reading of the manuscript. The work was supported by grants from the Deutsche Forschungsgemeinschaft and the Fonds der Chcmischen Industrie.

References Been.M.D., Barfod.E.T., BurkeJ.M., PriceJ.V., Tanner.N.K., Zaug.AJ. and Cech.T.R. (1987) Structures involved in Telruhymena rRNA self-splicing and RNA enzyme activity. Cold Spring Harbor Symp. Quant. Biol., 52, 147-157. Bellman,R. (1958) On a routing problem. Quart. Appl. Math., 16, 87-90. Bellman.R. and Kalaba.R. (1960) Onfcthbest policies. J. Soc. Indust. Appl. Math., 8, 582-588. Bruening.G. (1990) Replication of satellite RNA of tobacco ringspot virus. Semin. Virol., 1, 127-134. Chen,L.-H., Emory.S.A., Bricker.A.L., Bouvel.P. and BclascoJ.G. (1991) Structure and function of a bacterial mRNA stabilizer: analysis of the 5' untranslated region of ompA mRNA. J. BacterioL, 173, 4578-4586. Coutts.S.M. (1971) Thermodynamics and kinetics of OC base pairing in the isolated extra arm of serine-specific tRNA from yeast. Biochim. Biophys. Acta, 232, 94-106. DevereuxJ., Haeberli.P. and Smithies.O. (1984) A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res., 12, 387—395. Gross.H.J., Domdey.H., Lossow.Ch., Jank.P., Raba.M., Alberty.H. and Sanger,H.L. (1978) Nucleotide sequence and secondary structure of potato spindle tuber viroid. Nature, 273, 203-208. HcncoJC., SSnger.H.L. and Riesncr.D. (1979) Fine structure melting of viroids as studied by kinetic methods. Nucleic Adds Res., 6. 3041 -3059. JaegerJ.A., Tumer.D.H. and Zuker.M. (1990) Predicting optimal and suboptimal secondary structure for RNA. In Doolittle.R.F. (ed.). Methods Jn Enzymology. Academic Press, San Diego, Vol. 183, pp. 281-306. Loss.P., Schmitz.M., Steger.G. and Riesner.D. (1991) Formation of a thermodyramically metastable structure containing hairpin II is critical for infectivity of potato spindle tuber viroid. EMBO J., 10, 719-727. McCaskillJ.S.M. (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29, 1105-1119. Nussinov.R., Pieczenik.G., GriggsJ.R. and Kleitman.DJ. (1978) Algorithm for toop matching. S1AM J. Appl Math., 35, 68-82. Osterburg.G. and Sommer.R. (1981) Squiggles. Comput. Programs Biomed., 13, 101-109. Raue\H.A., KlootwijkJ. and Musters.W. (1988) Evolutionary conservation of structure and function of high molecular weight ribosomal RNA. Prog. Biophys. Molec. Biol., 51, 77-129. Riesner.D. and Steger.G. (1990) Viroid and viroid-like RNAs. In Sacnger.W. (ed.), Landolt-BOmstein Group VU Biophysics, Vol. I: Nucleic Acids. SubwL

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

thus giving rise to an additional shoulder in the melting curve below 75CC. The calculated curve in Figure 3(B) is based on a structure ensemble that contained 166 different structures from 30 optimal and suboptimal secondary structures at each of seven temperatures (25, 50, 65, 70, 75, 80 and 90 c C). Comparison of Figure 3(A) and (B) shows a good agreement between the experimental and the calculated curve. Figure 3(D) is based on a structure ensemble that contained only the six optimal secondary structures at the same seven temperatures as in Figure 3(B). (The optimal secondary structures at 50 and 65 °C are identical.) This reduction of the structure ensemble, which leads to a drastic deviation between the experimental and the calculated curve, demonstrates the important influence of the suboptimal structures on the denaturation curve. Figure 3(C) is based on a structure ensemble containing 11 structures. These structures are selected from the structure ensemble used for Figure 3(B); each of them occurs as an optimal structure at a certain temperature in the range from 25 to 90°C. The consideration of the five structures additional to those from Figure 3(D) already leads to a considerable improvement. The curve shown in Figure 3(E) is based on the same structure ensemble as that of Figure 3(B), but the statistical weight of the structures was calculated at a temperature of 10 K. This simulates the effect of taking into account only the optimal structure at each temperature (Steger et al, 1984) without any averaging over the structure ensemble, and clearly shows the almost discontinuous behavior that results from switching between the different optimal structures—the half-width of the peaks depends only on the temperature increments used to evaluate the transition curve, i.e. 0.2°C in the example. The number of peaks reflects the number of secondary structures optima] over the temperature range.

Base-pair probability profiles of RNA secondary structures

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Exeter on July 22, 2015

d: Physical Data II. Theoretical Investigations. Springer-Verlag, Berlin, pp. 194-243. Ricsner.D., Henco.K., Rokohl.U., Klotz.K., Kleinschmidt.A.K., Domdey.H., Jank.P., Gross.HJ. and Sanger.H.L. (1978) Structure and structure formation of viroids. J. Mol. Bioi, 133, 85-115. Riesner,D., Henco.K. and Steger.G. (1991) Temperature-gradient gel electrophoresis: a method for the analysis of conformational transitions and mutations in nucleic acids and proteins. In Chrambach.A., Dunn.M.J. and Radola.B.J. (eds), Advances in Electrophoresis. VCH Vertagsgesellschaft, Weinheim, Vol. 4, pp. 169-250. Rosenbaum.V. and Riesner.D. (1987) Temperature-gradient gd electrophoresis: Tbermodynamic analysis of nucleic acids and proteins in purified form and in cellular extracts. Biophys. Chenu, 26, 235-246. Rosenbaum.V., Klahn,T., Lundberg.U., von Gabain.A. and Riesner.D. (1992) Co-existing structures of the stability regulating 5' region of ompA mRNA from Escherichia coli and Serrana marcescens (in press). Schnolzer.M., Haas.B., Ramm.K., Hofmann.H. and Sanger.H.L. (1985) Correlation between structure and pathogenicity of potato spindle tuber viroid. EMBOJ., 4, 2181-2190. Steger.G., Hofmann.H. FSrtschJ., Gross.H.J., RandlesJ.W., Sfinger.H.L. and Riesner.D. (1984) Conformational transitions in viroids and virusoids: comparison of results from energy minimization algorithm and from experimental data. J. Biomol. Struct. Dynam., 2, 543-571. Steger.G., Tabler.M., Bruggemarm.W., Colpan.M., Klotz.G., Sanger.H.L. and Riesner.D. (1986) Structure of viroid replkative intermediates: physicochemical studies on SP6 transcripts of cloned digomeric potato spindle tuber viroid. Nucleic Acids Res., 14, 9613-9630. Steger.G., Baumstark.T., Morcnen.M., Tabler.M., Tsagris.M., Sanger.H.L. and Riesner.D. (1992) Structural requirements for viroid processing by RNase T l . J. Mol. Biol. (in press). Tinoco.I.. Uhlenbcck.O.C. and Levine.M.D. (1971) Estimation of secondary structure in ribonucleic acids. Nature. 230, 362-367. Waterman.M.S. and Smith.T.F. (1978) RNA secondary structure: a complete mathematical analysis. Math. Biosci., 42, 257—266. Wild.U., Ramm.K., Sanger.H.L. and Riesner.D. (1980) Loops in viroids: accessibility to tRNA anticodon binding. Eur. J. Biochem., 103,227-235. Williams.A.L. and Tinoco.I. (1986) A dynamic programming algorithm for finding alternative RNA secondary structures. Nucleic Adds Res.. 14, 299-315. Zuker.M. (1989) On finding all suboptimal foldings of an RNA molecule. Science. 244, 48-52. Zuker.M. and Stiegler.P. (1981) Optical folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res., 9. 133- 148. Receh-ed on December 5, 1991; accepted on February 24. 1992

Circle No. 11 on Reader Enquiry Card

399

Base-pair probability profiles of RNA secondary structures.

Dynamic programming algorithms are able to predict optimal and suboptimal secondary structures of RNA. These suboptimal or alternative secondary struc...
823KB Sizes 0 Downloads 0 Views