B U L L E T I N OF

~ M A ~ C X L ~IOLOGY VOLUMr: 37, 1975

T H E O R Y OF R N A - D N A H Y B R I D I Z A T I O N - - I . S T O I C H I O M E T R Y AT E Q U I L I B R I U M

[ ] JOHN M. REINER Departments of Biochemistry and Pathology, Albany Medical College of Union University, Albany, New York 12298

The equations relating hybridized RNA to free RNA, in the case of simple hybridization, or to ratio of labelled and unlabelled RNA in competitive hybridization, are derived. Analysis of the equations shows how hybridization data may be used to infer properties of the distribution of components in an RNA mixture, or the relation between two distributions in competitive hybridization. A critical examination of the assumptions underlying the equations indicates that some of them may be violated in certain cases, or have no current support, evidential or theoretical. The consequences of such qualifications for the interpretation of hybridization data are indicated.

1. Introduction. The hybridization of R NA with DNA has become, in the decade since its introduction (Schfldkraut et al., 1961a, b), one of the most widespread and useful techniques of molecular biology. The t h e o r y of this flourishing practice, however, has received short shrift. E x c e p t for a few early discussions (McCarthy and Bolton, 1964; Hall et al., 1964), assumptions have tended to remain implicit, and conclusions have been reached by reasoning so qualitative as to seem almost intuitive. We propose to supply the rudiments of a theoretical framework, and to point out conceptual difficulties. The phenomenon divides itself rather naturally into three topics: stoiehiometry of the hybridization equilibrium, kinetics and physical chemistry of the reaction. The steichiometry is mathematically 255

256

JOHN M. REINER

elementary; but we begin with it, since it has not been set forth explicitly elsewhere. 2. Assumptions. The applicability of the hybridization technique depends on the presumption that binding takes place through hydrogen bonds between complementary bases in RNA and the corresponding segment of the homologous DNA. Besides this reasonable supposition, without which the whole matter would be valueless, additional assumptions are required if a halfway usable method is to be developed. Many of these are arguable or in need of careful qualification. We state them here, and consider difficulties later. 1. The partners are permitted to react sufficiently long for the equilibrium state to be reached or virtually reached. 2. Preparation of DNA to serve as binding agent (e.g. denaturation, shearing) does not change the original distribution of kinds of binding region. In the simplest case, with which we shall be dealing for the most part, if Dt moles of DNA are used, the number of moles of the binding region of type i in the final preparation is Dtt = Dr. I f loss of DNA is incurred, it is uniform, so that Dtt = hDt, O < h < 1. 3. Hybridization is a one-to-one relation between the set of kinds of RNA and the set of kinds of corresponding DNA region. No RNA bonds to more than one kind of DNA region; and no RNA binds in such a way as to overlap two regions. 4. I n the RNA labelled with radioactive tracer (e.g. 14C, 32p), customarily used in hybridization experiments, the radioactivity per mole (dpm/mole) is the same for all RNA species. Then total measured radioactivity (of bound or free RNA of all kinds) is simply proportional to total number of moles. 3. Simple Hybridization. We begin by considering the hybridization of a single RNA, of kind i. The equilibrium, with dissociation constant K~, of S~t moles of RNA and D t moles of DNA binding site, is given by: C, =

(D t -

C,)(S,t -

C,)/K,.

(1)

I n the presence of excess DNA (C, < S,t f ~ > f ~ >....

(16)

What we shall show is that ~' 1/h~ = ~ ' f J f [ is maximal if the f[ (whatever their actual values) are arranged according to (16), given that the f~ satisfy (15). This means that any permutation of arrangement (16) must decrease the summation. Since every permutation of a set of elements is representable as the result of a succession of interchanges between pairs, we m a y confine ourselves to such interchanges. Suppose we interchange the denominators of the i-th term and the i § p-th term, and subtract the new value of the sum from the old value. The difference is: (f, - f,+v)(f~+v - f/)/f:f:+~,,

(17)

which is positive b y virtue of (15) and (16). Thus the maximum value of the initial slope of (11) is given b y the antimorphic case. 4. The intercept of (11) on the ordinate axis (the value for z = 0) is 1 for homeomeric distributions, otherwise greater than 1. H o w much greater, of course, depends not only on how many terms are missing from the restricted summation Z' x~, but also on whether the missing terms are among the larger or the smaller ones. I f the curve (or enough of it to find the intercept) is obtained at saturating values of St, every x t = 1. Hence the intercept is n/n', where n' is the number of elements in common between the hot and cold distributions. Since n was already obtained from the C curve (previous section), n' is thus known. The overlap n'/n is a measure of homeomerism not completely equivalent to the parameters g and k; for 1 - n'/n measures the relative number of components lost in a heteromeric situation, while 1 - g and 1 - k measure the relative mass (number of moles) lost as far as competition is concerned.

5. Discussion.

Assumption 1 is technical, and requires no special discussion, except to note that it is probably not satisfied in many experiments. Assumption 2 is technical only in part. Repeated D N A regions present a complication. I f region of type i has repetition rot, its concentration is m~Dt, and the i-th term in (5) is multiplied b y m t, as is the corresponding xt in (11). The variety parameter n now represents the number of D N A regions corresponding to the hybridizing R N A mixture, rather than the actual number of kinds of R N A in the

262

J O H N M. R E I N E R

mixture. The same change affects the interpretation of n'. The overlap coefficient n'/n now depends not only on degree of overlap but also on whether the overlap is contributed b y R N A corresponding to unique or repeated DNA regions. Thus variety and overlap coefficients, which give information about the R N A mixtures when these all arise from unique DNA, are contaminated heavily with the properties of the DNA if they correspond to repeated regions even in part. Certain lines of reasoning, like that for the unique property of antimorphic distributions, are unaffected b y repetition, provided we incorporate the factor mi into the x~. The use of constants g and k as measures of heteromerism likewise remains valid. On the other hand, the maximal property of C(K) for equal ft no longer holds. Instead, the f~ making C a maximum increase with the square root of the corresponding m i. This is a reasonable conclusion: hybridization will be maximal when the fraction of any kind of R N A corresponds to the number of D N A places it must occupy. I t has been argued (Hall et al., 1964) that competition curves of C*/C where S t is at saturating levels (as in their Figure 5) tend to measure the fraction of D N A available for binding "non-competing" R N A rather than the actual abundance of such RNA. These authors propose instead measurements at the lowest feasible subsaturating levels of St, as in their Figure 7. Let us t r y to relate this argument to the equations. The terms in both C and C* corresponding to "non-competed" R N A (RNA that is present in the hot but not in the cold mixture) are identical, and of the form of the terms in (5). At saturation each term equals unity. Call the sum of such terms nl. The terms of C that will be involved in competition are of the same form, and their sum at saturation is n 2, such that n 1 + n 2 = n. A typical competitive term in C* is of the form (10), and m a y be written in terms of our abbreviations as zxJ(z + htxt), which becomes z/(z + hi) at saturation, when x i = 1. Such terms will vanish only if z d o e s - - t h a t is, if St is negligible compared to S~ even when S~ is saturating. In that case C*/C -- n J ( n 1 + n g ) , and the verbal argument is correct. Inspection of the data considered b y Hall et al. shows that the condition is satisfied. It is important to keep in mind that C*/C will be an overestimate of the proportion of such DNA regions if z is not negligible. (It should be obvious that, b y definition, nl = n - n' so that in the successful case C*/C = 1 - n'/n, and so is simply related to our overlap coefficient.) It is also important to remember that we are talking about D N A regions that are represented in the hybridizing RNA; about the totality of the DNA regions we have no information. At subsaturating levels, z = 0 afortiori; and it follows from (11) that C*/C = 1 ~ ' x~/~ x~. For small St, however, xi -*fiSt/Kt. Using the approximation

T H E O R Y OF R N A - D N A

HYBRIDIZATION

263

K s = K for simplicity, we see that then C*/C = 1 - N.'$sS~/~f, St. The second term is the fractional number of moles of R N A involved in competition; hence C*/C is indeed the fractional number of moles of R N A not involved in competition. The reader will have noticed that the saturation value of C*/C, 1 - n'/n, could have been used to measure the fractional number of kinds of non-competing RNA, whereas the value at subsaturating levels measures the fractional number of moles or fractional mass--provided repetition could be excluded. Unfortunately, repetition vitiates the subsaturation measure as well. To see this at once, one m a y imagine R N A consisting of two kinds in equal proportions (fl = f2 = 89 only one kind being competitive. I f the competitive species is also highly repeated (repetition m, where m is a large number), the parameter proposed b y Hall et al. will come out 1/(m § 1), or virtually nil, whereas the true value is 0.5. Assumption 4 is highly questionable. Consider, for instance, R N A of widely different molecular weights, such as tRNA and r R N A labelled at the same specific activity of labelling agent. Or again, consider the possible differences due to in vivo labelling at widely different rates. In either case, what the experiment gives us is R = ~ rsC~ (where rs is radioactivity per mole); and what we want is C = ~ C~. I f a constant radioactivity per mole cannot be obtained, the results are contaminated b y the unknown r s precisely as, in the preceding paragraphs, they were b y the ms. Finally, assumption 3, as well as the special assumption that all Ks = K, must be left with large question-marks. The extent to which binding specificity and strength are affected b y molecular weight, b y frequency of mismatches, and b y what we might call micro-repetition (occurrence of relatively short repeated sequences throughout the D N A - - a n occurrence which would be very likely if very long sequences made up of only four bases were randomly ordered) are questions whose answer we expect to approach in future communications. Clearly, a more intimate analysis of the hybridization process and perhaps also of the internal order relations in the polynucleotide chain will be required. Leaving these problems to the future, let us comment briefly on a couple of previous attempts at analyzing the hybridization of mixtures. The first of these is a little difficult to compare directly with our results (Bishop, Robertson, Burns and Melli, 1969), because it proposes a linear double-reciprocal plot of hybridization (basically 1/C against liSt) as valid for all times during the hybridization process rather than at equilibrium. The paper is interesting, however, despite this, because the authors are alert to the difficulties posed b y the distribution of kinds of RNA, and attempt to estimate the nature of these

264

J O H N M. R E I N E R

difficulties by calculating results for various assumed distributions (with assumed numerical values of constants, of course). The kinetic model cannot be considered seriously, despite the published curves (Bishop et al., 1969), as more than a rough approximation. The reader will readily convince himself that the linear double-reciprocal relation cannot hold even in the simplest model of hybridization of a pure RNA species. Indeed, such a demonstration appears in a recent paper by Spiegelman et al. (Spiegelman, Haber and Halvorson, 1973); and these authors also indicate that the simple kinetic model is probably inadequate to represent the time course of hybridization. In dealing with mixtures, however, Bishop et al. (1969) rightly took the hybridization to be a sum of independently generated terms, such as (4) in the equilibrium case. In the absence of information about the distributions, they calculated the sum for uniform distributions, Gaussian distributions, and "discontinuous" distributions, over a fair range of RNA concentrations, both for simple hybridization and for competition. Unfortunately, they then introduced the calculated hybridization and fractional competition values into their empirical linear double reciprocal equations (equations (2) and (6) of Bishop et al., 1969), in order to estimate graphically two parameters which they interpreted as saturation hybridization in the simple case and competed hybridization in the competition case. In the simple ease they were interested in evaluating the degree to which the observed maximum hybridization (plateau level) for a finite set of experimental points agreed with the true saturation value (hybridization at virtually infinite RNA concentration). In the competition case they looked at the estimated "homology" (fraction of hybridization by hot RNA involved in competition with cold RNA) compared with the homology actually simulated (1.0 in all cases). As we have seen, the saturation value in the simple case (limit of ~ x~) is the variety parameter n. Bishop's r~ is always less than 1, due to the fact that he measures DNA concentration in mass rather than in molar units. If the mass of DNA were M and it contained N equal cistrons of equivalent weight W, then D ~ = M/NW. Assuming the mass of an RNA sequence to be approximately W also, the saturation value in Bishop's units would be n/1V, a fraction giving what Bishop terms "complementarity", the fraction of DNA cistrons to which there correspond complementary sequences in the RNA mixture. However, we must confess to seeing the comparison as trivial: scarcely anyone would attempt to estimate the asymptote of a supposed hyperbola by eye when better methods are available, so Bishop et al. might better have dismissed other workers' use of r m a x summarily. In the competition case, Bishop et al.'s equation (6) is a form of our (11). As

T H E O R Y OF R N A - D N A H Y B R I D I Z A T I O N

265

we saw earlier, the intercept of (11) is ~ x d ~ ' x~, whose significance is the same as that of l i p c in the notation of Bishop et al.; and indeed it must equal 1, as we observed, in the homeomeric case ("perfect homology" of Bishop et al.). The trouble with all this is that, as we have seen, neither l/C, with C given by our (5), nor the reciprocal of relative competition, as given by (11), is linear in l/St. The deviations from expected values found by Bishop et al. as an outcome of their procedure, which t h e y interpret as due to non-uniform or anisomorphic distributions, m a y well be due to testing against an assumed linear relation. This is probably aggravated by the fact that (2) and (6) of Bishop et al. are in the Dixon-Massey rather than the Lincweaver-Burk form of the reciprocal plot, so that the parameters must be estimated from slopes rather t h a n intercepts; for if a graph of modest curvature is confounded with a straight line, its slope is even less definite than its intercept. The fact that uniform distributions are less troublesome than others is readily intelligible. For, assumingf~ = 1In and K s = K, all terms of (5) are the same; and so a linear plot is theoretically valid in this isolated case. I f the idea of circumventing the difficulties inherent in the behaviour of mixtures by simulation of possibly typical distributions is to be useful, it must be carried out in terms of the correct theoretical formulation, or of approximations to such a formulation. The problem of where to approximate arises in the next paper we shall discuss, and illustrates the point very well. The paper referred to (Hansen et al., 1970a) suffers, like that of Bishop et al., from an overemphasis on linearity. The paper was criticized (Bolch, 1970) in an exasperated but unfortunately not too clear note, the rejoinder to which (Hansen et al., 1970b) was altogether unresponsive. What Hansen et al. did was to form equations like our (3) for C~ and for the corresponding C*. They defined 2'~ = C*/C~ (unfortunately omitting subscripts), and showed that 1 / ( 1 - F ~ ) is linear in z. They then averaged, sumruing over the n' competing species and dividing by n'. The average is of course still linear in z; and the authors employ this equation for the analysis of competitive hybridization. The difficulty (leaving aside a minor error, the omission of a factor f~) is that the left side of the equation is not expressible in terms of measurable quantities, so that the exercise is worse than academic. The quantities given experimentally are C = ~ C~ and C* -- ~ C* (equations (5) and (10)). One may define an analogue to F~, namely G = C/C*, which is thus a direct function of the measurable quantities. Then I/(1 - G) = C/(C - G*), given by our (11). On the other hand, ~ ' 1/(1 - 2'~) = ~ ' [CJ(C~ - C*)] is a quantity which would be evaluated only if the individual C~ and C* could be measured; and this is not the case in a typical hybridization experiment. Confounding the sum of ratios

266

J O H N M. R E I N E R

with the ratio of sums is the fatal error: addition does not permute with nonlinear operations such as squaring and dividing. Since Hansen et al. exhibit some fairly linear curves, one m a y wonder whether this proves that their linear quasi-equation is a good and sufficient approximation to (11). I t is not; and the reason is very simple. The situation resembles the familiar one in which rounding during a complicated series of calculations m a y produce a substantial error, whereas rounding at the very end of the calculation does not. I t is easy to see that the linear equation is incorrect. One m a y expand the right side of (11) in terms of z, and keep only the linear term. The slope of this approximate line is ~ x~ ~' (f~/f/)/(~'x~) 2, which is the same as the initial slope of (11), as obviously it must be. The slope of Hansen et al.'s line (correcting the omission of a factor noted previously) is ~' (fd/[x~)/n' in our notation; it is therefore incorrect. The intercept of our approximation is ~ x J ~ ' x~, which is the same as the intercept of (11), which again must be the case. The intercept of Hansen et al.'s equation is 1, which is incorrect. However, the authors disavow this value in the heteromeric case, and argue that it must then exceed 1. This conclusion is correct, but nothing in their equation justifies it. Whether the experimental data justify a linear approximation is arguable. It is not difficult to show, b y extending the expansion of (11) to the quadratic term, that an average experimental error of not more than 5% would demand a change of 40% between the initial and final slopes in order that such a change be measurable with reasonable confidence. The significance of an apparent linear fit in the published curves is therefore doubtful at best. The important point, however, is that a theoretically useful approximation should be derived from an equation that is correct, or as nearly correct as possible, in the first place. The result of expanding (11) and keeping only the linear term is unquestionably a valid approximation to the actual non-linear relation, provided z is sufficiently small; and it is possible to decide from the expansion what "sufficiently small" amounts to. I f this condition is satisfied in the actual experiment, then the linear approximation will correctly represent the observed results unless the entire theory is erroneous. The linear quasiequation of Hansen et al., on the other hand, is vitiated from an early stage b y the error already remarked: in mathematical terms, the attempt to permute the intrinsically nonpermutable, or in physical terms the expression of the result in terms of non-measurable quantities, followed b y a confusion of these with the measurable quantities. Finally, since we have shown that it is possible to extract a good deal of information from the nonlinear equation (11), the merit of a linear form seems to be more a matter of comforting simplicity than of real power.

THEORY

OF

RNA-DNA HYBRIDIZATION

267

While the nonlinear hybridization characteristics of mixtures m a y be an obstacle for some individuals, other problems we have raised are less readily dealt with. The possible range of variation of the K t and the effects of partial binding as a result of mierorepetition will at least be susceptible to analysis as we develop the physical theory of the hybridization process; and the questions with which we leave them now will be settled. The problems raised by repetition and variable labelling, on the other hand, promise to be genuine stumbling-blocks since they east doubt on the interpretation (or even impede the cMeulation) of so m a n y parameters. I t is true t h a t some of these problems m a y be alleviated with the help of more laborious experimental procedures, such as the DNA prefractionation techniques (e.g. Brown and Weber, 1968, 1968b). But in a sense these are beside the point: relative ease and versatility are unquestionably among the attractions of hybridization. Moreover, recent developments (summarized by Graham e t a l . , 1973) raise some questions about the efficiency with which it will be possible to separate repetitive from nonrepetitive DNA, by a n y simple procedure, for use in hybridization experiments. When it comes to variability of labelling, whether due to variation in cistron sizes or to variation in metabolic rates, the remedy is still more obscure. Again, prefractionation of RNA by available techniques will surely enable us to get some information; but the palmy days of hybridization as a fast and neat method are clearly at an end. Such pessimistic conclusions should occasion no surprise: every beautifully simple phenomenon, when it is subjected to more detailed investigation, becomes much more complex, at least for a time. What one might hope for, if one is so inclined, is t h a t the theoretical analysis of the physicochemieal mechanisms involved in hybridization m a y suggest presently unglimpsed possibilities for discrimination. Supported by U.S.P.H.S. Grant HL-14177. LITERATURE Bishop, J.O. 1969. "The Effect of Genetic Complexity on the Time Course of RNADNA Hybridization." Biochem. J., 113, 805-811. ~ , F. W. Robertson, J. A. Burns and M. Melli. 1969. "Methods for the Analysis of DNA-RNA Hybridization Data." Biochem. J., 115, 361-370. Belch, H. D. 1970. "RNA Hybridization Competition Between Species." Science, 170, 872. Brown, D. D. and C. S. Weber. 1968a. "Gone Linkage by RNA-DNA Hybridization. 1." J. Mol. Biol., 34, 661-680. - and - 1968b. "Gone Linkage by RNA-DNA Hybridization. 2." Ibid., 3 4 , 681-697. Graham, D. E., B. R. Bleufeld and R. J. Britten. 1973. "General Assessment of Advances in the Knowledge of DNA Sequence Arrangement." Annual Report of the Director 1972-73, p. 224. Department of Terrestrial Magnetism, Carnegie Institution.

268

JOHN

M.

REINER

Hall, B. D., A. P. Nygaard and M. H. Green. 1964. "Control of T2-Specific RI~A Synthesis." J. Mol. Biol., 9, 143-153. Hansen, J. N., G. Spiegehnan and H. O. Halvorson. 1970. "Bacterial Spore Outgrowth: Its Regulation." Science, 168, 1291-1298. - and 1970. "RNA Hybridization Competition Between Species." Ibid., 170, 872. McCarthy, B. J. and E. T. Bolton. 1964. "Interaction of Complementary RNA and DNA." J. Mol. Biol., 8, 184-200. Reiner, J. M. 1969. Behavior of Enzyme Systems, pp. 165, 188. New York: Van Nostrand-Reinhold. Schildkraut, C. L., J. Marmur and P. Dory. 1961a. "The Formation of Hybrid DNA Molecules and Their Use in the Studies of DNA Homologies." J..Mol. Biol., 3, 595-617. , - - , J. Fresco and P. Dory. 1961b. "Formation and Properties of Folyribonucleotide--Polydeoxyribonucleotide Helical Complexes." J. Biol. Chem., 230, PC2. Spiegclman, G. B., J. E. Haber and H. O. Halvorson. 1973. "Kinetics of RNA-DI~A Membrane Filter Hybridization." Biochemistrg, 12, 1234--1242. RECEIVED 10-6-74

Theory of RNA-DNA hybridization--I. Stoichiometry at equilibrium.

B U L L E T I N OF ~ M A ~ C X L ~IOLOGY VOLUMr: 37, 1975 T H E O R Y OF R N A - D N A H Y B R I D I Z A T I O N - - I . S T O I C H I O M E T R Y A...
847KB Sizes 0 Downloads 0 Views