Summary A diploid human genome contains approximately six billion nucleotides. This enormous amount of genetic information can be replicated with great accuracy in only a few hours. However, because DNA strands are oriented antiparallel while DNA polymerization only occurs in the 5' + 3' direction, semi-conservative replication of double-stranded DNA is an asymmetric process, i.e., there is a leading and a lagging strand. This provides a considerable opportunity for non-random error rates, because the architecture of the two strands as well as the DNA polymerases that replicate them may be different. In addition, the proteins that start or finish chains may well be different from those that perform the bulk of chain elongation. Furthermore, while replication fidelity depends on the absolute and relative concentrations of the four deoxyribonucleotide precursors, these are not equal in vivo, not constant throughout the cell cycle, and not necessarily equivalent in all cell types. Finally, the fidelity of DNA synthesis is sequencedependent and the eukaryotic nuclear genome is a heterogeneous substrate. It contains repetitive and nonrepetitive sequences and can actually be considered as two subgenomes that differ in nucleotide composition and gene content and that replicate at different times. The effects that each of these asymmetries may have on error rates during replication of the cukaryotic genome are discussed. Introduction The spontaneous mutation rate in microbial(') and

Table 1. Eukaryotic DNA polymerases N B Catalytic subunit (kDa) Associated primase 3'+S' exonuclcasc" Processivity Average error ratc for: Base-substitutions One-base frameshifts

165 yes

40 I10

110"

110

moderate

low

1x 10-j 5x lo-'

5 X 1W4

5x lo-'

mammalian(2)cells has been estimated to be less than 1 mutation per billion base pairs per cell division. How is this incredible accuracy achieved? It results from several DNA metabolic processes occurring in a cell, each of which contributes to the final mutation rate. Discrimination against errors first occurs as the multiprotein DNA replication apparatus decides whether to polymerize a correct or an incorrect nucleotide at the growing replication fork. If a mistake is made, it may be excised by an associated 3'-+5' exonuclease prior to further chain elongation. Those errors that escape proofreading and are fixed in doublestranded DNA can be corrected by post-replication heteroduplex repair, which selectively removes errors in the newly-synthesized strand. In the past few years, major advances have been made in DNA replication and in describing the fidelity of DNA synthesis in model polymerization reactions(4). In this review, I consider the impact of these advances on our understanding of the fidelity of semi-conservative DNA replication in eukaryotes, focusing on several asymmetries that might influence error rates during this critical process. Replication Models During the past 30 years. five template-dependent deoxyribonucleotide-polymerizing enzymes have been described in eukaryotic cells, designated DNA polymerases a,/3. y, 6, and d5).For about 20 of these years, DNA polymerase a/, the first eukaryotic DNA polymerase discovered. was considered to be the replicative enzyme. However, it is now very clear that more than one DNA polymerase is required for replication of the nuclear genome. The four leading candidates are listed in Table 1. (Not listed is DNA polymerase 7,the replicative enzyme for the mitochondria1 genome.) Their possible roles can be considered by dividing replicative synthesis into four phases (Fig. 1). The first phase is DNA synthesis at origins of replication. This is likely to be catalyzed by DNA polymerase a/. This polymerase contains an associated DNA primase activity (Table 1) for synthesizing the RNA primers required to initiate DNA polymerization. The bulk of chain elongation then occurs on the leading and lagging strands at the replication fork. Since the

6

E

125 no Yes PCNA-dependent

2.5.5 no yes high

2x10-5 2x 10-5

s5x10-6

s1x10

"This rcfers to proofreading activity associated with the catalytic subunit. hExceptions have been rcportcd. For a detailed rei,iew of the properties of and relationships among these DNA polymerases, see reference 5.

DNA 5'

RNA primer

3'

\A/

3'

5'

5' \

Direction of fork movement a,6 or

E

RNA primer

Lagging strand Okazaki fragment

3'

'

3' 5'

I

5'

3' 5'

3' Replacement synthesis (a, R fi, €9

RNA primers

Fig. 1. Four phases of replicative DNA synthesis. See text for description. Phase 1is initiation of replication on both strands at the origin. Phases 2 and 3 are replication on the leading and lagging strands. Only one of the two forks characteristic of bidirectional replication is shown, here depicted in the traditional Y-shape. Phase 4. shown at the bottom? is polymerization associated with replacerncnt of RNA primers with DNA on the lagging strand.

strands are oriented antiparallel and DNA polymerases only polymerize in the S'+3' direction, the leading strand can be replicated continuously starting from the RNA primer at the origin, while the lagging strand is synthesized in a series of -250-nucleotide Okazaki fragments, each initiated by a -10-nucleotide RNA primer. Thus, the requirements for leading and lagging strand replication proteins are different, i.e., there is an enzymological asymmetry at the replication fork. The exact roles of the eukaryotic DNA polymerases on the leading and lagging strands remain an exciting unresolved question. The leading polymerase candidates for these two phases of replication are DNA polymerases a, 6 and E (Table 1). Recent studies in yeast demonstrate that in fact all three DNA polymerases do participate in replication('). Several models have been posited to account for their rolest7). All models logically envision the involvement of DNA polyinerase n i n lagging strand replication because of its

associated primase activity. The models differ primarily in the potential roles of DNA polymerases 6 and E . The simplest model suggests that DNA polymerase 6, in association with accessory proteins, may replicate the leading strand, while only DNA polymerase a-primase is needed on the lagging strand. Other studies suggest that a second polymerase may participate in lagging strand synthesis, perhap5 performing most of the synthesis after pol ct: has incorporated about 25 nucleotides. One such model proposes that DNA polymerase E is the leading strand polymerase, while DNA polymerases a and 0 are responsible for lagging strand replication(6). Another proposes that DNA polymerase 6 is the leading strand polymerase, while DNA polymerases N and E are responsible for lagging strand replication@). It is also possible that only one enzyme, either DNA polymerase 6 or 6, may be responsible for the bulk of polyrncrization on both strands. This is suggested by the fact that prokaryotic replication systems use the same DNA polymerase catalytic subunit to replicate both strands(') and by the number of striking enzymological similarities between prokaryotic and eukaryotic replication Replication models for leading and lagging strand synthesis also posit an 'architectural asymmetry' at the forkt3). Leading strand replication can occur concomitantly with unwinding of the parental helix, hence little template need be exposed as single-stranded DNA prior to replication. However, synthesis using the complementary strand template occurs only after exposing an extensive singlc-stranded region. This provides an asymmetric potential on the two strands for mutagenesis involving template-primer misalignments (reference 13 and text below). While many replication models suggest that a few hundred nucleotides (equivalent to an Okazaki fragment) are present as unreplicated single-stranded DNA. the thousands of nucleotides of single-stranded DKA found in some cells have also been suggested to result from replication (11.w The fourth phase of replicative DNA synthesis occurs concomitantly with removal of RNA primers from Okazaki fragments. The amount of polymerization involved may be substantial. For example, replacement of a 10-nucleotide RNA primer on a 250-nucleotide Okazaki fragment would represent 4% of lagging strand replication. The amount could be greater if some DNA is also excised and replaced in the proccss. The DNA polymerase catalyzing replacement synthesis is not known. Candidates include DNA polymerases a,/3, S and E .

Error Pathways and Inequalities in DNA Polymerase Fidelity What is known about the fidelity of DNA synthesis in v i m with each of the candidate replicative polymerases? Error rates per nucleotide polymerized are given in Table 1 for each enzymc. In highly purified

form, the catalytic subunits of DNA polymerases a and /J are now generally believed to lack associated exonucleolytic proofreading activity(’,’’) (although there is still some uncertainty on this point). Consistent with the absence of this error-correcting function, these are the least accurate of the eukaryotic DNA polymerases. They generate base substitution and frameshift errors durin s nthesis zn v i m at average rates between 5 X 10-$ Yand 5 x lop4 (Table 1). In contrast, DNA polymerases 6 and E are considerably more accurate for both base substitution and frameshift errors. The 3’+5’ exonuclease associated with DNA polymerase E has been shown to contribute partly to its high fidelity for both types of errors(15).Although this is likely to be true for DNA polymerase 6 as well, the evidence is as yet less compelling. Note that despite having an associated 3’+5’ exonuclease activity, DNA polymerase 6 is considerably less accurate than DNA polymcrase E . Also, while both DNA polymerase a and p lack exonuclease, DNA polymerase [j is significantly less accurate than DNA polymerase a.These inequalities in polymerase error rates suggest that the error rates may not be equal for all four phases of replication shown in Figure 1. The obvious caveat is that the fidelity of these enzymes may be modulated by replication accessory factors such as single-stranded DNA binding proteins and proteins that enhance processivity and/or polynierase-DNA and polymerase-dNTP binding affinities. The fidelity of DNA synthesis depends not only on the DNA polymerase, but also varies with the type of error and its position in a DNA sequence(’‘). One way to begin to appreciate the non-randomness of error rates is to consider four pathways to make mistakes(17). The pathway most studied over the past two decades involves misincorporation of a nucleotide followed by extension from the mispair to fix thc base substitution mutation. Alternatively, if the misincorporated nucleotide is complementary to a nearby template nucleotide, the template-primer may rearrange to form a misaligned substrate with one or more correct terminal base pairs. Continued polymerization from this intermediate would yield a frameshift error. For errors initiated by template-primer slippage rather than misinsertion, at least two possible outcomes can also be envisioned. Continued polymerization from the misaligned template-primer could fix the misalignment and yield nucleotide addition or deletion errors. Alternatively, incorporation of a single correct nucleotide onto the misaligned template-primer followed by realignment generates a terminal mispair. If extended by further polymerization, this would yield a basesubstitution error. This latter process has been referred to as dislocation mutagenesis. For base substitutions. there arc twelve possible mispairs when considering only the template nucleotide and the incoming dNTP. The number of possibilities is actually much greater because, even for mispairs involving the same two nucleotides, more than one

possible structure can exist. For example, structural studies of G - A mispairs (reviewed in ref. 16) demonstrate that both nucleotides can be in the anti configuration, or that one can be in the syn configuration. Also, it is now well established that neighboring nucleotides influence polymerase misinsertion error rates(‘). Current structural information suggests that DNA polymerases may contact as many as 8-10 nucleotides in the single-stranded template and about as many base pairs in the double-stranded primer stem. If these contacts affect discrimination in the active site, there are potentially a large number of structures for discrimination. Obviously, neighboring nucleotide effects may be different for replicative DNA polymerases having different primary amino acid sequences and possibly different structures. For misinsertion to result in a base substitution error, the mispair must be fixed by correct incorporation of the next nucleotide. Here too, polymerase and sequence context differences have been described(’). For those DNA polymerases having an associated 3’+5’ exonuclease activity, the balance between extension and excision is critical for determining fidclityC4).Depending on thc mispair and its position and the polymerase and its associated exonuclease, the contribution of proofreading to fidelity may be expected to differ significantly during replication. The probability that errors will be made via pathways that involve template-primer misali nment likewise depends on the local DNA sequence88). The classical example(’’) is for frameshifts that occur more frequently in reiterated nucleotide sequences than elsewhere. For such errors arising during DNA polymerization, the logic is that as the length of the run increases, the number of potential misaligned intermediates and the number of correct base pairs that could potentially stabilize the misaligned intermediates both increase. Furthermore, the distance between the misaligned nucleotide and the 3’-OH primer terminus increases with run length, which could reduce interference by this extra base in phosphodiester bond formation within the active site of the enzyme. Similar logic concerning stability and distance can be applied to misalignments produced when complementarity exists between a misinserted nucleotide and a template nucleotide at some nearby position. Note that slippage and rearrangement pathways are not confined to immediately neighboring nucleotides or to homopolymeric runs. Polymerases also generate deletions between directly repeated sequences many nucleotides apart(18).Deletions durin replication map also involve palindromic structures‘14 (and see references in 18). proteins other than polymerases (reference 19 and references therein), and structures generated during replication, for instance, during termination(”). The probability that errors will be made via misalignment pathways is also not equal for all DNA polymerases. Among the four enzymes shown in Table

1, there is evidence for base-substitution errors via dislocation only for DNA polymerase p (and for the replicative polymerase for the type 1 human immunodeficiency virus(21)).Similarly, although all four polymerases produce one-nucleotide frameshift errors, they do so at different rates. Thus, DNA polymerase is much less accurate for frameshift errors in homopolymeric runs than is DNA polymerases E . The 3'%5' exonuclease associated with the latter enzyme contributes to its high frameshift fidelity("). Eukaryotic DNA Replication Fidelity In both prokaryotes and eukaryotes, DNA replication involves the concertcd action of a large number of gene products(3). What is known about the fidelity of semiconservative replication by a replisome'? Considering the complexity of replication and the number of possible ways to do anything incorrectly, the answer is not much, yet. Replication error rates in model prokaryotic replication systems (reviewed in 16) are -lop7. This observation and the observed low rate of spontaneous mutation in eukaryotes implies that semiconservative replication of double-stranded DNA in eukaryotes is accurate. So far, limited measurements of replication fidelity in vitro using mammalian cell extracts confirm this expectation. During SV40 origindependent DNA replication. currently the best available model for chromosomal replication in primate cells, error rates for several types of base substitutions vary between 1851S7. 8 Rnrgers, P. M. J. (1991). Saccharomyces cerei3isiae replication factor C. 11. Formation and activity of complexes with the proliferating cell nuclear antigen and with DNA polymerases B and 6 . .I. B i d . Chem. 266, 22698-22706. 9 Alherts, B. M. (1990). Mcchanisnis arid enzymology of DNA replication. In: Srriictiire and Function of Nucleic Acids und Prutr,in;,>(edn. F.Y.-H. Wu and C: W. Wu), pp.223-237. Riven Press, Ltd., New York. 10 Hurvitz, J., Dean, F. B., Kwong, A. I). and Lee, S.-H. (1990). The in v i m replication of DNA containing the SV40 origin. J Bid. Chem. 265. 1804318046. 11 Bjursell, G . , Gussander, E. and Lindahl, T. (1979). Long regions of singlestrandcd DNA in human cells. Xaratuie 280, 420-423. 12 Gaudelte, M. F. and Benbnw, R . M. (1986). Rcplicatioii forks arc underrepresented in chromosomal DNA of XenopuA lneuis embryos. Proc. Natl Acad. Sci. USA 83, 5 9 5 3 - 5 W . 13 Trinh, T. Q. and Sinden, R. R. (1991). Preferential DNA secondary structure mutagcncsis in thc lagging strand of rcplication in E. coli. Nature 352. 544547. 14 Morrison, A., Bell, J. B., Knnkel, T. A. and Sugino, A. (1991). Eukaryolic DNA polymerase amino acid sequence required for 3'45' exonuclease activity. Proc. Natl Acad. Sci. USA 88, 94739477, 15 Thomas, D. C., Roberts, J. D., Sahatino, R. D., Myers, T. W., Downey, K. M., So, A. G., B;imbdrd, K. A. and Kunkel, T. A . (1991). Fidelity of mammalian DNA replication and replicative DNA polymerases. Riorhemirrry 30, ii751-117~9. 16 Kunkel, T. A. and Behenek. K. (1988). Recent studies of the fidelity of DNA synthesis. Biochim. Biophys. Acta 951. 1-15. 17 Kunkel, T. A. and Soni, A. (1988). Mutagcncsis by transient misalignment. . I Bid. . Chem. 263, 14784-14789. 18 Kunkel, T. A. (1990). Misalignment~mediated DNA synthesis errors. Biochemistry 29, 8um-81111. 19 Streisinger, G., Okada, Y . , Emrich, J., Newton, J., Tsngita, A , , Terzaghi, E. and Inouye, M. (1966). Frameshift mutations and the genetic code. Cold Spring Harbor Sq'mp. Quant. Biol. 31, 77-84. 20 Rierne, H., Ebrlich, S. D. and Michel, B. (1991). The rcplication termination signal terB of the Escherichia coli chromosomc is a dclction hot spot. EMRO .I. 10. 2699-2705. 21 Bebenek, K., Abbotts, J., Roberts, J . I)., Wilson, S. H. and Kunkel, T. A . (1Y8Y). Specificity and mechanism of error-prone replication by HIV-1 reverse transcriptase. J. B i d . C'hem 264, 16948-16956. 22 Roberts, J. D. and Kunkel, T. A. (1988). Fidelity of a human cell DNA replication complcx. Proc. N a f l Acad. Sci. U S A 85, 7064-7068. 23 Roberts, J. D., Thomas, D. C. and Kunkel, T. A. (1591). Exonucleolytic proofreading of leading and lagging strand DNA replication errors. Proc. iVarl Acad. Sci. U S A 88. 3465-3469.

24 Hauser, J., Levine, A. S. and Dixon, K. (1988). Fidelity of DNA synthesis in a mammalian in v i m replication syrtem. Mol. Cell. B i d . 8, 3267.3271. 25 Simon, M., Giot, L. and Faye, G. (1991). The 3' to 5' exonuclease activity located in the DNA polymerase 2, subunit of Saccharomyces ccrevisiae i s required for accurate replication. EMBU J. 10, 2165-2170. 26 Wu, C . 4 . and Maeda, N. (1987). Inequality in mutation rates of the two strands of DNA. Nature 327, 169-170. 27 Filipski, J. (1950). Euolution of DNA sequence. Contributions of mutational bias and selection to the origins of chromosomal compartments. In: Advances in Mutugcnesis Research Vul. 11, (ed. G. Ohe), pp. 1-54. SpringerVerlag. 28 Bebenek, K., Roberts, J. D. and Kunkel, T. A, (1992). The effects of dNTP pool imbalances 011 the kameshift tidelity of replicative DNA polymerases, J. Biol. Chem. 267. 3585-3596. 29 hleuth, M. (1989). The molecular basis of mutations induced by deoxyribouuclcoside triphosphate pool inihalanccs in mammalian cclls. Exptl. Cell Res. 181, 305-316. 30 Kunz. B. A. and Kohahni, S. E. (1991). Modulalion ol mutagenesis by deoxyribonucleotide levels. Annu. R w . Gener. 25. 339-359. 31 Mathews, C. K. and Ji, J. (1991). DNA precursor asymmetries. replication fidclity and variable genome evolution. BioEssnys 14. 000-000. 32 Lepds, J. M., Slabaugh, M. B. and Mathews, C. K. (1985). DNA precursor pools and ribonucleotidc reductase activity: Distribution between nucleus and cytoplasm of mammalian cells. Mol. Cell. Bid. 5 , 3443-3450. 33 Wolfe, K. H., Sharp, P. 111. and Li, W.-H. (1989). Mutation ratcs differ among regions of thc mammalian gcnomc. iVaruve 337, 283-285. 34 Wulfe, K. H. (1991). Mammalian DNA replication: mutation biases arid the mutation rate. J. Theor. B i d . 149, 44-451. 35 Holmqnist, G. (1988). DNA sequences in G-band% and R-hands. In: Chromosomes and Chromatin, (ed. K.W. Adolph), pp. 76-121. CRC Prebs. Boca Raton, FL. 36 Holmquist, G. P. (1990). Mutational bias, molecular ccology. and chromosonie evolution. Adv. Mutntion ReA. 2, 95.126. 37 Bessman, M.J. and Rhea-Krantz, L. J. (1977). Studies on the biochemical basis of spontaneous mutation. Effect of tempertature on mutation frequency. J. Mol. Biol. 116, 115-123. 38 .Jones, M.. Wagner, R. and Radman, M. (1987). Rcpair of a mismatch is influenced by the hase composition of the surrounding nucleotide sequence. Generics 115, 605-610. 39 Modrich, P. (1989). Methyl-directed D N A mismatch correction. J. H i d . Chem. 264, 6597-6600. 40 Bohr, V. A., Smith Phillips, D. H. and Hanawdt, P. C. (1987). Heterogcneous DNA damage and repair in the mammalian genomc. C'nncev Res. 47, 6426-6436. 41 Cook, P. R. (1991). Thc iiucleoskeletoii aud thc topology of replication. Cell 66, 627-63.5. 42 Cooper, D. N. and Krawczak. M. (1990). The mutational spectrum of single base-pair siihstitutions causing human genetic disea3e: patterns and predictions. Hum. Genet. 85, 55-74. 43 Krawczak, kl. and Cooper, D. N. (1991). Gene deletions causing human genetic disease: mechitnisms of mutagenesis and the role of the local sequence environment. Hum. Gmet. 86, 425-441. 44 Steele, E.J. (1991). Somnric ffypermutarion in V-Regions, CRC Press, Inc., Boca Raton, FL.

Thomas A. Kunkel is at the Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.

Biological asymmetries and the fidelity of eukaryotic DNA replication.

A diploid human genome contains approximately six billion nucleotides. This enormous amount of genetic information can be replicated with great accura...
748KB Sizes 0 Downloads 0 Views