[ 12]

CRYSTALLOGRAPHY OF DNA CONTAININGERRORS

221

[12] C r y s t a l l o g r a p h i c S t u d i e s o f D N A C o n t a i n i n g Mismatches, Modified and Unpaired Bases B y WILLIAM N . H U N T E R

Introduction

The replication of DNA must occur with a high degree of precision in order for genetic information to be faithfully transmitted from one generation to the next. It was recognized by Watson and Crick ~,2 that a complementary base-pairing scheme (A with T, G with C) provided a mechanism whereby this may be accomplished. The Watson-Crick base pairs are shown in Fig. 1. The incorporation of non-Watson-Crick base pairs, such as G pairing with T, into duplex DNA does occur during replication, and this may induce errors in the processing of the genetic code. Such pairings are referred to as mismatches or mispairs. In addition, the integrity of genomic DNA is constantly pressured by chemical and physical forces present in our environment. Cancerous chemicals, ultraviolet light, and ionizing radiation are but some of the environmental factors which directly damage DNA and which may account for biosynthetic errors during life processes. These errors may be strand breaks, missing or extra unpaired bases, chemically modified bases, or mismatches. Single-crystal X-ray diffraction methods have been applied to a number of synthetic oligonucleotide fragments which with duplex formation contain examples of these structural "errors." Table 13-23 lists the sequences so far studied and the 1 j. D. Watson and F. H. C. Crick, Nature (London) 171, 737 (1953). 2 j. D. Watson and F. H. C. Crick, Nature (London) 171, 964 (1953). 30. Kennard, J. Biomol. Struct. Dyn. 3, 205 (1985). 4 T. Brown, G. Kneale, W. N. Hunter, and O. Kennard, Nucleic Acids Res. 14, 1801 (1986). 5 M. Coll, D. Saal, C. A. Frederick, J. Aymami, A. Rich, and A. H.-J. Wang, Nucleic Acids Res. 17, 911 (1989). 6 M. Coil, A. H.-J. Wang, G. A. van der Marel, J. H. van Boom, and A. Rich, J. Biomol. Struct. Dyn. 4, 157 (1986). 7 L. Van Meervelt, M. H. Moore, L. P. Kong Thoo, D. M. Brown, and O. Kennard, J. Mol. Biol. 216, 773 (1990). s p. S. Ho, C. A. Frederick, G. Quigley, G. A. van der Marel, J. H. van Boom, A. H.-J. Wang, and A. Rich, EMBO J. 4, 3617 (1985). 9 S. C. Ginell, S. Kuzmich, R. A. Jones, and H. M. Berman, Biochemistry46, 10461 (1990). 1oG. Kneale, T. Brown, O. Kennard, and D. Rabinovich, J. Mol. Biol. 186, 805 (1985). " T. Brown, O. Kennard, G. Kneale, and D. Rabinovich, Nature (London) 315, 604 (1985). 12 Z. Shakked, T. E. Haran, M. Eisenstein, F. Frolow, and D. Rabinovich, J. MoL Biol. 200, 151 (1988). 13 W. B. T. Cruse, J. Aymami, O. Kennard, T. Brown, A. G. C. Jack, and G. A. Leonard, Nucleic Acids Res. 17, 55 (1989).

METHODS IN ENZYMOLOGY, VOL 211

Copyright © 1992 by Academic Press, Inc. All rights of reproduction in any form r~erved.

222

NONSTANDARD

06

C1'

DNA

STRUCTURES A N D THEIR ANALYSIS

N4

N3 o----"u N2 02 A(06) D(N4) A(N7)

~___

C1'

C1' m [

[12]

04 N6

N7

02 N3 C1' A(04) [ D(N6) [ A(N7)

I

G I

C A(N3) D(N2) A(02)

T- -

A A(02)

A(N3)

l~o. I. Watson-Crick pairing,G with C and A with T. The atomic arrangement isshown at top, and below isa stickrepresentativeto point out the dispositionof the functionalgroups. A stands for a hydrogen bond acccptor,D for a donor. Note the approximately symmetrical relationshipof the bases,be they purine or pydmidine, with respectto a vector linkingthe C-I' atoms. The stick drawing has the major groove above, the minor groove below the horizontalline. points of s t r u c t u r a l interest. These studies have given information which aids our understanding about D N A structure and have led to proposals about the nucleic acid-protein interactions which are important for mainmining the fidelity of replication. 24

m4G. G. Priv6, U. Heinemann, S. Chandrasegaran, L. S. Kan, M. L. Kopka, and R. E. Dickerson, Science 238, 498 (1987). ~sW. N. Hunter, T. Brown, G. Kneale, N. N. Anand, D. Rabinovich, and O. Kennard, J. Biol. Chem. 262, 9962 (1987). ,6 T. Brown, W. N. Hunter, G. Kneale, and O. Kennard, Proc. Natl. Acad. Sci. U.S.A. 83, 2402 (1986). ~7W. N. Hunter, T. Brown, N. N. Anand, and O. Kennard, Nature (London) 320, 552 (1986). ~s p. W. R. Corfield, W. N. Hunter, T. Brown, P. Robinson, and O. Kennard, Nucleic Acids Res. 15, 7935 (1987). m9T. Brown, G. A. Leonard, E. D r Booth, and J. Chambers, J. Mol. Biol. 207, 455 (1989). 2oG. A. Leonard, J. Thompson, W. P. Watson, and T. Brown, Proc. Natl. Acad. Sci. U.S.A. 87, 9573 (1990). 2~L. Joshua-Tot, D. Rabinovich, H. Hoppo, F. Frolow, E. Appella, and J. L. Sussman, Nature (London) 334, 82 (1988). 22 M. Miller, R. W. Harrison, A. Wlodawer, E. Appolla, and J. L. Sussman, Nature (London) 334, 85 (1988). 23 R. Chattopadhyaya, S. Ikuta, K. Grzeskowiak, and R. E. Dickerson, Nature (London) 334, 175 (1988). 24p. M~rich, Annu. Rev. Biochem. 56, 435 (1987).

[ 12]

CRYSTALLOGRAPHY OF DNA CONTAININGERRORS

223

TABLE I STRUCTURES WITH NoN-WATSON-CRICK BASE PAIRS, MODIFIED BASES, OR EXTRA HELICAL BASESa

Sequence d(TGCGCG) d(BrUGCGCG) d(CGCGFUG) d(CGTDCG) d(CDCGTG) d(CGCGXG) d(CGCGTG) d(CGCOCG) d(~TCCC) d(~CC) d(GGGTGCCC) d(GGIGCFCC) d(CCAAGATTGG) d(CGCGAATTTGCG) d(CGCGAATTAGCG) d(CGCAAATTCCJCG) d(CGCIAATTAGCG) d(CGCAAATTGC~G) d(CGCOAATTTGCG) d(CGCAGAATTCGCG) d(CGCGAAATTTACGCG) d(CGCGCGTTTTCGCGCG) a

Comment G. T BrU. G FU. G T. D T. D G.X G. T G. O G-T G. T G. T I. T G" A G" T G- A C. A I. A G. A O" T Extra A Extra A Hairpin

Global form

Ref.

Z Z Z Z Z Z Z Z A A A A B B B B B B B B B Z

3 4 5 6 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

BrU, 5-Bromouridine; FU, 5-fluorouridine; D, 2-aminoadenine; X, N4-methoxycytosine; and O, O6-methylguanine.

Crystallographic Analyses The single-crystal diffraction methods applied to structure analyses of mismatches and D N A fragments with extra bases are essentially the same as applied to any biological macromolecule and will only be summarized. Comprehensive details can be found in Refs. 2 5 - 2 8 . The single-crystal X-ray analysis can be divided into four stages.

25 T. L. BlundeU and L. N. Johnson, "Protein Crystallography." Academic Press, New York 1976. 26 A. McPherson, "Preparation and Analysis of Protein Crystals." Wiley, San Francisco, California, 1982. 27 H. Wyckoff, C. H. W. Hits, and S. N. Timasheff, eds., this series, Vol. 114. 2s H. Wyckoff, C. H. W. Hirs, and S. N. Timasheff, eds., this series, Vol. 115.

224

NONSTANDARD D N A STRUCTURES AND THEIR ANALYSIS

[ 12]

Crystallization The oligonucleotide fragment is synthesized and purified. It must then be coerced into an ordered precipitation. This is achieved by driving a solution into a supersaturated state where crystal nucleation may occur and then maintaining the conditions around the nucleus so that crystal growth can take place. The most common methods for crystal growth are to diffuse a precipitating agent into an oligonucleotide solution or to diffuse out water. There are a number of considerations. To bring together negatively charged nucleic acids into an ordered crystal lattice requires the presence of cations. The nature and concentration of the cation can have a profound effect on crystallization. Other variables are the concentration and type of precipitant, temperature, and pH. In most of the cases listed in Table I the crystallization conditions for one of the Watson-Crick parent sequences had already been determined and they were crystallized in a similar manner. Several samples did, however, give completely new crystal forms for which the crystallization conditions had to be determined.

Data Collection Oligonucleotide crystals, like those of proteins, are extensively hydrated. To preserve the crystalline order they must be kept in a humid environment. For the purpose of data collection this is achieved by putting the sample in a capillary which is then sealed with wax. A drop of crystallization solution is often placed in the capillary as well. Most samples listed in Table I had the diffraction data measured on automated four-circle diffractometers, but the availability of area detectors is speeding up this process. Selected crystallographic details are given in Table II. The resolution quoted in Table II is an indication of how ordered the crystal lattice is: the smaller the resolution given, the more accurate the final structure.

Structure Solution The structure solution part of the analysis aims to place a model structure at the correct position in the unit cell and with the correct orientation. This may be achieved by using multiple isomorphous replacement methods to generate the electron density map and then constructing a molecular model to fit this density. Alternatively an oligonudeotide model structure can be constructed from fiber diffraction results, or a previously determined crystal structure can be used in molecular replacement methods to position and orient this known structure in the unknown unit cell. With few exceptions most of the "error" structures crystallize isomorphously with one of the parent compounds for which the structure had

[ 1 2]

C R Y S T A L L O G R A P H Y OF D N A

225

C O N T A I N I N G ERRORS

TABLE II CRYSTALLOGRAPHIC DETAILS OF SELECTED STRUCTURES CONTAINING N O N - W A T S O N - C R I C K BASE PAIRS A N D EXTRA BASES a

Unit cell dimensions (k) Sequence

Mispair

DNA form

Space group

a

b

c

d(GGCJGCrCC) d(~TCCC) d(GC~TGC-L-~) d(GGIGCTCC) d(CCAAGATTGG) d(CGC'GAATTAGCG) d(CC~AAA~) d(CGCIAATTAGCG) d(CGCAAATTCGL~) d(CCK~AA~) d(CGCOAATTTGCG) d(T~) d(BrUGCCd2G) d(~TG) d(~FUG) d(CC~AGAAT'I'CC_K~) d(CGCGAAATTI'ACCd3G) d ( ~ i I 1-1~ )

G.T G.T G. T I.T G.A G.A G-A I.A C'A G'T O'T G.T BrU'G G.T FU'G Extra A Extra A Hairpin

A A A A B B B B B B B Z Z Z Z B B Z

P6t P61 /'6 ~ P6 t C2,fl= 119" P2t2~21 P2~2t21 P2t2t2 t P212121 P2t2t2j P21212j P212121 P21212j P2~2t2 s P2~2t2t C2,/3 s 99* 1222 C2,,0=95"

45.2 44.7 45.6 45.1 32.5 25.7 25.2 25.8 25.4 25.5 25.4 17.9 17.9 17.5 17.8 78.5 37.0 57.2

45.2 44.7 45.6 45.1 26.2 41.9 41.2 41.9 41.4 41.2 40.7 30.7 30.9 31.6 31.3 42.8 53.7 21.6

42.9 42.4 41.0 45.5 34.3 65.2 65.0 65.1 65.2 65.6 65.9 45.1 49.9 45.6 45.4 25.2 101.6 36.4

Resolution (A)

R-factor (%)

2.2 2.1 2.5 1.7 1.3 2.5 2.3 2.5 2.5 2.5 2.0 1.5 2.2 1.0 1.5 2.8 3.0 2.1

° O, O6-Methylguanosine;BrU, 5-bromouridine; FU, 5-fluorouridine;and X, N4-methoxycytosine.

previously been determined. There are notable exceptions; the decamer d(CCAAGATTGG), the looped out A structure d(CGCAGAATTCGCG) and the hairpin d(CGCGCGTTTTCGCGCG) were solved with molecular replacement. In the case of d(CGCGAAATTTACGCG) novel maximum entropy methods were applied. Once a starting model is positioned correctly in the unit cell the next stage, refinement, proceeds.

Refinement The aim in refinement is to reduce differences between calculated structure factors (based on an atomic model) and the experimental observations, that is, the diffraction data. This is achieved by a combination of large- and small-scale alterations of atomic positions. Gross alterations of the model to optimize the fit to generated electron density maps are often required. Access to computer graphics greatly speeds up this process. The smaller manipulations of the model are generally carried out by a series of least-squares optimization calculations. Because the nucleic acid crystals do not often diffract to very high resolution, it is necessary to utilize the accurate structural information generated from small molecule structure determinations of the nucleic acid components. In this way the least-

14 14 15 14 17 17 16 19 19 18 18 18 16 20 20 15 24 23

226

NONSTANDARD D N A STRUCTURES AND THEIR ANALYSIS

[ 12]

squares calculations are closely linked to known geometrical parameters. This compensates for the limited diffraction data that are available. The two most commonly used programs in the least-squares calculations are CORELS 29 and NUCLSQ) ° Oligonucleotide crystals are highly hydrated, and water or cations must be considered to be part of the molecular model. A proper refinement using diffraction data beyond about 2.8 A will include the location of solvent positions. Although cations are present, it is often not possible to positively identify them. A safe way to treat the cations is to include them as oxygen atoms of water molecules. The careful identification of solvent positions and treatment of the model is necessary to avoid errors of interpretation and to pinpoint solvent positions correctly. As discussed later, the identification of solvent positions in the G" T and C. A mispairs was important because solvent molecules are closely associated with the actual mispairs. Refinement ceases when no further improvement of the model fit to the electron density maps can be accomplished, when all well-ordered solvent positions are located, and when no further improvement in the crystallographic residual can be obtained. This residual or reliabilityindex, R = "2 F o - F J F o , where F o is the observed and F c the calculated structure factors. Most of the structures of relevance to this chapter refine to R values of less than 20%, which is perfectly acceptable for molecules of this size. During the refinement of macromolecules it is common practice to omit from the electron density map calculations parts of the structure either for which no clear structural interpretation of the electron density can be made or which the researcher is prepared to leave out until the rest of the model is accurately refined. An interpretation of the density can then be made toward the end of the refinement. This method was applied by the author in a number of cases and has subsequently been used for most of the structures listed in Table I. The atoms of the mismatched bases are omitted from all calculations while other parts of the molecule are dealt with and solvent positions located. As the refinement proceeds the quality of the electron density maps improves, and an unambiguous interpretation of the electron density can be made. Figure 2 provides an example of an omit electron density map for a G" T mispalr. Care is then taken to check that after the structural interpretation the model behaves normally in further refinement. 29 j. L. Sussman, S. R. Holbrook, G. M. Church, and S. H. Kim, Acta Crystallogr. Sect. A: Cryst. Phys. Diffr. Theor. Gen. Crystallogr. A33, 800 (1977). 3o E. Westhof, P. Dumas, and D. Moras, J. Mol. Biol. 184, 119 (1985).

[ 12] "-t

CRYSTALLOGRAPHY OF DNA CONTAININGERRORS I ]

I I

I I I i

I I I I--]

-'--l---'l

I I I I I ~ i

227 m i

t--

FIG. 2. Stereoview of an omit electron density map at 2.5 A resolution for the G.T mispair in the structure d(CGO3AATTTGCG).The continuouslines represent increasesin the electrondensitylevel.Atomicpositionsare depictedas spheres. The locationof the bases is unambiguous.

The accuracy of the final structures produced depends to a large extent on how ordered the crystal is and to what resolution it diffracts (see above). In most cases cited in Table I, the errors in atomic position will be between 0.1 and 0.5 A. Results and Some Implications for Recognition and Repair The X-ray analyses of DNA fragments with potential mispairs does not provide information about the occurrence of rare tautomedc forms at the time of replication, but they do define the structure of the duplex which incorporates a mispair or modified base. They show how functional groups interact to form hydrogen bonds and give some idea about the van der Waals and hydrophic interactions that stabilize a variety of molecular conformations. The purine- pyrimidine pairings G. T and A. C form what

228

[12]

NONSTANDARD D N A STRUCTURES AND THEIR ANALYSIS w O /

w

wO....~:'. ~........ 0,-....~

W ....

W

C>..... " -0

.....

~l

--2[

l

a~

2-



l.$5

_3Z--

l.gt

l.O

,/ ,

Ow

b

~'

FIG. 3. G . T mismatches with the distinctive hydration pattern observed in two crystal structures, (a) d(CGCGTG) and (b) d ( ~ T C C C ) . Distances are given in angstroms.

are termed "wobble" pairs. The pyrimidine moves out into the major groove and the purine to the minor groove. Solvent positions form hydration bridges between the bases in the major groove for both of the pairings. The G. T mispair has an additional solvent bridge in the minor groove. Figure 3 illustrates two G. T pairings observed in different sequences showing the well-defined hydration network. Figure 4 shows structural formulas for G" T, C. A, and one type of G" A pairing. The purine- purine pairing of G with A has been observed in three different conformations (see Fig. 5). This structural variability may explain why this mismatch is the most likely to escape detection by polymerase III. 3~ Efforts to crystallize the A. A, G" G, C. C, C. T, C. T, and T. T mispairs have so far not met with success. The T. T pairing has been observed not in a duplex but formed by intermolecular interactions in the sequence d(CGCGCGTTTTCGCGCG). A symmetry related molecule is positioned so that two thymines are close enough to pair. The enol tautomer form may be present for one of the bases. Two structures which have looped out adenines have been described in the literature. Crystal packing forces have been implicated in determining the positions of these extrahelical bases. Nevertheless these structures show how unpaired bases can be associated with a stable duplex.

31 A. R. Fersht, J. W . Knill-Jones, and W . C. Tsui, J. Mol. Biol. 156, 37 (1982).

[ 12]

CRYSTALLOGRAPHY OF D N A CONTAINING ERRORS

a

H3C

b

0

T~/N-H .... 0

N--

Crystallographic studies of DNA containing mismatches, modified and unpaired bases.

[ 12] CRYSTALLOGRAPHY OF DNA CONTAININGERRORS 221 [12] C r y s t a l l o g r a p h i c S t u d i e s o f D N A C o n t a i n i n g Mismatches, Modi...
503KB Sizes 0 Downloads 0 Views