Proc. Natl. Acad. Sci. USA

Vol. 88, pp. 1237-1241, February 1991 Biophysics

Protein solution structure determination using distances from twodimensional nuclear Overhauser effect experiments: Effect of approximations on the accuracy of derived structures PAUL D. THOMAS, VLADIMIR J. BASUS,

AND

THOMAS L. JAMES*

Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143

Communicated by Mildred Cohn, October 22, 1990

Solution structures for many proteins have ABSTRACT been determined to date utilizing interproton distance constraints estimated from two-dimensional nuclear Overhauser effect (2D NOE) spectra. Although the simple isolated spin pair approximation (ISPA) generally used can result in systematic errors in distances, the large number of constraints enables protein structure to be defmed with reasonably high resolution. Effects of these systematic errors on the resulting protein structure are examined. Iterative relaxation matrix calculations, which account for dipolar interactions between all protons in a molecule, can accurately determine internuclear distances with little or no a priori knowledge of the molecular structure. The value of this additional complexity is also addressed. To assess these distance determination methods, hypothetical "experimental" data, including random noise and peak overlap, are calculated for an arbitrary "true" protein structure. Three methods of obtaining distance constraints from 2D NOE peak intensities are examined: one entails a conservative use of ISPA, one assumes the ISPA to be fairly accurate, and one utilizes an iterative relaxation matrix method called MARDIGRAS (matrix analysis of relaxation for discerning the geometry of an aqueous structure), developed in this laboratory. A distance geometry algorithm was used to generate a family of structures for each distance set. The quality of the average structure from each family was good. The rootmean-square deviation of that average structure from the true structure was improved about 2-5% using the more restrictive rather than the more conservative ISPA approach. Use of MARDIGRAS in a conservative fashion-i.e., with a poor initial model-resulted in improvement in the root-mean-square deviation by 8-15%. With a better initial model, MARDIGRAS obtained even more accurate distances. MARDIGRAS also permits analysis of 2D NOE data at longer mixing times, yielding additional distances. Use of more restrictive ISPA distances did, however, result in a few systematically incorrect structural features in local regions of the protein, producing distortions of 2-3 A. Comparison between experimental data and spectra calculated for the structures correlates with root-mean-square deviation, offering a method of structure evaluation. An R factor for evaluating fit between experimental and calculated 2D NOE intensities is proposed.

Interproton distances obtained from homonuclear proton two-dimensional nuclear Overhauser effect (2D NOE) experiments are used to determine three-dimensional protein structure in solution (1-4). Various protocols are used for structure determination, but the initial step often utilizes distance geometry (DG) to generate a family of structures consistent with NOE distance constraints (5, 6). Some methods entail theoretical energy calculations, energy minimization, or restrained molecular dynamics, with pseudoenergy terms

maintaining NOE-derived distances (7, 8). Families of structures for several proteins have been produced that exhibit little internal variance and few distance violations. But it does not necessarily follow that these families accurately represent the actual molecular structure. Approximations used to derive distance constraints from NOE intensities have been questioned (9-13). The effect of these approximations on the inferred protein structure, however, has not been reported. Here we primarily address two questions: (i) Will semiquantitative NOE distances suffice to define protein tertiary structure, and how precisely can we specify distances before we sacrifice accuracy? (ii) Are techniques without the approximations worth the computer time required to generate more accurate distances? To assess the effect of different methods of obtaining distance constraints on the derived structures, we must know the "true" protein structure precisely. We therefore used hypothetical 2D NOE spectra, generated by our program CORMA (14), for a defined structure. From the simulated data, we employed three methods to generate three sets ofdistance constraints. Two of the methods rely on the commonly used two-spin or isolated spin pair approximation (ISPA) at the extremes of restrictive and conservative assignment of distance bounds. The third method utilizes our program MARDIGRAS (matrix analysis of relaxation for discerning the geometry of an aqueous structure) (15). MARDIGRAS iteratively refines the complete proton relaxation matrix until it is both internally consistent and consistent with any input experimental 2D NOE intensities. It then calculates distances from interproton relaxation rates for proton pairs corresponding to experimental cross-peaks. Each distance set was used separately for DG calculations. The family of structures generated by DG for each distance set was then compared with the true structure. Systematic errors in DG structures are unlikely to be corrected by energy-based computations, particularly if they are due to distance errors. Energy-based methods would add an additional complication, the balance between "real" intramolecular forces and NOE pseudopotentials. We decided that analysis of DG structures provided the most meaningful test of distance assignment methods. Determination of Interproton Distances from 2D NOE Intensities

Typically, distances are estimated from 2D NOE cross-peak intensities using the two-spin or ISPA: rij = rref (aref/au) 6, where ru is the interproton distance to be estimated and aY is the corresponding 2D NOE cross-peak intensity; rref and aref are a known interproton distance and its cross-peak intensity, respectively. Assumptions of ISPA are (i) mixing time Tm is Abbreviations: 2D NOE, two-dimensional nuclear Overhauser effect; ISPA, isolated spin pair approximation; DG, distance geometry; rmsd, root-mean-square deviation(s); BPTI, bovine pancreatic trypsin inhibitor. *To whom reprint requests should be addressed.

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

1237

1238

Biophysics: Thomas et al.

sufficiently short that each cross-peak intensity originates only from relaxation between two spins and (ii) internal motions are negligible. In general, assumption of an isotropic motional model leads to relatively small errors (i.e., < 10%o) (9). The chief cause of error for ISPA lies in neglecting multispin relaxation effects commonly referred to as "spin diffusion" (9-12). Estimates of the inherent error associated with ISPA, reflected by the upper and lower bounds assigned to the distances, vary widely throughout the literature. Some studies use estimated distances only qualitatively, but others assume the extreme (i.e., sixth power) dependence of intensity on distance allows distances to be specified more precisely. Borgias et al. (12) showed that, for mixing times generally accepted as sufficiently short (i.e., 50-100 ms), ISPA can result in systematic errors of 45-80% in distances over 3.5 A, the range most important in defining molecular structure.

Several techniques have been proposed that obviate ISPA's inadequacy. MINSY entails saturating selected spins during the mixing period, preventing spin-diffusion from occurring by means of these spins (16). Most methods, however, make better distance approximations by at least partial consideration of multispin effects (17-20). Complete relaxation matrix approaches take into account all dipoledipole interactions, explicitly accounting for spin diffusion (9, 20-23). In addition, internal motions can be included in calculation of distances, although these will increase the uncertainty in distances to protons involved in the motion. Multiple conformational states can be modeled as well, with a weighted average of the relaxation matrices describing each state. Iterative methods for fitting experimental and theoretical 2D NOE intensities using relaxation matrix approaches yield unbiased distances (11, 15, 24). In particular, an efficient program has been developed to accurately calculate distances (15) without relying at each cycle on more computationally expensive techniques such as DG or restrained molecular dynamics. This program is called MARDIGRAS. It has been used for nucleic acid structure determination (25) and here is compared to ISPA for determining protein distance constraints. Methodology Generation of Hypothetical 2D NOE Spectral Intensities. 2D NOE mixing coefficients (proportional to 2D NOE intensities) were calculated using our program CORMA. The arbitrary structural model for these calculations was the Spti crystal structure variant (26) of bovine pancreatic trypsin inhibitor (BPTI), a 58-amino acid protein. Protons were positioned with a locally written program to idealize their geometry with respect to heavy atom coordinates. Spectra were calculated for mixing times of 100 and 200 ms for isotropic overall correlation times of 2 ns and 5 ns. Tc for BPTI at 600C is 2 ns; we ran the 5-ns simulation to determine effects of greater spin diffusion with slower motions. Internal molecular motions were modeled as follows. Unresolved methylene and methyl peak intensities were calculated with simplified rapid twoand three-state jump models: effective distances to pseudoatoms were calculated by (r-3) averaging over individual proton positions without using an anisotropic spectral density function. Unresolved aromatic ring proton peak intensities were calculated with slower (r-6) averaging. Random noise was added to each spectrum within the range of ±0.25% of the diagonal peak intensity at mixing time 0. This is quite conservative; signal-to-noise ratios in local experimental spectra are generally much higher. The cutoff for accepted cross-peak intensities was 0.3%. To mimic real data sets with information loss due to peak overlap, a realistic subset of each calculated spectrum was chosen to correspond to assigned, resolved peaks in an experimental spectrum for

Proc. Natl. Acad. Sci. USA 88 (1991)

BPTI. All final hypothetical data sets included the same 812 2D NOE cross-peak intensities. With longer mixing time (200 ms) or longer correlation time (5 ns), more than 812 crosspeaks would be observable, but we limited data sets so that structures obtained for all data sets could be compared. With the relatively large number of structural constraints, derived structures in this study may be less sensitive to sporadic distance errors than in many actual cases. Determination of Distances from 2D NOE Intensities. As ISPA requires short mixing times, all ISPA distances were estimated from the 100-ms data. We selected from the current literature two ISPA-based approaches to assigning distances. The first is "conservative ISPA": this approach yields broad distance ranges allowing for significant error. The second, "restrictive ISPA," utilizes ISPA to calculate shorter distances and otherwise assigns relatively narrow error boundaries. A third distance set was determined by means of MARDIGRAS, the least computationally intensive of the iterative complete relaxation matrix approaches. NOE-derived distances were not supplemented with any additional constraints, other than holonomic constraints necessary for distance geometry calculations. Not all NOE intehsities yield useful distance information-e.g., cross-peaks between geminal protons and between aromatic ring protons. So the final distance sets contain fewer constraints than there were 2D NOE intensities. Of the 812 "observable" cross-peak intensities, only 708 yield useful distance constraints. Conservative Distance Bounds Using ISPA. Distance constraints were assigned to categories based on fixed-distance Phe and Tyr H81-Hel and H82-He2 cross-peak intensities. These intensities build up more slowly than those for geminal proton pairs (often used for ISPA) and are therefore a more reasonable approximation of the initial rate condition. In the experimental spectrum used for reference, only one of these peaks was resolvable (H81-Hel of Tyr-35). This cross-peak alone was used as reference (distance = 2.49 A). Table 1 lists distance assignments. For distances to unresolvable methyl and methylene groups, the standard real-atom approach was used, assigning the distance to the central carbon atom and adding 1.5 or 1.0 A, respectively, to the constraints. For aromatic ring pseudoatoms, distances were assigned to the geometric-mean carbon atom (C'y for 6-protons and C; for E-protons) after adding a 2.0-A correction factor. For lower bounds, the minimum distance was the sum of the van der Waals radii. Restrictive Distance Bounds Using ISPA. For this case, lower bounds were explicitly assigned. Shorter distances were calculated directly using ISPA (±0.3 A). The average value of all Hal-Ha2 geminalproton intensities was used for calibration (distance = 1.77 A). Smaller cross-peak intensities were placed in one of two distance categories; distance Table 1. Assignment of distance constraints from 2D NOE intensities Distance Intensity range, A Distance set (Tf/ns) range, % 1.8 -2.8 >2.0 Conservative ISPA (2) 1.8-3.5 0.8-2.0 1.8 - 5.0 0.3 - 2.0 1.8 - 2.8 >4.5 Conservative ISPA (5) 1.8 - 3.6 1.5 - 4.5 1.8-5.2 0.3 - 1.5 ISPA ± 0.3 >1.2 Restrictive ISPA (2) 3.0-3.7 0.9- 1.2 3.6 - 5.0 0.3 - 0.9 ISPA + 0.3 >2.0 Restrictive ISPA (5) 3.1-3.8 1.5-2.0 3.6-5.0 0.3 - 1.5

No. 350 262 200 351 318 143 520 57 235 604 65 143

Biophysics: Thomas et al. assignments are in Table 1. Upper bound pseudoatom corrections were the same as for the conservative ISPA distances; lower bounds were decreased by 0.5 A for methyls and methylenes and by 1.0 A for unresolved aromatic ring protons.

Distance Bounds Using MARDIGRAS. MARDIGRAS requires that experimental 2D NOE intensities be supplemented by intensities calculated for some arbitrary model structure. To minimize bias, the model used for all distance calculations, unless otherwise noted, was the 5pti proton coordinate set randomized by a root-mean-square (rms) shift of 3.0 A. This rather poor model gave MARDIGRAS distances with a rms deviation (rmsd) between upper and lower bounds of 1.68 A for 2-ns data and 1.54 A for 5-ns data. When the 4pti crystal structure (0.39 A rmsd vs. Spti over backbone atoms of residues 1-56) was used as starting model, the results were much better: rmsd of 1.53 A for 2-ns data and 1.26 A for 5-ns data. As an additional test, MARDIGRAS was run using an extended-chain structure as the initial model. Overall, results were comparable to those for the randomized model. Although MARDIGRAS is capable of generating a largely correct set of distances independent of starting model, distances are improved with a better initial model. Distances generated using the 4pti model have higher precision than those for either the extended-chain or the randomized-coordinate model. Each MARDIGRAS calculation took 35-50 min on a Sun Sparcstation 1. MARDIGRAS yields more accurate distances from the 2-ns data than from the 5-ns data. More extensive cross-relaxation occurs during Tm at the longer correlation time, so intensities tend toward a common value. Although MARDIGRAS accounts for spin diffusion, it cannot compensate for loss of information resulting from severe spin diffusion. Despite this limitation, because MARDIGRAS inherently accounts for spin diffusion, it still yields accurate distances at longer mixing times than can be analyzed using ISPA, potentially providing more distance constraints. Increasing the number of constraints will improve structural resolution. We ran MARDIGRAS using 100- and 200-ms data. Distances calculated for these different 2D NOE intensity sets were in general agreement, but the 100-ms distances were slightly more accurate. But an additional 50 distances were obtained from the 200-ms data whose cross-peaks are below the noise level after 100 ms. However, we did not use these distances in the structure determination phase of this study as we wanted to test the effect of distance accuracy rather than number of constraints. The complete relaxation network approach here provides accurate distances; generally distances of 3-4 A can be specified to +0.2 A, and distances of 5 A can be specified to ±0.7 A. The iterative portion of MARDIGRAS does not rely on any model for internal motions. Convergence is required only of relaxation rates. Distances calculated from these rates, how-

Proc. Natl. Acad. Sci. USA 88 (1991)

1239

ever, depend on motional model. For this study, we assumed only isotropic overall motion. For unresolvable equivalent spins, MARDIGRAS gives distances to a pseudoatom; only the uncertainty in these distances is currently estimated using a motional model. For methyl and methylene groups, the correction factor is ±0.3 A, reflecting a worst-case deviation of the effective proton pseudoatom position from the central carbon atom, assuming (r-3) averaging. For distances to

aromatic ring pseudoatoms, the correction factor is the same for ISPA distances. Generation of Structures Using Distance Geometry. We used the program VEMBED written by John Thomason at Univ. of California, San Francisco; it is a vectorized version of EMBED (27). For each set of NOE distances, two DG runs were made, each producing 20 structures. A control set of "perfect" distances was also run through VEMBED to distinguish errors due to distance constraints from those due to the VEMBED algorithm and to our subsampling ofthe hypothetical NOE data. For each intensity in the 100-ms data, the actual distance (±0.02 A) in the Spti structure was used. In addition, distance ran es were modified for unresolvable equivalent spins, ±0.3 A for methyl and methylene carbon distances and +2.0/-1.0 A for aromatic ring carbons, to reproduce the precision of the experimentally derived distance sets. The best 8-10 structures for each set of distances were selected, based on the final value of the VEMBED target function. as

Results of Distance Geometry Calculations

Results are summarized in Tables 2-4. All rmsd are calculated over all residues except for the N-terminal residue and three at the C terminus. These residues were found to be very poorly defined even for the control set of perfect distances, due simply to lack of constraints. Table 2 lists rmsd values for best-fit superposition of each structure relative to (i) the "average structure" for that particular distance set and (ii) the true structure. It is common to list rmsd from the average structure as a measure of similarity between the individual structures in a family. Table 3 compares these average structures to the starting structure. Since the NOE experiment yields time-averaged data, it is often assumed that this average structure is representative of the time-averaged structure. This approach may be justified: Table 3 shows that the average structure is consistently closer to the true structure than any single member of a family. One method of judging accuracy of a proposed molecular structure is to calculate the 2D NOE spectrum for that model and compare it to the experimental data (28). This has gained broad acceptance recently, and several refinement techniques based on such calculations have been proposed (11, 15, 19, 24, 29). Table 4 was compiled using CORMA to calculate the theoretical spectrum for each structure and comparing the fit between these spectra and the original

Table 2. rms difference between upper and lower distance bounds and rmsd of each DG structure in a set from the averaged coordinates of structures in the set and from the true starting structure rmsd of DG structures from average rmsd of DG structures from true structure,* A structure,* A rmsd between Backbone Side chain Backbone Side chain Distance set (Tr/ns) bounds, A atoms atoms All atoms atoms atoms All atoms Control 1.09 0.69 ± 0.09 1.34 ± 0.15 1.13 ± 0.12 1.10 ± 0.08 2.16 ± 0.17 1.81 ± 0.12 Conservative ISPA (2) 2.77 0.76 ± 0.10 1.44 ± 0.12 1.23 ± 0.11 1.27 ± 0.11 2.43 ± 0.12 2.07 ± 0.09 Restrictive ISPA (2) 2.19 0.72 ± 0.09 1.43 ± 0.13 1.22 ± 0.11 1.19 ± 0.09 2.38 ± 0.15 2.00 ± 0.12 MARDIGRAS (2) 1.68 0.66 ± 0.06 1.34 ± 0.10 1.13 ± 0.07 1.08 ± 0.10 2.26 ± 0.16 1.89 ± 0.13 ± ± Conservative ISPA (5) ± ± 2.75 ± 0.76 0.08 1.46 0.14 1.23 0.10 1.36 0.06 2.50 0.13 2.13 ± 0.10 Restrictive ISPA (5) 2.00 0.64 ± 0.08 1.33 ± 0.07 1.11 ± 0.06 1.22 ± 0.06 2.35 ± 0.15 1.98 ± 0.10 MARDIGRAS (5) 1.54 0.73 ± 0.09 1.40 ± 0.13 1.19 ± 0.10 1.13 ± 0.11 2.24 + 0.12 1.89 ± 0.10 *Values represent the mean ± standard deviation.

1240

Biophysics: Thomas et al.

Proc. Natl. Acad. Sci. USA 88 (1991)

Table 3. rmsd between the averaged coordinates of each set of DG structures and the true structure Backbone Side chain All atoms, A atoms, A atoms, A Structure set (Tcns) 1.42 1.70 0.86 Control 1.66 1.95 1.02 Conservative ISPA (2) 1.65 1.98 0.97 Restrictive ISPA (2) 1.52 1.81 0.86 MARDIGRAS (2) 1.74 2.03 1.13 Conservative ISPA (5) 1.64 1.94 1.03 Restrictive ISPA (5) 1.47 1.75 0.86 MARDIGRAS (5)

"data." Several different figures of merit were calculated to the overall fit. The first two measures are analogous to crystallographic residual indices, or R factors: express

R, =

E

ai

-

acI/

a' I1/2

R2 = [

(a'

-

a

i

2/ E(a i 2]

where subscripts denote calculated (c) and observed (o) intensities. These R factors are well-established in crystallography, but for NMR other functions may be more descriptive. We favor sixth-root residual indices:

Rx =

E

(a )1/6 -(a' )1/61/E

(a' )1/6

Rx = [ [(a')1/6 (a')1/6]2/ E[(a' )1/6]2] -

These equations attempt to relate intensities, assuming approximate dependence on r-6, to the coordinate space of the model. Because of this extreme distance dependence, errors in the shortest, often least structurally interesting distances tend to dominate R1 and R2. Sixth-root scaling allows longerrange interactions (i.e., up to =5 A) to be considered as well, though they are still not weighted as heavily as the larger cross-peaks (appropriate due to their lower signal-to-noise ratio). Although the different residual indices generally offered no consensus in ranking structures within a given set of structures, gross results of the comparisons are essentially independent of the form of the figure of merit. Differences in overall fits to data qualitatively agree with the rmsd calculations. This confirms the usefulness of comparing calculated spectra to experimental data. Of the different residual indices, RX ranked structures within a given set most consistently with rmsd rankings, though the correlation was not significant for some sets. MARDIGRAS structures fit the data consistently better than either ISPA set.

The most significant trend in all of these tables is the increase in structural accuracy with increasingly restrictive constraints. As the distance ranges become smaller, the structures become more conformationally restricted. As long as the smaller distance ranges are in reasonable agreement with the true distances, there is a greater probability that a randomly chosen distance from a smaller range will be more accurate than one chosen from a larger range. Conservative ISPA distances do not specify a lower bound and are therefore more likely to result in an underestimated random distance. Distance constraints in other parts of the molecule may correct these initial distances, but sometimes they may not. Considering rmsd, restrictive ISPA structures are on average about 2-5% closer to the true structure than those generated from conservative ISPA distances. MARDIGRAS structures are another 5-10% closer to the true structure than restrictive ISPA. Significantly, the control set of perfect distances yielded structures with no improvement in backbone rmsd and only 5% improvement in side chain rmsd relative to the MARDIGRAS structures. Improvement in structure quality is strikingly revealed by values of the residual indices (Table 4). These R factors for different distance sets differ by two or more standard deviations, whereas the rmsd between atom coordinates may differ by less than one standard deviation. One might expect an overall decrease in accuracy of derived structures with increasing Tr, particularly with ISPA, due to increased spin diffusion. For all three methods, more incorrect distance constraints were calculated for the 5-ns data than for the 2-ns data. For MARDIGRAS and restrictive ISPA methods, however, structures derived for the different correlation times are comparable, for both overall rmsd and R factors (Tables 2-4). This may be a result of tighter bounds in the 5-ns distance sets (Table 2). This tightening of constraints for the restrictive ISPA distances is shown in Table 1: more cross-peak volumes in the 5-ns data set are large enough to estimate directly. For MARDIGRAS, uncertainty in NOE-derived distances depends in part on signal-to-noise ratio; generally increasing signal intensity at 5 ns will decrease uncertainty. Tables 2-4 describe structures on a global level. Determination of local structure will be important in certain regions, such as ligand binding sites. We used computer graphics to compare each DG structure set with the true structure. Because distances in each structure could be compared to those in the true structure, we used a list of consistently incorrect distances (in error by >0.5 A in all structures in a set) to direct a search for systematic errors in determined structure. Fig. 1 Left shows one region of local structure for which distance errors accumulate to produce some systematic errors. Structures generated from the 5-ns data from restrictive ISPA distances and MARDIGRAS distances are compared. The Phe-45 ring is systematically displaced about

Table 4. Comparisons between calculated 2D NOE intensities for each structure and "experimental" data R1 RX2 Structure set (TC/ns)* R2 R, ± 0.008 ± ± 0.137 ± 0.079 0.005 0.38 0.07 0.31 0.03 Control (2) 0.199 ± 0.010 0.125 ± 0.007 0.51 ± 0.02 0.46 ± 0.02 Conservative ISPA (2) 0.183 ± 0.017 0.109 ± 0.009 0.44 ± 0.04 0.39 ± 0.03 Restrictive ISPA (2) 0.157 ± 0.014 0.092 ± 0.007 0.38 ± 0.03 0.34 ± 0.02 MARDIGRAS (2) 0.094 ± 0.008 0.062 ± 0.005 0.37 ± 0.03 0.30 ± 0.02 Control (5) 0.131 ± 0.007 0.094 ± 0.003 0.52 + 0.04 0.45 ± 0.02 Conservative ISPA (5) 0.117 ± 0.007 0.083 ± 0.005 0.48 ± 0.03 0.40 ± 0.03 Restrictive ISPA (5) 0.107 ± 0.008 0.074 ± 0.003 0.34 ± 0.01 0.40 + 0.03 MARDIGRAS (5) Values represent the mean ± standard deviation. See text for definitions of the various residual indices. *Numbers in parentheses also represent the correlation time used for the calculation of 2D NOE intensities.

Biophysics: Thomas et al.

Proc. Natl. Acad. Sci. USA 88 (1991)

1241

ing distances from 2D NOE intensities. MARDIGRAS will soon be modified to include more sophisticated internal motional models such as those employed in our original complete relaxation matrix studies (9).

FIG. 1. Comparison between best-fit DG structures generated from restrictive ISPA distances (Left) and MARDIGRAS distances (Right). Distances were derived from 2D NOE spectra calculated for a mixing time of 100 ms and a correlation time of 5 ns. The true structure is shown in dashed lines. Only BPTI residues 45-51 are shown, and all side chains except for Phe-45, Glu-49, and Cys-51 have been removed for clarity. Structures are superimposed for best fit over backbone atoms of residues 44-52.

2.0 A, due to incorrectly short distances to Arg-20 and Tyr-21. Systematically short distances from Phe-45 to Cys-51 result in an average displacement of the Cys oxygen atom by 2.3 A. Although these errors involve a single incorrect torsion angle, Glu-49 contains several additive erroneous torsion angles. Two systematically incorrect intraresidue distances, along with another short distance to Asp-50, resulted in the Glu-49 carboxylate being displaced >3.0 A. MARDIGRAS structures show none of these systematic singularities (Fig. 1 Right). Some systematic errors, notably backbone displacements over short (two to four residue) segments, were found in conservative ISPA structures for both correlation times, but these may be a result of limited DG sampling of distance space, as there were no significant errors in proton distances in these regions. Discussion Results here suggest that more restrictive distance constraints yield more accurate structures than conservative application of the ISPA approach, using as judgment criteria both rmsd of heavy atom coordinates and spectral R factors. However, incorrect restrictive constraints-e.g., some from the restrictive ISPA approach-can lead to systematically incorrect local structural features. Despite the few instances of systematic error, even highly qualitative distances could define backbone folding correctly as well as the general position of most side chains. This assumption is inherent in most protein solution structure studies to date; our results confirm its validity. But the small investment of computer time to obtain more accurate distances by means of MARDIGRAS results in structures that agree more closely both with the actual structure and with the experimental data. MARDIGRAS distances can be used with either restrained molecular dynamics or DG to yield structures. MARDIGRAS also offers the potential of determining additional distances unavailable using ISPA; MARDIGRAS explicitly accounts for spin diffusion and can thus derive accurate distances from data obtained for longer mixing times. Stereospecific assignment of prochiral centers was intentionally neglected here; correct stereospecific assignments were assumed. In reality, this is often not the case. The accuracy of MARDIGRAS distances, however, in tandem with a "floating chirality" algorithm allowing inversion of prochiral centers in the structure generation phase, can establish stereospecific assignments and thus higher resolution structures. We note that this study does not address in depth the effects of internal molecular motions. These effects, though generally less important than spin diffusion for macromolecular distance determination, should be considered for deriv-

We thank Drs. Shauna Farr-Jones and Irwin D. Kuntz for helpful discussions and comments on this manuscript, Mr. Robert Cerpa for suggesting use of a sixth-root residual index, and Mr. John Thomason for help with the initial DG calculations. This work was supported by National Institutes of Health Grants GM 39247 and RR 01695 and by a gift from Ajinomoto Co., Inc. The Sun Sparcstation used for the computations was purchased using National Science Foundation Grant DMB-8406826. We gratefully acknowledge use of the Computer Graphics Laboratory (supported by National Institutes of Health Grant RR 01081) and use of the Cray-YMP supercomputer, which was supported by a grant from the Pittsburgh Supercomputing Center through the National Institutes of Health Division of Research Resources, Cooperative Agreement U41RR04154, and a grant from the National Science Foundation, Cooperative Agreement ASC-8500650. 1. Wuthrich, K. (1986) NMR of Proteins and Nucleic Acids (Wiley, New York). 2. Oppenheimer, N. J. & James, T. L., eds. (1989) Methods Enzymol. 176. 3. Oppenheimer, N. J. & James, T. L., eds. (1989) Methods Enzymol. 177. 4. Bax, A. (1989) Annu. Rev. Biochem. 58, 223-256. 5. Nilges, M., Gronenborn, A. M., Brunger, A. T. & Clore, G. M. (1988) Protein Eng. 2, 27-38. 6. Billeter, M., Kline, A. D., Braun, W., Huber, R. & Wuthrich,

K. (1989) J. Mol. Biol. 206, 677-687. 7. Zuiderweg, E. R. P., Scheek, R. M., Boelens, R., van Gunsteren, W. F. & Kaptein, R. (1985) Biochimie 67, 707-715. 8. Holak, T. A., Prestegard, J. H. & Forman, J. D. (1987) Biochemistry 26, 4652-4660. 9. Keepers, J. W. & James, T. L. (1984) J. Magn. Reson. 57, 404-426. 10. Lefevre, J.-F., Lane, A. N. & Jardetzky, 0. (1987) Biochemistry 26, 5076-5090. 11. Borgias, B. A. & James, T. L. (1988) J. Magn. Reson. 79, 493-512. 12. Borgias, B. A., Gochin, M., Kerwood, D. J. & James, T. L. (1990) Prog. Nucl. Magn. Reson. Spectrosc. 22, 83-100. 13. Landy, S. B. & Rao, B. D. N. (1989) J. Magn. Reson. 83, 19-43. 14. Borgias, B. A., Thomas, P.D. & James, T. L. (1989) CORMA, Complete Relaxation Matrix Analysis (Univ. of California, San Francisco), Version 2.2. 15. Borgias, B. A. & James, T. L. (1990) J. Magn. Reson. 87, 475-487. 16. Massefski, W., Jr., & Redfield, A. G. (1988) J. Magn. Reson. 78, 150-155. 17. Eaton, H. L. & Andersen, N. H. (1987) J. Magn. Reson. 74, 212-225. 18. Hyberts, S. G. & Wagner, G. H. (1989) J. Magn. Reson. 81, 418-422. 19. Baleja, J. D. & Sykes, B. D. (1990) J. Magn. Reson. 87, 375-384. 20. Borgias, B. A. & James, T. L. (1989) Meth. Enzymol. 176, 169-183. 21. Olejniczak, E. T., Gampe, R. T., Jr., & Fesik, S. W. (1986) J. Magn. Reson. 67, 28-41. 22. Macura, S., Farmer, B. T., II, & Brown, L. R. (1986) J. Magn. Reson. 70, 493-499. 23. Mirau, P. A. (1988) J. Magn. Reson. 80, 439-447. 24. Boelens, R., Koning, T. M. G. & Kaptein, R. (1988) J. Mol. Struc. 173, 299-311. 25. Gochin, M. & James, T. L. (1990) Biochemistry 29, 1117211180. 26. Wlodawer, A., Walter, J., Huber, R. & Sjolin, L. (1984) J. Mol. Biol. 180, 307-329. 27. Havel, T. F., Kuntz, I. D. & Crippen, G. M. (1983) Bull. Math. Biol. 45, 665-720. 28. Suzuki, E.-I., Pattabiraman, N., Zon, G. & James, T. L. (1986) Biochemistry 25, 6854-6865. 29. Yip, P. & Case, D. A. (1989) J. Magn. Reson. 83, 643-648.

Protein solution structure determination using distances from two-dimensional nuclear Overhauser effect experiments: effect of approximations on the accuracy of derived structures.

Solution structures for many proteins have been determined to date utilizing interproton distance constraints estimated from two-dimensional nuclear O...
1MB Sizes 0 Downloads 0 Views