J. Mol. Biol. (1992) 226, 1143-1159
Hydrogen Bonding in Globular Proteins Douglas F. Stickle Department of Biological Chemistry Pennsylvania State University College of Medicine Hershey, PA 17033, U.S.A.
Leonard G. Presta Genentech, Inc Department of Protein Engineering 460 Point San Bruno Blvd. South San Francisco, CA 94080, U.S.A.
Ken A. Dill Department of Pharmaceutical Chemistry University of California 3333 California Street, Room 102 San Francisco, CA 94118, U.S.A.
and George D. Rose? Department of Biochemistry and Biophysics University of North Carolina, CB 7260 FLOB Chapel Hill, NC 27599, U.S.A. (Received
7 February
1992, accepted 13 April
1992)
A global census of the hydrogen bonds in 42 X-ray-elucidated proteins was taken and the following demographic trends identified: (1) Most hydrogen bonds are local, i.e. between partners that are close in sequence, the primary exception being hydrogen-bonded ion pairs. (2) Most hydrogen bonds are between backbone atoms in the protein, an average of 68 yo. (3) All proteins studied have extensive hydrogen-bonded secondary structure, an average of 82 %. (4) Almost all backbone hydrogen bonds are within single elements of secondary structure. An approximate rule of thirds applies: slightly more than one-third (37%) form i-G - 3 hydrogen bonds, almost one-third (32 %) form i G-4 hydrogen bonds, and slightly less than one-third (26%) reside in paired strands of P-sheet. The remaining 5% are not wholly within an individual helix, turn or sheet. (5) Side-chain to backbone hydrogen bonds are clustered at helix-capping positions. (6) An extensive network of hydrogen bonds is present in helices. (7) To a close approximation, the total number of hydrogen bonds is a simple function of a protein’s helix and sheet content. (8) A unique quantity, termed the reduced number of hydrogen bonds, is defined as the maximum number of hydrogen bonds possible when every donor : acceptor pair is constrained to be 1:l. This quantity scales linearly with chain length, with 071 reduced hydrogen bond per residue. Implications of these results for pathways of protein folding are discussed.
Keywords: hydrogen
tAuthor Biophysics.
to whom correspondence Washington
University
bonding;
H-bonding; helix
should
be addressed
School
of Medicine,
protein capping
at his current Box
1143 0022%2836/92/161143-17
$08.00/O
folding;
8231,
660
protein
address: Dept. 8. Euclid
Ave.,
secondary
structure;
of Biochemistry St Louis, MO 0
8: Molecular 63110, U.S.A.
1992 Academic
Press
Limited
1144
D. F. Stickle
1. Introduction The current view of protein folding and stability has been conditioned by ideas developed during the preceding several decades of research (Dill, 1990). The driving force for folding was initially thought to be intramolecular hydrogen bonding. This was Pauling’s view, and it led him to the crucial role ascribed to hydrogen bonding in model structures for a-helix (Pauling et al., 1951) and p-sheet (Pauling & Corey, 1951). Such models, which were constrained by the known geometry of the peptide unit (Corey, 1948), sought to optimize both the number and the geometry of >N-H.. ~0 = C< hydrogen bonds. The existence of these predicted structures was soon confirmed in ongoing X-ray crystal studies (Perutz, 1951), bolstering the assumption that hydrogen bonds play the formative role in folding and stability. This early view of hydrogen bonding was later supplanted by the current “text-book” view that the hydrophobic effect, not hydrogen bonding, serves as the driving force for protein folding (Kauzmann, 1959). However, protein folding cannot be simply a matter of burying apolar residues while exposing polar ones. Even residues with hydrophobic side-chains have a pronounced hydrogenbonding capacity due to the presence of backbone > N-H and >C = 0 groups. Were such residues unable to realize hydrogen bonds within the molecular interior, then hydrogen bonding would favor denaturation, since presumably these same groups could hydrogen bond readily to water in the unfolded state. One approach to understanding the patterns and principles of hydrogen bonding in proteins is to survey a representative collection of molecules and search out recurrent themes. Such an approach was used by Baker & Hubbard (1984) in an elegant and influential review that laid the groundwork for many later studies of proteins, including the present one. Hydrogen bond geometry in proteins has been analyzed in individual molecules (Artymiuk & Blake, 1981) and by systematic surveys (Baker & Hubbard, 1984; Singh et al., 1987; Barlow & Thornton, 1988; Ippolito et al., 1990; Jeffrey & Saenger, 1991). Small-molecule crystal structures have also provided a wealth of information about hydrogen bond geometry (Ceccarelli et al., 1981; Jeffrey & Maluszynska, 1982; Taylor et al., 1983; Murray-Rust & Glusker, 1984; Taylor 6 Kennard, 1984; Vedani & Dunitz, 1985). Our emphasis in the present study is less on geometry and more on the numbers, types and patterns of hydrogen bonds, especially their relationship to protein secondary structure. Of particular interest is the concept of a network hydrogen bond, in which a given donor or acceptor participates in two or more hydrogen bonds. Central to this study is a novel collection of computer algorithms that identify hydrogen bonds and hydrogen-bonded secondary structure in a twostep procedure. (1) Identification of hydrogen
et al.
bonds, from X-ray co-ordinates, is based on multiple criteria (similar though not identical to those of Baker and Hubbard (1984)) extracted from both protein and small-molecule crystal structures. (2) Identification of secondary structure, from hydrogen bonds, is based solely on the automatic recognition of signature patterns in a plot of donors versus acceptors, similar in intent to work of Richards & Kundrot (1988). In the following pages, the comprehensive set of hydrogen bonds from a data base of 42 high-resolution, X-ray-elucidated proteins is analyzed for patterns and trends. Following this analysis, it becomes apparent that most hydrogen bonds are between backbone atoms, within individual elements of secondary structure; the majority of these are local, i.e. between partners that are close in sequence. Hydrogen bonds between main-chain and side-chain atoms are also highly local, with conspicuous clusters at characteristic sequence distances. Such clusters are found to be indicative of frequent helix-capping interactions (Presta & Rose, 1988; Richardson & Richardson, 1988), in which polar side-chains from residues that flank helix termini form hydrogen bonds with the initial >NH donors or final >C = 0 acceptors in the helix backbone. In proteins, approximately half of all polar groups are buried within the molecular interior (Chothia, 1976; Richards, 1977), where, shielded from solvent, they nevertheless find intramolecular hydrogen bond partners and are rarely left unsatisfied (Chothia, 1975; Richards, 1977). A lingering question asks how buried polar groups can realize hydrogen bonds so effectively within such a constrained microenvironment. The prevalence of local interactions can rationalize this long-standing conundrum. In this view, the accretion of structure involves prior formation of a sequentially local hydrogen-bonded framework (Kim & Baldwin, 1982), with concomitant or subsequent condensation of these nascent elements into larger composites (Rose, 1979; Fetrow et al. 1988).
2. Materials (a) Identifiation
and Methods
of hydrogen bonds in X-ray-elucidated protein structures
Hydrogen-bonding criteria were derived from smallmolecule crystal studies and are similar to those used in an earlier study (Presta & Rose, 1988). For each protein, hydrogen bonds were identified in a a-step process. First, polar groups were classified by atom type and hybridization into 1 of 5 categories: Nsp2, Nsp3, Osp’, Osp3, or Ssp3. All groups and their classes are listed in Table 1. Second, distances between all potential donor-acceptor pairs in the protein were evaluated. A pair was retained for further analysis whenever its interatomic distance was less than or equal to the sum of the hydrogen bond radii of its respective atoms. It should be noted that the hydrogen bond radii in Table 1 are about 10% larger than the corresponding van der Waals’ radii. Expanded radii were used to account for the presence of hydrogen atoms
1145
Hydrogen Bonding in Proteins
Table 1 Hydrogen bond donors and acceptors from amino acid residues NS$?
>NH peptide, 190
-NH, Asn, Gln
Trp
190 A (2)
A (1)
N5p’ Donors
osp3
asp=
Acceptors
Nsp’
asp=
SW, Thr 1.70 A (2) = 0
08p2
ssp3
All main-chain classified by atom donor or acceptor, (4) the maximum
-OH
-coo-
Asp, Glu 160 A (2/O)
peptide, Am, Gln 1.60 A (2) -SMet 1.95 A (2)
1.73(l)
-SH CYS 2.10 A (2)
and aide-chain polar groups from the 20 natural amino acid residues. Groups are type (N or 0) and hybridization (8p2or sp3).Each Table entry contains (1) either a (2) residues in which the group is found, (3) the assigned hydrogen bond radius, and number of hydrogen bonds expected for the group, in parenthesis.
and to compensate for the fact that protein crystal structures lack the resolution of small-molecule crystal strucbetween the tures. The maximum distance electronegative heavy atoms in a hydrogen-bonded pair is usually taken to be 3.1 to 32 A (Baker & Hubbard, 1984). The use of expanded radii retains hydrogen bonds with slightly longer interatomic distances but acceptable geometry in other respects. Finally,
hydrogen
bond
geometry
was
evaluated
for
pairs satisfying the distance criterion. Two angle criteria were assessed, as summarized in Fig. 1: (1) Angle at the acceptor atom, i.e. the scalar angle between D-A-AA. For sp2 acceptors, the ideal value is 135”, and a range of 90 to 180” was accepted. For sp3 acceptors, the ideal value is 1095 ‘, and a range of 60 to 180 ’ was accepted. (2) Angle at the donor atom, i.e. the scalar angle between A-DDD. For sp2 donors, the ideal value is 120”, and a range of 90 to 180” was accepted. For sp3 donors, the ideal value is 1095”, and a range of 90 to 180” was accepted. Studies
of
both
small-molecule
and
protein
crystal
structures have shown that hydrogen bonds involving sp2 donors and/or acceptors exhibit a preferred geometry with the donor/acceptor situated in the plane of its sp2 partner (Artymiuk t Blake, 1981; Ceccarelli et al., 1981; Taylor et al., 1983; Vedani & Dunitz, 1985; Singh et aZ., 1987). Hence, potential hydrogen bond pairs with at least 1 .sp2 partner that satisfied criteria (1) and (2) were subjected to 2 additional tests for planarity. (3) Acceptor in the plane of the donor complex. Planes a (D-DD-DD’) and /I (A-DDD) were defined. The ideal value of the angle between normals to a and /? is 0”;
deviations up to + 60” were accepted. (4) Donor in the plane of the acceptor complex. Planes y (A-AA-AA2) and 6 (D-A-AA) were defined. The ideal value of the angle between normals to y and 6 is 0”; deviations up to + 90’ were accepted. Criterion (3) imposes the constraint that the oxygen acceptor must lie no more than 60” outside the plane of the donor complex. The constraint imposed by criterion (4) allows greater latitude between the position of the nitrogen donor and its acceptor complex because the large
JpA$
AA2
Figure 1. Hydrogen bonding criteria for X-ray-elucidated proteins. AA is an acceptor antecedent atom that is covalently bound to the acceptor; DD and DD’ are donor antecedent atoms that are covalently bound to the donor. Co-ordinates for hydrogen atoms are not available at the resolution of typical protein X-ray crystal structures, and geometric criteria are evaluated using only heavy atoms. The distance between donor and acceptor heavy atoms must not exceed the sum of their hydrogen bond radii. Scalar angles at both the acceptor atom (i) and donor atom (ii) must satisfy criteria given in Materials and Methods. For hydrogen bonds with at least one .sp2 partner, tests for planarity were imposed, as described.
1146
D. F. Stickle et al.
oxygen atom, with its broad “cone of acceptance” can still form an effective hydrogen bond when the hydrogen donor is considerably out-of-plane (Taylor et al., 1984). As an example, consider a backbone >N.. .O = C< hydrogen bond. The angle between the normals to planes defined by atoms N,-G-C: and O-N&; should be near O”, but deviations from planarity of up to +60” are accepted. (b) Identi&xztion
of hydrogen-bonded
secondary structure
The relationship between the protein sequence and the complete set of hydrogen bonds can be visualized from a square plot of donors wemus acceptors (Factor & Mehler, 1991), as illustrated in Fig. Z(a). Visual inspection of the donor versus acceptor plot indicates that most hydrogen bonds are within patterns of secondary structure. These hydrogen bonding patterns can be interpreted in the same manner as the corresponding patterns in a distance plot (Liljas & Rossmann, 1974), with helices and turns describing a trace that runs parallel to and near the diagonal, strands of antiparallel sheet running perpendicular to the diagonal, and strands of parallel sheet running parallel to but distant from the diagonal. Even those hydrogen bonds involving side-chains, which are displayed as open circles in Fig. 2(a), are almost all enmeshed within these 3 predominant secondary structure patterns. The clear patterns seen in Fig. 2(a) have prompted us to devise simple algorithms to scan donor verrus acceptor plots automatically and to identify secondary structure elements objectively. These algorithms start by filtering the donor versus acceptor plot to include only backbone : backbone hydrogen bonds. The filtered set, represented by the filled circles in Fig. 2(a), is then dissected into helix and hydrogen-bonded turns (Fig. 2(b)), antiparallel sheet (Fig. 2(c)), and parallel sheet. (Fig. 2(d)). A brief description of the algorithms follows. (i) Helices and turns Helices and /?-turns have characteristic patterns of > N-H. .O = C!< hydrogen bonds. In the a-helix, hydrogen bonds link residues i-G-4 (Pauling et al., 1951); in 3,, helices and turns (Rose et al., 1985) they link residues i-G-3. To identify helices and turns, a donor versu.s acceptor plot of dimension N x N is filtered to include only the 2 diagonals, 6-, and a-,, that parallel the main diagonal at either (i,i-3) or (i&-4), respectively. Next, each minor diagonal is mapped into a separate N x 1 binary array (Am3 and A-,, respectively) such that for every hydrogen bond in the diagonal, either 4 (for &,) or 5 (for 6-J consecutive array elements are set to true (i.e. unity). For example, corresponding to an i+i - 3 H-bond from 6-,, positions (i, i- 1, i-2 and i-3) would be set to unity in A-,. Finally, the logical union of these arrays, 4-3 u A-,, is mapped onto a composite array and searched for start (011. . ,) and stop (110. .) sequences. Disjunct start and stop sequences bracket the ends of helices while coincident start and stop sequences correspond to isolated hydrogen bonded turns. j-sheet
On a donor versus acceptor plot, the interstrand pattern of > N-H. . .O = C < hydrogen bonds in /?-sheet gives rise to a symmetric trace that is either perpendicular to the main diagonal in antiparallel strands or parallel to the main diagonal in parallel strands. To identify all such traces. the plot, is first reflected across the main diagonal, resulting in a triangular matrix, and all hydrogen bonds previously identified as belonging to helices or turns are
eliminated. Then, every diagonal (parallel strands) and anti-diagonal (antiparallel strands) file of the triangular matrix is scanned exhaustively for the presence of characteristic seeds, which consist of 2 consecutive hydrogen bonds. Once initiated by a seed, the extension of a segment allows for slight deviations from the ideal pattern in order to accommodate “bulges” (Richardson et al., 1978) of no more than 3 residues. In a manner analogous to helices, segments terminate when start (011. .) or stop (110. .) sequences are encountered, providing the structure is not within a bulge. It should be emphasized that our algorithms count all qualifying atoms in each hydrogen-bonded ring as belonging to its given element of secondary structure. For example, a classical p-turn (Venkatachalam, 1968), with an i-G-3 hydrogen bond, forms a lo-membered ring spanning 4 residues. In our accounting, an isolated turn would be counted as having 1 donor, 1 acceptor, 1 hydrogen bond and 4 residues within identified secondary struc-
ture.
3. Results (a) Classi~cation Hydrogen
bonds
were
identified
in
42
X-ray-elucidated proteins from a recent version of the Brookhaven data base (Bernstein et al., 1977). All p;oteins are refined structures, with resolution I 2.OA and R factors I20 o/O. Proteins are listed in Table 2 together with a summary of the numbers and types of hydrogen bonds. The automatic classification of backbone hydrogen bonds into secondary structure types was also performed for each protein, with results shown in Table 2. (b) Network
hydrogen
bonds
The 6734 residues in our data base were found to have 7299 hydrogen bonds, slightly more than one per residue. It should be noted, however, that many hydrogen bonds belong to networks in which each donor (acceptor) is bonded to multiple acceptors (donors). Thus, there are more hydrogen bonds than donor:acceptor pairs. We refer to hydrogen bonds that participate in such networks as network hydrogen bonds. Associated with any given cluster of hydrogen bonds is a unique number, %, which corresponds to the maximum number of hydrogen bonds possible for that set of donors and acceptors if every donor : acceptor pair were constrained to be 1 :l. In other words, ‘%(cluster) is the maximum number of non-network hydrogen bonds that could be formed for the set. We refer to %(cluster) as the reduced number of hydrogen bonds for the cluster. It is well known that co-operative hydrogen bonds are stronger than non-co-operative hydrogen bonds between the same atoms (Jeffrey & Saenger, 1991). Thus, a hydrogen bond to a backbone >C = 0 will strengthen a second hydrogen bond to the backbone >NH of that same peptide bond (and conversely) by further polarizing the peptide unit. In proteins, the hydrogen-bonding contribution to conformational stability is even further enhanced
Hydrogen Bonding in Proteins
1147
.
I,, 50
100 Donor
Donor residue (0)
50
100
150
0
residue (b)
50
100
150
Diagonal Cdl Helix/turns A sheet P sheet Al I sheet Structure 0
50
100
150
Residue (e)
Figure 2. Identification of hydrogen bonds and hydrogen-bonded secondary structure for dihydrofolate reductase (4DFR). (a) Donor versus acceptor plot. The 159 residues give rise to a 159 x 159 square plot, with hydrogen bond donors on the abscissa and acceptors on the ordinate. The presence of a hydrogen bond is indicated by a circle at the corresponding co-ordinate position. All hydrogen bonds are shown. Backbone:backbone hydrogen bonds are indicated by filled circles; hydrogen bonds that involve at least 1 side-chain partner are indicated by open circles. Hydrogen bond identification is described in Materials and Methods. Most hydrogen bonds are seen to be between backbone atoms. Regular secondary structure results in characteristic patterns on the plot, as described in the text. It should be noted that a single point can represent multiple hydrogen bonds, e.g. both N and R from a given residue donate to the same acceptor residue. (b) Helices and turns are identified as any contiguous sequence of residues between and including i-G- 3 or i-G-4 hydrogen bonds. Hydrogen bonds are represented by filled circles and contiguous sequences are shaded. (c) and (d) After subtraction of hydrogen bonds involved in helices and turns, the square donor versus acceptor plot is reflected across the main diagonal and collapsed to a triangular plot. Paired strands of antiparallel p-sheet describe a trace normal to the main diagonal (c); paired strands of parallel B-sheet describe a trace that parallels the main diagonal (d). These traces, shown as filled circles, are identified using canonical patterns, as described in the text. Each identified trace represents a composite that can be deconvoluted into 2 discrete segments of contiguous residues, indicated by the shaded regions emanating from that trace. (e) Results of helix and sheet identification algorithms, illustrated in (b) to (d) and described in the text, are summarized graphically. Each row corresponds to one or more categories of secondary structure; positive identification is indicated by shading. The 5 rows correspond to (1) helices and turns, indicated by diagonal stripes, (2) antiparallel /?-sheet, indicated by vertical stripes, (3) parallel B-sheet, indicated by horizontal stripes, (4) either parallel or antiparallel b-sheet, indicated by cross-hatching, and (5) the logical union of all categories, indicated by a filled bar. It is apparent that most residues participate in regular hydrogen-bonded secondary structure. As noted in the text, our algorithms count all qualifying atoms in each hydrogen-bonded ring as belonging to its given element of secondary structure. For example, an isolated turn (with I donor, 1 acceptor and 1 hydrogen bond) is counted as having 4 residues within identified secondary structure. For this reason, our secondary structure assignments tend to be supersets of those given in the Protein Data Base (Bernstein et al., 1977).
1INS
1HOE
1HMQ
3GRS
lGP1
lGD1
1GCR
4FXN
1ECD
4DFR
5CYT
2CTS
1CRN
5CPA
5CHA
2CDV
2CA2
lBP2
LAZA
2APP
2ACT
A
51
74
113
461
35
331
174
138
39
35
123
358
9
249
91
113
4
10
9
47
1
34
20
14
14
161
136
18
106
159
18
49
8
42
26
16
45
16
16
37
28
9
R:O
75
387
39
239
126
56
137
106
72
190
146
71
N:O
Hydrogen
103
437
46
307
237
107
256
123
129
323
218
82
35lC
Cytochrome C551 (oxidized), 1.6 A (Almassy & Dickerson, 1978) Actinidin, 20 A (Baker, 1980) Penicillopepsin, 18 A (James & Sielecki, 1983) Azurin (oxidized), 18 A (Norris et al., 1986) Phospholipase A2, 1.7 A (Dijkstra el al., 1981) Carbonic anhydrase, 1.9 A (Eriksson et al., 1988 Cytochrome C3, 1.8 4 (Higuchi et al., 1984) Alpha chymotrypsin A, 1.67 A (Blevins & Tulinsky, 1985) Carboxypeptidase A Alpha, 154 A (Rees et al., 1983) Crambin, 1.5 A (Hendrickson & Teeter, 1981) Citrate synthase, 20 A (Remington et al., 1982) Cytochrome c (reduced), 1.5 A (Takano & Dickerson, 1981) Dihydrofolate reductase, 1.7 A (Bolin et al., 1982) Hemoglobin (erythrocruorin), 1.4 A (Weber et al., 1978) Flavodoxin (semiquinone form), 1.8 (Smith et al., 1977) Gamma-II crystallin, 1.6 A (Wistow et al., 1983) Glyceraldehyde-3-P dehydro., 1.8 A (Skarzynski et al., 1987) Glutathione peroxidase, 20 A (Epp et al., 1983) Glutathione reductase, 1.54 A (Karplus & Schulz, 1987 Hemerythrin (met), 2.0 A (Stenkamp et al., 1985) Amylase inhibitor, 2.0 A (Pflugrath et al., 1986) Insulin, 1.5 A (Baker et al., 1988)
n
Protein
Table 2
6
9
6
32
2
44
24
9
9
9
10
48
3
39
25
14
38
15
17
46
35
7
N:R
Bonds
2
8
20
50
0
49
23
10
15
18
3
40
2
48
14
16
46
18
20
39
29
6
R:R
51
62
158
487
12
376
158
146
199
151
106
524
52
368
191
102
266
155
125
312
238
93
Sum
37
46
87
315
11
249
120
90
108
104
66
320
28
233
140
66
173
85
89
232
153
55
D
135
51
A
31
46
83
295
11
229
112
93
99
101
66
293
28
216
135
61
176
80
83
220
Residues
Hydrogen bond in 42 X-ray-elucidated
41
56
111
382
12
309
152
107
132
128
80
389
35
284
176
81
235
110
112
280
190
67
D
Sites
35
51
103
350
11
277
135
103
115
114
72
337
30
261
159
76
222
100
101
261
172
59
A
proteins
30
46
96
324
11
233
113
94
106
109
71
328
32
221
141
61
187
90
83
233
136
57
D
Groups
32
48
80
314
11
238
113
94
104
97
62
306
28
223
136
61
181
a4
85
255
149
49
A 3
3
3
4
22
3
17
6
8
3
7
7
17
3
14
11
6
15
8
6
22
12
n
fnfnf
0667
0203
08.58
0.529
0457
0548
0178
0587
0978
0365
0718
0762
0630
0596
6325
6570
6418
o-780
0434
0390
0482
6683
Helix/Turn
0
0
0
11
0
15
0
6
0
8
0
2
0
11
12
4
7
0
4
13
2
0
---
0.128
6235
6268
0283
0909
0189
0017
0065
0129
0202
9186
6018
PBS
Secondary
0
0
0
0
0
0
0
0
0
0
2
7
0
18
2
11
12
0
0
5
2
4
2
5
14
4
13
4
8
21
10
0
0118
0649
0217
0171
0190
0552
0214
o-068
0027
0109
0111
0414
6112
0359
0130
0372
0502
0294
ABS
Structure
0
0
0
0
6118
0649
0330
6171
0389
0552
0268
6421
0968
0.037
o-109
0.241
0.418
6178
0.422
0.130
0.496
0.598
9298
0
0
0.0
f
Sheet
0725
0797
0858
o-833
0543
0867
0718
6819
O-978
6742
0767
6787
0739
6811
6688
0673
6797
o-854
0876
0873
0733
0683
f
Hemoglobin v (cyano, met), 2.0 A (Hendrickson et al., 1973) Lysozyme, 15 A (Blake et al., 1983) Lysozyme, 1.7 A (Weaver & Matthews, 1987) Myoglobin (oxy), 1.6 A _(Phillips, 1980) Ovomucoid third domain, 15 A (Papamokos et al., 1982) Papain, 165 A (Kamphuis et al., 1984) Apo-Plastocyanin, 1.8 A (Cuss & Freeman, 1983) Pepsin, 1.8 A (Sielecki et al., 1990) Avian pancreatic polypeptide, 1.37 A (Glover et al., 1983) Trypsin inhibitor, 1.0 A (Wlodawer et al., 1987) Rubredoxin, 1.4 A (Pierrot et al., 1976) Ig Bence-Jones protein, 1.6 A (Furey et al., 1983) Lys 25ribonuclease Tl, 1.8 A (Koepke et al., 1989) Ribonuclease A, 1.26 A (Wlodawer et al., 1982) Scorpion neurotoxin, 1.8 A (Almassy et al., 1983) Staphylococcus nuclease, 1.65 A (Loll & Lattman, 1989) Thermolysin, 1.6 A (Holmes & Matthews, 1982) Beta-trypsin, 1.4 A (Bode & Schwager, 1975) Ubiquitin, 1.8 A (Vijay-Kumar et al.,* 1987) Trp repressor, 1.65 A (Zhang et al., 1987) 137
4970
6734
52
123
265
101
34
74
65
61
24
27
33
193
50
132
28
194
179
105
165
105
76
220
316
135
65
124
104
114
52
58
36
326
99
212
56
153
164
130
149
792
9
9
25
31
13
5
25
15
13
6
4
1
36
9
31
9
17
17
20
16
760
7
11
26
39
19
6
13
12
12
9
6
0
37
12
27
12
13
20
16
16
777
3
4
16
61
18
3
18
12
11
2
2
1
30
9
31
11
20
26
18
5
7299
156
76
190
396
151
48
130
104
97
41
39
35
296
80
221
60
244
242
159
202
4695
82
49
121
249
103
35
86
65
76
32
29
21
223
63
145
35
131
128
96
119
4383
75
50
111
225
89
32
85
62
69
27
26
20
208
62
132
33
119
119
88
107
93
59
171
303
123
40
110
82
89
38
32
22
259
75
185
46
158
163
121
136
82
57
154
281
109
39
105
76
80
34
31
20
242
69
164
43
143
144
108
120
4515
86
53
136
252
92
30
89
69
68
28
29
21
212
59
135
37
138
137
98
117
79
50
140
249
100
32
82
61
70
28
28
20
216
60
141
37
129
126
95
114
3
6
12
14
6
5
0556
0905
0579
0.386
0595
0530
0.477
0.379
0452
7 5
9377
0731
0362
6694
0377
0.384
0491
0.411
0941
0860
0815
0.893
8
5
3
1
23
7
11
3
3
7
9
3
0
4
9087
9276
0098
5 0
0.060
0.032
0114
0160
0.202
0.038
6100
2
0
2
0
4
0
0
0
10
4
4
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
5
12
12
6
3
6
7
6
4
2
0
22
7
9
3
0
4
4
0
6245
0.539
o-427
9218
0396
9338
9419
0.346
9526
0365
0.293
0.460
0485
0.292
0250
9998
0.146
0
0
0
0
0308
O-592
0427
0282
0396
0338
o-444
6346
0614
0365
0293
9555
0586
0316
9250
0.098
9246
0895
0709
0.839
9858
0754
0790
0750
0.886
0923
0638
0694
0828
9838
o-731
0625
0941
0927
0.923
9893
9811
0 0905
0
0
0
Summary of numbers and types of hydrogen bonds in 42 high-resolution proteins from the Brookhaven (Bernstein et al., 1977) protein data base (PDB), analyzed as described in Materials and Methods. The Table contains 22 columns. Columns 1 to 3 identify each protein and include: (1) PDB identifier, (2) protein name and resolution of structure, and (3) number of residues in the protein. Columns 4 to 8 list the number of hydrogen bonds: (4) backbone donor to backbone acceptor (N:O), (5) side-chain donor to backbone acceptor (R:O), (6) backbone donor to side-chain acceptor (N:R), and (7) side-chain donor to side-chain acceptor (R:R), and (8) total hydrogen bonds (2: columns 4 to 7). Columns 9 to 14 list the count of donors (D) and acceptors (A): (9) number of donor residues, (10) number of acceptor residues, (11) number of donor atoms, (12) number of acceptor atoms, (13) reduced number of donors, and (14) reduced number of acceptors. The lesser values of columns 13 and 14, shown in boldface, is %(protein). Columns 15 to 22 list secondary structure by type: (15) number of contiguous segments of helix and/or hydrogen bonded turns, (16) fraction V, of protein in helix and/or hydrogen-bonded turns, (17) number of contiguous segments of parallel b-sheet (PBS), (18) fraction of protein in parallel /I-sheet, (19) number of contiguous segments of antiparallel b-sheet (ABS), (20) fraction of protein in parallel B-sheet, (21) fraction of protein in sheet, the logical-or of columns 18 and 20 (i.e. antiparallel /?-sheet verau8 parallel /Isheet), and (22) fraction of protein in helix and sheet, the logical-or of columns 16 and 22 (i.e. helix uer8u.s sheet). Note: residues at the interface between 2 elements of secondary structure may be counted twice, once in each segment. For this reason, the logical-or of b-sheet, in column 21, and the logical-or of all secondary structure, in column 22, are typically less than the arithmetic sum of their components.
2WRP
1UBQ
1TPP
3TLN
1SNC
lSN3
7RSA
3RNT
2RHE
1RDG
5PTI
1PPT
4PEP
2PCY
9PAP
2ovo
lMB0
3LZM
lLZl
2LHB
1150
D. F. Stickle et al.
by the existence of co-operative networks. The difference between the total number of hydrogen bonds and the reduced number of hydrogen bonds is proportional to this co-operative enhancement. %(protein) is easily determined. First, the number of hydrogen bonds is tallied for the entire molecule by evaluating every polar group (from Table 1) in each residue. Then, for every donor with multiple acceptors each acceptor is tested for the presence of multiple donors. Whenever an acceptor in this set is found to have a unique donor, all other acceptors for that donor are eliminated, reducing the number of acceptor sites. Next, this paring operation is repeated for every acceptor with multiple donors. Finally, the winnowed lists of donors and acceptors are counted; the lesser number is ‘%(protein). For each molecule, %(protein) is given in Table 2. In all, the 7299 hydrogen bonds in the data set of 42 proteins give rise to 4515 reduced hydrogen bonds, an average multiplicity of 1.6 hydrogen bonds for each reduced hydrogen bond. From Table 2, most hydrogen bonds are between backbone atoms (68%), with the remainder distributed almost uniformly over other categories. The 42 proteins have slightly more than one hydrogen bond per residue (1.08) and slightly more than twothirds of a reduced hydrogen bond per residue (0.67). Both the total number of hydrogen bonds and the reduced number of hydrogen bonds can be reliably estimated from the fraction of hydrogen-bonded secondary structure. Table 2 lists the measured fraction of helix and sheet in each protein. Using these known fractions, a three-parameter, leastsquares, best fit to the equation: number of hydrogen bonds = Af,N+BfsN+C(l-f,-fs)N. (1) was calculated, where f, is the fraction of helix, fa the fraction of sheet, and N the number of residues in the protein. The value of the equation is taken as either the total or reduced number of hydrogen bonds. The subroutine ZXSSQ from the IMSL Subroutine Library (1987) was used to determine coefficients A, B and C that minimize the residual over the set of proteins. From the fitted coefficients, both the total and reduced number of hydrogen bonds can be estimated as: number of hydrogen bonds = 1*49f,N +0*65f,N +0.50( 1 -fd-fO)N. number of reduced hydrogen bonds = 080j~N+961fSN+0.40(1-f,-f~)N.
(2) (3)
In Figure 3, the number of hydrogen bonds predicted by equation (2) or (3) is plotted against the observed number of hydrogen bonds. As seen in the Figure, predicted values closely approximate observed values; the linear regression lines through these data have slopes near unity and intercepts near zero. Comparison of the coefficients in equations (2) and (3) is instructive. Most network hydrogen bonds
are seen to be within helices and turns. The ratio of the coefficients off, in the two equations is almost 2:1, a consequence of the fact that segment geometry is usually consistent with both i+(i-4) and i+(i--3) hydrogen bonds. The co-operative nature of helical hydrogen bonds should enhance the stability of helices. The number of hydrogen bonds is expected to depend significantly upon the relative proportions of helix and sheet, since most network hydrogen bonds are in helices. However, the reduced number of hydrogen bonds, which is normalized to the number of donors/acceptors, should not have a similar dependency. In fact, a plot of the number of residues versus the reduced number of hydrogen bonds (from Table 2), shown in Figure 3(c), is reasonably linear, and can be described by the equation: Number of hydrogen bonds = 0.714 (number of residues) -6.8
(p = 699).
(4)
(c) Patterns of hydrogen bonding in proteins The global census of hydrogen bonds, reported in Table 2, can be used to reveal patterns of hydrogen bonding in proteins. Histograms of the distance in sequence between all donor : acceptor pairs are shown in Figure 4. Four general conclusions are supported by the figure: (1) Helices and turns (i.e. ,i+i-4 and i-+-3 hydrogen bonds) are the predominant type of backbone : backbone hydrogen bond. Figure 4(a) shows large peaks at these positions, against a featureless, low-level background. Of the 4970 backbone : backbone hydrogen bonds in the data base, 1838 (37%) make i+i-3 hydrogen bonds, 1582 (32%) make i-i-4 hydrogen bonds, and 1311 (26%) are in hydrogen bonded strand-pairs in /?-sheet’, a total of 95% in turns, helix and sheet. A note of caution is in order, however. Since most network hydrogen bonds are within helices, the combined i-+i-4 and i-+i--3 peaks in Figure 4(a) represent a much smaller number of reduced hydrogen bonds. From equations (2) and (3), the expected number of reduced backbone : backbone hydrogen bonds is 3257, with 56% in helix/turn and 38% in P-sheet. (2) Most side-chain-donor to backbone-acceptor hydrogen bonds are local. Further, pronounced peaks are evident in Figure 4(b) for i-i-4 and i-i-3 interactions. Within these peaks, 90% of the i+i - 4 and 82% of the i-G-3 interactions are in single-helical segments, consistent with extensive C-terminal helix capping (data not shown) (Presta & Rose, 1988; Richardson & Richardson, 1988). (3) Again, most backbone-donor to side-chain acceptor hydrogen bonds are local. The main peaks in Figure 4(c) are at i-+i--2, i+i--3, and i+i-1. and, to a lesser extent, at i+i+3. These positional preferences are even more sharply differentiated than those of side-chain-donors to backbone-acceptors. Within these four peaks, 78% of the i-+&3,
Hydrogen Bonding in Proteins
I 100
I 200
Number
I
I
6
300 of hydrogen (a)
(b)
I
400 bonds
I
I
500
1
600
1151
53% of the i+i-2, 43% of the idi-1, and 93% of the i-G+ 3 interactions are in single-helical segments, consistent with extensive N-terminal helix capping (data not shown) (Presta & Rose, 1988; Richardson & Richardson, 1988). (4) While side-chain : side-chain hydrogen bonds are more uniformly distributed than those of other classes, a marked preference for local interactions is nonetheless evident in Figure 4(d). The prevalence of local interactions in Figure 4(a) to (c) indicates that, in these cases, hydrogen bond formation is governed primarily by proximity in sequence. The distribution of side-chain : side-chain hydrogen bonds (Fig. 4(d)), though also biased toward local interactions, is visibly broader than the other three classes. The broader distribution of side-chain : side-chain hydrogen bonds suggests that non-local interactions may play a larger role in this case. To investigate non-local interacbions further, all hydrogen bonds between side-chains were classified by residue type, as shown in Table 3. A high incidence of hydrogen-bonded ion pairs is immediately apparent: more than two-thirds of all hydrogen bonds involving Lys, Arg and His are with acidic residues and, conversely, more than half of all hydrogen bonds involving Asp, Glu and His are with basic residues. Charged residues comprise the largest single category among both donors and acceptors. Of the total of 757 donors, 41 y. are either Lys or Arg, while 59% of the 769 acceptors are either Asp or Glu. The absolute number of hydrogen bonds between side-chain groups is normalized in Table 3C. Corresponding to each donor : acceptor pair, (ij), is a table ent’ry, fij, which measures whether the pair is more common in the database than would be expected if assortment were random. The fractionfii is given by: Aj
=
(nij/Npairs)/(ni
x njlN&alL
(5)
where nij is the number of hydrogen bonded pairs between side-chain groups of residues i and j, ni and nj are the total number of residues i and j in the database (from Table 2B), Npairs is the total number
Figure 3. Predicting the number of hydrogen bonds. Eqn (1) and the known fractions of hydrogen-bonded secondary structure from Table 2 were used to derive
Number
of residues (cl
coefficients for predicting the number of hydrogen bonds. (a) The observed number of hydrogen bonds versus predicted number for 42 proteins, using eqn (2). The equation of the linear regression line through these data is: N predicted = @96 X Ntota, hydrogen bonds + 8.6 (p = 699). (b) The observed number of reduced hydrogen bonds versus predicted number for 42 proteins, using eqn (3). The regression line is: NprediEted = 095 x Nreduced hydrogen bonds + 7.4 (p = 999). (c) The number of residues versus the reduced number of hydrogen bonds, from Table 2. The linear regression line through these data is given by eqn (4).
1152
D. F. Stickle et al.
-io
-‘5
lb (b)
60, 50 200 25
L 5
Distance
-5
IO
in sequence
Distance
(cl
0 in sequence (d)
Distance
in sequence (el
Figure 4. Sequential distance between donors and acceptors. Histograms show the number of hydrogen bonds in the database as a function of the sequential distance between donor and acceptor sites. An expanded scale, shown in insets, discloses the smaller details at larger distances. By convention, sequence distance is taken to be the (residue number of the donor) - (residue number of the acceptor); for classical a-helical > N-H. . .O = C < hydrogen bonds between i and i - 4, this distance is + 4. Peaks in the histograms reveal characteristic interactions in proteins, as discussed in the text. The plots are dominated by these large peaks. Plots are truncated at + 10, by which point all histograms have decayed to an undifferentiated, low-level background. Hydrogen bonds are separated into classes, each of which is displayed separately. (a) Backbone:backbone hydrogen bonds. (b) Side-chain donors to backbone acceptors. (c) Backbone donors to side-chain acceptors. (d) Side-chain:side-chain hydrogen bonds. (e) Same as (d), with charged residues (Lys, Arg, Asp and Glu) omitted.
Hydrogen Bonding in Proteins
1153
Table 3 A. Side-chain
: side-chain
hydrogen
bonds
Acceptors
Donors
Met
SW
1 1 0 1 2 2 3 0 1 11
11 7 6 10 2 1 6 7 13 ii5
Ser Thr Gln Asn TY~ Trp His LYS hi% Sum
Thr
CYS
Gln
Asn
TY~
Asp
Glll
1 2 0 1 1 0 2 0 0 7
9 3 2 7 4 2 1 5 6 39
7 6 1 8 4 1 4 6 6 43
7 3 6 5 3 2 2 14 11 53
33 26 9 25 24 8 21 54 72 272
22 8 6 9 19 1 22 52 41 Is0
14 10 4 2 2 2 4 9 6 53
His
Sum
6 5 1 2 3 1 7 4 6 35
111 71 35 70 64 20 72 151 162
The number of side-chain :side-chain hydrogen bonds, for each type of residue pair. Donors are arrayed in rows, acceptors in columns. Partial sums for each residue type are shown. Among the 6734 residues in the data base of 42 proteins, there are 757 donors and 769 acceptors involved in a total of 756 hydrogen bonds between side-chains. IS. Residues
by type
Ala Arg Asn Asp cys Gln Glu ‘JY His Ile
539 226 330 400 171 247 339 600 156 351
A listing
496
LYS
414 121 236 276 530 416 106 305 475
Met Phe Pro Ser Thr Trp ‘M Val
of residues
C. Normalized
LeU
by type.
side-chain
There
: aide-chain
are 6734 hydrogen
residues
in the database
of 42 proteins.
bonds
Acceptors
Donors
Ser Thr Gln Asn ‘br Trp His LYS -4%
Met
Ser
Thr
Cys
Gln
Asn
Tyr
Asp
Glu
His
0.16 0.22 040 0.28 060 I.70 1.74 0.00 640
043 034 950 962 0.14 920 980 0.34 I.18
970 963 042 616 618 650 667 056 970
012 0.30 090 0.20 0.20 090 682 000 om
0.74 O-32 0.36 094 058 084 928 0.54 I.18
0.44 948 0.14 0.80 044 032 0.84 048 0.88
048 0.26 086 0.54 035 0.68 046 1.20 I.74
1.70 I.70 1 .oo 2.06 2.14 206 366 356 368
1.33 062 0.78 0.88 2.00 0.30 4.52 404 582
030 084 028 042 068 666 313 968 I.86
The database of 6734 residues from 42 proteins contains 2750 residues with side-chain donors and 3015 residues with side-chain acceptors. From these, there are 756 hydrogen-bonded pairs between side-chains. Plainly, most polar side-chain groups do not form hydrogen bonds with other polar sidechain groups. The question addressed by the Table is whether a given hydrogen-bonded pair, (ij), is more common in the database than it would be if assortment were random. A value of unity represents random assortment; values of less than (greater than) unity indicate that the given pair occurs less frequently (more frequently) than would be expected by chance.
of hydrogen
bonded pairs between all side-chain is the total number of residues in the database. A value of unity represents random assortment; that is, the frequency of occurrence of pair (ij) among all pairs matches the frequency of occurrence of residues i and j among all residues. Values less than (greater than) unity indicate that the given pair is observed less frequently (more
groups,
and
N,,,,,
frequently) than expected by chance. It should be noted that the ordered pair (ij) differs from the pair (i,i). In particular, (i,i) denotes the pair formed between residue i as donor and residue i as acceptor, and this case differs only trivially from that of a general (ij) pair with i#;j. Upon normalization, it becomes apparent that most hydrogen bonds between side-chain groups
1154
D. F. Stickle
et al.
IC
0-S
0.6 .-s z e L 0.4
0.2
2.5 N:O
R:O
N:R
R:R
Figure 5. Assortment of hydrogen bonds with secondary structure. Histograms show the normalized fraction of hydrogen bonds, by type, that are wholly or partially contained within units of regular secondary structure. Filled patterns represent that fraction within a single element of regular secondary structure, i.e. both donor and acceptor contributed by residues from a single cc-helix or b-sheet. Checkerboard patterns represent that fraction between elements of regular secondary structure. i.e. either donor or acceptor contributed by residues from one cc-helix or p-sheet with remaining partner contributed by a second helix or sheet. Open patterns represent that fraction half-in an element of regular secondary structure, i.e. eit,her donor or acceptor contributed by residues from an g-helix or p-sheet with remaining partner from a residue outside regular secondary structure. Cross-hatched patterns represent that fraction without, regular secondary structure. i.e. neither donor nor acceptor contributed by residues belonging to an element of regular secondary structure. As shown, fractions were normalized within each of four categories: backbone to backbone (NC), side-chain donor to backbone acceptor (RX)). backbone donor to side-chain acceptor (N:R), and side-chain donor to side-chain acceptor (R:R). Almost all backbone to backbone hydrogen bonds are seen t,o be within single elements of secondary structure.
occur less often than expected by chance. Still, hydrogen-bonded ion pairs are a conspicuous exception, as seen in Table 3C. When charged residues are subtracted from the ensemble of hydrogen bonds between side-chains. the remaining distribution, shown in Figure 4(e), resembles those seen in Figure 4(a) to (c), with local peaks at or near i--G+2 against a uniform low-level background. Summarizing Figure 4, with the exception of hydrogen-bonded ion pairs, all peaks are local. (d) Hydrogen-bonded
secondary
structure
Hydrogen-bonded secondary structure assignments were made using algorithms described in Materials and Methods. It should be emphasized that segments of secondary structure identified by these algorithms can include non-hydrogen-bonded residues. For example, an isolated i-i-3 hydrogen
3.5 Bo%e”g+h
4.0
(X,
Figure 6. Hydrogen bond geometry: bond lengths. Hydrogen bond lengths were not allowed to exceed the summed van der Waals’ radii of their respective electronegative heavy atoms (see Materials and Methods); no other distance-dependent criteria were imposed. For the 42 proteins in this study, the histogram of observed lengths is narrowly distributed around 2.9 to 3.0 A, the optimal distance expected for > C = 0. .H-N < hydrogen bonds when N, H and 0 are colinear.
bond would be classified as a four-residue turn, consisting of residues (i, i - 1, i - 2 and i - 3). A consequence of this classification strategy is that segments tend to be slightly longer than t’heir counterparts identified using multiple criteria, especially in the case of helices. In comparison with the stringent criteria of Presta & Rose (1988), the present classification encompasses all actual helices together with the peptide chain turns at either terminus. Despite these methodological constraints, secondary structure assignments based solely upon hydrogen bonding are objective and automatic, and they are appropriate for studying trends in hydrogen bonding per se. This latter assertion is borne out’ by the distribution of segment lengths (not’ shown) among identified segments of helix and sheet, which resemble those derived using more sophisticated. multi-parameter methods (Kabsch & Sander, 1983). study have extensive All proteins in this secondary structure (Table 2). The total fraction of helix, hydrogen-bonded turns and sheet ranges from 54% in glutathione peroxidase to 98% in erythrocruorin hemoglobin, with a mean of 82e/, for the 42 proteins. These secondary structure assignments can be used to analyze the extent to which a protein‘s hydrogen bonds are apportioned between regular and non-regular secondary structure. As shown in Figure 5, the ensemble of hydrogen bonds was partitioned into four categories: (1) wholly within an element of regular secondary structure; (2) between elements of regular secondary structure; (3) half-in an element of regular secondary structure; and (4) excluded entirely from elements of regular secondary structure.
Hydrogen Bonding in Proteins 2000
1500
1155
,
I
60
60
60
(a)
(b)
120
I
120
180
I500 1
60
60
60 Angle
Angle
(
(c )
v) 600
(“)
id )
800
-
600
-
cz 0 z - 400
60 Angle (el
Angle
(“)
If)
nd angles. The preferred geometry found in both small-molecule and protein Figure 7. Hydrogen bond geometry crystal structures was used to establis reening criteria, as described in Materials and Methods. Histograms show the distribution of angles in the 42-protein a base. The scalar angle at the donor atom, A-D-DD, was divided into (A) sp’ and (B) sp3 donors. Note: extensive str 3 of carbohydrates (Ceccarelli et al., 1981) provide a model for -OH sp3 donors, but comparable models for the E-NH, : ionor in Lys do not exist. This fact, together with the comparative disorder of Lys side-chains, has prompted us to LX slightly the limits imposed on sp3 donors from Lys residues, which are excluded from the distribution in (b). Similarly, the scalar angle at the acceptor atom, D-A-AA, was divided into (c) sp2 and (d) sp3 acceptors. The apparent bimodal distribution of sp2 acceptors, seen in (c), is due to the differing geometries of i-i-3 and i-+-4 hydrogen bonds, which are shown individually in (e) and (r), respectively.
1156
D. F. Stickle
As seen in figures, more than 95% (namely, 4731 of the 4970) of the backbone: backbone hydrogen bonds are within single elements of secondary structure, i.e. an individual helix or paired strands of Bsheet. hydrogen bonds involving side-chains are distributed almost uniformly across the first three categories, with few excluded entirely from regular secondary structure. (e) Hydrogen
bonding
geometry
The variability observed in hydrogen bonding geometry is analyzed in this section. Geometry is necessarily constrained to lie within the limits imposed by the methods used to identify hydrogen bonds. Within these constraints, however, hydrogen bond lengths and angles may vary from a uniform distribution within their permitted ranges to a narrow clustering around their ideal values. Of course, better geometry will result in stronger hydrogen bonds. The observed distribution of hydrogen bond lengths is shown in Figure 6. According to Baker & Hubbard (1984), the mean 0. * .N distance fo! >CO= 0. * .H-N< hydrogen bonds ~is 2.99 A (1 A = 0.1 nm) in a-helices and 2.91 A in /?-sheet. Our lumped distances for all classeF lie near this range, with a mean of 304 ( +0*24) A. The distribution of donor angles (i.e. scalar angle between acceptor-donor-donor antecedent) is shown in Fi ure 7, subdivided into sp2 donors (Fig. 7(a)) and sp Q donors (Fig. 7(b)). In both cases, the distributions are narrow and have peaks that coincide with ideal trigonal (120”) or tetrahedral (1095”) values, respectively. The distribution of acceptor angles (i.e. scalar angle between donoracceptor-acceptor antecedent) is also shown in Figure 7, subdivided into sp2 acceptors (Fig. 7(c)) and sp3 acceptors (Fig. 7(d)). Both histograms are broadly distributed around ideal values. The apparent bimodal distribution of sp2 acceptors (Fig. 7(c)) is a consequence of the distinctly different i+i - 3 and i+i - 4 hydrogen bond geometries; both types are abundant in proteins. This bimodal distribution is broken down into individual components. In Figure 7(e) are shown acceptor angles involved in an N:O hydrogen bond between i and i- 3, and in Figure 7(f) are shown corresponding angles between i and i-4. From the Figure, it is apparent that the geometry of i+i- 3 hydrogen bonds is less favorable than that of i-i-4 hydrogen bonds.
Discussion (a) Hydrogen
bonds and secondary
structure
Ideas about the role of hydrogen bonding in proteins have changed over time. It was Pauling’s view that intramolecular hydrogen bonding provides the driving force for folding (Pauling et al., 1951), but, within the decade, this notion was reappraised, and the hydrogen bond came to be perceived as energetically neutral or even unfavorable (Klotz & Franzen,
et al.
1962; Susi et al., 1964). The issue continues to evolve, as underscored by recent work of Privalov and Gill (1988) who, once again, attribute a significant fraction of the folding energy to hydrogen bonding. If they are correct, then the interpretation of hydrogen bonding will have come full circle. One approach to understanding hydrogen bonding in proteins is to survey molecules of known structure for diagnostic patterns. In the present work, all hydrogen bonds in 42 high-resolution, X-ray-elucidated proteins from the Brookhaven data base (Bernstein et al., 1977) were counted and classified systematically. Upon analysis, several patterns of interest emerge. Most hydrogen bonds are seen to be local (Fig. 4) (i.e. between donors and acceptors that are proximate in sequence) and predominantly within single elements of secondary structure (Fig. 5). On average, a large majority of the protein (82%) is involved in regular, hydrogen-bonded secondary structure, with slightly more than two-thirds (68%) of all hydrogen bonds situated between backbone polar groups ( > N-H. . .O = C < ). Of particular interest are network hydrogen bonds, in which each donor (acceptor) is bonded to multiple acceptors (donors). Almost all central residues of helices are within networks because helix geometry is commensurate with both i+i-4 and i-i-3 hydrogen bonds simultaneously (see Fig. 7(e) and (f)). Such networks are evident in Figure 2(a) as two adjacent traces that parallel the main diagonal at positions of helices. Most network hydrogen bonds are localized within helices, as demonstrated by comparison of equations (2) and (3). The 42 proteins have 7299 hydrogen bonds, slightly more than one per residue (l-08). It is likely that the existence of multiply determined hydrogen bonds enhances the elasticity of an a-helix. Returning to the topic of hydrogen bond energetics, an open question asks whether intramolecular hydrogen bonds can be favorable in 55 M water. In proteins, this question is of crucial importance when evaluating helix capping; namely that process in which hydrogen bond partners for the first four >N-H groups and the last four >C = 0 groups of helices are supplied by polar side-chains of residues that flank the helix termini (Presta & Rose, 1988; Richardson & Richardson, 1988). If helix then corresponding sidecapping is ubiquitous, chain to main-chain hydrogen bonds should be both favorable and abundant in proteins. An indirect answer to this question is furnished by the analysis illustrated in Figure 4(b) and (c). Within the entire spectrum of side-chain:main-chain hydrogen bonds, the only pronounced peaks are situated at helix capping loci, against an otherwise undifferentiated, low-level background. The Figure provides persuasive, albeit indirect, evidence for the stability of this category of intramolecular hydrogen bonding. Indeed, we suspect that a directed analysis of existing X-ray elucidated crystal structures may disclose the existence of further, hitherto overlooked, interactions of this type.
Hydrogen Bonding in Proteins Although most hydrogen bonds are’local, hydrogen-bonded ion pairs represent a notable exception, as illustrated in Figure 4(d) and (e). This abundant class of interactions comprises more than half of all side-chain:side-chain hydrogen bonds (Table 3). The frequent occurrence of hydrogen-bonded ion pairs suggests a correlative significance in protein folding and stability (Barlow t Thornton, 1983; Rashin 8z Honig, 1984). Most proteins are stable over a broad range of pH that extends, in the extreme, to the pK values of charged residues (Privalov, 1979). This observation has been construed to mean that charges per se are not the relevant factor. However, the predominance of charged pairs, conspicuous in Table 3C, is provocative. Several useful rules-of-thumb emerge from our study. The reduced number of hydrogen bonds scales linearly with chain length (eqn (4)), with an average of 071 hydrogen bond per residue. Hydrogen bonds (both total and reduced) can be well estimated from secondary structure content (eqns (2) and (3)). In the case of network hydrogen bonds, most are found within helices. (b) Implications
for protein
folding
In native proteins, approximately half of all polar groups are buried within the molecular interior, sequestered from bulk solvent (Chothia, 1975, 1976; Richards, 1977). In the unfolded state, such groups could form hydrogen bonds with solvent readily. Hence, if these same groups were not satisfied in the folded state, then hydrogen bonding would favor denaturation. Not surprisingly then, it has been found that buried polar groups in proteins are almost invariably hydrogen bonded (Chothia, 1975; Richards, 1977). The observation
that most buried polar groups are hydrogen bonded confirms a plausible expectation. Less obvious is the mechanism by which such hydrogen bonding is achieved. Specifically, how do polar groups realize hydrogen bonds so effectively while constrained within a molecular interior that is as tightly packed as an organic crystal (Richards, 1977)? The experimental finding that most hydrogen bonds are local (i.e. between partners close in sequence) can rationalize this conundrum. As shown in the preceding analysis, the majority of hydrogen bonds are between backbone groups and within elements of secondary structure. Extensive sidechain to main-chain hydrogen bonding is observed too; these interactions are also local, primarily within single helices. The prevalence of local interactions, including both main-chain and side-chain groups, is consistent with a process wherein most hydrogen bonds are formed between nearby partners at an early step in folding. Accretion of folded structure would then involve persistence of this hydrogen-bonded framework (Kim & Baldwin, 1982), together with concomitant or subsequent condensation of these nascent
elements
of secondary
structure
into
larger
1157
modules (Zehfus & Rose, 1986). Iteration of this step-wise condensation process would lead to the observed hierarchic architecture of protein molecules (Crippen, 1978; Rose, 1979). We thank Eaton Lattman and Bruno Zimm for their critical reading of this manuscript, and David Custer for technical assistance. This study was supported by National Institutes of Health grant GM 29458.
References Almassy, R. J. & Dickerson,*R. E. (1978). Pseudomonas cytochrome ~551 at 20 A resolution: enlargement of the cytochrome c family. Proc. Nat. Ad. Sk., U.S.A. 75, 2674-2678. Almassy, R. J., Fontecilla-Camps, J. C., Suddath, F. L. & Bugg, C. E. (1983). Structure of variant-3 scorpion neurotoxin from- Centruroides sculpturatus Ewing, refined at 1.8 A resolution. J. Mol. Biol. 170, 497-527. Artymiuk, P. J. & Blake, C, C. F. (1981). Refinement of hyman lysozyme at 15A resolution. Analysis of nonbonded and hydrogen-bond interactions. J. Mol. Biol. 152, 737-762. Baker, E. N. (1980). Structure of actinidin, after refinement at 1.7 A resolution. J. Mol. Biol. 141, 441484. Baker, E. N. & Hubbard, R. E. (1984). Hydrogen bonding in globular proteins. Progr. Biophys. Mol. Biol. 44, 97-179. Baker, E. N., Blundell, T. L., Cutfield, J. F., Cutfield, S. M., Dodson, E.J., Dodson, G. G., Hodgkin, D. M. C., Hubbard, R. E., Isaccs, N. W., Reynolds, C. D., Skabe, K., Sakabe, N. & Vijayan, M. (1988). The structure of 2Zn pig insulin crystals at 1.5 A resolution. Phil. Trans. Roy. Sot. Loud. B, 319, 369456. Barlow, D. J. & Thornton, J. M. (1983). Ion-pairs in proteins. J. Mol. Biol. 168, 867-885. Barlow, D. J. & Thornton, J. M. (1988). Helix geometry in proteins. J. Mol. Biol. 201, 601619. Bernstein, F. C., Koetzle, T. G., Williams, G. J. B., Meyer, E. F., Jr., Brice, M. D., Rogers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). The protein data bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535-542. Blake, C. C., Pulford, W. C. & Artymiuk, P. J. (1983). X-ray studies of water in crystals of lysozyme. J. Mol. Biol. 167, 693-723. Blevins, R. A. & Tulinsky, A. (1985). The refinement and the s$ucture of the dimer of alpha-chymotrypsin at 1.67 A resolution. J. Biol. Chem. 269, 42644275. Bode, W. & Schwager, P. (1975). The re@ed crystal structure of bovine beta-trypsin at 1.8 A resolution. II. Crystallographic refinement, calcium binding site, benzamidine binding site and active site at pH 7.0. J. Mol. Biol. 98, 693-717. Bolin, J. T., Filman, D. J., Matthews, D. A., Hamlin, R. C. & Kraut, J. (1982). Crystal structures of Escherichia eoli and Lactobacillus casei dihydrofolate reductase refined at 1.7 A resolution. J. Biol. Chem. 257, 13,6&X13,662. Ceccarelli, C., Jeffrey, G. A. & Taylor, R. (1981) A survey of 0-H. . .O hydrogen-bond geometries determined by neutron diffraction. J. Mol. Struct. 70, 255271. Chothia, C. (1975). Structural invariants in protein folding. Nature (London), 254, 304-308. Chothia, C. (1976). The nature of the accessible and buried surfaces in proteins. J. Mol. Biol. 105, l-14.
1158
D. F. Stickle
Corey, R. B. (1948). X-ray studies of amino acids and peptides. Advan. Prot. Chem. 4, 385406. Crippen, G. M. (1978). The tree structural organizaton of proteins. J. Mol. Biol. 126, 315-332. Dijkstra, B. W., Kalk, K. H., Hol, W. G. J. & Drenth, J. (1981). Structure of bovine pancreatic phospholipase A2 at 1.7 A resolution. J. Mol. Biol. 147, 97-123. Dill, K. A. (1990). Dominant forces in protein folding. Biochemistry, 29, 7133-7155. Epp, O., Ladenstein, R. & Wendel, A. (1983). The refined structure of the selenoenzyme glutathione peroxidase at 02-nm resolution. Eur. J. Biochem. 133, 51-69. Eriksson, A. E., Jones, T. A. & Liljas, A. (1988). Refined structure of human carbonic anhydrase II at 20 A resolution. Proteins: Struct. Funct. Genet. 4, 274282. Factor, A. D. & Mehler, E. L. (1991). Graphical representation of hydrogen bonding patterns in proteins, Protein Eng. 4, 421425. Fetrow, J. S., Zehfus, M. H. & Rose, G. D. (1988). Protein folding: new twists. Bio/Technol. 6, 167-171. Furey, W., Jr, Wang, B. C., Yoo, C. S. & Sax, M. (1983). Structure of a novel Bence-Jones protein (Rhe) fragment at 1.6 A resolution. J. Mol. Biol. 167, 661692. Glover, I., Haneef, I., Pitts, J., Wood, S., Moss, D., Tickle, I. & Blundell, T. (1983). Conformational flexibility in a small globular hormone: x-ray analysis of avian pancreatic polypeptide at 098 A resolution. Biopolymers, 22, 293-304. Guss, J. M. & Freeman, H. C. (1983). Structure of oxidized poplar plastocyanin at 1.6 A resolution. J. Mol. Biol. 169, 521-563. Hendrickson, W. A. & Teeter, M. M. (1981). Structure of the hydrophobic protein crambin determined directly from the anomalous scattering of sulphur. Nature (London), 290, 107-l 13. Hendrickson, W. A., Love, W. E. C Karle, J. (1973) Crystal structure analysis of sea lamprey hemoglobin at 2 angstrom resolution. J. Mol. Biol. 74, 331-361. Higuchi, Y., Kusunoki, M., Matsuura, Y., Noritake, Y. & Kakudo, M. (1984). Refined structure of cytochrome cj at 1.8 A resolution. J. Mol. Biol. 172, 109139. Holmes, M. A. & Matthews, B: W. (1982). Structure of thermolysin refined at 1.6 A resolution. J. Mol. Biol. 160, 623-639. IMSL (1987). International Mathematical Subroutine Library, 7500 Bellaire Blvd., Houston, TX 77036. Ippolito, J. A., Alexander, R. S. & Christianson, D. W. (1990). Hydrogen bond stereochemistry in protein structure and function. J. Mol. Biol. 215, 457-471. James, M. N. G. & Sielecki, A. R. (1983). Structure and refinement of penicillopepsin at 1.8 A resolution. J. Mol. Biol. 163, 29%361. Jeffrey, G. A. t Maluszynska, H. (1982). A survey of hydrogen-bond geometries in the crystal structures of amino acids. Int. J. Biol. Macromol. 4, 173-185. Jeffrey, G. A. & Saenger, W. (1991). Hydrogen Bonding in Biological Structures, Springer-Verlag, New York. Kabsch, W. t Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577-2637. Kamphuis, I. G., Kalk, K. H., Swarte, M. B. & Drenth, J. (1984). Structure of papain refined at 1.65 A resolution. J. Mol. Biol. 179, 233-256. Karplus, P. A. k Schulz, G. E. (198T). Refined structure of glutathione reductase at 154 A resolution. J. Mol. Biol. 195, 701-729. Kauzmann, W. (1959). Some factors in the interpretation of protein denaturation. Advan. Prot. Chem. 14, l-64. Kim, P. S. & Baldwin, R. L. (1982). Specific intermediates
et al. in the folding of small proteins and the mechanism of protein folding. Annu. Rev. B&hem. 51, 459489. Klotz, I. M. & Franzen, J. S. (1962). Hydrogen bonds between model peptide groups in solution. J. Amer. Chem. Sot., 84, 3461-3466. Koepke, J., Maslowska, M., Heinemann, U. & Saenger, W. (1989). Three-dimensional structure of ribonucleate T, complexed with guanylyl-2’,5’-guanosine at 1.8 A resolution. J. Mol. Biol. 206, 475488. Liljas, A. & Rossmann, M. G. (1974). X-ray studies of protein interactions. Annu. Rev. B&hem. 43, 475-507. Loll, P. J. & Lattman, E. E. (1989). The crystal structure of the ternary complex of staphlococcal nuclease, Ca”. and the inhibitor pdTp, refined at 1.65 il. Proteins: Struct. Funct. Genet. 5, 1833201. Murray-Rust, P. & Glusker, J. P. (1984). Directional hydrogen-bonding to sp2 and sp3 hybridized oxygen atoms and its relevance to ligand-macromolecule interactions. J. Amer. Chem. Sot. 106, 1018-1025. Norris, G. E., Anderson, B. F. t Baker, E. N. (1986). Blue copper proteins. The copper site in Azurin from Alcaligenes denitrijcans. J. Amer. Chem. Roe. 108, 2784-2785. Papamokos, E., Weber, E., Bode, W., Huber, R., Empie, M. W., Kato, I. & Laskowski, M. (1982). Crystallographic refinement of Japanese quail ovomucoid, a Kazal-type inhibitor, and model building studies of complexes with serine proteases. J. Mol. Biol. 158, 515-537. Pauling, L. & Corey, R. B. (1951). The pleated sheet. a new layer configuration of polypeptide chains. Proc. Nat. Acad. Sci., U.S.A. 37, 251-256. Pauling, L., Corey, R. B. & Branson, H. R. (1951). The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Nat. Acad. Sci., [J.S.A. 37, 205211. Perutz, M. F. (1951). New X-ray evidence on the configuration of polypeptide chains. Nature (London), 167, 1053-1054. Pflugrath, J. W., Wiegand, G., Huber, R. & Vertesy, L. (1986). Crystal structure determination, refinement’ and the molecular model of the alpha-amylase inhibitor Hoe-467A. J. Mol. Biol. 189, 383-386. Phillips, S. E. (1980). Structure and refinement of oxymyoglobin at 1.6 A resolution. J. Mol. Biol. 142, 531-54. Pierrot, M., Haser, R., Frey, M., Bruschi, M., le Gall, J., Sieker, L. C. C Hensen, L. H. (1976). Some comparisons between two crystallized anaerobic bacterial ruberdoxins from Desulfovibrio gigas and D. vulgaris. .J. Mol. Biol. 107, 179-182. Presta, L. G. & Rose, G. D. (1988). Helix signals in proteins. Science, 240, 1632-1641. Privalov, P. L. (1979). Stability of proteins. Advan. Protein Chem. 33, 167-241. Privalov, P. L. & Gill, S. J. (1988). Stability of protein structure and hydrophobic interaction. Advan. Protein. Chem. 39, 191-234. Rashin, A. A. & Honig, B. (1984). On the environment of ionizable groups in globular proteins. J. Mol. Biol. 173, 515-521. Rees, D. C., Lewis, M. & Lipscomb, W. N. (1983). Refined crystal structure of carboxypeptidase A at 1.54 A resolution. J. Mol. Biol. 168, 367-387. Remington, S., Wiegand, G. & Huber, R. (1982). Crystallographic refinement and atomic models of $wo different forms of citrate synthase at 2.7 and 1.7 A resolution. J. Mol. Biol. 158, 111-152. Richards, F. M. (1977). Areas, volumes, packing and
Hydrogen Bonding in Proteins protein structure. Annu. Rev. Biophys. Bioeng. 6, 151-176. Richards, F. M. & Kundrot, C. E. (1988). Identification of structural motifs from protein co-ordinate data: secondary structure and first-level supersecondary structure. Proteins: Struct. Func. Genet. 3, 71-84. Richardson, J. S. & Richardson, D. C. (1988). Amino acid preferences for specific locations at the ends of a helices. Science, 240, 1648-1652. Richardson, J. S., & Getzoff. E. D. & Richardson, D. C. (1978). The /? bulge: a common small unit of nonrepetitive protein structure. Proc. Nat. Acud. Sci., U.S.A. 75, 2574-2578. Rose, G. D. (1979). Hierarchic organization of domains in globular proteins. J. Mol. Biol. 134, 447470. Rose. G. D.. Gierasch, L. M. & Smith, J. A. (1985). Turns in peptides and proteins. Advan. Protein. Chem. 37, l-109. Sielecki, A. R., Fedorov, A. A., Boodhoo, A., Andreeva, N. S. & James, M. N. (1999). Molecular and crystal structures of monoclinic porcine pepsin refined at 1.8 A resolution. J. Mol. Biol. 214, 143-170. Singh. J., Thornton, J. M., Snarey, M. & Campbell, S. F. (1987). The geometries of interacting argininecarboxyls in proteins. FEBS Letters, 224, 161- 171. Skarzynski, T., Moody. P. C. E. t Wonacott, A. J. (1987). holo-glyceraldehyde-3-phosphate Structure of dehydrogenase from Bacilllus stearothermophilus at 1.8 angstrom resolution. J. Mol. Biol. 193, 171-187. Smith. W. W., Burnett, R. M.. Darling, G. D. & Ludwig, M. L. (1977). Structure of the semiquinone form of flavodoxin from Clostridium MP. J. Mol. Biol. 117, 1955225. Stenkamp, R. E., Sieker, L. C.. Jensen, L. H., McCallum, J. D. & Sanders-Loehr, J. (1985). Active site structures of deoxyhemerythrin and oxyhemerythrin. Proc. Nat. Acad. Sci., U.S.A. 82, 713-716. Susi, H., Timasheff, S. N., & Ard, J. S. (1964). Near infrared investigation of interamide hydrogen bonding in aqueous solution. J. Biol. Chem. 239, 3051-3054. Takano, T. & Dickerson, R. E. (1981). Conformation change of cytochromOe c. I. Ferrocytochrome c structure refined at 1.5 A resolution. J. Mol. Biol. 153, 79-94. Taylor. R. & Kennard, 0. (1984). Hydrogen bond geometry in organic crystals. Act. Chem. Res. 17, 320-326.
1159
Taylor, R., Kennard, 0. & Versichel, W. (1983). Geometry of the N-H.. .O = C hydrogen-bond. 1. Lone pair directionality. J. Amer. Chem. Sot. 105, 5761-5766. Taylor, R., Kennard, 0. & Versichel. W. (1984). Geometry of the N-H...0 = C hydrogen-bond. 3. Hydrogen bond distances and angles. Acta. Crystallogr. 40, 280-288. Vedani, A. & Dunitz, J. D. (1985). Lone-pair directionality in hydrogen-bond potential functions for molecular mechanics calculations: the inhibition of human carbonic anhydrase II by sulfonamides. J. Amer. Chem. Sot. 107, 7653-7658. Venkatachalam, C. M. (1968). Stereochemical criteria for polypeptides and proteins: conformation of a system of three linked peptide units. Biopolymers. 6, 14251436. Vijay-Kumar, S., Bugg, C. E. & Cook, W. J. (1987). Structure of ubiquitin refined at 1.8 A resolution. J. Mol. Biol. 194, 531-544. Weaver, L. H. & Matthews. B. W. (1987). Structure of bacteriophage T4 lysozyme refined at 1.7 A resolution. J. Mol. Biol. 193, 189-199. Weber, E., Steigemann, W., Jones, T. A. & Huber, R. (1978). The structure of oxy-erythrocruorin at 1.4 A resolution. J. Mol. Biol. 120, 327-336. Wistow, G., Turnell, B., Summers, L., Slingsby, C., Moss, D., Miller, L., Lindley, P. & Blundell, T. (1983). X-ray analysis of0 the eye lens protein gamma-II crystallin at 1.9 A resolution. J. Mol. BioZ. 170, 175-202. Wlodawer, A., Bott, R. & Sjolin, L. (1982). The refined crystal structure of ribonuclease A at 2.0 A resolution J. Biol. Chem. 257, 13251332. Wlodawer, A., Deisenhofer, J., Huber, R. (1987). Comparison of two highly refined structures of bovine pancreatic trypsin inhibitor. J. Mol. Biol. 193, 145-156. Zehfus, M. H. & Rose, G. D. (1986). Compact units in proteins. Biochemistry, 25, 5759-5765. Zhang, R.G ., Joachimiak, A., Lawson, C. L.. Schevitz, R. W., Otwinowskiu, Z. & Sigler, P. B. (1987). The crystal structure of trp aporepressor at 1.8 A shows how binding tryptophan enhances DNA affinity. Nature (London), 327, 591-597.
Edited by P. E. Wright