J. Mol. Biol. (1992) 226, 1143-1159

Hydrogen Bonding in Globular Proteins Douglas F. Stickle Department of Biological Chemistry Pennsylvania State University College of Medicine Hershey, PA 17033, U.S.A.

Leonard G. Presta Genentech, Inc Department of Protein Engineering 460 Point San Bruno Blvd. South San Francisco, CA 94080, U.S.A.

Ken A. Dill Department of Pharmaceutical Chemistry University of California 3333 California Street, Room 102 San Francisco, CA 94118, U.S.A.

and George D. Rose? Department of Biochemistry and Biophysics University of North Carolina, CB 7260 FLOB Chapel Hill, NC 27599, U.S.A. (Received

7 February

1992, accepted 13 April

1992)

A global census of the hydrogen bonds in 42 X-ray-elucidated proteins was taken and the following demographic trends identified: (1) Most hydrogen bonds are local, i.e. between partners that are close in sequence, the primary exception being hydrogen-bonded ion pairs. (2) Most hydrogen bonds are between backbone atoms in the protein, an average of 68 yo. (3) All proteins studied have extensive hydrogen-bonded secondary structure, an average of 82 %. (4) Almost all backbone hydrogen bonds are within single elements of secondary structure. An approximate rule of thirds applies: slightly more than one-third (37%) form i-G - 3 hydrogen bonds, almost one-third (32 %) form i G-4 hydrogen bonds, and slightly less than one-third (26%) reside in paired strands of P-sheet. The remaining 5% are not wholly within an individual helix, turn or sheet. (5) Side-chain to backbone hydrogen bonds are clustered at helix-capping positions. (6) An extensive network of hydrogen bonds is present in helices. (7) To a close approximation, the total number of hydrogen bonds is a simple function of a protein’s helix and sheet content. (8) A unique quantity, termed the reduced number of hydrogen bonds, is defined as the maximum number of hydrogen bonds possible when every donor : acceptor pair is constrained to be 1:l. This quantity scales linearly with chain length, with 071 reduced hydrogen bond per residue. Implications of these results for pathways of protein folding are discussed.

Keywords: hydrogen

tAuthor Biophysics.

to whom correspondence Washington

University

bonding;

H-bonding; helix

should

be addressed

School

of Medicine,

protein capping

at his current Box

1143 0022%2836/92/161143-17

$08.00/O

folding;

8231,

660

protein

address: Dept. 8. Euclid

Ave.,

secondary

structure;

of Biochemistry St Louis, MO 0

8: Molecular 63110, U.S.A.

1992 Academic

Press

Limited

1144

D. F. Stickle

1. Introduction The current view of protein folding and stability has been conditioned by ideas developed during the preceding several decades of research (Dill, 1990). The driving force for folding was initially thought to be intramolecular hydrogen bonding. This was Pauling’s view, and it led him to the crucial role ascribed to hydrogen bonding in model structures for a-helix (Pauling et al., 1951) and p-sheet (Pauling & Corey, 1951). Such models, which were constrained by the known geometry of the peptide unit (Corey, 1948), sought to optimize both the number and the geometry of >N-H.. ~0 = C< hydrogen bonds. The existence of these predicted structures was soon confirmed in ongoing X-ray crystal studies (Perutz, 1951), bolstering the assumption that hydrogen bonds play the formative role in folding and stability. This early view of hydrogen bonding was later supplanted by the current “text-book” view that the hydrophobic effect, not hydrogen bonding, serves as the driving force for protein folding (Kauzmann, 1959). However, protein folding cannot be simply a matter of burying apolar residues while exposing polar ones. Even residues with hydrophobic side-chains have a pronounced hydrogenbonding capacity due to the presence of backbone > N-H and >C = 0 groups. Were such residues unable to realize hydrogen bonds within the molecular interior, then hydrogen bonding would favor denaturation, since presumably these same groups could hydrogen bond readily to water in the unfolded state. One approach to understanding the patterns and principles of hydrogen bonding in proteins is to survey a representative collection of molecules and search out recurrent themes. Such an approach was used by Baker & Hubbard (1984) in an elegant and influential review that laid the groundwork for many later studies of proteins, including the present one. Hydrogen bond geometry in proteins has been analyzed in individual molecules (Artymiuk & Blake, 1981) and by systematic surveys (Baker & Hubbard, 1984; Singh et al., 1987; Barlow & Thornton, 1988; Ippolito et al., 1990; Jeffrey & Saenger, 1991). Small-molecule crystal structures have also provided a wealth of information about hydrogen bond geometry (Ceccarelli et al., 1981; Jeffrey & Maluszynska, 1982; Taylor et al., 1983; Murray-Rust & Glusker, 1984; Taylor 6 Kennard, 1984; Vedani & Dunitz, 1985). Our emphasis in the present study is less on geometry and more on the numbers, types and patterns of hydrogen bonds, especially their relationship to protein secondary structure. Of particular interest is the concept of a network hydrogen bond, in which a given donor or acceptor participates in two or more hydrogen bonds. Central to this study is a novel collection of computer algorithms that identify hydrogen bonds and hydrogen-bonded secondary structure in a twostep procedure. (1) Identification of hydrogen

et al.

bonds, from X-ray co-ordinates, is based on multiple criteria (similar though not identical to those of Baker and Hubbard (1984)) extracted from both protein and small-molecule crystal structures. (2) Identification of secondary structure, from hydrogen bonds, is based solely on the automatic recognition of signature patterns in a plot of donors versus acceptors, similar in intent to work of Richards & Kundrot (1988). In the following pages, the comprehensive set of hydrogen bonds from a data base of 42 high-resolution, X-ray-elucidated proteins is analyzed for patterns and trends. Following this analysis, it becomes apparent that most hydrogen bonds are between backbone atoms, within individual elements of secondary structure; the majority of these are local, i.e. between partners that are close in sequence. Hydrogen bonds between main-chain and side-chain atoms are also highly local, with conspicuous clusters at characteristic sequence distances. Such clusters are found to be indicative of frequent helix-capping interactions (Presta & Rose, 1988; Richardson & Richardson, 1988), in which polar side-chains from residues that flank helix termini form hydrogen bonds with the initial >NH donors or final >C = 0 acceptors in the helix backbone. In proteins, approximately half of all polar groups are buried within the molecular interior (Chothia, 1976; Richards, 1977), where, shielded from solvent, they nevertheless find intramolecular hydrogen bond partners and are rarely left unsatisfied (Chothia, 1975; Richards, 1977). A lingering question asks how buried polar groups can realize hydrogen bonds so effectively within such a constrained microenvironment. The prevalence of local interactions can rationalize this long-standing conundrum. In this view, the accretion of structure involves prior formation of a sequentially local hydrogen-bonded framework (Kim & Baldwin, 1982), with concomitant or subsequent condensation of these nascent elements into larger composites (Rose, 1979; Fetrow et al. 1988).

2. Materials (a) Identifiation

and Methods

of hydrogen bonds in X-ray-elucidated protein structures

Hydrogen-bonding criteria were derived from smallmolecule crystal studies and are similar to those used in an earlier study (Presta & Rose, 1988). For each protein, hydrogen bonds were identified in a a-step process. First, polar groups were classified by atom type and hybridization into 1 of 5 categories: Nsp2, Nsp3, Osp’, Osp3, or Ssp3. All groups and their classes are listed in Table 1. Second, distances between all potential donor-acceptor pairs in the protein were evaluated. A pair was retained for further analysis whenever its interatomic distance was less than or equal to the sum of the hydrogen bond radii of its respective atoms. It should be noted that the hydrogen bond radii in Table 1 are about 10% larger than the corresponding van der Waals’ radii. Expanded radii were used to account for the presence of hydrogen atoms

1145

Hydrogen Bonding in Proteins

Table 1 Hydrogen bond donors and acceptors from amino acid residues NS$?

>NH peptide, 190

-NH, Asn, Gln

Trp

190 A (2)

A (1)

N5p’ Donors

osp3

asp=

Acceptors

Nsp’

asp=

SW, Thr 1.70 A (2) = 0

08p2

ssp3

All main-chain classified by atom donor or acceptor, (4) the maximum

-OH

-coo-

Asp, Glu 160 A (2/O)

peptide, Am, Gln 1.60 A (2) -SMet 1.95 A (2)

1.73(l)

-SH CYS 2.10 A (2)

and aide-chain polar groups from the 20 natural amino acid residues. Groups are type (N or 0) and hybridization (8p2or sp3).Each Table entry contains (1) either a (2) residues in which the group is found, (3) the assigned hydrogen bond radius, and number of hydrogen bonds expected for the group, in parenthesis.

and to compensate for the fact that protein crystal structures lack the resolution of small-molecule crystal strucbetween the tures. The maximum distance electronegative heavy atoms in a hydrogen-bonded pair is usually taken to be 3.1 to 32 A (Baker & Hubbard, 1984). The use of expanded radii retains hydrogen bonds with slightly longer interatomic distances but acceptable geometry in other respects. Finally,

hydrogen

bond

geometry

was

evaluated

for

pairs satisfying the distance criterion. Two angle criteria were assessed, as summarized in Fig. 1: (1) Angle at the acceptor atom, i.e. the scalar angle between D-A-AA. For sp2 acceptors, the ideal value is 135”, and a range of 90 to 180” was accepted. For sp3 acceptors, the ideal value is 1095 ‘, and a range of 60 to 180 ’ was accepted. (2) Angle at the donor atom, i.e. the scalar angle between A-DDD. For sp2 donors, the ideal value is 120”, and a range of 90 to 180” was accepted. For sp3 donors, the ideal value is 1095”, and a range of 90 to 180” was accepted. Studies

of

both

small-molecule

and

protein

crystal

structures have shown that hydrogen bonds involving sp2 donors and/or acceptors exhibit a preferred geometry with the donor/acceptor situated in the plane of its sp2 partner (Artymiuk t Blake, 1981; Ceccarelli et al., 1981; Taylor et al., 1983; Vedani & Dunitz, 1985; Singh et aZ., 1987). Hence, potential hydrogen bond pairs with at least 1 .sp2 partner that satisfied criteria (1) and (2) were subjected to 2 additional tests for planarity. (3) Acceptor in the plane of the donor complex. Planes a (D-DD-DD’) and /I (A-DDD) were defined. The ideal value of the angle between normals to a and /? is 0”;

deviations up to + 60” were accepted. (4) Donor in the plane of the acceptor complex. Planes y (A-AA-AA2) and 6 (D-A-AA) were defined. The ideal value of the angle between normals to y and 6 is 0”; deviations up to + 90’ were accepted. Criterion (3) imposes the constraint that the oxygen acceptor must lie no more than 60” outside the plane of the donor complex. The constraint imposed by criterion (4) allows greater latitude between the position of the nitrogen donor and its acceptor complex because the large

JpA$

AA2

Figure 1. Hydrogen bonding criteria for X-ray-elucidated proteins. AA is an acceptor antecedent atom that is covalently bound to the acceptor; DD and DD’ are donor antecedent atoms that are covalently bound to the donor. Co-ordinates for hydrogen atoms are not available at the resolution of typical protein X-ray crystal structures, and geometric criteria are evaluated using only heavy atoms. The distance between donor and acceptor heavy atoms must not exceed the sum of their hydrogen bond radii. Scalar angles at both the acceptor atom (i) and donor atom (ii) must satisfy criteria given in Materials and Methods. For hydrogen bonds with at least one .sp2 partner, tests for planarity were imposed, as described.

1146

D. F. Stickle et al.

oxygen atom, with its broad “cone of acceptance” can still form an effective hydrogen bond when the hydrogen donor is considerably out-of-plane (Taylor et al., 1984). As an example, consider a backbone >N.. .O = C< hydrogen bond. The angle between the normals to planes defined by atoms N,-G-C: and O-N&; should be near O”, but deviations from planarity of up to +60” are accepted. (b) Identi&xztion

of hydrogen-bonded

secondary structure

The relationship between the protein sequence and the complete set of hydrogen bonds can be visualized from a square plot of donors wemus acceptors (Factor & Mehler, 1991), as illustrated in Fig. Z(a). Visual inspection of the donor versus acceptor plot indicates that most hydrogen bonds are within patterns of secondary structure. These hydrogen bonding patterns can be interpreted in the same manner as the corresponding patterns in a distance plot (Liljas & Rossmann, 1974), with helices and turns describing a trace that runs parallel to and near the diagonal, strands of antiparallel sheet running perpendicular to the diagonal, and strands of parallel sheet running parallel to but distant from the diagonal. Even those hydrogen bonds involving side-chains, which are displayed as open circles in Fig. 2(a), are almost all enmeshed within these 3 predominant secondary structure patterns. The clear patterns seen in Fig. 2(a) have prompted us to devise simple algorithms to scan donor verrus acceptor plots automatically and to identify secondary structure elements objectively. These algorithms start by filtering the donor versus acceptor plot to include only backbone : backbone hydrogen bonds. The filtered set, represented by the filled circles in Fig. 2(a), is then dissected into helix and hydrogen-bonded turns (Fig. 2(b)), antiparallel sheet (Fig. 2(c)), and parallel sheet. (Fig. 2(d)). A brief description of the algorithms follows. (i) Helices and turns Helices and /?-turns have characteristic patterns of > N-H. .O = C!< hydrogen bonds. In the a-helix, hydrogen bonds link residues i-G-4 (Pauling et al., 1951); in 3,, helices and turns (Rose et al., 1985) they link residues i-G-3. To identify helices and turns, a donor versu.s acceptor plot of dimension N x N is filtered to include only the 2 diagonals, 6-, and a-,, that parallel the main diagonal at either (i,i-3) or (i&-4), respectively. Next, each minor diagonal is mapped into a separate N x 1 binary array (Am3 and A-,, respectively) such that for every hydrogen bond in the diagonal, either 4 (for &,) or 5 (for 6-J consecutive array elements are set to true (i.e. unity). For example, corresponding to an i+i - 3 H-bond from 6-,, positions (i, i- 1, i-2 and i-3) would be set to unity in A-,. Finally, the logical union of these arrays, 4-3 u A-,, is mapped onto a composite array and searched for start (011. . ,) and stop (110. .) sequences. Disjunct start and stop sequences bracket the ends of helices while coincident start and stop sequences correspond to isolated hydrogen bonded turns. j-sheet

On a donor versus acceptor plot, the interstrand pattern of > N-H. . .O = C < hydrogen bonds in /?-sheet gives rise to a symmetric trace that is either perpendicular to the main diagonal in antiparallel strands or parallel to the main diagonal in parallel strands. To identify all such traces. the plot, is first reflected across the main diagonal, resulting in a triangular matrix, and all hydrogen bonds previously identified as belonging to helices or turns are

eliminated. Then, every diagonal (parallel strands) and anti-diagonal (antiparallel strands) file of the triangular matrix is scanned exhaustively for the presence of characteristic seeds, which consist of 2 consecutive hydrogen bonds. Once initiated by a seed, the extension of a segment allows for slight deviations from the ideal pattern in order to accommodate “bulges” (Richardson et al., 1978) of no more than 3 residues. In a manner analogous to helices, segments terminate when start (011. .) or stop (110. .) sequences are encountered, providing the structure is not within a bulge. It should be emphasized that our algorithms count all qualifying atoms in each hydrogen-bonded ring as belonging to its given element of secondary structure. For example, a classical p-turn (Venkatachalam, 1968), with an i-G-3 hydrogen bond, forms a lo-membered ring spanning 4 residues. In our accounting, an isolated turn would be counted as having 1 donor, 1 acceptor, 1 hydrogen bond and 4 residues within identified secondary struc-

ture.

3. Results (a) Classi~cation Hydrogen

bonds

were

identified

in

42

X-ray-elucidated proteins from a recent version of the Brookhaven data base (Bernstein et al., 1977). All p;oteins are refined structures, with resolution I 2.OA and R factors I20 o/O. Proteins are listed in Table 2 together with a summary of the numbers and types of hydrogen bonds. The automatic classification of backbone hydrogen bonds into secondary structure types was also performed for each protein, with results shown in Table 2. (b) Network

hydrogen

bonds

The 6734 residues in our data base were found to have 7299 hydrogen bonds, slightly more than one per residue. It should be noted, however, that many hydrogen bonds belong to networks in which each donor (acceptor) is bonded to multiple acceptors (donors). Thus, there are more hydrogen bonds than donor:acceptor pairs. We refer to hydrogen bonds that participate in such networks as network hydrogen bonds. Associated with any given cluster of hydrogen bonds is a unique number, %, which corresponds to the maximum number of hydrogen bonds possible for that set of donors and acceptors if every donor : acceptor pair were constrained to be 1 :l. In other words, ‘%(cluster) is the maximum number of non-network hydrogen bonds that could be formed for the set. We refer to %(cluster) as the reduced number of hydrogen bonds for the cluster. It is well known that co-operative hydrogen bonds are stronger than non-co-operative hydrogen bonds between the same atoms (Jeffrey & Saenger, 1991). Thus, a hydrogen bond to a backbone >C = 0 will strengthen a second hydrogen bond to the backbone >NH of that same peptide bond (and conversely) by further polarizing the peptide unit. In proteins, the hydrogen-bonding contribution to conformational stability is even further enhanced

Hydrogen Bonding in Proteins

1147

.

I,, 50

100 Donor

Donor residue (0)

50

100

150

0

residue (b)

50

100

150

Diagonal Cdl Helix/turns A sheet P sheet Al I sheet Structure 0

50

100

150

Residue (e)

Figure 2. Identification of hydrogen bonds and hydrogen-bonded secondary structure for dihydrofolate reductase (4DFR). (a) Donor versus acceptor plot. The 159 residues give rise to a 159 x 159 square plot, with hydrogen bond donors on the abscissa and acceptors on the ordinate. The presence of a hydrogen bond is indicated by a circle at the corresponding co-ordinate position. All hydrogen bonds are shown. Backbone:backbone hydrogen bonds are indicated by filled circles; hydrogen bonds that involve at least 1 side-chain partner are indicated by open circles. Hydrogen bond identification is described in Materials and Methods. Most hydrogen bonds are seen to be between backbone atoms. Regular secondary structure results in characteristic patterns on the plot, as described in the text. It should be noted that a single point can represent multiple hydrogen bonds, e.g. both N and R from a given residue donate to the same acceptor residue. (b) Helices and turns are identified as any contiguous sequence of residues between and including i-G- 3 or i-G-4 hydrogen bonds. Hydrogen bonds are represented by filled circles and contiguous sequences are shaded. (c) and (d) After subtraction of hydrogen bonds involved in helices and turns, the square donor versus acceptor plot is reflected across the main diagonal and collapsed to a triangular plot. Paired strands of antiparallel p-sheet describe a trace normal to the main diagonal (c); paired strands of parallel B-sheet describe a trace that parallels the main diagonal (d). These traces, shown as filled circles, are identified using canonical patterns, as described in the text. Each identified trace represents a composite that can be deconvoluted into 2 discrete segments of contiguous residues, indicated by the shaded regions emanating from that trace. (e) Results of helix and sheet identification algorithms, illustrated in (b) to (d) and described in the text, are summarized graphically. Each row corresponds to one or more categories of secondary structure; positive identification is indicated by shading. The 5 rows correspond to (1) helices and turns, indicated by diagonal stripes, (2) antiparallel /?-sheet, indicated by vertical stripes, (3) parallel B-sheet, indicated by horizontal stripes, (4) either parallel or antiparallel b-sheet, indicated by cross-hatching, and (5) the logical union of all categories, indicated by a filled bar. It is apparent that most residues participate in regular hydrogen-bonded secondary structure. As noted in the text, our algorithms count all qualifying atoms in each hydrogen-bonded ring as belonging to its given element of secondary structure. For example, an isolated turn (with I donor, 1 acceptor and 1 hydrogen bond) is counted as having 4 residues within identified secondary structure. For this reason, our secondary structure assignments tend to be supersets of those given in the Protein Data Base (Bernstein et al., 1977).

1INS

1HOE

1HMQ

3GRS

lGP1

lGD1

1GCR

4FXN

1ECD

4DFR

5CYT

2CTS

1CRN

5CPA

5CHA

2CDV

2CA2

lBP2

LAZA

2APP

2ACT

A

51

74

113

461

35

331

174

138

39

35

123

358

9

249

91

113

4

10

9

47

1

34

20

14

14

161

136

18

106

159

18

49

8

42

26

16

45

16

16

37

28

9

R:O

75

387

39

239

126

56

137

106

72

190

146

71

N:O

Hydrogen

103

437

46

307

237

107

256

123

129

323

218

82

35lC

Cytochrome C551 (oxidized), 1.6 A (Almassy & Dickerson, 1978) Actinidin, 20 A (Baker, 1980) Penicillopepsin, 18 A (James & Sielecki, 1983) Azurin (oxidized), 18 A (Norris et al., 1986) Phospholipase A2, 1.7 A (Dijkstra el al., 1981) Carbonic anhydrase, 1.9 A (Eriksson et al., 1988 Cytochrome C3, 1.8 4 (Higuchi et al., 1984) Alpha chymotrypsin A, 1.67 A (Blevins & Tulinsky, 1985) Carboxypeptidase A Alpha, 154 A (Rees et al., 1983) Crambin, 1.5 A (Hendrickson & Teeter, 1981) Citrate synthase, 20 A (Remington et al., 1982) Cytochrome c (reduced), 1.5 A (Takano & Dickerson, 1981) Dihydrofolate reductase, 1.7 A (Bolin et al., 1982) Hemoglobin (erythrocruorin), 1.4 A (Weber et al., 1978) Flavodoxin (semiquinone form), 1.8 (Smith et al., 1977) Gamma-II crystallin, 1.6 A (Wistow et al., 1983) Glyceraldehyde-3-P dehydro., 1.8 A (Skarzynski et al., 1987) Glutathione peroxidase, 20 A (Epp et al., 1983) Glutathione reductase, 1.54 A (Karplus & Schulz, 1987 Hemerythrin (met), 2.0 A (Stenkamp et al., 1985) Amylase inhibitor, 2.0 A (Pflugrath et al., 1986) Insulin, 1.5 A (Baker et al., 1988)

n

Protein

Table 2

6

9

6

32

2

44

24

9

9

9

10

48

3

39

25

14

38

15

17

46

35

7

N:R

Bonds

2

8

20

50

0

49

23

10

15

18

3

40

2

48

14

16

46

18

20

39

29

6

R:R

51

62

158

487

12

376

158

146

199

151

106

524

52

368

191

102

266

155

125

312

238

93

Sum

37

46

87

315

11

249

120

90

108

104

66

320

28

233

140

66

173

85

89

232

153

55

D

135

51

A

31

46

83

295

11

229

112

93

99

101

66

293

28

216

135

61

176

80

83

220

Residues

Hydrogen bond in 42 X-ray-elucidated

41

56

111

382

12

309

152

107

132

128

80

389

35

284

176

81

235

110

112

280

190

67

D

Sites

35

51

103

350

11

277

135

103

115

114

72

337

30

261

159

76

222

100

101

261

172

59

A

proteins

30

46

96

324

11

233

113

94

106

109

71

328

32

221

141

61

187

90

83

233

136

57

D

Groups

32

48

80

314

11

238

113

94

104

97

62

306

28

223

136

61

181

a4

85

255

149

49

A 3

3

3

4

22

3

17

6

8

3

7

7

17

3

14

11

6

15

8

6

22

12

n

fnfnf

0667

0203

08.58

0.529

0457

0548

0178

0587

0978

0365

0718

0762

0630

0596

6325

6570

6418

o-780

0434

0390

0482

6683

Helix/Turn

0

0

0

11

0

15

0

6

0

8

0

2

0

11

12

4

7

0

4

13

2

0

---

0.128

6235

6268

0283

0909

0189

0017

0065

0129

0202

9186

6018

PBS

Secondary

0

0

0

0

0

0

0

0

0

0

2

7

0

18

2

11

12

0

0

5

2

4

2

5

14

4

13

4

8

21

10

0

0118

0649

0217

0171

0190

0552

0214

o-068

0027

0109

0111

0414

6112

0359

0130

0372

0502

0294

ABS

Structure

0

0

0

0

6118

0649

0330

6171

0389

0552

0268

6421

0968

0.037

o-109

0.241

0.418

6178

0.422

0.130

0.496

0.598

9298

0

0

0.0

f

Sheet

0725

0797

0858

o-833

0543

0867

0718

6819

O-978

6742

0767

6787

0739

6811

6688

0673

6797

o-854

0876

0873

0733

0683

f

Hemoglobin v (cyano, met), 2.0 A (Hendrickson et al., 1973) Lysozyme, 15 A (Blake et al., 1983) Lysozyme, 1.7 A (Weaver & Matthews, 1987) Myoglobin (oxy), 1.6 A _(Phillips, 1980) Ovomucoid third domain, 15 A (Papamokos et al., 1982) Papain, 165 A (Kamphuis et al., 1984) Apo-Plastocyanin, 1.8 A (Cuss & Freeman, 1983) Pepsin, 1.8 A (Sielecki et al., 1990) Avian pancreatic polypeptide, 1.37 A (Glover et al., 1983) Trypsin inhibitor, 1.0 A (Wlodawer et al., 1987) Rubredoxin, 1.4 A (Pierrot et al., 1976) Ig Bence-Jones protein, 1.6 A (Furey et al., 1983) Lys 25ribonuclease Tl, 1.8 A (Koepke et al., 1989) Ribonuclease A, 1.26 A (Wlodawer et al., 1982) Scorpion neurotoxin, 1.8 A (Almassy et al., 1983) Staphylococcus nuclease, 1.65 A (Loll & Lattman, 1989) Thermolysin, 1.6 A (Holmes & Matthews, 1982) Beta-trypsin, 1.4 A (Bode & Schwager, 1975) Ubiquitin, 1.8 A (Vijay-Kumar et al.,* 1987) Trp repressor, 1.65 A (Zhang et al., 1987) 137

4970

6734

52

123

265

101

34

74

65

61

24

27

33

193

50

132

28

194

179

105

165

105

76

220

316

135

65

124

104

114

52

58

36

326

99

212

56

153

164

130

149

792

9

9

25

31

13

5

25

15

13

6

4

1

36

9

31

9

17

17

20

16

760

7

11

26

39

19

6

13

12

12

9

6

0

37

12

27

12

13

20

16

16

777

3

4

16

61

18

3

18

12

11

2

2

1

30

9

31

11

20

26

18

5

7299

156

76

190

396

151

48

130

104

97

41

39

35

296

80

221

60

244

242

159

202

4695

82

49

121

249

103

35

86

65

76

32

29

21

223

63

145

35

131

128

96

119

4383

75

50

111

225

89

32

85

62

69

27

26

20

208

62

132

33

119

119

88

107

93

59

171

303

123

40

110

82

89

38

32

22

259

75

185

46

158

163

121

136

82

57

154

281

109

39

105

76

80

34

31

20

242

69

164

43

143

144

108

120

4515

86

53

136

252

92

30

89

69

68

28

29

21

212

59

135

37

138

137

98

117

79

50

140

249

100

32

82

61

70

28

28

20

216

60

141

37

129

126

95

114

3

6

12

14

6

5

0556

0905

0579

0.386

0595

0530

0.477

0.379

0452

7 5

9377

0731

0362

6694

0377

0.384

0491

0.411

0941

0860

0815

0.893

8

5

3

1

23

7

11

3

3

7

9

3

0

4

9087

9276

0098

5 0

0.060

0.032

0114

0160

0.202

0.038

6100

2

0

2

0

4

0

0

0

10

4

4

0

0

0

2

0

0

0

0

0

0

0

0

0

0

0

0

0

5

12

12

6

3

6

7

6

4

2

0

22

7

9

3

0

4

4

0

6245

0.539

o-427

9218

0396

9338

9419

0.346

9526

0365

0.293

0.460

0485

0.292

0250

9998

0.146

0

0

0

0

0308

O-592

0427

0282

0396

0338

o-444

6346

0614

0365

0293

9555

0586

0316

9250

0.098

9246

0895

0709

0.839

9858

0754

0790

0750

0.886

0923

0638

0694

0828

9838

o-731

0625

0941

0927

0.923

9893

9811

0 0905

0

0

0

Summary of numbers and types of hydrogen bonds in 42 high-resolution proteins from the Brookhaven (Bernstein et al., 1977) protein data base (PDB), analyzed as described in Materials and Methods. The Table contains 22 columns. Columns 1 to 3 identify each protein and include: (1) PDB identifier, (2) protein name and resolution of structure, and (3) number of residues in the protein. Columns 4 to 8 list the number of hydrogen bonds: (4) backbone donor to backbone acceptor (N:O), (5) side-chain donor to backbone acceptor (R:O), (6) backbone donor to side-chain acceptor (N:R), and (7) side-chain donor to side-chain acceptor (R:R), and (8) total hydrogen bonds (2: columns 4 to 7). Columns 9 to 14 list the count of donors (D) and acceptors (A): (9) number of donor residues, (10) number of acceptor residues, (11) number of donor atoms, (12) number of acceptor atoms, (13) reduced number of donors, and (14) reduced number of acceptors. The lesser values of columns 13 and 14, shown in boldface, is %(protein). Columns 15 to 22 list secondary structure by type: (15) number of contiguous segments of helix and/or hydrogen bonded turns, (16) fraction V, of protein in helix and/or hydrogen-bonded turns, (17) number of contiguous segments of parallel b-sheet (PBS), (18) fraction of protein in parallel /I-sheet, (19) number of contiguous segments of antiparallel b-sheet (ABS), (20) fraction of protein in parallel B-sheet, (21) fraction of protein in sheet, the logical-or of columns 18 and 20 (i.e. antiparallel /?-sheet verau8 parallel /Isheet), and (22) fraction of protein in helix and sheet, the logical-or of columns 16 and 22 (i.e. helix uer8u.s sheet). Note: residues at the interface between 2 elements of secondary structure may be counted twice, once in each segment. For this reason, the logical-or of b-sheet, in column 21, and the logical-or of all secondary structure, in column 22, are typically less than the arithmetic sum of their components.

2WRP

1UBQ

1TPP

3TLN

1SNC

lSN3

7RSA

3RNT

2RHE

1RDG

5PTI

1PPT

4PEP

2PCY

9PAP

2ovo

lMB0

3LZM

lLZl

2LHB

1150

D. F. Stickle et al.

by the existence of co-operative networks. The difference between the total number of hydrogen bonds and the reduced number of hydrogen bonds is proportional to this co-operative enhancement. %(protein) is easily determined. First, the number of hydrogen bonds is tallied for the entire molecule by evaluating every polar group (from Table 1) in each residue. Then, for every donor with multiple acceptors each acceptor is tested for the presence of multiple donors. Whenever an acceptor in this set is found to have a unique donor, all other acceptors for that donor are eliminated, reducing the number of acceptor sites. Next, this paring operation is repeated for every acceptor with multiple donors. Finally, the winnowed lists of donors and acceptors are counted; the lesser number is ‘%(protein). For each molecule, %(protein) is given in Table 2. In all, the 7299 hydrogen bonds in the data set of 42 proteins give rise to 4515 reduced hydrogen bonds, an average multiplicity of 1.6 hydrogen bonds for each reduced hydrogen bond. From Table 2, most hydrogen bonds are between backbone atoms (68%), with the remainder distributed almost uniformly over other categories. The 42 proteins have slightly more than one hydrogen bond per residue (1.08) and slightly more than twothirds of a reduced hydrogen bond per residue (0.67). Both the total number of hydrogen bonds and the reduced number of hydrogen bonds can be reliably estimated from the fraction of hydrogen-bonded secondary structure. Table 2 lists the measured fraction of helix and sheet in each protein. Using these known fractions, a three-parameter, leastsquares, best fit to the equation: number of hydrogen bonds = Af,N+BfsN+C(l-f,-fs)N. (1) was calculated, where f, is the fraction of helix, fa the fraction of sheet, and N the number of residues in the protein. The value of the equation is taken as either the total or reduced number of hydrogen bonds. The subroutine ZXSSQ from the IMSL Subroutine Library (1987) was used to determine coefficients A, B and C that minimize the residual over the set of proteins. From the fitted coefficients, both the total and reduced number of hydrogen bonds can be estimated as: number of hydrogen bonds = 1*49f,N +0*65f,N +0.50( 1 -fd-fO)N. number of reduced hydrogen bonds = 080j~N+961fSN+0.40(1-f,-f~)N.

(2) (3)

In Figure 3, the number of hydrogen bonds predicted by equation (2) or (3) is plotted against the observed number of hydrogen bonds. As seen in the Figure, predicted values closely approximate observed values; the linear regression lines through these data have slopes near unity and intercepts near zero. Comparison of the coefficients in equations (2) and (3) is instructive. Most network hydrogen bonds

are seen to be within helices and turns. The ratio of the coefficients off, in the two equations is almost 2:1, a consequence of the fact that segment geometry is usually consistent with both i+(i-4) and i+(i--3) hydrogen bonds. The co-operative nature of helical hydrogen bonds should enhance the stability of helices. The number of hydrogen bonds is expected to depend significantly upon the relative proportions of helix and sheet, since most network hydrogen bonds are in helices. However, the reduced number of hydrogen bonds, which is normalized to the number of donors/acceptors, should not have a similar dependency. In fact, a plot of the number of residues versus the reduced number of hydrogen bonds (from Table 2), shown in Figure 3(c), is reasonably linear, and can be described by the equation: Number of hydrogen bonds = 0.714 (number of residues) -6.8

(p = 699).

(4)

(c) Patterns of hydrogen bonding in proteins The global census of hydrogen bonds, reported in Table 2, can be used to reveal patterns of hydrogen bonding in proteins. Histograms of the distance in sequence between all donor : acceptor pairs are shown in Figure 4. Four general conclusions are supported by the figure: (1) Helices and turns (i.e. ,i+i-4 and i-+-3 hydrogen bonds) are the predominant type of backbone : backbone hydrogen bond. Figure 4(a) shows large peaks at these positions, against a featureless, low-level background. Of the 4970 backbone : backbone hydrogen bonds in the data base, 1838 (37%) make i+i-3 hydrogen bonds, 1582 (32%) make i-i-4 hydrogen bonds, and 1311 (26%) are in hydrogen bonded strand-pairs in /?-sheet’, a total of 95% in turns, helix and sheet. A note of caution is in order, however. Since most network hydrogen bonds are within helices, the combined i-+i-4 and i-+i--3 peaks in Figure 4(a) represent a much smaller number of reduced hydrogen bonds. From equations (2) and (3), the expected number of reduced backbone : backbone hydrogen bonds is 3257, with 56% in helix/turn and 38% in P-sheet. (2) Most side-chain-donor to backbone-acceptor hydrogen bonds are local. Further, pronounced peaks are evident in Figure 4(b) for i-i-4 and i-i-3 interactions. Within these peaks, 90% of the i+i - 4 and 82% of the i-G-3 interactions are in single-helical segments, consistent with extensive C-terminal helix capping (data not shown) (Presta & Rose, 1988; Richardson & Richardson, 1988). (3) Again, most backbone-donor to side-chain acceptor hydrogen bonds are local. The main peaks in Figure 4(c) are at i-+i--2, i+i--3, and i+i-1. and, to a lesser extent, at i+i+3. These positional preferences are even more sharply differentiated than those of side-chain-donors to backbone-acceptors. Within these four peaks, 78% of the i-+&3,

Hydrogen Bonding in Proteins

I 100

I 200

Number

I

I

6

300 of hydrogen (a)

(b)

I

400 bonds

I

I

500

1

600

1151

53% of the i+i-2, 43% of the idi-1, and 93% of the i-G+ 3 interactions are in single-helical segments, consistent with extensive N-terminal helix capping (data not shown) (Presta & Rose, 1988; Richardson & Richardson, 1988). (4) While side-chain : side-chain hydrogen bonds are more uniformly distributed than those of other classes, a marked preference for local interactions is nonetheless evident in Figure 4(d). The prevalence of local interactions in Figure 4(a) to (c) indicates that, in these cases, hydrogen bond formation is governed primarily by proximity in sequence. The distribution of side-chain : side-chain hydrogen bonds (Fig. 4(d)), though also biased toward local interactions, is visibly broader than the other three classes. The broader distribution of side-chain : side-chain hydrogen bonds suggests that non-local interactions may play a larger role in this case. To investigate non-local interacbions further, all hydrogen bonds between side-chains were classified by residue type, as shown in Table 3. A high incidence of hydrogen-bonded ion pairs is immediately apparent: more than two-thirds of all hydrogen bonds involving Lys, Arg and His are with acidic residues and, conversely, more than half of all hydrogen bonds involving Asp, Glu and His are with basic residues. Charged residues comprise the largest single category among both donors and acceptors. Of the total of 757 donors, 41 y. are either Lys or Arg, while 59% of the 769 acceptors are either Asp or Glu. The absolute number of hydrogen bonds between side-chain groups is normalized in Table 3C. Corresponding to each donor : acceptor pair, (ij), is a table ent’ry, fij, which measures whether the pair is more common in the database than would be expected if assortment were random. The fractionfii is given by: Aj

=

(nij/Npairs)/(ni

x njlN&alL

(5)

where nij is the number of hydrogen bonded pairs between side-chain groups of residues i and j, ni and nj are the total number of residues i and j in the database (from Table 2B), Npairs is the total number

Figure 3. Predicting the number of hydrogen bonds. Eqn (1) and the known fractions of hydrogen-bonded secondary structure from Table 2 were used to derive

Number

of residues (cl

coefficients for predicting the number of hydrogen bonds. (a) The observed number of hydrogen bonds versus predicted number for 42 proteins, using eqn (2). The equation of the linear regression line through these data is: N predicted = @96 X Ntota, hydrogen bonds + 8.6 (p = 699). (b) The observed number of reduced hydrogen bonds versus predicted number for 42 proteins, using eqn (3). The regression line is: NprediEted = 095 x Nreduced hydrogen bonds + 7.4 (p = 999). (c) The number of residues versus the reduced number of hydrogen bonds, from Table 2. The linear regression line through these data is given by eqn (4).

1152

D. F. Stickle et al.

-io

-‘5

lb (b)

60, 50 200 25

L 5

Distance

-5

IO

in sequence

Distance

(cl

0 in sequence (d)

Distance

in sequence (el

Figure 4. Sequential distance between donors and acceptors. Histograms show the number of hydrogen bonds in the database as a function of the sequential distance between donor and acceptor sites. An expanded scale, shown in insets, discloses the smaller details at larger distances. By convention, sequence distance is taken to be the (residue number of the donor) - (residue number of the acceptor); for classical a-helical > N-H. . .O = C < hydrogen bonds between i and i - 4, this distance is + 4. Peaks in the histograms reveal characteristic interactions in proteins, as discussed in the text. The plots are dominated by these large peaks. Plots are truncated at + 10, by which point all histograms have decayed to an undifferentiated, low-level background. Hydrogen bonds are separated into classes, each of which is displayed separately. (a) Backbone:backbone hydrogen bonds. (b) Side-chain donors to backbone acceptors. (c) Backbone donors to side-chain acceptors. (d) Side-chain:side-chain hydrogen bonds. (e) Same as (d), with charged residues (Lys, Arg, Asp and Glu) omitted.

Hydrogen Bonding in Proteins

1153

Table 3 A. Side-chain

: side-chain

hydrogen

bonds

Acceptors

Donors

Met

SW

1 1 0 1 2 2 3 0 1 11

11 7 6 10 2 1 6 7 13 ii5

Ser Thr Gln Asn TY~ Trp His LYS hi% Sum

Thr

CYS

Gln

Asn

TY~

Asp

Glll

1 2 0 1 1 0 2 0 0 7

9 3 2 7 4 2 1 5 6 39

7 6 1 8 4 1 4 6 6 43

7 3 6 5 3 2 2 14 11 53

33 26 9 25 24 8 21 54 72 272

22 8 6 9 19 1 22 52 41 Is0

14 10 4 2 2 2 4 9 6 53

His

Sum

6 5 1 2 3 1 7 4 6 35

111 71 35 70 64 20 72 151 162

The number of side-chain :side-chain hydrogen bonds, for each type of residue pair. Donors are arrayed in rows, acceptors in columns. Partial sums for each residue type are shown. Among the 6734 residues in the data base of 42 proteins, there are 757 donors and 769 acceptors involved in a total of 756 hydrogen bonds between side-chains. IS. Residues

by type

Ala Arg Asn Asp cys Gln Glu ‘JY His Ile

539 226 330 400 171 247 339 600 156 351

A listing

496

LYS

414 121 236 276 530 416 106 305 475

Met Phe Pro Ser Thr Trp ‘M Val

of residues

C. Normalized

LeU

by type.

side-chain

There

: aide-chain

are 6734 hydrogen

residues

in the database

of 42 proteins.

bonds

Acceptors

Donors

Ser Thr Gln Asn ‘br Trp His LYS -4%

Met

Ser

Thr

Cys

Gln

Asn

Tyr

Asp

Glu

His

0.16 0.22 040 0.28 060 I.70 1.74 0.00 640

043 034 950 962 0.14 920 980 0.34 I.18

970 963 042 616 618 650 667 056 970

012 0.30 090 0.20 0.20 090 682 000 om

0.74 O-32 0.36 094 058 084 928 0.54 I.18

0.44 948 0.14 0.80 044 032 0.84 048 0.88

048 0.26 086 0.54 035 0.68 046 1.20 I.74

1.70 I.70 1 .oo 2.06 2.14 206 366 356 368

1.33 062 0.78 0.88 2.00 0.30 4.52 404 582

030 084 028 042 068 666 313 968 I.86

The database of 6734 residues from 42 proteins contains 2750 residues with side-chain donors and 3015 residues with side-chain acceptors. From these, there are 756 hydrogen-bonded pairs between side-chains. Plainly, most polar side-chain groups do not form hydrogen bonds with other polar sidechain groups. The question addressed by the Table is whether a given hydrogen-bonded pair, (ij), is more common in the database than it would be if assortment were random. A value of unity represents random assortment; values of less than (greater than) unity indicate that the given pair occurs less frequently (more frequently) than would be expected by chance.

of hydrogen

bonded pairs between all side-chain is the total number of residues in the database. A value of unity represents random assortment; that is, the frequency of occurrence of pair (ij) among all pairs matches the frequency of occurrence of residues i and j among all residues. Values less than (greater than) unity indicate that the given pair is observed less frequently (more

groups,

and

N,,,,,

frequently) than expected by chance. It should be noted that the ordered pair (ij) differs from the pair (i,i). In particular, (i,i) denotes the pair formed between residue i as donor and residue i as acceptor, and this case differs only trivially from that of a general (ij) pair with i#;j. Upon normalization, it becomes apparent that most hydrogen bonds between side-chain groups

1154

D. F. Stickle

et al.

IC

0-S

0.6 .-s z e L 0.4

0.2

2.5 N:O

R:O

N:R

R:R

Figure 5. Assortment of hydrogen bonds with secondary structure. Histograms show the normalized fraction of hydrogen bonds, by type, that are wholly or partially contained within units of regular secondary structure. Filled patterns represent that fraction within a single element of regular secondary structure, i.e. both donor and acceptor contributed by residues from a single cc-helix or b-sheet. Checkerboard patterns represent that fraction between elements of regular secondary structure. i.e. either donor or acceptor contributed by residues from one cc-helix or p-sheet with remaining partner contributed by a second helix or sheet. Open patterns represent that fraction half-in an element of regular secondary structure, i.e. eit,her donor or acceptor contributed by residues from an g-helix or p-sheet with remaining partner from a residue outside regular secondary structure. Cross-hatched patterns represent that fraction without, regular secondary structure. i.e. neither donor nor acceptor contributed by residues belonging to an element of regular secondary structure. As shown, fractions were normalized within each of four categories: backbone to backbone (NC), side-chain donor to backbone acceptor (RX)). backbone donor to side-chain acceptor (N:R), and side-chain donor to side-chain acceptor (R:R). Almost all backbone to backbone hydrogen bonds are seen t,o be within single elements of secondary structure.

occur less often than expected by chance. Still, hydrogen-bonded ion pairs are a conspicuous exception, as seen in Table 3C. When charged residues are subtracted from the ensemble of hydrogen bonds between side-chains. the remaining distribution, shown in Figure 4(e), resembles those seen in Figure 4(a) to (c), with local peaks at or near i--G+2 against a uniform low-level background. Summarizing Figure 4, with the exception of hydrogen-bonded ion pairs, all peaks are local. (d) Hydrogen-bonded

secondary

structure

Hydrogen-bonded secondary structure assignments were made using algorithms described in Materials and Methods. It should be emphasized that segments of secondary structure identified by these algorithms can include non-hydrogen-bonded residues. For example, an isolated i-i-3 hydrogen

3.5 Bo%e”g+h

4.0

(X,

Figure 6. Hydrogen bond geometry: bond lengths. Hydrogen bond lengths were not allowed to exceed the summed van der Waals’ radii of their respective electronegative heavy atoms (see Materials and Methods); no other distance-dependent criteria were imposed. For the 42 proteins in this study, the histogram of observed lengths is narrowly distributed around 2.9 to 3.0 A, the optimal distance expected for > C = 0. .H-N < hydrogen bonds when N, H and 0 are colinear.

bond would be classified as a four-residue turn, consisting of residues (i, i - 1, i - 2 and i - 3). A consequence of this classification strategy is that segments tend to be slightly longer than t’heir counterparts identified using multiple criteria, especially in the case of helices. In comparison with the stringent criteria of Presta & Rose (1988), the present classification encompasses all actual helices together with the peptide chain turns at either terminus. Despite these methodological constraints, secondary structure assignments based solely upon hydrogen bonding are objective and automatic, and they are appropriate for studying trends in hydrogen bonding per se. This latter assertion is borne out’ by the distribution of segment lengths (not’ shown) among identified segments of helix and sheet, which resemble those derived using more sophisticated. multi-parameter methods (Kabsch & Sander, 1983). study have extensive All proteins in this secondary structure (Table 2). The total fraction of helix, hydrogen-bonded turns and sheet ranges from 54% in glutathione peroxidase to 98% in erythrocruorin hemoglobin, with a mean of 82e/, for the 42 proteins. These secondary structure assignments can be used to analyze the extent to which a protein‘s hydrogen bonds are apportioned between regular and non-regular secondary structure. As shown in Figure 5, the ensemble of hydrogen bonds was partitioned into four categories: (1) wholly within an element of regular secondary structure; (2) between elements of regular secondary structure; (3) half-in an element of regular secondary structure; and (4) excluded entirely from elements of regular secondary structure.

Hydrogen Bonding in Proteins 2000

1500

1155

,

I

60

60

60

(a)

(b)

120

I

120

180

I500 1

60

60

60 Angle

Angle

(

(c )

v) 600

(“)

id )

800

-

600

-

cz 0 z - 400

60 Angle (el

Angle

(“)

If)

nd angles. The preferred geometry found in both small-molecule and protein Figure 7. Hydrogen bond geometry crystal structures was used to establis reening criteria, as described in Materials and Methods. Histograms show the distribution of angles in the 42-protein a base. The scalar angle at the donor atom, A-D-DD, was divided into (A) sp’ and (B) sp3 donors. Note: extensive str 3 of carbohydrates (Ceccarelli et al., 1981) provide a model for -OH sp3 donors, but comparable models for the E-NH, : ionor in Lys do not exist. This fact, together with the comparative disorder of Lys side-chains, has prompted us to LX slightly the limits imposed on sp3 donors from Lys residues, which are excluded from the distribution in (b). Similarly, the scalar angle at the acceptor atom, D-A-AA, was divided into (c) sp2 and (d) sp3 acceptors. The apparent bimodal distribution of sp2 acceptors, seen in (c), is due to the differing geometries of i-i-3 and i-+-4 hydrogen bonds, which are shown individually in (e) and (r), respectively.

1156

D. F. Stickle

As seen in figures, more than 95% (namely, 4731 of the 4970) of the backbone: backbone hydrogen bonds are within single elements of secondary structure, i.e. an individual helix or paired strands of Bsheet. hydrogen bonds involving side-chains are distributed almost uniformly across the first three categories, with few excluded entirely from regular secondary structure. (e) Hydrogen

bonding

geometry

The variability observed in hydrogen bonding geometry is analyzed in this section. Geometry is necessarily constrained to lie within the limits imposed by the methods used to identify hydrogen bonds. Within these constraints, however, hydrogen bond lengths and angles may vary from a uniform distribution within their permitted ranges to a narrow clustering around their ideal values. Of course, better geometry will result in stronger hydrogen bonds. The observed distribution of hydrogen bond lengths is shown in Figure 6. According to Baker & Hubbard (1984), the mean 0. * .N distance fo! >CO= 0. * .H-N< hydrogen bonds ~is 2.99 A (1 A = 0.1 nm) in a-helices and 2.91 A in /?-sheet. Our lumped distances for all classeF lie near this range, with a mean of 304 ( +0*24) A. The distribution of donor angles (i.e. scalar angle between acceptor-donor-donor antecedent) is shown in Fi ure 7, subdivided into sp2 donors (Fig. 7(a)) and sp Q donors (Fig. 7(b)). In both cases, the distributions are narrow and have peaks that coincide with ideal trigonal (120”) or tetrahedral (1095”) values, respectively. The distribution of acceptor angles (i.e. scalar angle between donoracceptor-acceptor antecedent) is also shown in Figure 7, subdivided into sp2 acceptors (Fig. 7(c)) and sp3 acceptors (Fig. 7(d)). Both histograms are broadly distributed around ideal values. The apparent bimodal distribution of sp2 acceptors (Fig. 7(c)) is a consequence of the distinctly different i+i - 3 and i+i - 4 hydrogen bond geometries; both types are abundant in proteins. This bimodal distribution is broken down into individual components. In Figure 7(e) are shown acceptor angles involved in an N:O hydrogen bond between i and i- 3, and in Figure 7(f) are shown corresponding angles between i and i-4. From the Figure, it is apparent that the geometry of i+i- 3 hydrogen bonds is less favorable than that of i-i-4 hydrogen bonds.

Discussion (a) Hydrogen

bonds and secondary

structure

Ideas about the role of hydrogen bonding in proteins have changed over time. It was Pauling’s view that intramolecular hydrogen bonding provides the driving force for folding (Pauling et al., 1951), but, within the decade, this notion was reappraised, and the hydrogen bond came to be perceived as energetically neutral or even unfavorable (Klotz & Franzen,

et al.

1962; Susi et al., 1964). The issue continues to evolve, as underscored by recent work of Privalov and Gill (1988) who, once again, attribute a significant fraction of the folding energy to hydrogen bonding. If they are correct, then the interpretation of hydrogen bonding will have come full circle. One approach to understanding hydrogen bonding in proteins is to survey molecules of known structure for diagnostic patterns. In the present work, all hydrogen bonds in 42 high-resolution, X-ray-elucidated proteins from the Brookhaven data base (Bernstein et al., 1977) were counted and classified systematically. Upon analysis, several patterns of interest emerge. Most hydrogen bonds are seen to be local (Fig. 4) (i.e. between donors and acceptors that are proximate in sequence) and predominantly within single elements of secondary structure (Fig. 5). On average, a large majority of the protein (82%) is involved in regular, hydrogen-bonded secondary structure, with slightly more than two-thirds (68%) of all hydrogen bonds situated between backbone polar groups ( > N-H. . .O = C < ). Of particular interest are network hydrogen bonds, in which each donor (acceptor) is bonded to multiple acceptors (donors). Almost all central residues of helices are within networks because helix geometry is commensurate with both i+i-4 and i-i-3 hydrogen bonds simultaneously (see Fig. 7(e) and (f)). Such networks are evident in Figure 2(a) as two adjacent traces that parallel the main diagonal at positions of helices. Most network hydrogen bonds are localized within helices, as demonstrated by comparison of equations (2) and (3). The 42 proteins have 7299 hydrogen bonds, slightly more than one per residue (l-08). It is likely that the existence of multiply determined hydrogen bonds enhances the elasticity of an a-helix. Returning to the topic of hydrogen bond energetics, an open question asks whether intramolecular hydrogen bonds can be favorable in 55 M water. In proteins, this question is of crucial importance when evaluating helix capping; namely that process in which hydrogen bond partners for the first four >N-H groups and the last four >C = 0 groups of helices are supplied by polar side-chains of residues that flank the helix termini (Presta & Rose, 1988; Richardson & Richardson, 1988). If helix then corresponding sidecapping is ubiquitous, chain to main-chain hydrogen bonds should be both favorable and abundant in proteins. An indirect answer to this question is furnished by the analysis illustrated in Figure 4(b) and (c). Within the entire spectrum of side-chain:main-chain hydrogen bonds, the only pronounced peaks are situated at helix capping loci, against an otherwise undifferentiated, low-level background. The Figure provides persuasive, albeit indirect, evidence for the stability of this category of intramolecular hydrogen bonding. Indeed, we suspect that a directed analysis of existing X-ray elucidated crystal structures may disclose the existence of further, hitherto overlooked, interactions of this type.

Hydrogen Bonding in Proteins Although most hydrogen bonds are’local, hydrogen-bonded ion pairs represent a notable exception, as illustrated in Figure 4(d) and (e). This abundant class of interactions comprises more than half of all side-chain:side-chain hydrogen bonds (Table 3). The frequent occurrence of hydrogen-bonded ion pairs suggests a correlative significance in protein folding and stability (Barlow t Thornton, 1983; Rashin 8z Honig, 1984). Most proteins are stable over a broad range of pH that extends, in the extreme, to the pK values of charged residues (Privalov, 1979). This observation has been construed to mean that charges per se are not the relevant factor. However, the predominance of charged pairs, conspicuous in Table 3C, is provocative. Several useful rules-of-thumb emerge from our study. The reduced number of hydrogen bonds scales linearly with chain length (eqn (4)), with an average of 071 hydrogen bond per residue. Hydrogen bonds (both total and reduced) can be well estimated from secondary structure content (eqns (2) and (3)). In the case of network hydrogen bonds, most are found within helices. (b) Implications

for protein

folding

In native proteins, approximately half of all polar groups are buried within the molecular interior, sequestered from bulk solvent (Chothia, 1975, 1976; Richards, 1977). In the unfolded state, such groups could form hydrogen bonds with solvent readily. Hence, if these same groups were not satisfied in the folded state, then hydrogen bonding would favor denaturation. Not surprisingly then, it has been found that buried polar groups in proteins are almost invariably hydrogen bonded (Chothia, 1975; Richards, 1977). The observation

that most buried polar groups are hydrogen bonded confirms a plausible expectation. Less obvious is the mechanism by which such hydrogen bonding is achieved. Specifically, how do polar groups realize hydrogen bonds so effectively while constrained within a molecular interior that is as tightly packed as an organic crystal (Richards, 1977)? The experimental finding that most hydrogen bonds are local (i.e. between partners close in sequence) can rationalize this conundrum. As shown in the preceding analysis, the majority of hydrogen bonds are between backbone groups and within elements of secondary structure. Extensive sidechain to main-chain hydrogen bonding is observed too; these interactions are also local, primarily within single helices. The prevalence of local interactions, including both main-chain and side-chain groups, is consistent with a process wherein most hydrogen bonds are formed between nearby partners at an early step in folding. Accretion of folded structure would then involve persistence of this hydrogen-bonded framework (Kim & Baldwin, 1982), together with concomitant or subsequent condensation of these nascent

elements

of secondary

structure

into

larger

1157

modules (Zehfus & Rose, 1986). Iteration of this step-wise condensation process would lead to the observed hierarchic architecture of protein molecules (Crippen, 1978; Rose, 1979). We thank Eaton Lattman and Bruno Zimm for their critical reading of this manuscript, and David Custer for technical assistance. This study was supported by National Institutes of Health grant GM 29458.

References Almassy, R. J. & Dickerson,*R. E. (1978). Pseudomonas cytochrome ~551 at 20 A resolution: enlargement of the cytochrome c family. Proc. Nat. Ad. Sk., U.S.A. 75, 2674-2678. Almassy, R. J., Fontecilla-Camps, J. C., Suddath, F. L. & Bugg, C. E. (1983). Structure of variant-3 scorpion neurotoxin from- Centruroides sculpturatus Ewing, refined at 1.8 A resolution. J. Mol. Biol. 170, 497-527. Artymiuk, P. J. & Blake, C, C. F. (1981). Refinement of hyman lysozyme at 15A resolution. Analysis of nonbonded and hydrogen-bond interactions. J. Mol. Biol. 152, 737-762. Baker, E. N. (1980). Structure of actinidin, after refinement at 1.7 A resolution. J. Mol. Biol. 141, 441484. Baker, E. N. & Hubbard, R. E. (1984). Hydrogen bonding in globular proteins. Progr. Biophys. Mol. Biol. 44, 97-179. Baker, E. N., Blundell, T. L., Cutfield, J. F., Cutfield, S. M., Dodson, E.J., Dodson, G. G., Hodgkin, D. M. C., Hubbard, R. E., Isaccs, N. W., Reynolds, C. D., Skabe, K., Sakabe, N. & Vijayan, M. (1988). The structure of 2Zn pig insulin crystals at 1.5 A resolution. Phil. Trans. Roy. Sot. Loud. B, 319, 369456. Barlow, D. J. & Thornton, J. M. (1983). Ion-pairs in proteins. J. Mol. Biol. 168, 867-885. Barlow, D. J. & Thornton, J. M. (1988). Helix geometry in proteins. J. Mol. Biol. 201, 601619. Bernstein, F. C., Koetzle, T. G., Williams, G. J. B., Meyer, E. F., Jr., Brice, M. D., Rogers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). The protein data bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535-542. Blake, C. C., Pulford, W. C. & Artymiuk, P. J. (1983). X-ray studies of water in crystals of lysozyme. J. Mol. Biol. 167, 693-723. Blevins, R. A. & Tulinsky, A. (1985). The refinement and the s$ucture of the dimer of alpha-chymotrypsin at 1.67 A resolution. J. Biol. Chem. 269, 42644275. Bode, W. & Schwager, P. (1975). The re@ed crystal structure of bovine beta-trypsin at 1.8 A resolution. II. Crystallographic refinement, calcium binding site, benzamidine binding site and active site at pH 7.0. J. Mol. Biol. 98, 693-717. Bolin, J. T., Filman, D. J., Matthews, D. A., Hamlin, R. C. & Kraut, J. (1982). Crystal structures of Escherichia eoli and Lactobacillus casei dihydrofolate reductase refined at 1.7 A resolution. J. Biol. Chem. 257, 13,6&X13,662. Ceccarelli, C., Jeffrey, G. A. & Taylor, R. (1981) A survey of 0-H. . .O hydrogen-bond geometries determined by neutron diffraction. J. Mol. Struct. 70, 255271. Chothia, C. (1975). Structural invariants in protein folding. Nature (London), 254, 304-308. Chothia, C. (1976). The nature of the accessible and buried surfaces in proteins. J. Mol. Biol. 105, l-14.

1158

D. F. Stickle

Corey, R. B. (1948). X-ray studies of amino acids and peptides. Advan. Prot. Chem. 4, 385406. Crippen, G. M. (1978). The tree structural organizaton of proteins. J. Mol. Biol. 126, 315-332. Dijkstra, B. W., Kalk, K. H., Hol, W. G. J. & Drenth, J. (1981). Structure of bovine pancreatic phospholipase A2 at 1.7 A resolution. J. Mol. Biol. 147, 97-123. Dill, K. A. (1990). Dominant forces in protein folding. Biochemistry, 29, 7133-7155. Epp, O., Ladenstein, R. & Wendel, A. (1983). The refined structure of the selenoenzyme glutathione peroxidase at 02-nm resolution. Eur. J. Biochem. 133, 51-69. Eriksson, A. E., Jones, T. A. & Liljas, A. (1988). Refined structure of human carbonic anhydrase II at 20 A resolution. Proteins: Struct. Funct. Genet. 4, 274282. Factor, A. D. & Mehler, E. L. (1991). Graphical representation of hydrogen bonding patterns in proteins, Protein Eng. 4, 421425. Fetrow, J. S., Zehfus, M. H. & Rose, G. D. (1988). Protein folding: new twists. Bio/Technol. 6, 167-171. Furey, W., Jr, Wang, B. C., Yoo, C. S. & Sax, M. (1983). Structure of a novel Bence-Jones protein (Rhe) fragment at 1.6 A resolution. J. Mol. Biol. 167, 661692. Glover, I., Haneef, I., Pitts, J., Wood, S., Moss, D., Tickle, I. & Blundell, T. (1983). Conformational flexibility in a small globular hormone: x-ray analysis of avian pancreatic polypeptide at 098 A resolution. Biopolymers, 22, 293-304. Guss, J. M. & Freeman, H. C. (1983). Structure of oxidized poplar plastocyanin at 1.6 A resolution. J. Mol. Biol. 169, 521-563. Hendrickson, W. A. & Teeter, M. M. (1981). Structure of the hydrophobic protein crambin determined directly from the anomalous scattering of sulphur. Nature (London), 290, 107-l 13. Hendrickson, W. A., Love, W. E. C Karle, J. (1973) Crystal structure analysis of sea lamprey hemoglobin at 2 angstrom resolution. J. Mol. Biol. 74, 331-361. Higuchi, Y., Kusunoki, M., Matsuura, Y., Noritake, Y. & Kakudo, M. (1984). Refined structure of cytochrome cj at 1.8 A resolution. J. Mol. Biol. 172, 109139. Holmes, M. A. & Matthews, B: W. (1982). Structure of thermolysin refined at 1.6 A resolution. J. Mol. Biol. 160, 623-639. IMSL (1987). International Mathematical Subroutine Library, 7500 Bellaire Blvd., Houston, TX 77036. Ippolito, J. A., Alexander, R. S. & Christianson, D. W. (1990). Hydrogen bond stereochemistry in protein structure and function. J. Mol. Biol. 215, 457-471. James, M. N. G. & Sielecki, A. R. (1983). Structure and refinement of penicillopepsin at 1.8 A resolution. J. Mol. Biol. 163, 29%361. Jeffrey, G. A. t Maluszynska, H. (1982). A survey of hydrogen-bond geometries in the crystal structures of amino acids. Int. J. Biol. Macromol. 4, 173-185. Jeffrey, G. A. & Saenger, W. (1991). Hydrogen Bonding in Biological Structures, Springer-Verlag, New York. Kabsch, W. t Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577-2637. Kamphuis, I. G., Kalk, K. H., Swarte, M. B. & Drenth, J. (1984). Structure of papain refined at 1.65 A resolution. J. Mol. Biol. 179, 233-256. Karplus, P. A. k Schulz, G. E. (198T). Refined structure of glutathione reductase at 154 A resolution. J. Mol. Biol. 195, 701-729. Kauzmann, W. (1959). Some factors in the interpretation of protein denaturation. Advan. Prot. Chem. 14, l-64. Kim, P. S. & Baldwin, R. L. (1982). Specific intermediates

et al. in the folding of small proteins and the mechanism of protein folding. Annu. Rev. B&hem. 51, 459489. Klotz, I. M. & Franzen, J. S. (1962). Hydrogen bonds between model peptide groups in solution. J. Amer. Chem. Sot., 84, 3461-3466. Koepke, J., Maslowska, M., Heinemann, U. & Saenger, W. (1989). Three-dimensional structure of ribonucleate T, complexed with guanylyl-2’,5’-guanosine at 1.8 A resolution. J. Mol. Biol. 206, 475488. Liljas, A. & Rossmann, M. G. (1974). X-ray studies of protein interactions. Annu. Rev. B&hem. 43, 475-507. Loll, P. J. & Lattman, E. E. (1989). The crystal structure of the ternary complex of staphlococcal nuclease, Ca”. and the inhibitor pdTp, refined at 1.65 il. Proteins: Struct. Funct. Genet. 5, 1833201. Murray-Rust, P. & Glusker, J. P. (1984). Directional hydrogen-bonding to sp2 and sp3 hybridized oxygen atoms and its relevance to ligand-macromolecule interactions. J. Amer. Chem. Sot. 106, 1018-1025. Norris, G. E., Anderson, B. F. t Baker, E. N. (1986). Blue copper proteins. The copper site in Azurin from Alcaligenes denitrijcans. J. Amer. Chem. Roe. 108, 2784-2785. Papamokos, E., Weber, E., Bode, W., Huber, R., Empie, M. W., Kato, I. & Laskowski, M. (1982). Crystallographic refinement of Japanese quail ovomucoid, a Kazal-type inhibitor, and model building studies of complexes with serine proteases. J. Mol. Biol. 158, 515-537. Pauling, L. & Corey, R. B. (1951). The pleated sheet. a new layer configuration of polypeptide chains. Proc. Nat. Acad. Sci., U.S.A. 37, 251-256. Pauling, L., Corey, R. B. & Branson, H. R. (1951). The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Nat. Acad. Sci., [J.S.A. 37, 205211. Perutz, M. F. (1951). New X-ray evidence on the configuration of polypeptide chains. Nature (London), 167, 1053-1054. Pflugrath, J. W., Wiegand, G., Huber, R. & Vertesy, L. (1986). Crystal structure determination, refinement’ and the molecular model of the alpha-amylase inhibitor Hoe-467A. J. Mol. Biol. 189, 383-386. Phillips, S. E. (1980). Structure and refinement of oxymyoglobin at 1.6 A resolution. J. Mol. Biol. 142, 531-54. Pierrot, M., Haser, R., Frey, M., Bruschi, M., le Gall, J., Sieker, L. C. C Hensen, L. H. (1976). Some comparisons between two crystallized anaerobic bacterial ruberdoxins from Desulfovibrio gigas and D. vulgaris. .J. Mol. Biol. 107, 179-182. Presta, L. G. & Rose, G. D. (1988). Helix signals in proteins. Science, 240, 1632-1641. Privalov, P. L. (1979). Stability of proteins. Advan. Protein Chem. 33, 167-241. Privalov, P. L. & Gill, S. J. (1988). Stability of protein structure and hydrophobic interaction. Advan. Protein. Chem. 39, 191-234. Rashin, A. A. & Honig, B. (1984). On the environment of ionizable groups in globular proteins. J. Mol. Biol. 173, 515-521. Rees, D. C., Lewis, M. & Lipscomb, W. N. (1983). Refined crystal structure of carboxypeptidase A at 1.54 A resolution. J. Mol. Biol. 168, 367-387. Remington, S., Wiegand, G. & Huber, R. (1982). Crystallographic refinement and atomic models of $wo different forms of citrate synthase at 2.7 and 1.7 A resolution. J. Mol. Biol. 158, 111-152. Richards, F. M. (1977). Areas, volumes, packing and

Hydrogen Bonding in Proteins protein structure. Annu. Rev. Biophys. Bioeng. 6, 151-176. Richards, F. M. & Kundrot, C. E. (1988). Identification of structural motifs from protein co-ordinate data: secondary structure and first-level supersecondary structure. Proteins: Struct. Func. Genet. 3, 71-84. Richardson, J. S. & Richardson, D. C. (1988). Amino acid preferences for specific locations at the ends of a helices. Science, 240, 1648-1652. Richardson, J. S., & Getzoff. E. D. & Richardson, D. C. (1978). The /? bulge: a common small unit of nonrepetitive protein structure. Proc. Nat. Acud. Sci., U.S.A. 75, 2574-2578. Rose, G. D. (1979). Hierarchic organization of domains in globular proteins. J. Mol. Biol. 134, 447470. Rose. G. D.. Gierasch, L. M. & Smith, J. A. (1985). Turns in peptides and proteins. Advan. Protein. Chem. 37, l-109. Sielecki, A. R., Fedorov, A. A., Boodhoo, A., Andreeva, N. S. & James, M. N. (1999). Molecular and crystal structures of monoclinic porcine pepsin refined at 1.8 A resolution. J. Mol. Biol. 214, 143-170. Singh. J., Thornton, J. M., Snarey, M. & Campbell, S. F. (1987). The geometries of interacting argininecarboxyls in proteins. FEBS Letters, 224, 161- 171. Skarzynski, T., Moody. P. C. E. t Wonacott, A. J. (1987). holo-glyceraldehyde-3-phosphate Structure of dehydrogenase from Bacilllus stearothermophilus at 1.8 angstrom resolution. J. Mol. Biol. 193, 171-187. Smith. W. W., Burnett, R. M.. Darling, G. D. & Ludwig, M. L. (1977). Structure of the semiquinone form of flavodoxin from Clostridium MP. J. Mol. Biol. 117, 1955225. Stenkamp, R. E., Sieker, L. C.. Jensen, L. H., McCallum, J. D. & Sanders-Loehr, J. (1985). Active site structures of deoxyhemerythrin and oxyhemerythrin. Proc. Nat. Acad. Sci., U.S.A. 82, 713-716. Susi, H., Timasheff, S. N., & Ard, J. S. (1964). Near infrared investigation of interamide hydrogen bonding in aqueous solution. J. Biol. Chem. 239, 3051-3054. Takano, T. & Dickerson, R. E. (1981). Conformation change of cytochromOe c. I. Ferrocytochrome c structure refined at 1.5 A resolution. J. Mol. Biol. 153, 79-94. Taylor. R. & Kennard, 0. (1984). Hydrogen bond geometry in organic crystals. Act. Chem. Res. 17, 320-326.

1159

Taylor, R., Kennard, 0. & Versichel, W. (1983). Geometry of the N-H.. .O = C hydrogen-bond. 1. Lone pair directionality. J. Amer. Chem. Sot. 105, 5761-5766. Taylor, R., Kennard, 0. & Versichel. W. (1984). Geometry of the N-H...0 = C hydrogen-bond. 3. Hydrogen bond distances and angles. Acta. Crystallogr. 40, 280-288. Vedani, A. & Dunitz, J. D. (1985). Lone-pair directionality in hydrogen-bond potential functions for molecular mechanics calculations: the inhibition of human carbonic anhydrase II by sulfonamides. J. Amer. Chem. Sot. 107, 7653-7658. Venkatachalam, C. M. (1968). Stereochemical criteria for polypeptides and proteins: conformation of a system of three linked peptide units. Biopolymers. 6, 14251436. Vijay-Kumar, S., Bugg, C. E. & Cook, W. J. (1987). Structure of ubiquitin refined at 1.8 A resolution. J. Mol. Biol. 194, 531-544. Weaver, L. H. & Matthews. B. W. (1987). Structure of bacteriophage T4 lysozyme refined at 1.7 A resolution. J. Mol. Biol. 193, 189-199. Weber, E., Steigemann, W., Jones, T. A. & Huber, R. (1978). The structure of oxy-erythrocruorin at 1.4 A resolution. J. Mol. Biol. 120, 327-336. Wistow, G., Turnell, B., Summers, L., Slingsby, C., Moss, D., Miller, L., Lindley, P. & Blundell, T. (1983). X-ray analysis of0 the eye lens protein gamma-II crystallin at 1.9 A resolution. J. Mol. BioZ. 170, 175-202. Wlodawer, A., Bott, R. & Sjolin, L. (1982). The refined crystal structure of ribonuclease A at 2.0 A resolution J. Biol. Chem. 257, 13251332. Wlodawer, A., Deisenhofer, J., Huber, R. (1987). Comparison of two highly refined structures of bovine pancreatic trypsin inhibitor. J. Mol. Biol. 193, 145-156. Zehfus, M. H. & Rose, G. D. (1986). Compact units in proteins. Biochemistry, 25, 5759-5765. Zhang, R.G ., Joachimiak, A., Lawson, C. L.. Schevitz, R. W., Otwinowskiu, Z. & Sigler, P. B. (1987). The crystal structure of trp aporepressor at 1.8 A shows how binding tryptophan enhances DNA affinity. Nature (London), 327, 591-597.

Edited by P. E. Wright

Hydrogen bonding in globular proteins.

A global census of the hydrogen bonds in 42 X-ray-elucidated proteins was taken and the following demographic trends identified: (1) Most hydrogen bon...
2MB Sizes 0 Downloads 0 Views