1. theor. Biol. (1979) 80, 485-504

The Spatial Distribution of Phyakal, Chemknl, Eneqptk and Confonnational Properties of Amino Acid R&duea in Glob&r Proteins M. PRABHAKARAN AND P. K. PONNUSWAMY Department of Physics, Autonomous Postgraduate Centre, Universfry of Madras, Ttruchirapaili 620 020, Tamilnadu, India (Received 6 October 1978, and in revised form 15 March 1979) An attempt is made to obtain information as to the spatial distribution of amino acid residues in globular proteins in terms of their chemical, physical, energetic and conformational properties. The crystallographic data on twenty-one protein molecules form the basis for the study. The properties of the residues, namely, hydrophobicity, polarity, acidity, molecular weight, bulkness, chromatographic index, reftactivity, short/medium/long-range energetics, and powers to adopt a-helical, extended and bend structures are analysed by dividing the protein globule into six concentric shells containing equal numbers of residues. The results show that the decisive factor in determining the spatial position of a residue in a protein molecule is of composite nature involving a compromise between the various properties of the residue. The observed deviations from the general hydrophobic interior and hydrophilic exterior in globular proteins are nicely brought out. A valuable clue is obtained as to the directional properties of a-helical and extended structure segments in proteins. The relative buried, exposed and intermediate characters of the residues in protein globules are obtained in a very realistic approach.

1. Introduction The availability of accurate X-ray crystallographic data on a sufficiently good number of globular protein molecules has prompted many research workers to carry out detailed statistical studies directed towards various kinds of information relating their primary structures with the secondary and tertiary structural features of these macromolecules. Extensive studies were reported on the possibility of assigning the probable conformational states of amino acid residues along the polypeptide chain, thus predicting the secondary structural elements of the protein molecule from the knowledge of its amino acid sequence (Sternberg & Thornton, 1978 and references 485 0022-5193/79/200485

+20

$02.00/O

0

1979 Academic

Press Inc. (London)

Ltd.

486

M.

PRABHAKARAN

AND

P. K. PONNUSWAMY

therein). Efforts were made to identify the common folding patterns in various kinds of protein molecules and their significance in evolution of species (Rao & Rossman, 1973; Rossman & Liljas, 1974; Nishikawa & Ooi, 1974). Recent reports concerned with the possibility of assigning the various amino acid residues into exposed and buried categories and as well as to the residue-residue and atom-atom interaction correlations (Manavalan & Ponnuswamy, 1977 ; Wertz & Sheraga, 1978 ; Warme & Morgan, 1978 ; Ponnuswamy, 1978). In one such study (Manavalan & Ponnuswamy, 1977), we analysed the preferred environments associated with the different kinds of amino acid residues and established the existing residue-residue cooperativities in the globular proteins. The information derived from these concerted efforts, although providing valuable clues to the globular protein structure, remains incomplete for the understanding of the problem of protein folding. It is now clear that although the needed information is interwoven within the amino acid sequence of the polypeptide chain, mere naming of the residues along the chain may not be sufficient to harness the needed information in the required way and it is essential to couple with the amino acid sequence some more parallel integrated clue derivable from the amino acid content constrained by the connectivity of the residues. Probably, this kind of indirect complementary information rests in the different characteristic properties of amino acid residues which concertly cooperate to dictate the linear chain into the specific shape and maintain it under equilibrium. These properties of residues, for example, may be different when the residues are considered individually in environments of simple experiments used to obtain the properties themselves compared to the same in protein environment. The importance of protein environment was clearly demonstrated by our recent investigation (Manavalan & Ponnuswamy, 1978) on the hydrophobic character of amino acid residues in globular proteins wherein we have shown that the hydrophobicity of a residue as observed in protein environment is significantly different from that obtained for the residue in thermodynamic transfer measurement. It was also demonstrated in that study that this protein “bulk hydrophobicity” of a residue is remarkably correlated to its buried/exposed character in the protein globule compared to its individual (organic environment) hydrophobicity. This significant result on the hydrophobic character of the residues prompted us to undertake further studies on the utility of other such properties of the amino acid residues in deriving the latent information that rests in the amino acid composition and their connectivity in the polypeptide chain that could be properly used in attempts to predict the tertiary structure in globular proteins. In spite of the considerable effort directed toward protein folding through studies of nature already mentioned and of empirical

AMINO

ACID

RESIDUES

IN

GLOBULAR

PROTEINS

487

energy calculations (Pullman & Pullman, 1974; Ponnuswamy, Warme & Scheraga, 1973; Ndmethy & Scheraga, 1977), very little has been done to investigate the spatial distribution of the physical, chemical and other kinds of properties of amino acid residues in globular proteins. Here we undertake a more general examination of the folded globular protein in terms of constraints imposed by the various properties of the constituent amino acid residues. The approach used is again of the statistical type with the use of Xray crystallographic data on twenty-one globular protein molecules. As we demonstrate in what follows, the incorporation of the information of the concerted behaviour of the various properties of the residues with the knowledge of the secondary structure derivable from the amino acid sequence can have a profound impact on theoretical attempts to predict the tertiary folding in globular protein molecules. The results of this extensive analysis show that there exist specific cooperativities among the amino acid residues due to the integrated display of their various physical, chemical, energetic and conformational properties, which bring them to preserve the specific preferred environment and spatial position for the individual residues, thus imposing constraints on the folding of the linear chain into a compact three-dimensional shape. The twenty-one globular proteins whose X-ray data form the basis for the present study are : myoglobin, ribonuclease-S, cytochrome-C, lysozyme, staphyloccocal nuclease, carboxypeptidase-A, subtilisin BPN’, xchymotrypsin, carp myogen, cytochrome-b,, apolactate dehydrogenase, bovine trypsin inhibitor, concanavalin-A, flovodoxin, dismutase, tosyl elastase and cytochrome-C,. The references for the crystallographic studies of these proteins are available in the article by Levitt & Greer (1977). In what follows we first describe the method and the properties of amino acids/residues used in the study and we then present the results in subsequent section. 2. Methods The protein molecule was represented by the co-ordinates of the a-carbon atoms of the residues. A sphere of minimum volume including all the acarbon atoms in the protein was constructed and its centroid found out by a simple procedure. Six concentric shells containing equal number of a-carbon atoms (residues) were constructed around the centroid. The distribution of various amino acid residues (their properties) in each of the shells and in each of the twenty-one proteins have been determined. Let d,i., denote the frequency of occurrence of the ith residue (or its specific property) in the jth shell of the kth protein, wherein i varies from 1 to 20 (residues),j varies from

488

M.

PRABHAKARAN

AND

P. K. PONNLJSWAMY

1 to 6 (shells), and k varies from 1 to 21 (proteins). The overall total occurrence of each of the 20 types of residues (properties) in each shell has been worked out by summing up the occurrence in the respective shells for the respective residues in all the proteins ; this quantity is represented by Fj = f fi,‘,, k=

(1)

1

The average occurrence of a residue in each of the shells is given by

A preference index is defined for each of the residues in a particular

shell by (3)

If the index pj exceeds one it shows the preference for the ith shell by the jth residue; if it is less than one it shows that the particular residue does not prefer to be in that particular shell ; a value of pi = 1 indicates the indeterminate attachment between the residue and the shell. By taking the overall frequency of occurrence (FF) as the chance frequency, x2 values were calculated for each of the residue-shell system by the formula x2 = VP -Fi)’ FP

(4)

The distribution of the residues or their representative properties in each of the six concentric shells was worked out and the preferential index for each residue for each shell and the corresponding x2 value were determined for analysis. 3. The Properties of Amino Acid Residues The properties of amino acid residues whose distribution investigated in the protein molecule are of four kinds:

(A)

behaviour were

CHEMICAL

The hydrophobic index (h) which is determined from thermodynamic transfer experiments for the free amino acids ; the surrounding hydrophobic index (H,) which is an empirical parameter determined for the amino acid residue in the protein environment; the polarity index (P) measuring the electric force due to sidechain acting on its immediate surroundings and the

AMINO

ACID

RESIDUES

IN

acid-base parameters pK’ and pHi properties of free amino acids. (B)

GLOBULAR

which

PROTEINS

characterize

489 the ionization

PHYSICAL

Molecular weight (M,), bulkness (B,), chromatographic index (RF) and refractivity index (,u), all corresponding to the free amino acids. (C)

ENERGETICS

Average energy contributions to the stability of a residue in the protein molecule due to short-range (intra-residue, E,), medium-range (neighbours in the vicinity, E,) and long-range (rest of the residues in the protein, E,) nonbonded interactions which characterize step-wise the specificity of the residue conformation. (D)

CONFORMATIONAL

The intrinsic property of the residue to adopt a-helical (a), extended (/?), and bend state (B) conformations which were obtained from the analysis of protein crystal data. The parameter hydrophobic index, first quantitatively assessed for each of the nonpolar amino acids from thermodynamic measurements by Tanford (1962) and extrapolated later to rest of the amino acids by Jones (1975) represents the intrinsic hydrophobic character of the sidechain of the free amino acid with reference to the simple organic surrounding medium used in the original experiments by Tanford. The surrounding hydrophobic index or the bulk hydrophobicity of an amino acid residue, on the other hand, is a parameter determined by Manavalan & Ponnuswamy (1978) taking into consideration that the medium of protein environment in which the residue is embedded is very different from that used by Tanford. This parameter, determined by constructing an optimum sphere of 8 A radius around a residue in protein crystals and by summing up the Tanford-Jones hydrophobic indices assigned to all the residues within this sphere was demonstrated by the authors to correlate with the extent of buriedness of the residue in protein crystals in a far better way than the corresponding Tanford-Jones hydrophobic index of the residue. The polarity index of an amino acid represents, as stated already, the measure of the electric force due to the sidechain acting on its immediate surrounding and was derived on the assumption that the electric force arises partly due to the dipole and partly due to the charge on the ionizable group of the sidechain (Zimmerman et al., 1968).

490

M.

PRABHAKARAN

AND

P. K. PONNUSWAMY

The acid-base character of the amino acid could be described by a number of parameters with reference to the ionizable groups in the molecule. We took two representative parameters, namely, pK’ and pHi, which, respectively, represent the pK value of the COOH group and the isoionic point of the free amino acid. The choice of these two parameters rests on the fact that while the pH, includes the ionizable character of either the sidechain or amino group plus the carboxyl group of the molecule, the pK’ represents that of the carboxyl group alone, thus making use of the needed information. The molecular weight of the amino acid is the mass contribution of the atoms constituting it. The bulkness of the amino acid is defined as the ratio of the sidechain volume to length which provides a measure of the average cross section of the sidechain, thus having relevance to packing considerations. The chromatographic index of an amino acid specifies its characteristic migration rate in a solvent-absorbant system which is a measure of the composite nature of interaction of solute, solvent and absorbant. Different kinds of R, indices are defined in literature, of which we have taken the average ranked value for each molecule over seven solvent systems. It was shown by Woese (1965) that there exists a remarkable correlation between the RF values of the amino acids and the nucleotide codons associated with each of these residues. The refractivity is a measure of the polar&ability of the amino acid ; this is also a reciprocal measure of its electrical stability under an external field. (Refer Zimmerman et al. (1968) for information on these various parameters.) The energy of two interacting residues is conveniently classified into three categories depending upon the number of residues existing between the two residues along the sequence; short-range, medium-range and long-range (Oobatake & Ooi, 1977). The short-range interaction is defined as the energy between the main chain atoms of the residue and its own sidechain atoms, the medium-range interaction as that between the two residues located within ten residues separated further. Oobatake and Ooi calculated the nonbonded energy per atom of the residue in terms of these three classifications for 16 proteins using the atomic co-ordinates obtained by X-ray crystallography and determined the average contribution in each category for each of the amino acid residues. They have shown that the sum of shortrange and medium-range interactions amounts to 72.5% of the total energy, suggesting that the interactions in these ranges are important for the folding of the polypeptide chain. They used the average energy of amino acid residues to estimate the local energy of a segment in protein molecules. The conformational properties represent the intrinsic character of the residue to adopt a-helical, extended and bend state conformations in protein molecules. The helix adopting power of a residue is again divided into three

AMINO

ACID

RESIDUES

IN

GLOBULAR

491

PROTEINS

types, namely, power to be at (i) the N-terminus, (ii) the C-terminus and (iii) the middle position of an a-helical section. The parameters representing these conformational properties of the residues were determined by Chou & Fasman (1974) from an analysis of a number of crystalline globular proteins. The physical, chemical, energetic and conformational parameters of the twenty amino acid residues are collected in Table 1. 4. Results and Discussion (A)

DISTRIBUTION

PAlTERN

OF THE

PROPERTIES

OF AMINO

ACID

RESIDUES

The characteristic distribution patterns of the different properties of the residues as obtained by equation (1) are displayed in Fig. 1, wherein we have plotted each of the properties of the constituent residues of the protein molecule against the varying concentric shells. The corresponding preference indices and the x2 values are given in Table 2. The spread of the various properties of the residue composition in the different concentric shells around the centroid are discernible to interesting information on the folding features in globular proteins. (B)

CHEMICAL

PROPERTIES

The variation of the hydrophobic character of the globule in the successive shells as reflected by the Tanford-Jones indices (plot (a) in Fig. 1) substantiates the hypothesis that the interior of the protein is highly hydrophobic and the exterior highly hydrophilic in character. Even though the individual residues along the sequence differ in their own hydrophobic chatters, the protein interior acquires, in general, residues of higher hydrophobicities, exhibiting a decreasing tendency in the overall hydrophobic character of the shell as we go away from the centroid. The x2 values corresponding to the first and the sixth shells (Table 2) indicate that the hydrophobic characters of these shells are significantly above the confidence level of the statistical test. The hydrophobic character of the different shells as reflected in plot (b) and also in the p and x2 values in Table 2 corresponding to the “bulk hydrophobicity” (Hp) of the amino acid residues indicates a smooth variation in the protein globule exhibiting itself as a better representation of the exposed/buried behaviour of the various residues constituting the respective spatial shells. The integrated polar character of the concentric shells as revealed by plot (c) is, as expected, with an interior with residues of nonpolar nature and an exterior with residues of polar nature. The interesting part of this plot as

(Cl) ,

XlO*k

r103\

1

(bl

7.9

.\

a ‘-k-

l\

s‘

7.0

\

6

7.6

I

I

Xl02

I

I

3.5

3.3 b

G

‘\

.a

3.1 2.9

I (k)

/\

1



I

t

-

0

Shell

I

I

I

I

I

2

3

4

5

6

AMINO

ACID

RESIDUES

,’

493

PROTEINS

(ml

;-• 6.1

IN GLOBULAR

-

\

l -e

5*0-

Xl0’1 (PI .

6.8 6.2

6.4

?,$ ,.,s/‘\.A.

/

4

.A. 6.0 5.6

5.6

I 2

I 3

I 4

I 5

I 6

1

.’ l

/

1

Shell

FIG. 1. Plots showing the variation of different properties of amino acid residues in the concentric shells around the centroids of globular protein molecules. (a) amino acid hydrophobicity; (b) protein environment or bulk hydrophobicity ; (c) polarity; (d) pH at isoelectric point; (e) pK value of COOH group; (f) molecular weight; (g) bulkness; (h) chromatography index ; (i) refractivity index ; (i) short and medium-range non-bonded energy : (k) long-range non-bonded energy; (1) protein environment or total non-bonded energy; (m) power to be at the N-terminus of a-helix ; (n) power to be at the middle of a-helix : (0) power to be at the C-terminus of a-helix: (p) extended structure adopting power: (q) bend adopting power.

0.87 066 I .52 061 2.87 0.1 0.8 3.15 1.64 2.17 1.67 0.09 2.77 OGO 0.85 0.07 0.07 1.87 3.77 2.67

Ala Asp CYS

1

12.97 10.85 14.63 Il.89 1400 12.43 12.16 15.67 11.36 14.90 14.39 Il.42 11.37 11.76 11.72 11.23 11.69 1571 13.93 13.42

HP

OGO 49.70 1.48 49.90 0.35 000 51.60 0.15 49.50 0.13 1.43 3.38 1.58 3.53 5200 1.67 1.66 0.13 2.10 1.61

P 6.00 2.77 5.05 3.22 5.48 5.97 7.59 6.02 9.74 5.98 5.74 5.41 6.30 5.65 10.76 5.68 5.66 5.96 5.89 5.66

PH, 2.34 2.01 1.65 2.19 1.89 2.34 1.82 1.36 2.18 2.36 2.28 2.02 1.99 2.17 1.81 2-21 2.10 2.32 2.38 2.20

PK’ 89 133 121 147 165 75 155 131 146 131 149 132 115 146 174 105 119 117 204 181

M,

14.45 14.28 9.47 15.77 21.57 21.61 18.03

12.82 17.43

11.50 11.68 13.46 13.57 19.80 3.40 13.67 21.40 15.71 2140 16.25

4 9.9 2.8 2.8 3.2 18.8 5.6 8.2 17.1 3.5 17.6 14.7 5.4 14.8 9.0 4.6 6.9 9.5 14.3 17.0 15.0

R, 14.34 12@0 35.17 17.26 2940 oao 21.81 19.06 21.29 IS.78 2164 13-28 1@93 17.56 26.66 6.35 11.01 13.92 42.53 31.55

P 140 1.16 1.37 1.16 1.14 I.36 1.22 1.19 1.07 1.32 1.30 I.18 1.24 1.12 092 1.30 1.25 1.25 1.03 1.03

JL

properties

0.49 0.35 0.67 0.37 0.72 0.53 0.54 0.76 O-30 0.65 0.65 0.38 0.46 040 0.55 0.45 0.52 0.73 0.83 0.65

E

0.53 0.89 1.22 1.13 1.91 1.25 0.53 0.00 0.98 0.67 0.70 0.75 1.42 1.33 0.58

1.37 1.97 1.96 1.56 1.70 1.52 1.48 1.75 1.77 1.98 I.87 1.69

1.59 0.53 0.33 1.45 1.14

1.90 1.76 1.95

1.90 1.52 2.04 1.54 1.86

@72

1.25 0.63 140

0.59 0.58 0.73 @93 2.19 1.20 0.39 @81

0.62 o-56 0.68

144 2.13 0.76 2.01 1.01

of amino acids/residues

0.52

0.46 0.46 1.20

1.65 1.05 1.47 O-93 OGO I-63 1.59 0.87

040 2.23 @77

1.22 0.56 1.53 1.28 1.13

I.29

1.20 1.65 1.19

0.74 1.22 1.17 0.65 0.62 1.23 O-90 0.72

0.8 1 1.71 160

0.97 0.80 1.30 0.26 1.28

1.25

1Gl 0.30 I.11

1.01 1.53 1.67 I-68 1.54 @56 1.00 1.56

1.68 0.69 @58

0.57 1.26 1.17 044 @71

H,: thermodynamic transfer hydrophobicity ; H,: protein environment or bulk hydrophobicity; P: polarity; M,: molecular weight; B: bulkness; R,: chromatography index ; 11: refractivity index ; pK : equilibrium constant with reference to the ionization property of COOH group; pH, : isoelectric point ; E,: short and medium range non-bonded energy; E, : long-range non-bonded energy; E, : protein environment total nonbonded energy; a,: power to be at the N-terminus of a-helix : a,,,: power to be at the middle of a-helix ; a,: power to be at the C-terminus of a-helix; fi: beta sheet adopting power; B: bend adopting power. (H, and H, in kcal mol-’ ; pHi and pK’ in pH units: P in Debye; B in A’: E,. E, and E, in kcal mot-’ atom and rest are dimensionless quantities.)

Phe GUY His Ile LYS Leu Met Asn Pro Gin Arg Ser Thr Val Trp Tv

GIU

H,

Parameters Res

TABLE

Physical, chemical, energetic and conformational

2 2 s g

z

2 0 ? F

2

$

2

3 ; $

P

AMINO

ACID

RESIDUES

IN GLOBULAR

PROTEINS

195

1.06 1.02 1.01

0.90 0.95

VIV

Cx’

P

1 .os

II III IV

I

Shell

a,

11.59

5.56 1.65

2.13 0.3 0.07

1.88

x2

2.04 3.13

1.05 1.07 Il.01

1.56 0.50 0.50

0.95 100 1.00

x2

3.28

P

0.93

P

1GO 0.96

1.02 1m 1.02

0.99

a,

TABLE

1.79

0.01 @98

0.41 000 0.31

@08

x2

Property

2 (contd.)

P

095 0.91

1.05 1.01 0.99

1.08

P x2

12.27

1.18 4.80

1.34 0.06 0.48

4.41

1.03 1.12

0.93 0.98 1.01

P 0.9 1

El

13.91

0.71 5.83

2.95 0.18 006

x2 4.18

x

5 tl v

%

;

k

r”

;J” 3-

z

AMINO

ACID

RESIDUES

IN GLOBULAR

PROTEINS

497

well as the corresponding p and x2 values (Table 2), is that it predicts no uniform increase in the polar character of the shells as we go away from the centroid. The fourth and the fifth shells exhibit higher polarity than the outermost sixth shell, which gives clue to the fact that these shells accommodate mainly the charged residues while the outermost shell accommodates mainly the residues of polar-neutral nature. The surrounding environment around the centroid as reflected by the pHi indices of the shell constituting residues [plot (d)] indicates that while the inner and outer shells accommodate, respectively, residues of high acidic and normal acidic characters, the intermediate third shell acquires mainly high basic residues. The spread of pH, values in the adjacent zones as revealed by the plot gives clue to the electrostatic stability that could be possible between residues occupied in the alternative shells. The x2 values testify to the statistical significance of this finding. In contrast to this, all the six shells exhibit an approximately uniform gradient from the centroid as far as the distribution of the pK’ of the constituent residues is concerned [plot (e)]. No inference could be made from this parameter as we note that the different shells have p indices equal to unity with ~~rO.0. It should be pointed out that the pK’ values of the twenty amino acid residues do not vary much (Table 1). (C) PHYSICAL

PROPERTIES

The physical contents of the shells are nicely brought out in plot (f) where we see the variation of the constituting residues in terms of their molecular weights. It is interesting to nste that the outermost shell is occupied by residues of lower molecular weights, while the one, next to the outermost, contains residues of higher molecular weights. The second, third and fourth shells contain more or less uniformly the intermediate type residues. A closely related property, but not proportional, to the molecular weight of a residue is its bulkness. The variation of bulknesses of the residues constituting the various shells is shown in plot (g) of Fig. 1. Interestingly this plot reveals that the bulkier residues with shorter-length sidechains prefer the two innermost shells, the third, fourth and fifth shells preferring more or less all kinds of residues with an edge to average sized ones, while the outermost shell definitely preferring lengthy sidechain residues. It is to be noted that the characters of the first five shells as reflected by the molecular weights and the bulknesses of the constituent residues have opposite nature [plots (f) and

b3u

The distribution of the residues in terms of their RF values [plot (h)] shows a clear decreasing trend from the interior to the exterior of the protein with a

498

M.

PRABHAKARAN

AND

P. K. PONNUSWAMY

higher slope and significantly high x2 values for the entries (Table 2). There is a noticeable drop in the composite RF value when we go from shell III to shell IV which suggests that this parameter could be more effectively used as a complementary factor to hydrophobicity to divide the protein globule into internal and external characters. The protein matrix as revealed by the refractivity of the constituent residues [plot (i)] indicates that the residues of lower polarizability mostly occupy the outermost shell whereas the rest of the shells contain residues of more or less similar polarizable properties. The drop in the refractivity character of the protein mass from the fifth to the sixth shell is very significant. (D)

NON-BONDED

ENERGY

PROPERTY

A perusal of the average non-bonded interaction energy properties of the residues given in Table 1 indicates that the hydrophobic residues Cys, Val, Leu, Met, Ile, Ala and Trp and the smallest size residue gly have higher stability characters through long-range interactions and the hydrophilic members asn, glu, gln, asp and arg have comparatively lower stability characters through long-range interactions while the rest of the residues having intermediate stability characters. Thus, the polar and non-polar members exhibit recognizably different levels of non-bonded interaction properties and this is well brought out in plots (j), (k) and (1) of Fig. 1. The short and medium range interaction properties of the residues put together [plot (j)] reveal that residues having a high value of this, prefer the fourth shell predominantly. The fourth shell, on the other hand, is having the lowest affinity for residues of long-range interaction property [plot (k)]. The composite picture including all the residue interactions [plot (l)] is noted to predict the inner core, as expected, densely packed facilitating the van der Waals attraction between the non-polar residues [plots (a) and (b)]. The predominant occurrence of highly polar and charged residues in the fifth shell is the cause for the observance of the diminished level of non-bonded interaction energy. (E)

CONFORMATIONAL

PROPERTIES

A more interesting result emerges from the analysis of the protein globule in terms of the conformational behaviour of the constituent residues [plots (m), (n) and (o)]. It is fascinating to note that the residues which have got high propensity to be associated with the N-terminus of or-helical segments appear in the first four inner shells, the interior shells having higher

AMINO

ACID

RESIDUES

IN GLOBULAR

PROTEINS

499

probabilities. In contrast to this, the residues associated with the C-termini of a-helical sections are highly populated in the outer (V and VI) shells and those residues preferring middle portions of helical segments appear predominantly in the middle (II to V) shells. These interesting observations give a clue to the valuable suggestion that in globular proteins, the helical arms start from the interior and extends to the exterior. The information obtained from the plots (p) and (q) with regard to the preferred locations of b-strands and bends are again valuable. The residues associated with the /?strands are embedded in the inner parts of the protein matrix while the bends are at the surface. 5. Distribution

Pattern of Residues

Having analysed the distribution patterns of different properties of the amino acid residues in the protein matrix, we here present the results of the analysis made on the distribution patterns of the residues themselves. It is expected that the overall distribution behaviour of the residues result due to the integrated action of the various properties of the residues. The F, p and x2 parameters for the twenty amino acid residues obtained using the data on the same twenty-one proteins are given in Table 3. A general survey of this table immediately reveals that the overall distribution of the residues, in general, follow the hydrophobic interior and hydrophilic exterior in a protein molecule. However, a deep study of Table 3 brings out many subtle features of the individual residues with regard to their spatial positions in globular protein molecules. The useful information extractable from the content of Table 3 is displayed in Fig. 2. In this figure the various residues are placed in the six concentric shells as they prefer them with statistical significance. The innermost first shell is noted to prefer all the purely non-polar residues, residues with sulphur in their sidechains, plus histidine. The second shell contains all the residues preferred by the first shell along with two additions Trp and Arg. The most interesting feature of these two inner shells is that they are accommodating histidine. A special feature of the second shell is that Trp highly prefers to be in this shell compared to shell V, the only other alternative site available for it. Shell III exhibits its specific character by preferring the aromatic residues Phe and Tyr. Pro just starts appearing in this shell. Shell IV accommodates both polar and non-polar residues equally but with some restrictions: Lys, Arg and Tyr are bulkier; Met and Cys contain sulphur. Shells V and VI contain almost all the polar residues and Pro with the difference lying in the fact that the outermost VI shell distinguishably prefers the residues, Ser, Gly, Ala and Thr.

I II III IV ::

shell ___

I II III IV V VI

Shell

-

52 47 54 53 50 75

F

55 50 54 58 45 75

F

Distribution

TABLE

3

094 @83 @96 094 @88 1.33

P

GlY

099 090 097 104 Ml 1.30

P

Ala

x2

O-22 1.59 o-11 2.16 074 605

x2

o-13 @61 04)s 008 2-10 5-28

F

25 20 14 13 8 9

F

26 28 37 34 43 43

I.69 1.35 095 088 cm 061

P

His

074 080 1.05 097 1.22 l-22

P

ASP x2

703 1.82 ou4 020 3-12 2-27

x2

2.38 l-45 O-09 oG3 1.74 1.75

F

F 49 44 31 24 29 16

P

CYS

1.51 1.32 097 075 090 050

P

Be

149 102 @74 1.14 1.20 MO

Residues

22 15 11 17 18 6

Residues

x2

7.75 1.89 ou4 O20 3.12 2-27

x2

3-m ooo 097 032. 069 5-20

parameters for amino acid residues in successive concentric

F

11 33 41 57 60 48

F

28 27 27 28 39 28

026 070 099 1.37 1.45 l-15

P

LYS

o-76 095 095 099 1.34 099

P

Gill

2240 1.77 ouo 5-69 807 098

x2

l-48 0.78 008 001 3-86 Ml

2

shells around the centroids

F

53 46 45 42 38 22

F

28 22 29 11 12 8

1.30 1.12 lcJ9 l-03 093 054

P

LeU

l-53 1.20 1,58 MO 066 044

P

Phe

3-50 Ml a-39 002 o-22 880

2

5-14 a75 6-25 291 2.16 5-79

x2

of protein molecules

I II II1 IV V VI

Shell

II III IV V VI

I

Shell

F

40 46 52 46 47 89

F

13 11 8 12 6 3

@75 O-87 0.97 O-87 088 1.67

P

Ser

1.47 1.25 091 1.35 068 034

P

Met

x2

3.30 loo 003 IQ0 O-74 2390

x2

1.96 053 007 1.13 Q90 3.84

F

35 33 37 38 39 46

F

26 30 25 32 40 42

x2

X2 2.60 O-70 0.03 O-00 0.01 1.59

P

1.30 0.71 1.73 O.00 1.73 277

O-92 O-87 O-97 1.00 1.02 1.20

Thr

080 093 077 099 1.23 1.30

P

Asn F

66 64 47 40 27 18

F

14 16 23 24 26 24

P

Pro

P

Val

I.51 1.47 1Q8 @91 O-62 O-41

Residues

066 075 108 1.13 1.23 1.14

Residues

x2

11.38 9-42 0.25 O-30 6.38 15-10

X2

244 1.27 @15 @36 108 O-36

F

11 16 7 8 13 6

F

12 19 20 21 32 28

108 1.57 049 0.79 1.28 079

P

Trp

@55 O-87 090 Q% 1-47 1.27

P

Ghl x2

0.69 3.35 098 046 @79 1.70

X2

454 0-W 018 005 454 1.63

F

20 15 29 30 26 20

F

11 21 16 22 15 13

f385 0.65 1.25 1.29 l-12 085

P

TY~

o-68 1.29 @98 1.35 091 @81

P

Arg x2

o-46 295 l-40 1.92 O-31 046

X’

1.75 2.29 OQO 196 O-10 068

502

M.

PRABHAKARAN

AND

P. K. PONNUSWAMY

FIG. 2. Preferred residues in successive concentric shells around the centroids of globular protein molecules. The most preferred residue in each shell is shown in thick print and the rest of the residues are placed clockwisely from this residue as per the order of preference. Alternative shells are shaded for clarity.

Referring to Fig. 2, some specific reasons could be attributed to the preference of a few residues in specific shells. Our first observation is on histidine, which is, although classifiable under the polar group of residues as per its polarity index, it behaves as if a non-polar residue, by predominantly occurring in the innermost shell. This could probably be due to the high hydrophobic character of the non-polar part of the sidechain, and the covalently linking property of the polar part of the sidechain to phorphyrins. The exactly opposite behaviour is noted with Proline: although the side group of Pro is made of non-polar atoms, its ring feature imposes high stereochemical limitations so as to be only associated with the secondary structural elements that could occur only on the exterior of the protein globule, thus behaving itself as though a polar member. The predominant preference of charged residues in the V shell is another interesting observation. The residues that occur in this shell, although should prefer the outermost shell as per their polarity indices, they are all associated with higher molecular weight, bulkness and PHI parameters which are not the factors favouring the outermost shell. This result shows that the decisive factor in determining the spatial position of a residue in a protein molecule is of composite nature involving all the parameters associated with the residue.

AMINO

ACID

RESIDUES

IN

GLOBULAR

PROTEINS

503

The preference of Lys in the three outer-most shells is again a consequence of compromise between its polarity, molecular weight, bulkness and conformational parameters a and j?. The behaviour of the two aromatic residues Tyr and Trp are also interesting. These two residues are highly populated in the intermediate shells. Tyrosine is characterizable with high molecular weight, moderate polarity and high R, index, all of which make this residue to prefer intermediate shells. The hydrophobicity and polarity characters of Trp are such that they allow it to occupy shells on either side of Tyr. The distribution of Cys also draws our attention ; it is found to be preferred in shells I, IV and V. It is likely that S-S bridges form within and between these shells. In conclusion, we would like to emphasize the following three points: (1) If the sample size is large enough, even very small deviations from the expected frequency may appear as significant. Table 3 includes the individual entries of x2 values for the distribution of the residues in various shells. The total sum of the x2 values noted in this table is 280 and there are 95 [ = (20 - 1) x (6 - l)] degrees of freedom. Since the 95% confidence level for 95 degrees of freedom is about 118, it could be taken that the distribution is far from random. The individual entries in Table 3 having values > 1 are thus statistically significant at the 95% confidence level. (2) In the present study we have treated the globular protein molecules in terms of concentric spherical shells thus neglecting the anisotropic shape of the molecules. Due to this approximation the surface residues could sometimes be wrongly assigned to an inner spherical shell. This error, however, is minimized to the possible extent by treating the protein molecule as a sphere whose centroid is calculated with reference to the minimum volume containing all the residues of the protein rather than considering it as a sphere whose centroid is determined with reference to the centroids of the residues themselves. By this way the unoccupied volume is maximally reduced. A better way to treat the general asymmetric shape of a globular protein would be the ellipsoidal representation of the molecule. Calculations were carried out on the sample proteins by representing the molecules as an ellipsoid of minimum volume including all the residues and the results were analysed in terms of respective ellipsoidal zones. It was found that the general statistical inferences obtained from such a treatment remained more or less the same as those obtained in the method of spherical sphere. In the absence of any other better method to treat the characteristic anisotropic shapes of the protein molecules, we presented the results on the spherical shell method, however, keeping in mind this obvious limitation. (3) The various properties considered in the present study could be

504

M. PRABHAKARAN

AND

P. K. PONNUSWAMY

classified into those pertaining to the free-state of the amino acids and those pertaining to the residue-state in globular proteins. The bulk hydrophobic, the energetic and the conformational properties were derived from the crystalline state of proteins and thus pertain to the residue-state of the amino acids while the rest of the properties pertain to the free state of amino acids. The free state properties were derived under different conditions and they will be altered when the amino acids combine to form the peptide chain and also due to the influence of neighbouring sidechains in the globular protein. For example, on peptide formation the molecular weight of a residue decreases and the pHi increases. It is to be remembered that the parametric values of many of these. free-state properties acquire their characteristic values mainly due to the nature of the sidechain in each of the amino acids. Moreover, it is extremely difficult to interpret the total behaviour of the residue by taking one kind of property and leaving the rest. Our main aim of the present study is to correlate the various kinds of properties of the sidechain of a residue with its distribution behaviour in the protein matrix. This work was supported by a grant by the Department Technology, Government of India to PKP.

of Science and

REFERENCES P. Y. & FASMAN, G. D. (1974). Biochemistry, N. Y. 13, 211. JONES, D. D. (1975). J. rheor. Biol. 60, 167. LEVITT, M. & GREER, J. (1977). J. mol. Biol. 114, 181. MANAVALAN, P. & PONNUSWAMY, P. K. (1977). Arch. Biochem. Biophys. 184, 476. MANAVALAN, P. & PONNUSWAMY, P. K. (1978). Nature 275, 673. N~THY, G. & SCHERAGA, H. A. (1977). J, phys. Chem. 81, 928. NI~HIKAWA, K. Jc 001, T. (1974). J. theor. Biof. 43, 351. O~BATAKE, H. & 001, T. (1977). J. theor. Biol. 67, 567. PONNUSWAMY, P. K. (1978). Proc. of International Symp. on Biomolecular Structure, Conformation and Evolution, Madras (in press). PONNIJSWAMY, P. K., WARME, P. K. & SCHERAGA, H. A. (1973). Proc. natn. Acad. Sci., U.S.A. 70, 830. PULLMAN, B. & PULLMAN, A. (1974). Adv. Protein Chem. 28, 347. RAO, S. T. & ROSSMAN, M. G. (1973). J. mol. Biol. 107, 85. ROSSMAN, M. G. & LKJAS, A. (1974). J. mol. Eiol. 85, 177. S~RNBERG, M. J. E. & THORNMN, J. M. (1978). Nature 271, 15. TANFORD, C. (1962). J. Amer. Chem. Sot. 04,424O. WARME, P. K. & MORGAN, R. S. (1978). J. mol. Biol. 180, 273, 289. WERTZ, D. H. & SCHERAGA, H. A. (1977). Macromolecules 11, 9. WOESE, C. R. (1965). Proc. natn. Acud. Sri., U.S.A. 54, 71. CHOW,

The spatial distribution of physical, chemical, energetic and conformational properties of amino acid residues in globular proteins.

1. theor. Biol. (1979) 80, 485-504 The Spatial Distribution of Phyakal, Chemknl, Eneqptk and Confonnational Properties of Amino Acid R&duea in Glob&r...
1MB Sizes 0 Downloads 0 Views