J. theor. Biol. (1975) So, 161-166

The Terminal Amino Acids of Protein Sequences and Protein Maturation JOHN A. BLACK, PETER STJZNZEL AND RICHARD N. HARKINS

Department of Biochemistry and Division of Medical Genetics, University of Oregon Medical School, Portland, Oregon 97201, U.S.A. (Received 3 April 1974, and in revisedform 14 May 1974)

A comparison is made of the N- and C-terminal amino acids from 96 published protein sequences, 26 from prokaryotes, 70 from eukaryotes. The observed frequenciesof the N-terminal amino acids methionine, alanine and serine in prokaryotes, and alanine and serine in eukaryotes are significantly higher than expected for a random arrangement of ammo acids. At the C-terminal end, the observed frequencies of lysine, asparagine and glutamine in prokaryotes and phenylalanine, asparagine and glutamine in eukaryotes exceed random expectation. These results could be explained by specific proteolytic cleavage during protein synthesis. There is evidence that proteolytic action at either end of a polypeptide sequence may be a necessary aspect of protein synthesis, and a study of the terminal amino acids may therefore provide evidence of the process and indicate the specificity of the enzymes involved. In prokaryotes, protein synthesis is initiated by N-formyl methionine (Clark 8z Marcker, 1966; Adams & Cape&i, 1966; Webster, Engelhardt 8z Zinder, 1966; Capecchi, 1966; Bachmayer & Kreil, 1968), the nascent peptide chain is enzymatically deformylated (Adams, 1968; Takeda & Webster, 1968; Livingstone & Leder, 1969) and the N-terminal amino acid(s) removed by an aminopeptidase. Enzymes which may function in the final proteolytic step have been described associated with the ribosomes (Matheson & Dick, 1970) and in the soluble fraction (Brown & Krall, 1971) of Escherichia coli. In eukaryotic protein synthesis, initiation is with methionine (Smith & Marcker, 1970; Brown & Smith, 1970; Housman, Jacobs-Lorena, Rajbhandary & Lodish, 1970; Jackson & Hunter, 1970; Wilson & Dintzis, 1970) and no deformylation is required before an aminopeptidase can act. There is evidence (Taber, Wertheimer & Golrick, 1973) that some polycistronic messenger RNAs may be translated into a single polypeptide chain which is split by proteolytic enzymes to give separate functional proteins. Proteolytic cleavage is known to be required in the production of some viral T.8. 161 11

162

J.

A.

BLACK,

P.

STENZEL

AND

R.

N.

HARKINS

proteins (Baltimore, 1971; Katz & Moss, 1970; Pfefferkorn & Boyle, 1972). TPCK (1-tosylamido-2-phenyl ethyl chloromethyl ketone), a specific inhibitor of chymotrypsin and related proteolytic enzymes with a specificity for phenylalanine side chains, prevents maturation of viral proteins in infected eukaryotic cells (Pfefferkorn & Boyle, 1972). TPCK also inhibits maturation of 5% of the polypeptide molecules in uninfected HeLa cells (Taber et al., 1973). In both cases abnormally long polypeptides could be detected in the inhibited cells. The possibility of non-random arrangement of amino acids within protein sequences was first considered by Morowitz and co-workers. They found an increased frequency of methionine (Morowitz & Spaulding, 1958) and a decreased frequency of leucine and isoleucine (Morowitz & Barra, 1959) compared to random expectation at the N-termini of soluble E. coli proteins. Subsequently a number of investigators have examined the N-terminal amino acids of soluble proteins in prokaryotes (Horikoshi & Doi, 1968; Waller, 1963; Samuel, Murray & Rabinowitz, 1973; Brown, 1970; Sarimo & Pine, 1969) and eukaryotes (Brown, 1970; Sarimo & Pine, 1969; Rho & De Busk, 1971) and report an apparent non-random distribution of N-terminal amino acids with methionine, alanine and serine most common in prokaryotes; alanine and serine in eukaryotes. The validity of these studies, however, depends on the relative numbers of the various proteins present in the sample. In addition, the data were obtained by N-terminal methods which are difficult to quantitate reliably and do not detect blocked N-terminal amino acids. From published data on protein sequences (Dayhoff, 1972, 1973), we have compiled the observed number of N-terminal and C-terminal amino acids, and compared these numbers with the values expected on the basis of the amino acid compositions of the individual proteins (Table I), assuming that any amino acid in a given protein has a random chance of being at the Nor C-terminal position. Where two or more proteins are clearly homologous and therefore related in evolution, one was selected for analysis on a random basis to remove bias from the sample as far as possible. In the same way, where a protein sequence has been determined in a number of species, e.g. cytochrome c, one sequence was selected, normally the first to be determined. This elimination resulted in a data pool of 96 sequences containing a total of 13,725 amino acids, 26 sequences from prokaryotic organisms, 70 from eukaryotes. In prokaryotes, three amino acids, methionine, alanine and serine represent 85% of observed N-terminal amino acids whereas the random expectation for these amino acids is 20 %. If the sum of the observed and expected values for these three amino acids are compared and also the sum of the values for

and 70 eukaryotic protein sequences

Eukaryote 15 4 2 6 3 2 5 1 5 1 4 11 1 2 2 1 4 1 4.9 4.1 4.7 3.3 1.0 2.5 1.2 2.9 2.8 3.7 4.6 3.6 5.2 5.2 4.3 3.0 3.7 2.7 4.1 1.7 54421623935134660231

N-terminal Expected C-terminal

The expectedfrequencies werecalculatedfrom the aminoacid compositionof eachindividualsequence on the basisthat any amino acidhasa randomchanceof beingat the N- or C-terminus.The expectedfrequencies werecalculatedto two sign&ant decimalplaces. The valuesshownin the tablewererounded-offto onedecimalplacefor clarity of presentation.

Prokaryote 12 1 5 2 5 I 3.1 2.1 1.6 1.3 0.3 0.8 0.5 O-6 0.8 1.7 O-7 1.0 2.0 1.6 1.6 1.3 1.8 O-8 l-9 0.4 4 1 1 2 2 1 2 2 3 1 7 0

N-terminal Expected Gterminal

Ala Val Leu Ile Trp Phe Met Cys Gin Glu Arg Pro Gly Ser Thr Asn Asp Tyr Lys His

The observed and expected N- and C-terminal amino acids in 26 prokaryotic

TABLE 1

164

J. A. BLACK,

P. STENZEL

AND

R. N. HARKINS

all other amino acids as shown in Table 2 and the &i-squared test applied (Davies, 1958), the results are significantly different from random expectation at the 0.1% level of significance. In eukaryotes, alanine and serine account for 37% of the observed N-terminal amino acids and chi-squared analysis indicates non-random distribution at the 0.1 y/, level of significance. The common concept is that synthesis of a functional protein on the polysome is completed by a terminating codon in which case there is no reason to expect that the distribution of the next-to-last codon should be non-random. From the results in Table 1, lysine, asparagine and glutamine represent 46% of the C-terminal amino acids observed in prokaryotes and phenylalanine, asparagine and glutamine 30 % of the C-terminal amino acids of eukaryotes. When the data are grouped as shown in Table 2 and the chi-squared test applied, both results differ from random expectation at the 0.1% level of significance. TABLE 2

The observedand expectedfrequenciesfor the terminal amino acids which apparently deviate from random behavior Eukaryote Observed Expected

Prokaryote Observed Expected N-terminal Ala Met Ser All others

22 4

5.2 20.7 Z x2 67.8

Ala Ser All others

26 44

10.1 59.8 r, Jp 29.2

C-terminal Lys Asn Gln All others

12 14

4 21.9 Z x2 18.9

Phe Asn Gln All others

21 49

8.3 61.6 z x2 22.0

The individual values were those shown in Table 1. Chi-squared was calculated for each row of the 2 x 2 tables and the values added to give the sum of chi-squared which is shown. For a 2 x 2 table, there is one degree of freedom and at the 0.1% level of significance the expected value for the sum of chi-squared is 10.8. All comparisons exceed this value and therefore represent non-random observations at the 0.1 y! confidence level.

The results in Table 1 are in excellent agreement with the values determined by N-terminal analyses of unfractionated soluble proteins from prokaryotes (Horikoshi & Doi, 1968; Waller, 1963; Samuel et al., 1973; Brown, 1970; Sarimo & Pine, 1969) and eukaryotes (Brown, 1970; Sarimo & Pine, 1969; Rho & De Busk, 1971). We are not aware of any comparable studies of C-terminal amino acids and it is perhaps doubtful if reliable

TERMINAL

AMINO

ACIDS

OF

PROTEINS

165

values could be obtained by the available C-terminal methods. The high incidence of phenylalanine at the C-terminal end of eukaryotic proteins is of particular interest in view of the reported effects of TPCK on eukaryotic cells (Pfefferkom 8z Boyle, 1972; Taber et al., 1973). It is concluded that available protein sequences show evidence of proteolytic maturation events at the N- and C-terminal ends. The evidence is consistent with the events which have been proposed to take place at the N-terminus. Analysis of the C-terminal data support circumstantial experimental evidence of proteolytic involvement at this end of the polypeptide. The predominance of lysine, asparagine and glutamine at the C-termini of prokaryotes and phenylalanine, asparagine and glutamine at the C-termini of eukaryotes should direct attention to intracellular proteases which have a specificity for these amino acids and are involved in the post-synthetic maturation of a number of functional proteins.

REFERENCES ADAMS, J. M. (1968). .I. molec. Biol. 33, 571. ADAMS, J. M. & CAPECCHI, M. R. (1966). Proc. mm. Acad. Sci. U.S.A. 55, 147. BACHMAYER, H. & KREIL, G. (1968). Biochim. biophys. Acta. 169 95. BALTIMORE, D. (1971). Perspect. Virof. 7, 1. BROWN, J. L. (1970). Biochim. biophys. Act4 221,480. BROWN, J. L. & KRALL, J. F. (1971). B&hem. biophys. Res. Commun. 42, 390. BROWN, J. C. & SMITH, A. E. (1970). Nature, Land. 226,610. CAPECCHI. M. R. (1966). Proc. natn. Acad. Sci. U.S.A. 55. 1514. CLARK, B: F. C. & MA&ER, K. A. (1966). J. mlec. Bioi. 17,394. DAVIES, 0. L. (1958). Statistical Methods in Research and Production. New York: Hefner Publishing Co. DAYHOPP, M. 0. (1972). At&s ofprotein Sequence andStructure. vol. 5. Washington, DC.: National Biomedical Research Foundation. DAYHOFF, M. 0. (1973). Atlas of Protein Sequence and Structure, vol. 5, suppl. 1. Washington, D.C. : National Biomedical Research Foundation. HORIK~~HI, K. & DOI, R. H. (1968). J. biol. Chem. 243, 2381. HOUSMAN, D., JACOBS-LORENA, M., RAJEI~NDARY, U. L. & LODI~I-I, H. F. (1970). Nuture,

Lmd. 227,913. JACKSON, R. & HIJNTER,

T. (1970). Nature, Land. 227,672. KATZ, D. & Moss, B. (1970). Proc. nattt.Acad. Sci. U.S.A. 66,677. LMNGS~NE, D. M. & L~DER, P. (1969). Biochemistry, N. Y. 8,435. MATHESON, A. T. & DICK, A. J. (1970). FEBS Lett. 7, 235. MOROWITZ, H. J. & BARRA, R. V. (1959). Biochim. btophys. Acta 33, 505. MOROWITZ, H. J. & SPALDING, M. (1958). Biochim. biophys. Act4 29, 514. PFEFFEZRKORN. E. & BOYLE, M. (1972). J. Viral. 9, 187. RHO, H. M. & DE BUSK, .&. G.‘(197i). J. Bact. SAMLJEL. C. E.. MURRAY. C. L. & RABINOWITZ.

iO7, 840. J. C. (1973).

.I. Bact. 116. 41. SARIMO,‘S. S. i PINE, hi J. (1969). J. Bact. 98; 368. . . S~+WK-I,A. E. & MARCKER, K. A. (1970). Nature, Land. 226,607. TABER, R., WERTHEIMW, R. & GOLRICK, J. (1973). J. melee. Biol. 80, 367. TAKEDA, M. & WEBSTER, R. E. (1968). Proc. natn. Acud. Sci. U.S.A. 69, 1487.

166

J.

A.

BLACK,

P.

STENZEL

AND

R.

WALLER, J. P. (1963). J. molec. Biol. 7, 483. WEBSTER, R. E., ENGELHARDT, D. L. & ZINDER, N. D. U.S.A. 55, 155. WILSON, D. B. & DINTZIS, H. M. (1970). Proc. nutn. Ad.

N. (1966).

HARKINS hoc.

mtn.

Acad.

Sci. U.S.A. 66, 1282.

Sci

The terminal amino acids of protein sequences and protein maturation.

J. theor. Biol. (1975) So, 161-166 The Terminal Amino Acids of Protein Sequences and Protein Maturation JOHN A. BLACK, PETER STJZNZEL AND RICHARD N...
299KB Sizes 0 Downloads 0 Views