Proc. Natl. Acad. Sci. USA Vol. 75, No. 4, pp. .1731-1735, April 1978
Biochemistry
Amino acid sequence of the j3 chain of human fibrinogen: Homology with the "y chain (protein structure/evolution)
KENNETH W. K. WATT, TAKASHI TAKAGI*, AND RUSSELL F. DOOLITTLE Department of Chemistry, University of California at San Diego, La Jolla, California 92093
Communicated by Nathan 0. Kaplan, January 30,1978
The ft chain of human fibrinogen is composed ABSTRACT of 452 ± 5 amino acid residues, 14 of which are methionines. Consistent with these findings we have isolated and characterized 15 fragments after cyanogen bromide digestion of carboxymethylated ft chains. The arrangement of several of these peptides was deduced on the basis of overlapping peptides isolated from the fragments D and E produced by the plasmic digestion of fibrinogen and/or from a tryptic digest of citraconylated ft chains. Most of the other cyanogen bromide fragments can be aligned by homology with the a and/or y chains from human fibrinogen, although the positioning of a few of the smallest peptides is still ambiguous. The homology of the , chain with the y chain is especially strong in certain regions of the domain that includes fragment D.
Vertebrate fibrinogen molecules are composed of three pairs of nonidentical chains (a2,272). For human fibrinogen, the a, fl, and y chains have molecular weights of 65,000,55,000, and 47,000, respectively (1). The corresponding number of amino acids in each is approximately 625, 450, and 410. The ,B and y chains also contain carbohydrate clusters which contribute to their overall molecular weights. During the past 5-10 years major amino acid sequencing efforts on human fibrinogen and its fragments have been under way in three different laboratories: Blomback's in Stockholm, Henschen's in Munich, and ours in La Jolla. All three groups are studying the structures of all three chains, and a large number of publications dealing with various portions of the sequence have already appeared. Many of these reports have been mutually confirmatory, if somewhat duplicative and overlapping. Only a few minor discrepancies have been claimed so far. In this regard, a virtually complete sequence for the y chain has recently been reported by the Henschen group (2). As a result, we have turned our attention fully to the completion of the a and , chains. In this article we now report most of the amino acid sequence of the ,B chain, portions of which have been reported on by all three groups in the past (3-10). To this end, we have isolated 15 cyanogen bromide fragments, the constituent amino acids of which total 452 + 5 residues. Moreover, we have characterized the ,B-chain portions of fragments D and E, and as a result have obtained some key overlap peptides. We have also obtained some overlaps from tryptic digestion of citraconylated ,B chains. Many of the other cyanogen bromide peptides can be aligned by homology with the and/or y chains. Only the positioning of a few small peptides remains ambiguous. The homology with the y chain is particularly striking in those parts of the chains found in fragments D, and several inferences can be made about the a
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U. S. C. §1734 solely to indicate
conservation of those portions of the chains whose structure and functions are tightly linked. MATERIALS AND METHODS Most of the materials and procedures used in this study have been fully described in previous publications from this laboratory (11, 12). In essence, fibrinogen was prepared from blood bank plasma, reduced with dithiothreitol in 6 M guanidine and carboxymethylated with iodo['4C]acetic acid; the chains were separated by chromatography on carboxymethylcellulose. Amino acid analyses were conducted after 24, 48, and 72 hr of acid hydrolysis; tryptophan was determined colorimetrically (13). Fragments D and E were isolated by DEAE-cellulose chromatography of an exhaustive plasmic digest of human fibrinogen (14). Cyanogen bromide digestions were carried out by dissolving the protein or peptide (5-7 mg/ml) in 70% formic acid followed by the addition of cyanogen bromide at a concentration of 10 mg/ml. Digestions were for 16-24 hr at room temperature, after which the mixtures were diluted with cold distilled water and lyophilized. Peptides were isolated by gel filtration on Sephadex G-50 followed by paper electrophoresis at pH 6.5 and/or pH 2.0. Amino acid analyses were performed on either a Spinco model 121 or a Spinco 119 Amino Acid Analyzer. Sequences were determined by a variety of methods, but especially by a thioaoetylation procedure (15) either manually or automatically on a solid phase sequencer (16). The dansyl chloride/phenylisothiocyanate procedure (17) was also used in many instances
and particularly for the confirmation and/or identification of serine and threonine residues. Enzymes used to degrade peptides included trypsin, chymotrypsin, thermolysin, and the
staphylococcal protease V-8 (18). RESULTS Amino Acid Composition. The amino acid composition of 3 chains isolated from human fibrinogen is tabulated in Table 1. Our results are compared with previous reports, as well as with the theoretical composition obtained by summing the residues found by determining the sequence of the 15 cyanogen bromide-fragments. In general, the agreement is quite good. Cyanogen Bromide Peptides. The cyanogen bromide digest was fractionated by gel filtration on Sephadex G-50 (Fig. 1); seven pools were routinely prepared.- Each run was made with approximately 60 mg of purified carboxymethylated chain (about 1 ,umol). Altogether more than 500 mg of ,B chains were treated in this fashion. Pools III-VII were purified further by paper electrophoresis at pH 6.5 and/or pH 2.0. Pools I-III were purified further by rechromatography on Sephadex G-50. Al* Present address: Biological Institute, Tohoku University, Sendai, Japan.
this fact.
1731
1732
Biochemistry:
Watt et al.
Proc. Natl. Acad. Sci. USA 75 (1978)
Table 1. Amino acid composition of , chain of human fibrinogen (residues/450 residues total) Ref. 1*
Aspartic acidt Threonine Serine Glutamic acidt Proline Glycine Alanine Cysteine Valine Methionine Isoleucine Leucine Tyrosine Phenylalanine Histidine
Lysine
Ref. 19* 58.0 22.2 30.4 55.6 20.4 41.5 25.2 13.6 21.8 13.6 17.4 26.4 18.6 13.0 9.8 32.6 17.9 NM§ 438.0 + Trp
60.9 22.7 27.2 51.9 20.1 41.0 26.3 14.0 21.2 14.0 19.4 27.0 19.5 14.5 9.6 32.2 18.7 9.7 449.9
Ref. 20* 59.4 24.4 35.1 50.8 23.8 40.8 22.5 9.3 23.1 14.0 15.9 27.3 19.1 9.4 7.5 32.8 24.0 10.6 449.8
Arginine Tryptophan Total residues * Recalculated from the original data on the basis of 450 residues. t Aspartic acid and glutamic acid values include asparagine and glutamine, respectively. t Residue 363 is heterogeneous; roughly equal amounts of arginine and histidine are present. § NM, not measured.
together, 15 fragments were isolated by these procedures, a summary of which appears in Table 2. Complete amino acid sequences were established for 13 of the 15 cyanogen bromide peptides (Fig. 2). In one other case,
This
CNBr
report
sequences
59.0 20.6 30.9 56.3 21.3 42.7 25.0 9.8 24.0 13.8 14.4 26.6 18.9 9.7 7.2 30.7 26.8 12.2 449.9
58 21 30 56 23 40 23 11 23 14 16 28 21 10
61/2? 32
271/2t 12 452
fragment CNI turned out to be the 118-residue amino-terminal fragment as first reported by Blomback et al. (3, 4). We confirmed the structure of most of this fragment on the basis of the composition of peptides produced by trypsin and the V-8 .4 I
I
II
H-I-I--- 11-4-lit~~~~~~~~~~~~I
I'
II
8000
I'',
! ' 0 eo
II I
" l 1~~~!
,I
B 0. I
I'
I,
120 Fraction FIG. 1. Gel filtration (Sephadex G-50) of cyanogen bromide digest of carboxymethylated # chains from human fibrinogen. Column was 2.5 X 170 cm; solvent was 10% acetic acid; flow rate was 60 ml/hr; fraction size was 5 ml. Solid line, absorbance at 280 nm; broken line, radioactivity due to carboxymethylation.
Biochemistry:
Proc. Nati. Acad. Sci. USA 75 (1978)
Watt et al.
1
50
40
30
20
10
1733
QGVNDNEEGFFSARGHRPLDKKREEAPSLRPAPPP ISGGGYRARPAKAAA
51
I T T QKK VE R KA P DAG GC L HADP DL GV LC PT GC QL Q EAL LQ QE R PI RN SV DE L
101
N N N V E A V(S,QTSSSG)Q F Y M Y L L K D L W Q K R Q K Q V K D N E N V V N E Y S S E L E K H Q
151
L Y I D E TV N SN I P(T)L R I L R S I LEN L R S K I Q K L ES DV S A Q M/E Y CR T PC TV SC |
CNI I
201
D I P V V S G K E C E E I I R K G G E T S E M Y L I Q P D S S V K P Y R V Y C D M/ T E N G G W T V CNVIB2
CNIVA
251
1 Q N R Q D G S V D F G R K W D P Y K Q G F G N V A TNTDGKBYCGLPGZYWLGBKI SQL CCNII-
301
T R M/G P T E L L I Q M/E D W K G N K V K A H Y G G F(T)V Q N Q A N K Y Q I S V N K Y R G T A G N A .
351
VIA
I
~
LMNGASQLMGENRTM/SRKM/RKM/FFSTYDRDNDG(W)L(T,S)DPRKQCSKEDGGY ---CNVIB1-|--N
401
CNVB--±NCNVIECNVI-I
B(W)R(C)YDAADPD(G,H)RYYBG(T,Z,W)M/AKHGTNNGVW(W)MNWKGSWYSMKI -|-
451
CNA
Ch1VIC
RPFFP
.|CNVIt
CNVID
QQ
FIG. 2. Proposed amino acid sequence of the , chain from human fibrinogen based on 15 cyanogen bromide fragments. Slashes indicate places in arrangement where overlap peptides are not yet available. Some of the arrangement has been achieved by comparison with the homologous ,y chain (Fig. 3). The single letter code used is: A, Ala; B, Asx; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; Y, Tyr; Z, Glx (21). *, Carbohydrate attachment site. protease. In one instance, however,
our results
differ from those
in early reports in that we find a different composition and tentative sequence for residues 108-114. As for the remaining cyanogen bromide fragment, CNVB, it is a sparingly soluble
peptide (n = 50 + 5 residues) which is very difficult to isolate in quantity. Moreover, sequencer runs (9) have tended to inexplicably fall off after 15-20 steps, although we were able to identify radioactive carboxymethyl-cysteine at residue 21.
QGVNDNEEGFFSARGHRPLDKKREEAPSLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADPDLGVLCPTGCQLQEALLQQERPIRNSVDEL y
YVATRDNCCILDERFGSYCPTTCGIADFLSTYQTKVDKDLQSL
B
NNNVEAVsqtsssgQFYMYLLKDLWQKRQKQVKDNENVVNEYSSELEKHQLYIDETVNSNIPtLRILRSILENLRSKIQKLESDVSAQMEYCRTPCTVSC I I I I I I 11 I I 11 EDI LHQVE*NKTSEVKQLIKAIQLTYNPDESSKPNMI DMATLKSRKMLEEIMKYEAS ILTHDSS IRYLQE IQNSNNQKI VNL KE KVAQLEAQCQEPC KDTV
y
D1PVVSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSVDFGRKWDPYKQGFGNVATNTDGKBYCGLPGZYWLGBKrSQL
1 11 I I 11
I
I
II
III
111111
I
11111
I
y
QIHDITGKDCQDIANKGAKQSGLYFIKPLKANQQFLVYCEIDGSGNGWTVFKKRLDGSVDFKKNWIQYKEGFGHLSPTGTTEFWLGNEKIHLISTQSA
B
sTRMGPTELLIQMEDWKGNKVKAHYGGFtVQNJQANKYQISVNKYRGTAGtIALMNGASQLMGENRTMSRKMRKMFtSTYDRDNDGWLTSDPRKQ-CSKEDGGY I I I I I I III I I I I I III
y
I
I
III
IPYALRVELEDWNGRTSTADYAMFKVGPEADKYRLTYAYFAGGDAGDAFDGFDFGDDPSDKFFTSHNGMQFSTWDNDNDKF BwRcYDAADPDghRYYBGtzwMAKHGTNN VWWM I1I
II
I
EGNCAEQDGSW
NWKGSWYSM KIRPFFPQQ I I11 1 1
y WMNKCHAGH INGVYYQGG tysKASTPNGYDNGIiwatKTRWYSMKKTTMKIIPFNRLTIGEGQQHHLGGAKQAGBV FIG. 3. Comparison of amino acid sequences of ,( andy chains from human fibrinogen. Vertical lines indicate identical residues. Lower-case letters indicate regions in which amino acid composition is known but sequence is not yet established. Cysteine residues underscored (-y chains) and overscored (,B chains). See legend to Fig. 2 for single letter code. *, Carbohydrate attachment site. are
1734
Biochemistry:
Table 2. Fifteen cyanogen bromide fragments isolated from ft chain of human fibrinogen
Sephadex G-50 pool*
Proc. Natl. Acad. Sci. USA 75 (1978)
Watt et al.
Aminoterminal sequence
No. of No. of residues cysteines 3 118 PCA-Gly-Val-t I 0 71 Tyr-Leu-LeuII 1 62 Asn-Thr-GluIII 4 34 Glu-Tyr-CysIVA 0 40 Glu-Asp-TrpIVB 2 50 I 5 Phe-Phe-SerVA 0 6 Gly-Glu-Asn-t VB 0 9 Gly-Pro-Thr VIA 0 7 Asn-Gly-AlaVIB1 1 18 Tyr-Leu-IleVIB2 0 12 Ala-Lys-HisVIC 0 9 Lys-Ile-Arg-A VID 0 4 Ser-Arg-LysVIE 0 3 Ary-Lys-MetVIF 0 9 Asn-Trp-LysVII 11 452 5 * Roman numeral indicates pool designation from Sephadex G-50 column (Fig. 1). t Amino-terminal peptide of j3 chain. PCA, pyrrolidone carboxylic acid. Carbohydrate-containing peptide. § Carboxy-terminal peptide of P chain.
The assignment of the remaining tryptic peptides in this fragment is still equivocal and the yield of one is marginal enough that we are still uncertain that it is a genuine component. Ordering the Cyanogen Bromide Fragments. The order of the cyanogen bromide fragments was established on the following basis. First, the amino-terminal fragment (CNI) was obvious as a result of previous studies conducted by Blombdck's group (3). Its amino terminus is blocked (pyroglutamic acid), its amino acid composition is (virtually) consistent with a previous report (4), and the fibrinopeptide B, as well as other predicted peptides, are released by tryptic digestion. The second cyanogen bromide peptide in the arrangement is CNII, which begins with the sequence Tyr-Leu-Leu-Lys (Table 2), a fragment first reported by us as the carboxyterminus of fragment E and necessarily adjacent to CNI (5). These two fragments together (CNI and CNII) comprise the first 189 residues of the 13 chain. The next fragment, CNIVA, was ordered on the basis of homology with both the a- and y-chain sequences, in which cases the two key cysteine residues are in exact register in all three chains. An overlap between this peptide and peptide CNVIB2 was obtained from a tryptic digest of citraconylated 13 chains. The observed sequence is highly homologous with the corresponding y-chain section (Fig. 3). The next two cyanogen bromide fragments, CNIII and CNIVB, were positioned by homology. Peptides CNVIB1 and CNVB are known to follow CNIVB sequentially, since tryptic digestion of 13 chains releases a carbohydrate peptide which overlaps all three cyanogen bromide fragments. This same fragment has been reported by the Munich group (8). The next five CNBr fragments (CNVIA, VA, VI, VII, and VID) were also positioned by homology with the -y chain, including the carboxy-terminal peptide which we have previously reported (6). In this alignment only the two small peptides VIE and VIF remain ambiguous. We have arbitrarily placed them in a seven-residue gap remaining near the carboxy terminus when the best alignment was achieved by homology with the y chain, but they could equally well go elsewhere in regions in which we have no overlaps. (Fig. 2).
DISCUSSION The most significant feature to emerge so far from the primary structure studies on fibrinogen, in our view, is the homology observed among the three nonidentical chains (22). Clearly all three chains have descended from a common ancestor (23). That being the case we can expect residual three-dimensional similarities and structure-function equivalences for all three chains in many instances. We have already invoked this principle of evolutionary equivalence in describing a three-stranded coiled-coil connection between the dimeric central domain and two terminal domains (24), a natural consequence of comparing all three chains in an appropriate alignment over a stretch of approximately 120 residues. In this article the most informative comparison is between the 13 and -y chains, not only because the entire sequence of the y chain is available, but also because the terminal domain is composed almost exclusively of the carboxy-terminal two-thirds of these two chains to the virtual exclusion of the a chain. Such a comparison over their entire lengths provides explanations for a variety of functional observations. For example, a best alignment-based by putting cysteine residues in register-indicates that a substantial terminal deletion (57 residues when human chains are compared) has occurred in the y chain, explaining why no fibrinopeptide is released by thrombin action from the amino-terminal segment of this chain (23). Similarly, the carboxy terminus of the 13 chain has had a terminal deletion, and this provides a natural explanation of why 13 chains are not crosslinked in a dimeric fashion the way their homologous y-chain counterparts are
(6).
What we want to emphasize here, however, are the similarities between 13 and y chains, not their differences. The terminal domains correspond closely to the plasmin-generated fragment D and are made up predominately of long stretches (about 250 residues each) of the 13 and -y chains. The two chains are apparently connected in this region by two disulfide bonds (25). It is our contention that there must be a pseudosymmetry for the two halves of this domain, the dyad axis of which extends through the disulfides themselves. If this notion is correct, then the 13- and y-chain portions would also be expected to play equivalent functional roles. Since these terminal domains each contains complementary receptor sites for the peptidyl knobs exposed by the action of thrombin (A. P. Laudano and R. F. Doolittle, unpublished data), it will be of great interest to establish the relative contributions of the two homologous chains. In summary, we have determined the sequence of and/or characterized 15 cyanogen bromide peptides from the 13 chain of human fibrinogen. An arrangement of these peptides has been proposed based on some overlapping enzymatically produced peptides as well as by homology with the My chain. The portions of the 1 and y chains that make up the terminal domain (fragment D) have the strongest homology, an observation we feel has great functional significance. Note Added in Proof. Since the submission of this article, we have completed most of the unfinished portions of the sequence. Also, a report containing a virtually complete sequence has appeared (26). Although only a few minor differences between the results of the two laboratories remain, the following significant differences from the preliminary results in this article ought to. be noted. First, the total number of residues in the 1t chain of human fibrinogen is 462. Fragments CNII and CNVB contain 72 and 53 residues, respectively. Moreover, a methionyl-threonine bond that is not cleaved by cyanogen bromide apparently occurs at position 366-367. As a result a 16th peptide (six residues) was missed.
Biochemistry:
Watt el al.
We thank Barbara Cottrell, Marcia Riley, Danielle Goldbaum,. and Dennis Trovato for exceptional assistance in this project, including the preparation of the fibrinogen, isolation of chains and fragments, operation and maintenance of the amino acid analyzers and sequencer, as well as much other technical assistance. We genuinely appreciate that this study was completely dependent on their highly supportive efforts. This work was supported by U.S. Public Health Service Grants HL-18,576 and GM-17,702. 1. McKee, P. A., Rogers, L. A., Marler, E. & Hill, R. L. (1966) Arch. Biochem. Biophys. 116,271-279. 2. Lottspeich, R. & Henschen A. (1977) Hoppe-Seyler's Z. Physiol. Chem. 358, 935-938. 3. Blomback, B. & Blombdck, M. (1972) Ann. N.Y. Acad. Sci. 202, 77-97. 4. Collen, D., Kudryk, B., Hessel, B. & Blombdck, B. (1975) J. Biol. Chem. 250, 5808-5817. 5. Takagi, T. & Doolittle, R. F. (1975) Thromb. Res. 7, 813-818. 6. Takagi, T. & Doolittle, R. F. (1975) Biochim. Blophys. Acta 386, 617-622. 7. Blomback, B., Hessel, B. & Hogg, D. (1976) Thromb. Res. 8, 639-658. 8. T6pfer-Petersen, E., Lottspeich, F. & Henschen, A. (1976) Hoppe-Seyler's Z. Physiol. Chem. 375, 1509-1513. 9. Henschen, A. & Lottspeich, F. (1976) Hoppe-Seyler's Z. Physiol. Chem. 357, 1801-1803. 10. Watt, K., Goldbaum, D., Cottrell, B. A. & Doolittle, R. F. (1977) Throm. Diath. Haemorrh. 38,26 (abstr). 11. Doolittle, R. F., Cassman, K. G., Cottrell, B. A., Friezner, S. J., Hucko, J. T. & Takagi, T. (1977) Biochemistry 16, 1703-1709. 12. Doolittle, R. F., Cassman, K. G., Cottrell, B. A., Friezner, S. J. &
Proc. Natl. Acad. Sci. USA 75 (1978)
1735
Takagi, T. (1977) Biochemistry 16, 1710-1715. 13. Spies, J. R. & Chambers, D. C. (1949) Anal. Chem. 21, 12491256. 14. Nussenzweig, V., Seligman, M., Pelmont, J. & Grabar, P. (1961) Ann. Inst. Pasteur Paris 100, 377-389. 15. Mross, G. A. & Doolittle, R. F. (1977) Mol. Biol. Biochem. Biophys. 25, 1-20. 16. Doolittle, L. R., Mross, G. A., Fothergill, L. A. & Doolittle, R. F. (1977) Anal. Biochem 78,491-505. 17. Gray, W. (1972) in Methods in Enzymology, eds. Hirs, C. H. W. & Timasheff, S. N. (Academic, New York), Vol. 25, pp. 333344. 18. Houmard, J. & Drapeau, G. R. (1972) Proc. Nati. Acad. Sci. USA
69,3506-3509. 19. Cartwright, T. & Kekwick, R. G. 0. (1971) Biochim. Biophys. Acta 236,550-562. 20. Henschen, A. & Edman, P. (1972) Biochim. Biophys. Acta 263,
351-367.
21. Dayhoff, M. 0. (1972) Atlas of Protein Sequence and Structure (National Biomedical Research Foundation, Silver Spring,
MD). 22. Doolittle, R. F. (1973) Adv. Protein Chem. 27, 1-109. 23. Doolittle, R. F. (1976) Fed. Proc. Fed. Am. Soc. Exp. Biol. 35, 2145-2149. 24. Doolittle, R. F., Goldbaum, D. M. & Doolittle, L. R. (1978) J. Mol. Biol., in press. 25. Doolittle, R. F., Takagi, T., Watt, K., Bouma, H., Cottrell, B. A., Cassman, K. G., Goldbaum, D. M., Doolittle, L. R. & Friezner, S. J. (1977) Proceedings of the FEBS Meeting 11th, in press. 26. Henschen, A. & Lottspeich, F. (1977) Hoppe Seyler's Z. Physiol. Chem. 358,1643-1646.