[18]

OVERVIEW AND STRATEGY

351

[18] P e p t i d e s a n d Proteins: O v e r v i e w a n d S t r a t e g y B y KLAUS BIEMANN

Introduction During the decade of the 1980s, mass spectrometry experienced a quantum leap that propelled it into the midst of protein chemistry. Prior to that period, the methodology was limited to small molecules, generally of molecular weights below 500. Although in the 1970s, field resorption (FD) ionization, developed by Beckey,l made it possible to ionize nonvolative compounds up to a molecular weight of 1500 to 2000, it required considerable experimental skill to obtain reliable results, particularly with large, polar peptides. Another novel method introduced during this time, plasma desorption mass spectrometry (PDMS) pioneered by Macfarlane,2 permitted the mass spectrometric determination of molecular masses ranging to many kilodaltons. B~cause that method used the fission products of californium-252 to ionize the molecules of interest, commercial instruments were not available for many years and this fact hampered the widespread use of this very useful technique. However, for the first time it demonstrated the fact that very large and polar molecules can be ionized and the resulting heavy ions detected in a mass spectrometer. The situation changed dramatically, when M. Barber et al. 3 developed a very simple approach to the ionization of large and polar molecules and used peptides as examples.1 Irradiation of a solution of the compound of interest in a liquid matrix of low vapor pressure, such as glycerol, with fast [a few kiloelectron volt (keV) kinetic energy] particles (argon atoms) produced protonated molecules, (M + H) +, that were ejected into the mass analyzing system of a mass spectrometer and were recorded after mass separation. The process, termed fast atom bombardment (FAB) ionization, was very easy to implement on existing mass spectrometers, particularly those that were equipped for FD, and over which it had the advantage of much greater ease of operation. This fact, combined with the demonstration that it was applicable to peptides of molecular masses in the range of a few thousand daltons (insulin soon became the yardstick of performance), immediately opened up the field of peptide identification. I A. G. Harrison and R. J. Cotter, this volume [1]. 2 R. D. Macfarlane, this volume [11]. 3 M. Barber, R. S. Bordoli, R. D. Sedgwick, and A. N. Tyler, J. Chem. Soc. Chem. Commun. p. 325 (1981).

METHODS IN ENZYMOLOGY,VOL. 193

Copyright © 1990by AcademicPress, Inc. All rightsof reproductionin any formreserved.

352

PEPTIDES AND PROTEINS

[18]

It should be noted that it was not the neutral particles but the liquid matrix that led to success, where similar attempts involving direct bombardment with keV ions (i.e., Ar ÷) of even much smaller molecules deposited on a metal surface (secondary ion mass spectrometry, SIMS) had failed. It was later found that bombardment of a glycerol solution of a peptide with Cs ÷ ions works equally well; the methodology is often also referred to as liquid secondary ion mass spectrometry (LSIMS). 4 At the time FAB ionization was discovered, the mass range of the larger, commercially available magnetic mass spectrometers barely reached 2000 u at full accelerating voltage. This provided the impetus to design instruments of higher mass range without sacrificing ion transmission and thus sensitivity, which resulted in the commerci~al availability of mass spectrometers having a mass range of up to 15,000 u at 8 or 10 kV. This approaches the practical limit of keV Cs ÷ ions to generate a sufficient number of (M + H) ÷ ions to be detectable. Beyond that size, other ionization methods (see below) are much more efficient. While FAB-MS (or LSIMS) quite efficiently generates protonated molecules, these are chemically very stable and have little excess energy. As a result, not much fragmentation takes place and the signal (i.e., the "mass spectrum") consists mainly of the isotope cluster of (M + H)÷. Fragment ion abundances are generally ten times lower and the spectrum is, therefore, dominated by the signal for the protonated molecule. Thus, such spectra contain little structural information beyond the molecular weight. In spite of this fact, the literature abounds with FAB mass spectra obtained with large samples (high concentrations in glycerol of commercially or otherwise available peptides), which are then "interpreted" in terms of their known sequence. Scattered among them are, indeed, a few peptides of partly unknown structure but, in these cases, the problem was most often one of distinguishing between a choice of candidate structures. The problem with normal FAB-MS spectra obtained with realistic sample sizes is not only the low abundance of the fragment ions but the fact that the matrix itself, and in combination with the sample, creates a spectrum that contains a peak at every mass. Thus, it is tempting to select those peaks that fit a known or preconceived structure while ignoring those that do not. However, the lack of fragmentation has great advantages as it permits the determination of the molecular weights of all the components present in a mixture, as long as they differ sufficiently in mass to be resolved. Alternatively, a FAB mass spectrum of a compound assumed to be homogeneous may reveal that it is a mixture. As a consequence of this lack of 4 W. Aberth, K. M. Straub, and A. L. Burlingame, Anal. Chem. 54, 2029 (1982).

[18]

OVERVIEW AND STRATEGY

353

fragmentation and the accompanying lack of structural information, other methods were developed to induce cleavage of bonds. Foremost among these is tandem mass spectrometry, 5-7 where the species, generally (M + H) ÷, in the first mass spectrometer (MS-I) is fragmented upon collision with an inert atom (collision-induced dissociation, CID, or collisionally activated dissociation, CAD 8) or some other energy-transfer process, and the fragment ions are recorded in the second mass spectrometer (MS-2). This principle had been used previously, mainly for studies of ion structure, elucidation of fragmentation mechanisms, or the qualitative and quantitative analysis of relatively small molecules for which mass range and resolution were not a problem) For this study of the CID spectra of peptides of the size produced by proteolysis of proteins, larger instruments were required, either a pair of double-focusing mass spectrometers, 5 a triple quadrupole mass spectrometer, or a combination of magnetic and quadrupole analyzers. 6 As has been mentioned earlier, the determination of molecular weights beyond 10,000-15,000 becomes increasingly difficult by FAB ionization in combination with a magnetic deflection mass spectrometer, and PDMS may be limited to molecular weights of ~45,000. 2 Two novel methods for the mass spectrometric determination of molecular weights, apparently particularly well-suited for proteins, were developed very recently. Hillenkamp has shown that a protein molecule embedded in a UVabsorbing matrix, such as nicotinic acid, can be ionized by irradiation with a neodymium/yttrium-aluminum-garnet (Nd-YAG) laser pulse at 266 nm.~° Mass measurements are carried out with a time-of-flight (TOF) mass spectrometer. H To date, protein molecular weights exceeding 300,000 have been measured with a -+0.1-0.5% accuracy of mass assignment. Another novel ionization method, electrospray (ES), is based on the ability to generate multiply protonated, and therefore multiply charged, molecules by generating a spray of very small droplets of a solution of the protein emerging from the tip of a needle held a few keV above the entrance aperture of a quadrupole mass spectrometer. 12The success of this method 5 M. L. Gross, this volume [6]. 6 R. A. Yost and R. K. Boyd, this volume [7]. 7 K. Biemann, this volume [25]. 8 R. N. Hayes and M. L. Gross, this volume [10]. 9 For a review, see "Tandem Mass Spectrometry" (F. W. McLafferty, ed.). Wiley, New York, 1983. 10 F. Hillenkamp and M. Karas, this volume [12]. 11 K. R. Jennings and G. G. Dolnikowski, this volume [2]. 12 C. G. Edmonds and R. D. Smith, this volume [22].

354

PEPTIDES AND PROTEINS

[18]

is due to the presence of well distributed basic sites (amino acids) along the protein chain, as first demonstrated by Fenn et al. 13 The general distribution of basic amino acids in most proteins is such that a multiplicity of peaks, one for each (M + Hn)n+ ion, centers around m/z ~ 1000. Therefore, a quadrupole mass spectrometer with a mass range of m/z -1500 suffices to record the data and the molecular weight can be redundantly determined from each pair of adjacent p e a k s . 12-14 Averaging the results leads to a mass accuracy of -+0.05-0.01%, even if the m/z value of each peak is only determined to within one mass unit. To date, the ES method has been applied to proteins of molecular weights as high as 130,000, but above 75,000 the multiply charged peaks come so close together that their reliable mass measurement is increasingly difficult. A more detailed discussion of these considerations is presented elsewhere in this volume. 14 At the present time, it appears that ES is the method of choice for molecular weights up to 75,000, where matrix-assisted laser desorption (LSD) takes over. All these measurements, and generally also those above m/z 5000 from magnetic deflection mass spectrometers, provide the average molecular weight, 15 because both ES and LSD lack the resolution required to separate the isotopic clusters of the protonated molecules. A double-focusing magnetic mass spectrometer is capable of unit resolution up to at least 10,000 molecular weight but the effort required to achieve this is rarely worth it, because at that mass range there is little difference in the information content of monoisotopic and polyisotopic mass measurements.

Utility of Molecular Weight Determinations There are many problems in peptide and protein structure that can be simply solved by determining molecular weights to better than one mass unit. One of the first applications of FAB-MS to protein structure was the verification and, if necessary, correction of the DNA sequence of a gene coding for the corresponding protein. 16 It merely requires translating the open reading frame of the gene sequence into the corresponding protein sequence and calculating the molecular weights of the tryptic peptides expected from the locations of lysine and arginine. Matching these values with the actual molecular weights, determined by FAB-MS, of the compo13 j. B. Fenn, M. Mann, C. K. Meng, S. F. Wong, and C. M. Whitehouse, Science 246, 64 (1989). 14 I. Jardine, this volume [24]. 15 K. Biemann, this volume [13]. 16 K. Biemann, Int. J. Mass Spectrom. Ion Phys. 45, 183 (1982).

[18]

OVERVIEW AND STRATEGY

355

nents of the tryptic digest of the protein reveals whether or not the DNA is correct. Those values that do not match indicate either an error in the DNA sequence or a posttranslational modification of the gene product. Sequence errors are usually misidentifications of a base that leads to a codon representing a different amino acid, or a deletion or insertion of a base causing a frame shift. In the former case, there will be an unmatched peptide, the molecular weight of which differs from the predicted one by the mass difference of two amino acids. If a frame shift has occurred, there will be one or usually more peptides that do not match the open reading frame but fit those expected for one of the other two. It is then easy to determine where a base has been missed, and where another one inadvertently has been inserted elsewhere, to return to the open reading frame. Such a verification process is even more efficient if the mass spectrometric information is acquired in parallel with or ahead of the DNA sequencing. Each preliminary set of DNA data can then be checked immediately after it has been obtained. If correct, no further effort needs to be devoted to this section of the DNA; if not, the region where the error occurred is pinpointed and easily corrected by reinspection of the sequencing gel. This parallel approach was used to advantage during the determination of the sequence of the gene coding for glycyl-tRNA synthetase (glycine-tRNA ligase) from Escherichia coll.17 In contrast to these relatively early days of DNA sequencing, the methodology is now so fast and redundant that an error in the final result is rare. However, mass spectrometry is often still required to settle some remaining questions concerning the gene or its transcription. An example that illustrates a number of points made in the introduction involves the structure of the gene coding for the cytoplasmic valyl-tRNA synthetase (valine-tRNA ligase) (Val-RS) from yeast. After the DNA sequence had been determined, the question of the site of initiation of transcription remained because there were two ATG codons for Met near the putative N terminus, separated by only 45 codons. The first stop codon was 1103 triplets downstream of the first ATG and the protein could therefore be either 1058 or 1104 amino acids long, a difference indistinguishable by molecular weight determinations based on chromatography or electrophoresis. The obvious solution, a number of Edman steps, failed, indicating that the N-terminal amino acid is acylated. A tryptic digest of Val-RS was partially separated by high-performance liquid chromatography (HPLC) (shown in Fig. 1). No particular effort was made to completely separate all components and 27 fractions were 17T. A. Webster, B. W. Gibson, T. Keng, K. Biemann, and P. Schimmel, J. Biol. Chem. 258, 10637 (1983).

356

PEPTIDES AND PROTEINS

118]

J

~]___.]

J

FIG. 1. HPLC trace of the tryptic digest of 2 to 3 nmol of cytoplasmic valyl-tRNA synthetase (valine-tRNA ligase) from yeast. The diagonal line indicates the solvent gradient from water to acetonitrile (both containing 0.1% trifluoroacetic acid) at l%/min. [From K. Biemann, in "Protein Sequencing: A Practical Approach" (J. B. C. Findlay and M. J. Geisow, eds.), p. 99. IRL Press, Oxford, 1989.]

collected. A FAB mass spectrum was recorded for each of the fractions and the molecular weights of a total of 80 peptides were determined from the resulting data. Of these, 79 corresponded to most of those predicted for the tryptic peptides expected for the protein sequence beginning at the second Met-47, but none fit the region between Met-1 and Met-47. This indicated that (1) the DNA sequence is essentially correct because the peptides identified by molecular weight covered about 80% of the protein sequence, and (2) the translation started at the second ATG codon. However, none of the molecular weights corresponded to the peptide Met47-Lys-59. However, among the 11 abundant peaks in HPLC fraction 10 (Fig. 2) there was one at m/z 1351.7 that did not fit any of the predicted tryptic peptides; it was in the range but less than that expected for the protonated Met-47-Lys-59 segment. Both its mass and CID spectrum 7 indicated that transcription began at Met-47 that had been posttranslationally removed, followed by N-acetylation of the new N-terminal amino acid s e r i n e . 18,19 It is of interest to note that the mitochondrial Val-RS in yeast was shown to be derived from the same gene except that translation began 18 B. Chatton, P. Walter, J. P. Ebel, F. Lacroute, and F. Fasiolo, J. Biol. Chem. 2,63, 52 (1988). 19 K. Biemann and H. A. Scoble, Science 237, 992 (1987).

[18]

OVERVIEW AND STRATEGY

357

(M+H)* 1351.7

U c~ z

2 .
3000) often result in CID spectra that do not reveal 2s I. A. Papayannopoulos and K. Biemann, Biomed. Environ. Mass Spectrom., in press.

360

PEPTIDES AND PROTEINS

[18]

the entire sequence but still show a portion (usually the middle),19 and thus may contain sufficient information to construct a useful oligonucleotide probe. Most of these discussions were related to problems of protein chemistry in the "precloning" stage. This field is usually plagued by a scarcity of starting material that has to be tediously isolated from its natural source, although a certain level of impurities can be tolerated. The situation is almost completely reversed in the "postcloning" stage, where the protein produced is generally abundant but the questions asked are also very different. The two most important ones are: How homogeneous is it? Were all posttranslational modifications carded out properly? Some of the chapters that follow touch on these questions to varying degrees and describe how at least parts of these complex questions can be answered. The last chapter 29 of this section is specifically devoted to this subject, using as an example a protein produced on a large scale by recombinant technology for clinical trials. Quality control and product integrity are of extreme importance for pharmaceuticals, particularly for those that are used over extended periods of time. For these reasons, it can be expected that mass spectrometry will attain an increasingly important role in this area and may become a methodology required by the agencies formulating the prerequisites of approval and the regulations of such biologicals for human use. Acknowledgment The portion of the methodologies and results describcd that were developed and carded out in the author's laboratory were supported by grants from the National Institutes of Health (RR00317 and GM05472). The author is also indebted to all his present and former associates, particularly B. W. Gibson, S. A. Martin, I. A. Papayannopoulos, and H. A. Scoble.

29 H. A. Scoble and S. A. Martin, this volume [28].

Peptides and proteins: overview and strategy.

[18] OVERVIEW AND STRATEGY 351 [18] P e p t i d e s a n d Proteins: O v e r v i e w a n d S t r a t e g y B y KLAUS BIEMANN Introduction During th...
560KB Sizes 0 Downloads 0 Views