Proteomics 2014, 14, 2031–2041

2031

DOI 10.1002/pmic.201300479

RESEARCH ARTICLE

Why less is more when generating tryptic peptides in bottom-up proteomics ´ Siri Hildonen, Trine Grønhaug Halvorsen and Leon Reubsaet Department of Pharmaceutical Chemistry, School of Pharmacy, University of Oslo, Oslo, Norway

Proteolytic digestion is a time consuming and critical step in bottom-up proteomic analysis. The most widely used protease, trypsin, has high specificity and generates peptides that are considered to be ideally suited for bottom-up LC-MS technology. By exploiting key factors affecting enzymatic activity we obtained a simple, straightforward, and rapid in-solution digest protocol that performed better than the conventional overnight digestion method in terms of amino acid coverage of proteins, number of peptides generated, and peptide ion abundances. Prolonged digestion time, such as overnight digestion, leads to decline in protein amino acid coverage and loss of tryptic peptides. This was found to be caused by complete digestion by trypsin leading to an increased number of small peptides that are not LC-MS detectable. Slow-rate nontryptic digestion of peptides is a contributing factor for loss of peptide ion intensities during extended digestion time. Our work demonstrates that for both qualitative and quantitative bottom-up proteomic studies it is beneficial to prevent trypsin digestion to go to completion by reducing treatment time from the conventional several hours to a few minutes cleavage time.

Received: October 29, 2013 Revised: June 11, 2014 Accepted: July 10, 2014

Keywords: Accelerated digestion / LC-MS/MS / Proteolysis / Technology / Tryptic digestion

 1

Additional supporting information may be found in the online version of this article at the publisher’s web-site

Introduction

Proteomics is the study of proteins expressed by the genetic material of an organism. Proteomic technologies are well established in drug discovery, diagnostics, and molecular medicine because expressed proteins links genes and diseases. Many pharmaceutical drugs target proteins or are proteins themselves [1–3]. At present the core technology in proteomics is LC MS/MS. MS-based bottom-up proteomics is the most widely used approach for protein identification and quantitation. In this approach, peptides derived from proteins are separated on a LC system and detected by a mass spectrometer. The MS data are then further computational or manually processed. The key to successful bottom-up proteomic analyses is to convert the

´ Correspondence: Dr. Leon Reubsaet, Department of Pharmaceutical Chemistry, School of Pharmacy, University of Oslo, Oslo, Norway E-mail: [email protected] Abbreviation: XIC, extract ion chromatogram

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

proteins to peptides with chemical features and masses that are optimal for both LC separation and MS fragmentation and detection [4–6]. Generation of peptides is routinely performed by enzymatic cleavage of proteins. Trypsin is the most frequently used proteolytic enzyme due to its high specificity and efficiency compared to other commercially available endoproteases. The enzyme cleaves at the carboxyl-terminal side of lysine (K) and arginine (R) residues with the exception of KP and RP sites. The generated peptides are of masses and with charges that are well suited for LC-MS/MS analysis [7, 8]. Conventional trypsin digestion protocols are wellestablished procedures where trypsin is added to a pretreated protein sample in a recommended enzyme-to-protein ratio w/w ranging from 1:100 to 1:10. Incubation is normally carried out overnight at 37⬚C to allow for complete digestion of proteins in the mixture [9]. There have been at least two challenges using the conventional trypsin digestion protocol in proteomics. Firstly, the conventional bottom-up approach is for some proteins associated with rather low amino acid sequence coverage after searching MS data against protein databases. The

www.proteomics-journal.com

2032

S. Hildonen et al.

obtained amino acid coverage of proteins is typically in the range 40–90 % depending on the protein, the sample complexity, and/or the applied method. High confidence identification, where high amino acid coverage is a contributing factor, is crucial for unambiguously proteomic identification of proteins in complex samples. To increase the coverage of proteins more labor-intensive protocols have been suggested such as technical replicates and combinatory digest of proteases with different specificities [10]. The second challenge is that the traditional trypsin digestion protocols are time consuming not suitable for highthroughput or automated sample preparation and analyses. Many techniques have been applied to accelerate the digestion process such as the use of ultrasound, microwave, and infrared energy as well as applying high pressure during digestion. (Recently reviewed by Switzar et al. [11].) These alternative protein digestion protocols take only minutes, in contrast to the several hours applied by conventional methods. However, these techniques requiring special devices have not been implemented into standard trypsin digestion procedures. The reason might be that the results derived by the use of these devices have proved difficult to reproduce by other research groups [12]. To circumvent the digestion process altogether MStechnology is developing to allow high throughput proteome analysis with high resolution, mass accuracy, and sensitivity of intact proteins [13,14]. Proteome analysis of intact proteins is referred to as top-down proteomics analysis. In addition to minimal sample preparation there are other advantages of analyzing proteins compared to their peptides such as reduced sample complexity, better conservation of PTMs [15] and the ability to detect and distinguish between different protein isoforms [16, 17]. Middle-down proteomics is emerging as a technique in the wait for the top-down proteomic approach to fully develop. In this approach, peptides generated are in the size range of 2000–20 000 Da compared to less than 2000 Da for fully tryptic peptides generated by the bottomup proteomic method [18]. The larger peptides are generated by proteolytic enzymes that cut proteins less frequently than trypsin [19]. In addition to better conserve PTMs [20] these midrange-sized peptides are found to be LC fractionated with improved resolution. The improved LC resolution is probably a major contributing factor for the reported high protein amino acid sequence coverage obtained by middle-down proteomics [21]. Swaney et al. pointed out in a paper from 2010 that peptides of amino acid lengths of six amino acids or less are hardly detected by LC-MS [10]. The limitation in detection is probably due to poor chromatographic column retention and/or that short peptides give rise to poor fragmentation spectra. Swaney et al. suggested the use of multiple proteases to increase the amino acid coverage of proteins in LC-MSbased proteomic analyses [10]. This strategy demands more sample and more elaborate sample preparations, MS-analysis and MS-data processing that altogether makes it less high throughput.  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics 2014, 14, 2031–2041

One way to reduce the enzymatic generation of short peptides, and at the same time increase the high throughput of the digestion, is to apply an accelerated digestion protocol that leads to incomplete digests of proteins. This paper investigates if it is possible to combine a bottom-up proteomic approach that is high-throughput (accelerated trypsin digestion) as well as leading to high confidence protein identifications (high protein amino acid coverage) solely by assuring that key parameters that governs enzymatic activity is at trypsin’s optimum. The criteria determining the trypsin digestion performance of the applied protocol was based on obtained protein amino acid coverage, counting, and mapping the peptides generated as well as determining the ion intensities of selected tryptic peptides from nine commercially available proteins.

2

Materials and methods

2.1 Materials The proteins listed in (Table 1), L-1-tosylamide-2-phenylethyl chloromethyl ketone (TPCK) treated Trypsin, Trizma Base, Trizma HCl, DTT, iodoacetic acid (IAA), formic acid (FA), and TFA were purchased from Sigma-Aldrich (St. Louis, MO, USA). All other chemicals used were of analytical grade.

2.2 Protein in-solution digestion The stock solutions of proteins mixture were prepared by dissolving 1 mg of each protein to a concentration of 1 mg/mL in 50 mM Tris buffer pH 8 at 37⬚C. The solubilized protein were reduced with 10 mM DTT for 15 minutes at 37⬚C, 800 rpm, and alkylated with 30 mM IAA at room temperature in the dark. Fifty microliters of each of the reduced and alkylated protein stock solutions were aliquoted into Eppendorf tubes. Six micrograms of total protein were digested by trypsin in a ratio of enzyme to total protein of 1:40 w/w for either 5 min or 18 h in 50 mM Tris buffer pH 8 at 37⬚C, 800 rpm, in a total volume of 100 ␮L. Buffers and samples were heated to 37⬚C prior to addition of trypsin to the samples. The enzymatic activity was terminated by adding 100% FA to a final concentration of 5%.

2.3 Tryptic digest salt clean-up The tryptic digests were SPE cleaned-up. Two hundred microliters pipette tips were packed with six membrane stacks of C8 and six membrane stacks of C8 Empore Extraction Disks obtained from Phenomenex B.V. (Utrecht, Netherlands). A centrifugal force of 3800 × g was used to force buffer through the membrane stacks. The in-house SPE-tips were activated with 100 ␮L of 100% acetonitrile (MeCN) and equilibrated with 100 ␮L 0.1% FA. The tryptic digests were loaded onto www.proteomics-journal.com

2033

Proteomics 2014, 14, 2031–2041 Table 1. Protein table

Protein name

Organism

Mw (kDa)

pI

Number of fully tryptic peptides

Median fully tryptic peptide mass

Cytochrome C Myoglobin ␤-lactoglobulin Carbonic anhydrase Ovalbumin Catalase Serum albumin Transferrin Glycogen phosphorylase

Horse Horse Bovine Bovine Chicken Bovine Bovine Human Rabbit

11.7 17.1 19.9 29.0 42.9 59.9 69.3 77.1 97.3

9.5 7.8 5.0 6.9 5.3 7.3 6.2 7.1 7.2

21 21 19 25 33 58 79 80 109

361 650 837 979 1190 833 725 780 696

Proteins included in the work listed with molecular weights, pI, maximum number of fully tryptic peptides and median fully tryptic peptide molecular mass.

the tips; bound peptides were washed with 100 ␮L of 0.1% FA and eluted by 100 ␮L of 80% MeCN. The eluate was evaporated to dryness and the peptides were resolubilized in 40 ␮L 0.01% TFA prior to the LC-MS/MS analysis.

2.4 Nano LC-MS/MS analysis Twenty microliters of the proteolytic peptide mixtures were injected into the Chromeleon Xpress controlled Dionex HPLC system (Thermo Fischer, Bremen, Germany) and trapped on a C18 5 mm × 300 ␮m id Acclaim PepMap 100 (5 ␮m) enrichment column (Dionex). The loading mobile phase, 20 mM FA and MeCN (98/2, v/v), was delivered at 10 ␮L/min for 4 min. The analytes were transferred to a 150 × ˚ particle 0.075 mm id. Acclaim PepMap 100 (pore size 100 A, diameter 3 ␮m; Dionex) at 300 nL/min. The mobile phases consisted of A: 20 mM formic acid and MeCN (95/5, v/v) and B: 20 mM formic acid and MeCN (5/95, v/v). A linear gradient was run from 0% to 50% B in 60 min. Subsequently, the elution strength was increased to 100%. Total analysis time per run was 87 min. The LC setup was connected to an Xcalibur 2.0.7 controlled LTQ Discovery Orbitrap MS equipped with a Nano-ESI ion source (Thermo Fischer, Bremen, Germany). The nanospray ionization source was operated in the positive ionization mode (360 ␮m od × 20 ␮m id distal coated fused silica emitter, 10 ␮m id tip (New Objective, Woburn, MA, USA)). The spray voltage was set at 2.2 kV. The heated capillary was kept at 150⬚C. The capillary voltage was set at 45 V, and the tube lens was offset at 100 V. The mass spectrometer was operated in data-dependent positive ion-mode. Survey MS scans were performed in the orbitrap analyser at a resolution of 30 000 over a mass range between m/z 300– 2000 Da with charge state disabled. Up to six most intense ions per scan were fragmented by CID at 35% relative collision energy, activation time of 30 ms, and analyzed in the linear ion trap. The wide band activation option was enabled and dynamic exclusion of a time window of 15 s was used to minimize the extent of repeat sequencing of the peptides.  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

2.5 Data interpretation The MS raw files were processed with Proteome Discoverer 1.3 (Thermo Fischer, Bremen, Germany), using the Sequest algorithm, searching against a 3- or 9- protein database generated from sequences obtained from Uniprot (October, 2012). Enzyme specificity was set to trypsin. The initial parent and fragment ion maximum mass deviation was set to 10 ppm and 0.8 Da, respectively. The search included cysteine carboxyl methylation and methionine oxidation as variable modifications. Up to three missed cleavages were allowed and peptides had to be fully tryptic. The Thermo ScientificTM XcaliburTM software version 2.1 (Thermo Fischer, Bremen, Germany) was used to manually extract ion chromatograms (XICs) of selected tryptic peptides.

3.

Results and discussion

For the digestion we choose to use a nonmodified trypsin as it is more economically advantageous (1000-fold cheaper than the proteomic grade trypsin) as our main goal of this work was to investigate the relative difference between an accelerate protocol versus a prolonged protocol under ideal conditions (digest of a small set of abundant proteins that is loaded onto LC-MS in high amounts without possible interfering matrix components). Our goal was not to compare the different trypsin qualities that are available. Trypsin digestion is complex and recently there has been published several papers addressing the specificity of different commercially available trypsin products [22–24]. The proteins selected for digestion were soluble proteins ranging in molecular weight from approximately 10–100 kDa (Table 1). When necessary to simplify the figures and when all proteins in the same experiment showed the same trends, BSA was chosen as a model for the other proteins in the mixture. Digestion efficiency was measured as percent average protein sequence coverage of each protein in the mixture after database search. If not www.proteomics-journal.com

2034

S. Hildonen et al.

stated otherwise in the text the experiments were duplicate experiments MS-analyzed twice. 3.1 Parameters affecting trypsin digestion A precondition to be met for accelerated and efficient trypsin digestion is not only that the enzyme works at its optimum but also that the enzyme has easily access to the substrate cleavage sites. To assure that trypsin was at its maximum activity key parameters of enzymatic activity, temperature, and pH, were adjusted to the reported optimum levels of bovine trypsin of 37⬚C and pH 8. These numbers were taken from the BRENDA (The Comprehensive Enzyme Information System) database (http://www.brenda-enzymes.org/). To further optimize the substrate for enzymatic treatment we used a Tris buffer instead of the volatile ammonium bicarbonate buffer to assure a stable pH during the digestion time and to reduce peptide artifacts from buffer components [25]. 3.2 Effect of pretrypsination heating on trypsin digestion efficiency Protein unfolding makes more of the enzymatic cleavage sites available at the start of the digestion. Denaturation of substrate proteins includes applying heat to disrupt the secondary structure and reduction of disulfide bridges by DTT. Reduction protocols of temperatures ranging from 37⬚C to 95⬚C are common. To investigate the effect of temperature applied to proteins during the reduction step we reduced and alkylated cysteine residues at various temperatures (Fig. 1). Figure 1 demonstrates that temperatures of 50⬚C or above do not lead to higher protein sequence coverage of a 9-protein mixture compared to a temperature at 37⬚C. At 60⬚C there is a decline in coverage for a few of the proteins (cytochrome C, catalase, and phosphorylase). The negative effect is even more pronounced at 95⬚C where the amino acid coverage seems dependent on the molecular weight of the proteins: proteins with a molecular weight above 40 kDa showed a marked decline in amino acid coverage. The decline in amino acid coverage was more than 50% when considering proteins of molecular weights above 60 kDa. Most proteins in the human proteome will be affected by such high reduction temperature since the median molecular mass of human proteins is 40 kDa (average 50 kDa) [26]. We suspect that the reduced protein amino acid coverage is due to reduced access of trypsin to its cleavage sites caused by formation of insoluble aggregates of proteins. The hypothesis is supported by that at a temperature of 95⬚C precipitates were visible in our 9-protein mixture protein solutions. We suggest that a treatment temperature of 95⬚C should be avoided for protein samples. For further experiments we chose 37⬚C as the temperature for the reduction of proteins as we reasoned that a physiological treatment temperature will reduce aggregation of heat-labile proteins when applying the protocol to complex samples.  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics 2014, 14, 2031–2041

3.3 Effect of time and enzyme-to-protein ratio on trypsin digestion efficiency Digestion efficiency was measured as average percent protein amino acid sequence coverage of each protein in the mixture allowing either zero or up to three missed cleavages in the database search setup. For the “time-point experiments” the ratio of enzyme to protein was 1:40 and the time points were 5 min, 30 min, 1, 2, 4, or 18 h. For the “enzyme-to-protein ratio experiments” trypsin treatment time of 1 h was applied and the trypsin-to-protein ratio w/w was 1:40, 1:20, 1:10, or 2:1. The results of the data revealed that with elongated trypsin treatment time (Fig. 2A) or when enzyme-to-protein ratio increases (Fig. 2C) there is an overall increase in amino acid coverage when we only allow the search program to include fully tryptic peptides (0 missed cleavages). This can be explained by that over time or by increasing the amount of enzyme relative to substrate the enzyme will have gained access to more of the cleavage sites and consequently the number of completely digested peptides increases. Unexpectedly, when allowing up to three missed cleavages (Fig. 2B and D) in the database search the opposite trend is observed, there is a reduction in protein amino acid sequence coverage both after prolonged digestion time (Fig. 2B) and after increasing the trypsin to substrate ratio (Fig. 2D). When digestion time is increased from 5 min to 18 h and when considering tryptic peptides with up to three missed cleavages the highest molecular mass protein, BSA (69 kDa), has 18% lower amino acid coverage, myoglobin (17 kDa), 15% lower amino acid coverage while the amino acid coverage of cytochrome C (12 kDa) was considered unaltered. The respectively difference in amino acid coverage of the three proteins after an increase of trypsin-to-protein ratio from 1:40 to 2:1 (trypsin treatment time was 1 h) allowing up to three missed cleavages were –15%, –11%, and unaltered. Five minutes or/and a low enzyme to substrate ratio digest will not give trypsin time to gain access to all trypsin cleavage sites. When allowing the search program to look for uncleaved trypsin sites (up to three) we will include and identify more peptides from the incomplete digest (5 min digest) data than if only including fully tryptic peptides in the search parameters. There will be fewer uncleaved tryptic peptides in a more complete (18 h) digest and hence the increase in amino acid coverage is not observed when allowing up to three miscleaved cleavage sites as observed after digesting 5 min. The overall highest average amino acid sequence coverage of all the three proteins in the experiments regardless of time-point or enzyme-to-protein ratio was obtained by the 5 min trypsin digestion time (with an enzyme-toprotein ratio of 1:40) allowing up to three missed cleavages in the database search. The coverage was 91% for the highest molecular mass protein BSA, 89% for myoglobin and 76% for cytochrome C. www.proteomics-journal.com

Proteomics 2014, 14, 2031–2041

2035

Figure 1. Effect of pretrypsination heating on trypsin digestion efficiency. The effect of different cysteine reduction temperatures on protein amino acid sequence coverage. The heating was applied prior to enzymatic digestion at 37⬚C for 15 min. The boxed number at the bottom of each column is the molecular mass (kDa) of the protein.

3.4 Number of peptides generated over time and with increasing enzyme-to-protein ratio That both a lengthy trypsin digestion time and an increase in enzyme concentration relative to substrate lead to reduced protein amino acid sequence coverage is concurrent with our hypothesis that the decline in coverage in both experiments might be a result of complete digestion of the proteins in the mixture. Complete trypsin digestion in silico of the yeast proteome would lead to generation of a substantial amount (56%) of peptides with length of less than six amino acids (average molecular mass of less than 660 Da) that are not easily detected by current LC-MS technology [10]. An overview of the median fully tryptic peptide size of the nine proteins included in this work is found in (Table 1). For three of the proteins, cytochrome C, myoglobin, and glycogen phosphorylase, the median fully tryptic peptide size is below or close to 660 kDa. This means that the masses of about half of the tryptic peptides from these proteins will be smaller than the stated limit of easily detected peptides if digestion goes to completion. Figure 3 displays the number of experimentally generated tryptic (0–3 missed cleavages) of BSA, after 5 min and 18 h digestion (ratio of enzyme to substrate 1:40) (Fig. 3A), and after 1 h of digestion with en-

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

zyme to substrate ratios of w/w 1:40 and 2:1 (Fig. 3B). To visually display the peptide size distribution of trypsin digestion gone to completion the in silico numbers of fully tryptic peptides of BSA is shown to contrast the experimentally obtained peptides. The highest numbers of detected peptides are in the 5 min digest above a molecular mass of 2500 Da. Compared to the 5 min digestion the 18 h digestion did not lead to an increase in total peptides detected and the peptide sizes are distributed fairly evenly between 700 and 2000 Da. A majority of fully digested theoretical peptides of BSA have molecular sizes below 700 Da, which as discussed previously is in the nondetectable peptide size range for LC-MS analysis. Indeed the search database had only one experimental peptide hit of a peptide of molecular size less than 700 Da regardless of the digestion time. By increasing the concentration of enzyme relatively to the substrate the same size distribution pattern of peptides as for the time point experiments is observed. However, since this experiment included a one-hour digestion we did not get as many peptides generated above 2500 Da as compared to a 5-minute digestion. No peptides less than 700 Da were detected in these experiments and there was no overall gain in peptide number when increasing the trypsin-to-protein ratio.

www.proteomics-journal.com

2036

S. Hildonen et al.

Proteomics 2014, 14, 2031–2041

Figure 2. Trypsin digestion efficiency at different time points and at different trypsin to protein ratios. The obtained percentage protein amino acid coverage of BSA, myoglobin, and cytochrome C at different time points at an enzyme to protein ratio of 1:40 (Fig. 2A and B) and with different enzyme to substrate ratio with a trypsin treatment time of 1 h (Fig. 2C and D). The percentage amino acid sequence coverage was based on either fully tryptic peptides (Fig. 2A and C) or tryptic peptides with up to three missed clavage sites (Fig. 2B and D).

In conclusion, there is no overall gain in detected peptides as the digestion time is prolonged or the amount of enzyme to substrate is increased, in other words when digestion is getting closer to completion. A substantial amount of detected BSA peptides was within a peptide size range that is defined as middle-down peptide size (>2000 Da). Fully tryptic theoretical peptides of BSA in the size range 0–699 Da covers 21.6% of the amino acid sequence. Based on the observations in (Fig. 3) it is reasonable to conclude that the 18% decline in coverage observed in (Fig. 2B), when prolonging the digestion time from 5 min to 18 h, is a result of digestion going to completion that in turns leads to generation of peptides that are less than 700 Da in molecular mass and hence not easily LC-MS detected.

3.5 Mapping of peptides generated after an accelerated and a conventional trypsin digestion To further investigate the generation of peptides with up to three missed cleavages after 5 min and 18 h of trypsin digestion, peptide maps were constructed for all of the nine

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

proteins in the mixture. These maps were based on triplicate samples that were LC-MS/MS analyzed twice in random order. Only tryptic peptides that were considered high confident by the search program’s default settings ((XCorr values of 1.5, 2, 2.5, and 3 of peptides of charge +1, +2, +3, and +4, respectively) and detected in at least one MS-analysis were displayed on the map. The peptides displayed on the maps are not shared by any of the other proteins in the mixture (confirmed by BLAST searching the sequences). Figure 4 shows the peptide map of BSA. The peptide map of BSA (other protein maps showed similar peptide pattern except for cytochrome C, see Supplementary Data) reveals that after 5 min of trypsin digestion more peptides were detected, covering more of the amino acid sequence, compared to after an 18-hour treatment time. The peptide map of BSA has areas where sequence coverage is lost (see Fig. 4; underlined, italicized amino acids) when incubating 18 h compared to a 5 min digestion. Sequence loss was also observed for the other proteins except for cytochrome C. These areas of sequence loss are covered by short tryptic peptides when the sequence is completely

www.proteomics-journal.com

Proteomics 2014, 14, 2031–2041

Figure 3. Number of generated peptides after trypsin digestion of BSA. The figure displays the number of peptides generated by trypsin digestion of BSA at two different time points (Fig. 3A), and at two different trypsin-to-protein ratios (Fig. 3B), when allowing up to three missed cleavages in the search database. The percent total coverage of BSA at the specified time point or trypsin-toprotein ratio is displayed within the parentheses. The number of theoretical peptides was obtained by counting only fully tryptic peptides (peptides with 0 missed cleavages within the sequence) from an in silico trypsin digestion of BSA. The results were based on duplicate experiments MS analysed twice.

digested. Our peptide maps also confirm the results by the work of Swaney et al., they found the majority (97%) of identified peptides in a yeast proteome digest to be of a length of at least seven amino acids [10]. As (Fig. 4) shows there are no peptides of six amino acids or less detected in serum albumin. ␤-Lactoglobulin and glycogen phosphorylase were the only proteins in our 9-protein mixture that there were detected with high confidence peptides of length six amino acids or less (Supplementary Data). It has been reported that the use of high percentage organic buffers leads to a higher protein amino acid sequence coverage and peptide yields due to better denaturation of substrate proteins but that trypsin activity is unaffected [27]. Based on our work (data not shown), and a recent paper showing trypsin is in fact less active in as little as 30% acetonitrile buffers [28], the reported improved trypsin digestion in organic buffers might be caused by less trypsin efficiency leading to a higher amino acid coverage of proteins due to prevention of generation of short nondetectable peptides. The observation that cytochrome C did not show the same peptide loss after prolonged digestion as the other proteins in the protein mixture was initially thought to be caused by  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

2037 that it is a low molecular weight protein and easy to digest. After 5 min of digestion sequence coverage of the protein may already have been lost due to generation of too short peptides (See Supplementary Data for the peptide map of cytochrome C). However, performing a one minute digest (data not shown) did not lead to an increase in amino acid coverage of the protein rather it lead to a decline. In contradiction, the protein seems resistant to trypsination as the ion intensities of a few selected cytochrome C peptides increases six times when trypsin digestion time is extended from 5 min to 18 h. None of the selected peptides from the other eight proteins shows such a gain in intensity (see Fig. 5A and Supplementary Figures). In conclusion, the peptide maps of several proteins of different molecular weights in a protein mixture strengthen the assumption that complete digestion of certain proteins leads to loss of sequence coverage because nondetectable peptides are generated. It is important to emphasize that this effect is not affected by the trypsin quality as we are only considering parameters that assume a highly specific trypsin (only tryptic petides are considered). In addition, as can be observed from the peptide maps generated, any part of the protein sequence is covered by more distinct peptides after a short digestion time compared to a prolonged digestion time. More peptides covering any part of the sequence will lead to more confident protein search hits. Finally, for five of the nine proteins there seems to be loss of coverage regardless of the trypsin treatment time (see peptide maps of carbonic anhydrase, ovalbumin, catalase, transferrin, and glycogen phosphorylase, Supplementary Data). The uncovered areas are within part of the sequence with few tryptic cleavage sites and the peptides generated from these areas might be too long for LC-MS detection.

3.6 Ion intensities of selected BSA peptides after an accelerated and a conventional trypsin digestion The mapping of tryptic peptides generated after 5 min and 18 h of trypsin digestion in (Fig. 4) lead to reduced number of identified peptides assigned to the amino acid sequence and a decline in sequence coverage. A 5-minute digestion will therefore be advantageous in discovery proteomics as higher sequence coverage leads higher confident protein hit in the search database. However in quantitative proteomic studies the relative or absolute abundance of peptides are to be reproducible determined. We assumed that longer incubations (closer to complete proteolysis) would lead to higher peptide ion intensities of fully tryptic peptides as well as more reproducible peptide ion intensities than shorter incubation times. Figure 5 and Supplementary Data display ion intensities obtained by manually XICs of eight BSA fully tryptic peptides (Fig. 5A), six BSA peptides with additional cleavage sites within their sequence (Fig. 5B) and different numbers of fully tryptic peptides from eight of the other proteins in the www.proteomics-journal.com

2038

S. Hildonen et al.

Proteomics 2014, 14, 2031–2041

Figure 4. Mapping of BSA peptides generated after an accelerated and a conventional trypsin digestion. Peptide map constructed of high confidence BSA peptides (in green) with up to three missed cleavages generated after 5 min and 18 h of trypsin digestion. The amino acid sequence of BSA is in gray and trypsin cleavage sites are in bold black. The numbers in italics at the end of each line of sequence is the amino acid number (in the total sequence of BSA) of the first and last amino acid of that line. Peptides generated after a 5-minute digest are displayed above the sequence and peptides generated after 18 h of digestion displayed below the sequence. Underlined italicized amino acids are residues not covered by peptides after 18 h of digestion. Peptides in olive green are peptides selected for determination of ion abundances (see Section 3.6 and Fig. 5). The results were based on triplicate experiments MS analysed twice.

protein mixture (see Supplementary Data). The peptides were selected from triplicate samples, that had been MS-analyzed twice in random order and the ion intensities of each peptide were averaged. If possible, peptides from the N-terminal, Cterminal, and middle regions of the protein sequence were chosen to monitor the ion intensities (their exact position in the amino acid sequence of BSA can be deduced from Fig. 4). These peptides were doubly or triply charged without any methionine residues (that can be in both oxidized and nonoxidized forms; exception was for Cytochrome C were a methionine containing peptide was included), and identified both after a 5-minute and an 18-hour digestion. The allowed mass deviation in the construction of the XICs of selected peptides was ± 0.01 Da. Cysteine containing peptides were included since it is expected that they are completely reduced and alkylated during the sample pretreatment. The relative standard deviation of the experimental parallels for each peptide is shown in the figure. The incomplete digested BSA peptides showed as expected a decline in ion intensities (five out of six peptides) when the digestion time were increased from 5 min to 18 h (Fig. 5B). Unexpectedly, four of the fully tryptic BSA peptides also showed a drop in ion intensities when increasing the digestion time 200-fold (Fig. 5A). Five of the eight other proteins in the mixture showed similar tendencies as observed with BSA for some peptides an increase and an for others a decrease in ion intensities of the selected peptides, three of the proteins showed only an increase of ion intensities upon extending trypsin digestion time from 5 min to  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

18 h (see Supplementary Data). In overall, the ion intensities of both fully tryptic peptides and miscleaved peptides were more reproducible (measured as % RSD) after 5 min of trypsin treatment compared to a 18 h digest for the majority of the peptides (nine out of 14 peptides) (Fig. 5A and B). It is believed that in general miscleaved peptides will skew results for absolute quantitation. However, a work by Jacuiond et al. showed that even miscleaved peptides can be used for quantitative proteomic analysis [29]. Our results confirm that the reproducibility for incompletely digested peptides is close to what is obtained for fully tryptic peptides when considering a 5-minute tryptic digest. Our results further confirm that, by the fairly good reproducibility of miscleaved peptides obtained in the accelerated trypsin protocol, that trypsin has a preference for certain cleavage sites upon others and it is not a randomized process [30]. The reproducibility of a 5-minute digestion can be further improved by automated system as it is more challenging to manually control the exact amount of digestion time of a few minutes. It is also shown that variation in instrument performance can contribute to as much as 17% median CV in LC-MS peptide ion abundances [31]. Prolonged trypsin digestion is known to induce slowrate chemical modifications of peptides [32, 33] or low-rate unspecific enzyme activity [34], resulting in peptides to be unrecognized by the search database if these artifacts are not accounted for in the search parameters. As mentioned earlier we choose to use Tris buffer as digestion buffer as there are reported reduced chemical modifications, by www.proteomics-journal.com

Proteomics 2014, 14, 2031–2041

2039

Figure 5. Effect of prolonged trypsin digestion on ion intensities of selected BSA peptides. (A) displays the ion intensities of eight-selected BSA fully tryptic peptides based on XIC of monoisotopic doubly charged masses after 5 min of digestion and 18 h of digestion. (B) displays the ion intensities of six selected BSA peptides with additional cleavage sites based on XIC of monoisotopic triply charged masses after 5 min of digestion and 18 h of digestion. The tables displays the percentage RSD of the selected peptides based on triplicate experiments, LC-MS analyzed twice at the two time points. The letter code of cysteine in small in the table denotes it is modified.

this buffer compared to ammonium bicarbonate, such as deamidation [25]. We manually inspected a few of the mass spectra for deamidated peptides without detecting the corresponding masses. We choose not to look for other modifications as the main objective of this work was not to map modifications. Manually inspection of the mass spectra for low rate chymotrypsin activity suggests that this might account for loss of the ion intensities for some of the selected peptides. Chymotrypsin has a cleavage preference for hydrophobic residues, such as tyrosine, phenylalanine, and tryptophan. Chymotryptic peptides were extracted from the mass spectra from samples of lengthy digestion time and at increased trypsin to protein ratio (data not shown). This chymotryptic activity in a TCPK-treated trypsin product might be explained by that an autolysis product of trypsin, pseudotrypsin, has  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

chymotrypsin-like activity that is resistant to TCPK-treatment [34]. In a recent study on different trypsin quality products all of the tested products showed chymotrypsin activity [22]. Care has to be taken in quantitative proteomic studies when selecting peptides to monitor as prolonged digestion might lead to a decline in peptide ion abundances as well as reduced reproducibility due to slow-rate nontryptic cleavage and/or chemical modifications of peptides. The former will be affected by the trypsin quality as it is shown that a higher grade trypsin do have less nontryptic activity [22]. In summary, and critical for quantitative studies, ion intensities were found to be substantially less reproducible for nine of 14 selected BSA peptides after increasing the digestion time from 5 min to 18 h. This was in addition to the profound decline that was seen in ion intensities of four out of eight fully tryptic peptides at increased digestion time.

www.proteomics-journal.com

2040

4

S. Hildonen et al.

Concluding remarks

There have been developed many different strategies and methods to optimize trypsin digestion in terms of speed and efficiency. The optimal digestion time for a certain sample has to be determined experimentally as it will be affected both qualitatively and quantitatively by the specific trypsin, the applied method, the sample complexity as well as the sample composition. However, it is a fair assumption, based on the work in this paper that less trypsin treatment of proteins in proteomic studies time leads to more reproducible LC-MS information. This was found to be caused by generation of a majority of larger tryptic peptides on the expense of fully tryptic peptides. Less treatment time will also reduce the effect of slow-rate nontryptic and buffer processes observed in lengthy digestion protocols. The authors have declared no conflict of interest.

5

References

[1] Frank, R., Hargreaves, R., Clinical biomarkers in drug discovery and development. Nat. Rev. 2003, 2, 566–580. [2] He, Q. Y., Chiu, J. F., Proteomics in biomarker discovery and drug development. J. Cell. Biochem. 2003, 89, 868–886. [3] Petricoin, E., Wulfkuhle, J., Espina, V., Liotta, L. A., Clinical proteomics: revolutionizing disease detection and patient tailoring therapy. J. Proteome Res. 2004, 3, 209–217. [4] Mann, M., Hendrickson, R. C., Pandey, A., Analysis of proteins and proteomes by mass spectrometry. Ann. Rev. Biochem. 2001, 70, 437–473. [5] Yates, J. R., Ruse, C. I., Nakorchevsky, A., Proteomics by mass spectrometry: approaches, advances, and applications. Ann. Rev. Biomed. Eng. 2009, 11, 49–79. [6] Zhang, Y., Fonslow, B. R., Shan, B., Baek, M. C., Yates, J. R., 3rd, Protein Analysis by shotgun/bottom-up proteomics. Chem. Rev. 2013. [7] Lee, T. D., Shively, J. E., Enzymatic and chemical digestion of proteins for mass spectrometry. Methods Enzymol. 1990, 193, 361–374.

Proteomics 2014, 14, 2031–2041 [12] Hustoft, H. K., Reubsaet, L., Greibrokk, T., Lundanes, E., Malerod, H., Critical assessment of accelerating trypsination methods. J. Pharm. Biomed. Anal. 2011, 56, 1069–1078. [13] Chait, B. T., Chemistry. Mass spectrometry: bottom-up or top-down? Science 2006, 314, 65–66. [14] Calligaris, D., Villard, C., Lafitte, D., Advances in top-down proteomics for disease biomarker discovery. J. Proteomics 2011, 74, 920–934. [15] Wu, S. L., Huhmer, A. F., Hao, Z., Karger, B. L., On-line LC-MS approach combining collision-induced dissociation (CID), electron-transfer dissociation (ETD), and CID of an isolated charge-reduced species for the trace-level characterization of proteins with post-translational modifications. J. Proteome Res. 2007, 6, 4230–4244. [16] Tran, J. C., Zamdborg, L., Ahlf, D. R., Lee, J. E. et al., Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 2011, 480, 254–258. [17] Peng, Y., Chen, X., Zhang, H., Xu, Q. et al., Top-down targeted proteomics for deep sequencing of tropomyosin isoforms. J. Proteome Res. 2013, 12, 187–198. [18] Meyer, B., Papasotiriou, D. G., Karas, M., 100% protein sequence coverage: a modern form of surrealism in proteomics. Amino Acids 2011, 41, 291–310. [19] Wu, C., Tran, J. C., Zamdborg, L., Durbin, K. R. et al., A protease for ‘middle-down’ proteomics. Nat. Methods 2012, 9, 822–824. [20] Boyne, M. T., Garcia, B. A., Li, M., Zamdborg, L. et al., Tandem mass spectrometry with ultrahigh mass accuracy clarifies peptide identification by database retrieval. J. Proteome Res. 2009, 8, 374–379. [21] Wu, S. L., Kim, J., Hancock, W. S., Karger, B., Extended range proteomic analysis (ERPA): a new and sensitive LC-MS platform for high sequence coverage of complex proteins with extensive post-translational modifications-comprehensive analysis of beta-casein and epidermal growth factor receptor (EGFR). J. Proteome Res. 2005, 4, 1155–1170. [22] Burkhart, J. M., Schumbrutzki, C., Wortelkamp, S., Sickmann, A., Zahedi, R. P., Systematic and quantitative comparison of digest efficiency and specificity reveals the impact of trypsin quality on MS-based proteomics. J. Proteomics 2012, 75, 1454–1462. [23] Bunkenborg, J., Espadas, G., Molina, H., Cutting edge proteomics: benchmarking of six commercial trypsins. J. Proteome Res. 2013, 12, 3631–3641.

[8] Olsen, J. V., Ong, S. E., Mann, M., Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteomics 2004, 3, 608–614.

[24] Walmsley, S., Rudnick, P. A., Liang, Y., Dong, Q. et al., Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics. J. Proteome Res. 2013.

[9] Vandermarliere, E., Mueller, M., Martens, L., Getting intimate with trypsin, the leading protease in proteomics. Mass Spectrom. Rev. 2013, 32, 453–465.

[25] Ren, D., Pipes, G. D., Liu, D., Shih, L. Y. et al., An improved trypsin digestion method minimizes digestion-induced modifications on proteins. Anal. Biochem. 2009, 392, 12–21.

[10] Swaney, D. L., Wenger, C. D., Coon, J. J., Value of using multiple proteases for large-scale mass spectrometry-based proteomics. J. Proteome Res. 2010, 9, 1323–1329.

[26] Brocchieri, L., Karlin, S., Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res. 2005, 33, 3390–3400.

[11] Switzar, L., Giera, M., Niessen, W. M., Protein digestion: an overview of the available techniques and recent developments. J. Proteome Res. 2013, 12, 1067–1077.

[27] Chen, E. I., Cociorva, D., Norris, J. L., Yates, J. R., 3rd, Optimization of mass spectrometry-compatible surfactants for shotgun proteomics. J. Proteome Res. 2007, 6, 2529–2538.

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.proteomics-journal.com

Proteomics 2014, 14, 2031–2041 [28] Crowell, A. M., Stewart, E. J., Take, Z. S., Doucette, A. A., Critical assessment of the spectroscopic activity assay for monitoring trypsin activity in organic-aqueous solvent. Anal. Biochem. 2013, 435, 131–136. [29] Jaquinod, M., Trauchessec, M., Huillet, C., Louwagie, M. et al., Mass spectrometry-based absolute protein quantification: PSAQ strategy makes use of “noncanonical” proteotypic peptides. Proteomics 2012, 12, 1217–1221. [30] Siepen, J. A., Keevil, E. J., Knight, D., Hubbard, S. J., Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics. J. Proteome Res. 2007, 6, 399– 408. [31] Piehowski, P. D., Petyuk, V., Orton, D. J., Xie, F. et al., Sources

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

2041 of technical variability in quantitative LC-MS proteomics: human brain tissue sample analysis. J. Proteome Res. 2013. [32] Geoghegan, K. F., Hoth, L. R., Tan, D. H., Borzilleri, K. A. et al., Cyclization of N-terminal S-carbamoylmethylcysteine causing loss of 17 Da from peptides and extra peaks in peptide maps. J. Proteome Res. 2002, 1, 181–187. [33] Krokhin, O. V., Antonovici, M., Ens, W., Wilkins, J. A., Standing, K. G., Deamidation of -Asn-Gly- sequences during sample preparation for proteomics: consequences for MALDI and HPLC-MALDI analysis. Anal. Chem. 2006, 78, 6645–6650. [34] Keil-Dlouha, V. V., Zylber, N., Imhoff, J., Tong, N., Keil, B., Proteolytic activity of pseudotrypsin. FEBS Lett. 1971, 16, 291–295.

www.proteomics-journal.com

Why less is more when generating tryptic peptides in bottom-up proteomics.

Proteolytic digestion is a time consuming and critical step in bottom-up proteomic analysis. The most widely used protease, trypsin, has high specific...
1019KB Sizes 4 Downloads 4 Views