2410

DOI 10.1002/pmic.201400268

Proteomics 2014, 14, 2410–2416

RESEARCH ARTICLE

Identification of phosphopeptides with unknown cleavage specificity by a de novo sequencing assisted database search strategy Mingming Dong1,2 , Mingliang Ye1∗ , Kai Cheng1,2 , Jing Dong1 , Jun Zhu1,2 , Hongqiang Qin1,2 , Yangyang Bian1,2 and Hanfa Zou1 1

Key Laboratory of Separation Sciences for Analytical Chemistry, National Chromatographic Research and Analysis Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China 2 University of Chinese Academy of Sciences, Beijing, China

In theory, proteases with broad cleavage specificity could be applied to digest protein samples to improve the phosphoproteomic analysis coverage. However, in practice this approach is seldom employed. This is because the identification of phosphopeptides without enzyme specificity by conventional database search strategy is extremely difficult due to the huge search space. In this study, we investigated the performance of a de novo sequencing assisted database search strategy for the identification of such phosphopeptides. Firstly, we compared the performance of conventional database search strategy and the de novo sequencing assisted database search strategy for the identification of peptides and phosphopeptides without stetting enzyme specificity. It was found that the identification sensitivity dropped significantly for the conventional one while it was only slightly decreased for the new approach. Then, this new search strategy was applied to identify phosphopeptides generated by Proteinase K digestion, which resulted in the identification of 717 phosphopeptides. Finally, this strategy was utilized for the identification of serum endogenous phosphopeptides, which were generated in vivo by different kinds of proteases and kinases, and the identification of 68 unique serum endogenous phosphopepitdes was successfully achieved.

Received: June 11, 2014 Revised: June 11, 2014 Accepted: August 13, 2014

Keywords: De novo sequencing / Endogenous phosphopeptide / Phosphoproteomics / Phosphopeptide identification / Technology

 1

Additional supporting information may be found in the online version of this article at the publisher’s web-site

Introduction

Large-scale phosphoproteome analysis can identify more than 10 000 phosphorylation sites per sample nowadays [1]. This is mainly attributed to the excellent performance of using database search strategy for the identification of phosphoCorrespondence: Professor Hanfa Zou, Laboratory of Separation Sciences for Analytical Chemistry, National Chromatographic Research and Analysis Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China E-mail: [email protected] Fax: 86-411-84379620 Abbreviations: FDR, false discovery rate; LTQ orbitrap, linear ion trap Orbitrap; Ti-IMAC, immobilized titanium ion affinity chromatography  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

peptides generated by trypsin. In addition to trypsin, other proteases with high cleavage specificity like Glu-C etc. are also used to digest proteome samples for phosphoproteome analysis [2]. While it should be mentioned that some phosphopeptides generated by proteases with high specificity may be too big or too small to be detected by MS, thereby the corresponding phosphorylation sites cannot be identified [3]. Proteases with broad cleavage specificity has been applied to comprehensively mapping the phosphorylation sites in simple protein mixtures for a long time [3, 4]. However, its application in complex phosphoproteome samples is rare. This is because the identification of phosphopeptides with no ∗ Additional corresponding author: Professor Mingliang Ye, E-mail: [email protected] Colour Online: See the article online to view Figs. 1–5 in colour.

www.proteomics-journal.com

2411

Proteomics 2014, 14, 2410–2416

cleavage specificity is extremely challenging [5]. The identification of peptides by conventional database search strategy requires matching the experimental precursor mass to the mass of candidate peptides generated by protein sequence database. The number of candidate peptides increased significantly when no enzyme and variable phosphorylation modifications are set. The explosion in search space makes the database search extremely insensitive or even impractical. Peptidomics is an emerging research field on studying of endogenous peptides in biological system [6, 7], and up to 1000 unmodified endogenous peptides can be identified from serum or tissue by the high throughput proteomics approach [8–10]. In addition to unmodified endogenous peptides, endogenous peptides with PTMs like endogenous phosphopeptides are also present in the peptidome sample, and they are derived from the combined activity of different enzymes. So the identification of them has the same problem with phosphopeptides generated by protease with broad cleavage specificity. An easy way to tackle the big search space issue is to reduce the database size by removing the redundant protein sequences. Recently, a focused database search strategy using an in-house collected human serum propeptidome database was applied to the identification of endogenous phosphopeptides [11]. However, the building of low redundant database for endogenous phosphopeptide is often difficult. Thus, it is desirable to find a way to identify these phosphopeptides directly using traditional proteome database. There are two types of peptide identification strategies. One is database dependent, i.e. the database search approach. The other one is database independent, i.e. de novo sequencing. De novo sequencing derives the peptide sequence directly from the MS/MS spectrum, so it does not have the problem of big search space issue. However, de novo sequencing is not as sensitive as database search approach. Recently, these two database search strategies have been combined together to improve the peptide identification sensitivity [12, 13]. Because the sequence tags identified by the de novo sequencing could be used to select the candidate peptides, thus the number of peptide hypotheses to be scored significantly reduced in this hybrid database search strategy. It was demonstrated that this de novo-assisted database search approach offered significant advances in database search with large search space [13]. In this study, this new strategy was applied to the identification of phosphopeptides with unknown cleavage specificity. It was found that such phosphopeptides could be identified efficiently with high sensitivity.

2

Materials and methods

K were all obtained from Sigma Aldrich (St. Louis, MO, USA). EDTA, EGTA, and PMSF were purchased from Amresco (Solon, OH, USA). Formic acid was bought from Fluka (Buches, Germany), and acetonitrile (ACN, HPLC grade) was purchased from Merck (Darmstadt, Germany). Fused-silica capillaries with 200 ␮m id and 75 ␮m id were purchased from Polymicro Technologies (Phoenix, AZ, USA). Daisogel ODS-AQ (5 ␮m, 12 nm pore) was purchased from DAISO Chemical CO., Ltd. (Osaka, Japan).

2.2 Sample preparation The human hepatoma BEL-7402 cells were grown in RPMI1640, supplemented with 10% bovine serum, 100 U/mL of streptomycin and penicillin. The cells were harvested at about 80% density. The protein extraction procedures were the same as our previous work [14]. Details about protein extraction and protein digestion can be found in the supporting information. The human serum from healthy volunteers (n = 16) used in this study was obtained from the Second Affiliated Hospital of Dalian Medical University (Dalian, China), and the utilization of it complied with guideline of Ethics Committee of the Hospital. The human serum was stored at −80 ⬚C until further use. For the removal of highly abundant proteins in human serum, organic solvent precipitation method was used. Briefly, 500 ␮L of human serum was diluted with 30% ACN/25 mM NH4 HCO3 to a volume of 2 mL, then 5 mL precooled acetonitrile was added and the resulting mixture was placed at –20⬚C for 2 h. The sample was centrifuged at 10 000g for 1 min, the supernatant was collected for next step phosphopeptide enrichment with 10 mg Ti-IMAC (immobilized titanium ion affinity chromatography) beads.

2.3 Phosphopeptide enrichment The enrichment of phosphopeptides was performed as described previously [15]. Briefly, peptide mixtures were first incubated with Ti-IMAC beads at a ratio of 1:10 w/w in loading buffer (80% ACN and 6% TFA). After centrifugation, the supernatant was removed. The Ti-IMAC beads with adsorbed phosphopeptides were then washed in turn by two washing buffers (50% ACN, 6% TFA containing 200 mM NaCl as washing buffer 1, 30% ACN containing 0.1% TFA as washing buffer 2) to remove nonspecific adsorbed peptides. The bound phosphopeptides were then eluted by 10% NH3 ·H2 O. After centrifugation at 20 000 g for 5 min, the supernatant was collected and lyophilized to dryness.

2.1 Materials and chemical reagents All water used in this experiment was prepared using a MilliQ system (Millipore, Bedford, MA). DTT, iodoacetamide, ammonium bicarbonate (NH4 HCO3 ), TFA, sodium orthovanadate (Na3 VO4 ), sodium fluoride (NaF), trypsin, and proteinase  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

2.4 Nano LC-MS/MS analysis Two different mass spectrometers TripleTOF 5600 system (AB Sciex, Canada) and linear ion trap (LTQ) Orbitrap Velos www.proteomics-journal.com

2412

M. Dong et al.

Proteomics 2014, 14, 2410–2416

(Thermo, San Jose, CA) were used for data acquisition. Details about Nano LC-MS/MS analysis and general MS acquisition parameters are provided in the supporting information.

2.5 Data analysis The raw files collected from TripleTOF 5600 were processed by ProteinPilot Software v. 4.0 (AB SCIEX, Foster City, CA) utilizing the Paragon Algorithms. The database employed was IPI human 3.80. Three different search modes, where enzyme was set to trypsin in thorough mode, enzyme was set to none in thorough mode (Taglet/none), enzyme was set to trypsin in rapid mode (Fraglet/trypsin), were used when searching with ProteinPilot. And phosphopeptide emphasis was chosen when processing enriched phosphopeptides. Only identified peptide with a false discovery rate (FDR) < 1% were considered for further analysis. For comparison, these raw files were also converted to .mgf format and searched by Mascot (version 2.3). Cysteine carbamidomethylation was set as the fixed modification and oxidation of methionine was set as the variable modification. Phosphorylation (S/T/Y) was also set as variable modification for phosphopeptides. The mass tolerance for the precursor ion and the fragment ion was set to 20 ppm and 0.1 Da, respectively. When trypsin specificity was adopted, up to two missed cleavage sites was allowed. Filtered peptides with FDR < 1% were used for further analysis. A score values were calculated for every phosphorylation site identified in our experiments [16]. The raw files collected from LTQ Orbitrap Velos were converted to .mgf file format using Proteome Discoverer 1.2 (Thermo Scientific), and then searched by ProteinPilot with the instrument set as “Orbi/FT MS (1–3 ppm), LTQ MS/MS.” The Mascot and the ProteinPilot were installed on two computers with the same configuration, both with two processors of Intel (R) Xeon (R) CPU X5650@ 2.67 GHz, a RAM of 16.0 GB.

3

Results and discussion

3.1 Investigation of the performance of the de novo sequencing assisted database search strategy The Paragon Algorithm in ProteinPilot is a new hybrid database search engine that has two core components [13]. The first component, referred to as Fraglet, is similar to the conventional database search strategy. It selects peptide hypotheses solely by matching the experimental precursor mass to the mass of the peptide hypothesis within a certain tolerance. The second component, referred to as Taglet, is the de novo sequencing assisted database search mode, it selects peptide hypotheses on the basis of sequence tags derived from the de novo sequencing of MS/MS spectra. These two search modes are invoked by different parameter setting during database search. Taglet can be run independently when  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Figure 1. Number of peptides identified by different search modes. (A) Tryptic peptide dataset acquired by TripleTOF 5600 (Bel-03). (B) Tryptic phosphopeptides dataset acquired by TripleTOF 5600 (Bel-phos-01). (C) Tryptic phosphopeptides dataset acquired by LTQ Orbitrap Velos (Bel-Phos-01-Velos). Mascot/trypsin: Mascot search with trypsin specificity; Mascot/none: Mascot search with no enzyme specificity; Fraglet/trypsin: Fraglet search with trypsin specificity; Taglet/none: Taglet search with no enzyme specificity.

enzyme specificity is set to none, or run as the initial search in conjunction with the Fraglet search. In conventional database search engine like Mascot, enzyme can be set as trypsin or set as none during search. We investigated the performance of the conventional database search engine Mascot and the Paragon Algorithm when different search parameters were set. We first investigated the performances of Mascot and Paragon Algorithm for the identification of peptides without modifications. Tryptic peptides from BEL-7402 cell lysate www.proteomics-journal.com

Proteomics 2014, 14, 2410–2416

were subjected to RPLC-MS/MS analysis by TripleTOF 5600. The sample was analyzed for three times and the acquired raw files were named as Bel-01, Bel-02, and Bel-03. Peptide identifications were achieved by searching these raw files with different search algorithms against the same protein sequence database. The FDR was controlled below 1% for confident identifications. Take Bel-03 dataset for example (Fig. 1A), the Mascot search with trypsin specificity and with no enzyme specificity leaded to the identification of 4596 and 3393 unique peptides (details were shown in Supporting Information Table 1), respectively. The number of identified peptides decreased 26% when no enzyme specificity was set. This was not surprising since the search space increased. For Paragon, the Fraglet search with trypsin specificity and the Taglet search with no enzyme specificity leaded to the identification of 3998 and 3783 peptides (details were shown in Supporting Information Table 1), respectively. The search with no enzyme specificity using Taglet only leaded to 5% decrease in the peptide identifications. As a de novo sequencing assisted database search mode, Taglet adopted the de novo identified sequence tags for the selection of candidate peptides that significantly reduced the search spaces and thus improved the identification sensitivity. For the other two datasets, the situation were the same, the search with no enzyme specificity by Mascot also leaded to over 20% decrease while that by Paragon only leaded to about 5% decrease (Supporting Information Fig. 1). These results indicated that the de novo-assisted approach was able to keep high identification sensitivity when enzyme specificity was set to none. The objective of this study is to find an effective search strategy to identify phosphopeptides with unknown enzyme cleavage specificity. Phosphopeptides enriched from the tryptic digest of BEL-7402 cell lysate were used to evaluate the performance of different search modes. Three raw data files (Bel-Phos-01, Bel-Phos-02, and Bel-Phos-03) were acquired by TripleTOF 5600. For Mascot search, besides carbamidomethyl (C) was set as fixed modification and oxidation (M) was set as variable modification, three more variable modifications on S/T/Y residues were set. This would significantly increase the search space. Identification of phosphopeptides by Mascot with trypsin specificity is not a challenge. The searching for the three raw files resulted in the identification of 1596, 1653, 1554 unique phosphopeptides (details were shown in Supporting Information Table 2), respectively. However, when the same three raw data files were searched by Mascot with enzyme set to none, serious problem was encountered. The search became very slow and often crashed. Though we tried many times, only the search with Bel-Phos01 was completed and it took approximately 97 h. For this raw file, the search by Mascot with trypsin specificity resulted in the identification of 1596 unique phosphopeptides, while the search with no enzyme specificity only leaded to the identification of 673 unique phosphopeptides (Fig. 1B). The number decreased drastically (58%) indicating this approach-lacked sensitivity when the search space increased. For Paragon, Fraglet with trypsin specificity and Taglet with no enzyme  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

2413

Figure 2. (A) Overlap of fully tryptic peptides identified from dataset Bel-03 by different search modes. (B) Overlap of fully tryptic phosphopeptides identified from dataset Bel-Phos-02 by different search modes. Mascot/trypsin: Mascot search with trypsin specificity; Fraglet/trypsin: Fraglet search with trypsin specificity; Taglet/none: Taglet search with no enzyme specificity.

specificity were used to search these raw data files and all the searches were completed in less than 4 h. The number of identified unique phosphopeptides was given in Fig. 1B. To our expectation, the number of identified phosphopeptides did not decrease seriously. For example, the numbers of unique phosphopeptides identified by Fraglet search with trypsin specificity and by Taglet search with no enzyme specificity for Bel-Phos-1 were 1507 and 1235, respectively. Compared with that 58% decrease in Mascot search, the number decreased only 18%. Since the Taglet search strategy can identify peptide without setting enzyme specificity at relative high sensitivity, one concern is whether the identified peptides are confident. The tryptic peptides could be identified by Mascot or Fraglet with trypsin specificity at high sensitivity, therefore we compared if the fully tryptic peptides identified by Taglet could also be identified by the above two modes. For the nonphosphopeptide datasets, i.e. Bel-01, Bel-02, and Bel-03, it was found that 96.6%, 97.3%, 97.0% of fully tryptic peptides identified by Taglet can also be identified by the above two search modes (Supporting Information Fig. 2 and Fig. 2). For the www.proteomics-journal.com

2414

M. Dong et al.

phosphopeptide datasets, i.e. Bel-phos-01, Bel-phos-02, and Bel-phos-03, it was found that 77.1%, 85.5%, 84.0% of fully tryptic phosphopeptides could also be identified by above two search modes. Majority of these fully tryptic peptides could also be identified by the above two search modes indicating high confidence of the search results obtained by Taglet. Above results clearly indicated that the de novo sequencing assisted database search strategy was able to efficiently and sensitively identify phosphopeptides with unknown cleavage specificity. As we all know, the mass accuracy of fragment ions is important for de novo sequencing. The raw data files used above were acquired by TripleTOF 5600, which in fact was a Q-TOF Mass spectrometer. Therefore,both precursor ions and fragment ions were recorded in high mass accuracy. For the datasets acquired by LTQ Orbitrap Velos in this work, only the precursor ions were detected by orbitrap and recorded in high mass accuracy, while the fragment ions were detected by LTQ with much lower mass accuracy. It is curious whether the de novo sequencing assisted database search also works well for such datasets collected from LTQ Orbitrap Velos. For this purpose, the same tryptic phosphopeptides were analyzed by LTQ Orbitrap Velos, and three raw data files (Bel-Phos-01-Velos, Bel-Phos-02-Velos, and Bel-Phos-03-Velos), were obtained. The obtained raw data files were then searched by Fraglet with trypsin specificity and by Taglet without enzyme specificity (details about the identified peptides were shown in Supporting Information Table 3), respectively. It was found that the number of identified unique phosphopeptides decreased significantly (Fig. 1C and Supporting Information Fig. 3) from 486 to 79, with an 84% decrease. This means that the de novo sequencing assisted database search is better fitted to search MS dataset with both precursor ions and fragment ions are recorded in high mass accuracy.

3.2 Identification of proteinase K generated phosphopeptides Digestion by proteinase K, a robust nonspecific protease, is useful for improving the sequence coverage in protein identification. Here, we evaluate if the de novo-assisted database search strategy can identify phosphopeptides derived from proteinase K digestion. Briefly, the BEL-7402 cell lysate was digested by proteinase K followed by phosphopeptide enrichment with Ti-IMAC. The enriched phosphopeptide were then analyzed in triplex by LC-MS/MS using TripleTOF 5600. The acquired raw data files were combined together and searched by Taglet. In total, 717 unique phosphopeptides with 747 unique phosphorylation sites were identified (the identification results were shown in Supporting Information Table 4). For comparison, three raw files of tryptic phosphopeptides from the same sample were searched by Paragon in hybrid mode, i.e. Fraglet with trypsin specificity plus Taglet with no enzyme specificity, 2855 unique phosphopeptides with  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics 2014, 14, 2410–2416

Figure 3. Overlap of phosphorylation sites identified from trypsin generated phosphopeptides and proteinase K generated phosphopeptides.

3499 phosphorylation sites were identified in all (details were shown in Supporting Information Table 4). Figure 3 shows the overlap of the identified phosphosites by the above two methods. Though much more phosphosites could be identified from tryptic digest, there were still 50% of phosphosites identified from Proteinase K dataset cannot be identified from the trypsin dataset. This was not surprising since some tryptic phosphopeptides could be too small or too big to be identified. For example, the two phosphosites (T566 and S570) on Microtubule-associated protein 6 (IPI00964662.1) were identified by a phosphopeptide (VVRTpPPKSpPSSAK, MW 1512.69) generated by Proteinase K digestion. However, the corresponding two tryptic peptides with these phosphosites (TpPPK, MW 539.5; SpPATPK, MW 697.7) were too small to be identified by MS. Then we compared the cleavage specificity of the two proteases, i.e. trypsin and proteinase K, based on the identified phosphopeptides. Each phosphopeptide generated from its parent protein by two cleavage events. The amino acid residues bracket the cleavage sites could be referred from their protein sequences based on the two termini of the identified phosphopeptides. Figure 4 shows the distribution of the amino acid residues bracket the cleavage sites when utilizing different proteases for protein digestion. In contrast to trypsin, which cleaves peptide chains mainly at the carboxyl side of the amino acids lysine or arginine, proteinase K cuts bonds after almost all types of amino acid residues. Thus, it is impossible to specify the enzyme specificity during database search. Due to the huge search space, the identification of such phosphopeptides by Mascot was inefficient, while the search of the three raw files by Taglet with no enzyme specificity was successfully completed in 9 h. To this end, this new search strategy can also be applied to identify phosphopeptides generated by other low-specificity proteases like pepsin, thermolysin, etc., to realize more comprehensive phosphorylation site mapping in phosphoproteome studies. 3.3 Identification of endogenous phosphopeptides from serum Many of the endogenous peptides contained in human serum are believed to be fragments of larger proteins that have www.proteomics-journal.com

Proteomics 2014, 14, 2410–2416

Figure 4. Distribution of the amino acid residues bracket the cleavage sites (A) tyrpsin digestion; (B) proteinase K digestion; (C) endogenous serum phosphopeptides. –1 and 1 represent the residue before and after the cleaved peptide bond, respectively.

been at least partially degraded by various enzymes. With regard to identify endogenous peptides from serum, removal of highly abundant proteins is crucial. In this work, we used a simple acetonitrile precipitation method for the removal of highly abundant proteins in human serum. After centrifuge at 10 000 g for 1 min, the supernatant was collected and subjected to phosphopeptide enrichment with 10 mg Ti-

2415 IMAC beads. Phosphopeptides enriched from 50 ␮L initial human serum were analyzed by TripleTOF 5600 for three times. The raw data files were combined together and searched by Taglet with nonenzyme specificity. After controlling the peptide FDR < 1% using the integrated tools in ProteinPilot, 82 unique endogenous phosphopeptides with 63 phosphorylation sites were identified at high confidence. In literature, the number of unique endogenous phosphopeptides identified from serum by one-dimensional LC–MS/MS analyze did not exceeded 20 [11]. Much more endogenous phosphopeptides identified in this study may be attribute to two main factors: (i) the de novo-assisted database search strategy allowed the sensitive identification without significantly increasing the number of peptide hypothesis; (ii) both the precursor ions and fragment ions were recorded in high mass accuracy, which facilitated the de novo-assisted database search process. The generation of endogenous serum phosphopeptides was due to two PTM processes, i.e. proteolysis and phosphorylation. Proteolytic processing is a major form of PTMs that occurs when a protease cleaves the bonds in a target protein to modify its activity. Many cellular processes are triggered by proteolytic processing. For example, coagulation or clotting is a proteolysis driven process, in which a cascade of proteases are involved, it can lead to the formation of solid fibrin/platelet plugs. It was reported that the cleavage pattern of serum endogenous peptides reflected the protease activity in serum, and may serve to the early diagnosis and classification of cancers [9, 17]. To explore the cleavage pattern of endogenous phosphopetides identified in this study, the two residues surrounding the cleavage sites of these phosphopetides were also investigated and shown in Fig. 4. Similar to that of proteinase K generated phosphopeptides, the cleavage sites did not show high specificity, indicating these phosphopeptides were generated by multiple proteases with different specificity. Another PTM process was phosphorylation that was catalyzed by kinases. The amino acid residues surrounding the phosphosites on the identified endogenous serum phosphopeptides were also investigated (Fig. 5). The position n + 1 showed an obvious selectivity for proline residue, indicating Pro-directed kinases contribute to the phosphorylation of these sites. The high frequency of acidic residues around the phosphosites indicated that acidophilic kinases also contributed to the phosphorylation of these sites. Different kinases may have various activities under different physiological states. Similar to cleavage pattern reflecting the protease

Figure 5. Distribution of the amino acid residues surrounding the phosphosites identified from endogenous serum phosphopeptides.

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.proteomics-journal.com

2416

M. Dong et al.

activity, the phosphorylation pattern of the endogenous phosphopeptides also reflected the kinase activity. The identification of the endogenous serum phosphopeptides may provide important information for biomarker discovery.

Proteomics 2014, 14, 2410–2416 [5] Wang, B., Malik, R., Nigg, E. A., Korner, R., Evaluation of the low-specificity protease elastase for largescale phosphoproteome analysis. Anal. Chem. 2008, 80, 9526–9533. [6] Diamandis, E. P., Peptidomics for cancer diagnosis: present and future. J. Proteome Res. 2006, 5, 2079–2082.

4

Concluding remarks

In this study, we systematically investigated the use of the conventional database search strategy in comparison to the de novo-assisted database search strategy for the identification of phosphopeptides with unknown cleavage specificity. It was found that the later strategy was able to identify such phosphopeptides at higher speed and with higher sensitivity. This new search strategy was successfully applied to the identification of phosphopeptides generated by proteinase K digestion or endogenous phosphopeptides enriched from human serum. Clearly this new database search strategy could be a promising tool in phosphoproteomics analysis. This work was supported by the China State Key Basic Research Program Grant (2013CB911202, 2012CB910101, 2012CB910604), the Creative Research Group Project of NSFC (21321064), the National Natural Science Foundation of China (21275142,21235006, 81161120540, 81361128015), National Key Special Program on Infection diseases (2012ZX10002009– 011), Analytical Method Innovation Program of MOST (2012IM030900). The authors greatly appreciate Dr. Wenhai Jin and Lihai Guo from AB Sciex for the help about TripleTOF 5600. The authors have declared no conflict of interest.

5

References

[1] Engholm-Keller, K., Larsen, M. R., Technologies and challenges in large-scale phosphoproteomics. Proteomics 2013, 13, 910–931. [2] Bian, Y., Ye, M., Song, C., Cheng, K. et al., Improve the coverage for the analysis of phosphoproteome of HeLa cells by a tandem digestion approach. J. Proteome Res. 2012, 11, 2828–2837. [3] Han, G. H., Ye, M. L., Jiang, X. N., Chen, R. et al., Comprehensive and reliable phosphorylation site mapping of individual phosphoproteins by combination of multiple stage mass spectrometric analysis with a target-decoy database search. Anal. Chem. 2009, 81, 5794–5805. [4] Schlosser, A., Vanselow, J. T., Kramer, A., Mapping of phosphorylation sites by a multi-protease approach with specific phosphopeptide enrichment and nanoLC-MS/MS analysis. Anal. Chem. 2005, 77, 5243–5250.

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

[7] Soloviev, M., Finch, P., Peptidomics: bridging the gap between proteome and metabolome. Proteomics 2006, 6, 744–747. [8] Hu, L. H., Li, X., Jiang, X. N., Zhou, H. J. et al., Comprehensive peptidome analysis of mouse livers by size exclusion chromatography prefractionation and NanoLC-MS/MS identification. J. Proteome Res. 2007, 6, 801–808. [9] Villanueva, J., Shaffer, D. R., Philip, J., Chaparro, C. A. et al., Differential exoprotease activities confer tumorspecific serum peptidome patterns. J. Clin. Invest. 2006, 116, 271–284. [10] Qin, H., Gao, P., Wang, F., Zhao, L. et al., Highly efficient extraction of serum peptides by ordered mesoporous carbon. Angew. Chem. Int. Ed. 2011, 50, 12218–12221. [11] Zhu, J., Wang, F. J., Cheng, K., Song, C. X. et al., Analysis of human serum phosphopeptidome by a focused database searching strategy. J. Proteomics 2013, 78, 389–397. [12] Zhang, J., Xin, L., Shan, B. Z., Chen, W. W. et al., PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteomics 2012, 11, M111.010587. [13] Shilov, I. V., Seymour, S. L., Patel, A. A., Loboda, A. et al., The paragon algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol. Cell. Proteomics 2007, 6, 1638–1655. [14] Han, G. H., Ye, M. L., Zhou, H. J., Jiang, X. N. et al., Large-scale phosphoproteome analysis of human liver tissue by enrichment and fractionation of phosphopeptides with strong anion exchange chromatography. Proteomics 2008, 8, 1346–1361. [15] Zhou, H. J., Ye, M. L., Dong, J., Han, G. H. et al., Specific phosphopeptide enrichment with immobilized titanium ion affinity chromatography adsorbent for phosphoproteome analysis. J. Proteome Res. 2008, 7, 3957–3967. [16] Beausoleil, S. A., Villen, J., Gerber, S. A., Rush, J. et al., A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 2006, 24, 1285–1292. [17] Petricoin, E. F., Belluco, C., Araujo, R. P., Liotta, L. A., The blood peptidome: a higher dimension of information content for cancer biomarker discovery. Nat. Rev. Cancer 2006, 6, 961–967.

www.proteomics-journal.com

Identification of phosphopeptides with unknown cleavage specificity by a de novo sequencing assisted database search strategy.

In theory, proteases with broad cleavage specificity could be applied to digest protein samples to improve the phosphoproteomic analysis coverage. How...
570KB Sizes 0 Downloads 6 Views