Proteomics Clin. Appl. 2015, 9, 307–321

307

DOI 10.1002/prca.201400117

REVIEW

Using data-independent, high-resolution mass spectrometry in protein biomarker research: Perspectives and clinical applications Tatjana Sajic1 , Yansheng Liu1 and Ruedi Aebersold1,2 1 2

Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Faculty of Science, University of Zurich, Zurich, Switzerland

In medicine, there is an urgent need for protein biomarkers in a range of applications that includes diagnostics, disease stratification, and therapeutic decisions. One of the main technologies to address this need is MS, used for protein biomarker discovery and, increasingly, also for protein biomarker validation. Currently, data-dependent analysis (also referred to as shotgun proteomics) and targeted MS, exemplified by SRM, are the most frequently used mass spectrometric methods. Recently developed data-independent acquisition techniques combine the strength of shotgun and targeted proteomics, while avoiding some of the limitations of the respective methods. They provide high-throughput, accurate quantification, and reproducible measurements within a single experimental setup. Here, we describe and review data-independent acquisition strategies and their recent use in clinically oriented studies. In addition, we also provide a detailed guide for the implementation of SWATH-MS (where SWATH is sequential window acquisition of all theoretical mass spectra)—one of the data-independent strategies that have gained wide application of late.

Received: August 25, 2014 Revised: November 13, 2014 Accepted: December 10, 2014

Keywords: Biomarker / Data-independent acquisition (DIA) / Mass spectrometry / Proteomics / SWATH mass spectrometry

1

Introduction

Systems biology uses a cross-disciplinary approach that integrates molecular biology with computational and conceptual sciences (mathematics, physics, and chemistry) to study, and/or eventually, to simulate biological processes at a systemic level [1]. This holistic approach aims at describing and modeling the functional interactions between molecules within the system or organism in question [2]. In almost all biochemical processes in the (human) body, proteins are the most functionally relevant class of molecules, responding rapidly and adapting dynamically to the changing conditions of the cell [3]. Studying protein molecules in complex Correspondence: Dr. Tatjana Sajic, IMSB, ETH Zurich, AugustePiccard-Hof 1, 8093 Zurich, Switzerland E-mail: [email protected], [email protected] Abbreviations: AP-SWATH MS, affinity purification SWATHMS; DDA, data-dependent acquisition; DIA, data-independent acquisition; PAcIFIC, precursor acquisition independent from ion count; PRM, parallel reaction monitoring; QTOF, quadrupole TOF; SWATH, sequential window acquisition of all theoretical mass spectra  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

biological systems has therefore been gaining momentum over the last few decades. A biomarker is defined as a measurable molecule in body fluids or tissues that is indicative of a normal or abnormal process (or of a condition or disease) [http://www.cancer.gov/]. Among the many types of possible biomarkers (proteins, transcripts, metabolites, etc.), protein markers can be particularly informative and valuable because of their functional relevance and accessibility by minimally invasive methods. Examples of such biomarkers include the proteins markers found in plasma or urine. Even though some of these proteins are indeed used to evaluate either normal or diseased processes [4, 5], no specific single protein could completely capture the diseased or healthy state of a cell or tissue. This is partly due to the interplay of complex cellular signaling networks and the heterogeneous nature associated with many diseases such as cancer [6]. Therefore, according to recent reports and expert opinion [4,7,8], it is advisable to gradually replace the concept of “single marker detection”—such as marker-specific assays (e.g., ELISA test), still in use in current clinical practice—with the concept of multiplex “biomarker panel,” which consists in the selection of several diverse biomarkers best capturing the state of a disease. Conceivably, multiple, simultaneous www.clinical.proteomics-journal.com

308

T. Sajic et al.

measurements of biomarkers can improve diagnostic selectivity and sensitivity. MS-based proteomics [9], a versatile tool to monitor large sets of proteins, holds the potential to transform modern medicine into a more predictive, prognostic, and personalized mode of operation [10, 11]. Current proteomic technologies can be used to generate detailed, quantitative molecular profiles of patient samples derived from tissue or biofluids. These sample-specific proteomes (personal map) [12, 13] can be cross-compared between patients and controls to infer the underlying molecular processes perturbed in a disease; thus, to detect potential diagnostic or prognostic biomarkers under specific experimental design. The changes in protein expression could also guide therapeutic decisions. For instance, the majority of therapeutic targets are proteins such as G-proteincoupled receptors, protein kinases, or chromatin modifiers [14]. Measuring the abundance of these target proteins and/or of those functionally related to them could potentially reveal how the body responds to specific treatment, helping us assess the efficacy of the therapy, and define therapy sensitivity or resistance as part of a tool in therapeutic drug monitoring called pharmacodynamics. Should proteomics be established as a mainstream technology in clinical research [15, 16], one could feasibly expect patients to benefit in many aspects [4,17]. Among available proteomics technologies, shotgun proteomics has been extensively used to compile large numbers of candidate protein biomarkers associated with specific pathologies [4,18,19]. Of the long candidate list reported from shotgun proteomic studies, however, many have not yet been validated [20–22], partially due to the lack of a suitable validation technology capable of accurately and reproducibly quantifying the candidate markers in a large number of samples [23]. Recent years have seen significant technological progress toward that goal [24]. In this review, we discuss MS-based proteomic strategies used in biomarker discovery and validation. Specifically, we present an overview of the new, emerging data-independent acquisition (DIA) MS strategy and its recent application in clinical oriented studies. For researchers interested in implementing this approach, the article also describes the basic framework of one of the DIA methods, SWATH-MS (where SWATH is sequential window acquisition of all theoretical mass spectra).

2

MS-based proteomic platforms in clinical applications

In MS-based proteomics, there are two broad, different approaches for studying proteins: the less used, less mature “top-down” proteomics, and the almost universally used “bottom-up” proteomics. “Top down” proteomics attempts to analyze intact proteins, and while not further discussed in this review, has also recently begun to spark interest in clinical applications [25]. “Bottom-up” proteomics generally analyzes peptides derived from proteins via enzymatic digestion. Individual peptides are then identified and quantified  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics Clin. Appl. 2015, 9, 307–321

by LC-MS/MS, and the identity and quantity of the proteins inferred from the peptide-level data [26]. What the numerous strategies developed in the area of “bottom up” proteomics have in common is that they identify and quantify peptides in a 2D space defined by the chromatographic elution time determined by the LC system and the precursor ion m/z determined by the mass spectrometer (Fig. 1A). Where they differ is how they select precursors for fragmentation, and how the fragment ion signals are recorded. Based on these parameters, “bottom-up” proteomic strategies are grouped into datadependent, targeting, and data-independent methods, used for different purposes. In clinical use, these three method groups can be applied at different stages in the biomarker discovery and development process. The following subchapters describe these methods and their use, with the focus on the data-independent method.

2.1 Data-dependent acquisition (DDA) mode 2.1.1 Principles of DDA data collection Shotgun or so-called discovery proteomics relies on DDA mode for data collection, remaining to date the most widely used MS-based platform [27, 28]. In DDA mode, the most abundant peptide ions (precursor ions) detected in a survey scan (MS1 scan) are selected for fragmentation, and accelerated toward and fragmented in a collision cell by CID. The resulting fragment ions (product ions) are then acquired in the form of fragment ion spectra (MS2 or MS/MS spectra) (Fig. 1B). Data analysis in DDA mode relies on peptide identification obtained by reconstructing a peptide’s amino acid sequence from the fragment ions recorded in these MS2 spectra. Specifically, the MS2 spectra are automatically “matched” to simulated MS/MS spectra generated by “in silico” digestion of protein sequences in a protein database either inserted directly or extrapolated from the genome sequences. Selection of a peptide precursor ion for fragmentation is usually based on the real-time intensity of such ion, and therefore depends on the ionization response factor and abundance of the peptide in the sample. Up to a point, these limitations could be overcome by application of acquisition parameters such as dynamic exclusion, which prevents repeated reselection of the same precursor ion, thus maximizing the number of peptides identified in an LC-MS/MS run. Also, to increase the chance of low-abundant peptides being detected, and reduce the complexity of clinical samples, multidimensional LC peptide separations are usually performed prior to DDA [29]. The shotgun approach is commonly used in discoverydriven proteomic experiments for identification of the maximal number of proteins in a complex biological sample, identification of their PTMs, or for cataloging whole proteomes such as the human proteome [30, 31]. www.clinical.proteomics-journal.com

309

Proteomics Clin. Appl. 2015, 9, 307–321

Figure 1. Schematic overview of three acquisition modes: (A) MS1 map of precursor ions; (B) DDA mode: intensity-based peptide selection in MS1 scan where MS2 scan is a snapshot of one single peptide in RT; (C) SRM mode: user-driven (targeted) peptide selection in narrow window (0.7 Da) and continuous recording in RT; (D) DIA-SWATH mode: precursor isolation window setup to a width of 25 Da (32 sequential windows cover the mass range 400–1200 Da) and MS2 scan is a multiplexed recording of the fragment ions originating from all precursor ions within a predefined m/z range and eluting in the same RT.

2.1.2 The role of DDA in the biomarker research context In complex clinical samples, the shotgun approach is particularly suitable for fast protein screening when there is no prior knowledge of protein presence, function, interaction, or structure. Therefore, shotgun proteomics has been applied mainly in the discovery phase of biomarker development, to compare multiple samples of disease/control cohorts, or more recently, to characterize the proteogenomic blueprint of certain diseases [19]. Discovered biomarker candidates would then be further validated under optimized conditions in larger sample cohorts until final acceptance in clinical use.

hybrid mass analyzers such as linear-ion trap/quadrupoleOrbitrap (or Q-Exactive) or quadrupole TOF (QTOF) [32, 33]. As far back as 10 years ago, Liu et al. reported that shotgun proteomics is biased against low-abundant peptides since the selection of precursor is based on peptide precursor intensity [34]. Interestingly, in 2011, a detailed analysis from Mann group reported that while more than 100 000 peptide features were recorded in a single LC-MS/MS analysis of a complex proteome by a state-of-the-art linear-ion trap/quadrupoleOrbitrap mass spectrometer, only about 16% was selected for generating MS/MS spectra as a result of DDA strategy [35], further illustrating the limitations of heuristic peptide selection processes in the shotgun approach.

2.1.3 Performance 2.2 Targeted (SRM) mode In recent years, new-generation MS instruments, with their high resolving power, high-acquisition speed, highsensitivity, and high-mass accuracy, permit the routine identification of a few thousand proteins in a complex sample within several hours. A typical instrument setup involves  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

2.2.1 Principle of SRM In contrast to shotgun, where no prior knowledge of proteins is demanded, the targeted approach is based on a www.clinical.proteomics-journal.com

310

T. Sajic et al.

hypothesis-driven selection of the proteins of research interest, while nontargeted peptides are not analyzed. SRM, or MRM [27, 36], is the fundamental acquisition mode in the targeted proteomics approach and the gold standard proteomic quantification method for predefined sets of proteins. In SRM, a series of transitions (m/z values of the precursor/product ion pairs) have to be defined in combination with the retention time for targeted proteins before an SRM experiment. For a targeted protein, specific peptides and peptide fragments in the predefined list are selected based on their predetermined m/z values by the first and third quadrupole analyzers (Q1 and Q3) in a triple quadrupole instrument (QqQ), with the intensity of each transition recorded over chromatographic time (Fig. 1C). This process is repeated in short sequence for additional fragment ions derived from the targeted peptide, and for transition signals from a set of targeted peptides. Once the ion chromatograms for the selected transitions have been acquired, automated SRM dataprocessing software tools, such as mProphet [37], can be applied to compute the probability that the targeted peptide has been detected in the sample, and determine its quantity, thus supporting the large-scale application of the method [36].

2.2.2 The role of SRM in the biomarker research context Compared with shotgun MS, SRM exhibits favorable performance in quantitative reproducibility, accuracy, and even precision, providing a sensitive measurement for tens of protein targets in a short time scale. Therefore, in the area of biomarker research, SRM has been used in the verification or validation phase of candidate biomarkers reported in the discovery phase. Further, various studies have successfully employed the targeted approach for both absolute and relative quantitation of protein levels in clinical samples [38, 39].

2.2.3 Performance The mass spectrometer operating in SRM mode measures targeted peptides at the right mass and elution time; if the peptides are detectable, missing values are minimized. Furthermore, SRM can reach a high dynamic range of ca. five orders of magnitude of protein measurements in samples with a complex background with a high selectivity. Noise reduction (S/N) and mass-selection filter take out noise on both peptide and fragment levels by a narrow (0.7 Da) window, supporting a highly accurate quantification [27, 40]. Additionally, SRM-based targeted proteomics has shown excellent interlaboratory reproducibility, with a CV below 20% [41]. While SRM is capable of performing measurements in a large cohort of samples (high-sample throughput), the lack of high-analyte throughput measurements constitutes a limitation [42] This is because the analyte throughput is limited by the number of  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics Clin. Appl. 2015, 9, 307–321

transitions, which usually does not exceed 100 peptides, that can be monitored in a single run without compromising performance. The latest developments in targeted methods, such as the introduction of the high resolution and accurate mass methods known as parallel reaction monitoring (PRM), execute the targeted approaches in Orbitrap-type instruments, providing alternative ways to quantify targeted proteins and peptides, with the Orbitrap analyzer—in comparison with the quadrupole analyzer used in SRM—providing a higher selectivity due to higher resolution [43]. Compared with the SRM mode, although PRM requires less time for assay development while simultaneously allowing a higher number of transitions to be monitored, the number of precursor ions in PRM in an experiment is still limited.

2.3 DIA: An innovative MS-based proteomic workflow 2.3.1 Principle of DIA Despite accurate and reproducible measurement of predefined sets of analytes across large sample numbers, the targeted MS analysis is still dependent on SRM assays normally derived from the DDA discovery of the biomarkers. As explained above, the stochastic nature of DDA tends to compromise the reproducible analysis of the same protein between samples [44]. Recent implementation of MS methods based on the DIA mode of all the analytes detectable in a sample promises to achieve the goal of reproducible and accurate measurement of several hundreds or thousands of proteins across multiple samples. In DIA high-resolution MS [24, 45], the MS instrument generates all the MS2 spectra from all precursor ions falling into a predefined m/z range (Fig. 1D). Hence, each recorded MS2 spectrum is not a snapshot of a single peptide, but a multiplexed recording of the fragment ions derived from all peptides eluting in real time within the predefined m/z range of the precursor window (Fig. 1D). If the precursor selection windows are seamlessly adjacent and cover the whole m/z precursor range of the peptides in a sample, every precursor detected in a sample is fragmented. This continuous sequencing of all detectable peptide features entering the mass spectrometer leads to data collection with fewer missing values than would be the case using the conventional DDA mode, and significantly higher levels of multiplexing than is afforded by SRM.

2.3.2 The role of DIA in biomarker research compared with DDA and SRM Owing to unbiased data collection, DIA combines the advantages of DDA acquisition and SRM. Depending on the purpose of biomarker research, since high numbers of proteins can be monitored, DIA could be used for discovery www.clinical.proteomics-journal.com

Proteomics Clin. Appl. 2015, 9, 307–321

experiments or targeted proteomic investigations. Significantly, DIA data sets generated as permanent digital maps could easily be reexamined for novel emerging biomarker validation and for iterative biomarker discovery [13]. As a developing MS technique, DIA, at least currently, cannot substitute discovery or targeted proteomics. Sensitivity of measurements cannot compete with SRM for small numbers of protein targets in single experiments [46]. For the discovery phase of the biomarker development pipeline (especially when such biomarkers present PTM formats, or novel alternative splicing isoforms, etc.), shotgun proteomics based on DDA is a more mature technique than DIA, and the most widely used and accepted by the proteomic community. The main challenges of DIA MS itself lie in the data processing and interpretation of multiplexed tandem spectra originating from multiple precursors. In contrast to DDA, where the fragment ion spectrum generally is well associated with the precursor, in DIA, scans of precursors and their fragment ions are not recorded coordinately, further complicating data interpretation. Recently, the considerable improvements in MS instrumentation, specifically their high resolving power, mass accuracy, and speed of spectra acquisition have greatly contributed to simplifying DIA data interpretation, bringing into focus new ideas for DIA mode applications and development. These include an increased precursor selectivity, either by application of spectra demultiplexing or narrower isolation windows that increase precursor selectivity [47], and/or by combination of posterior data extraction from acquired data with high fragment ion accuracy (e.g., 10 ppm) that provides fragment ion specificity comparable to standard SRM setups (0.7 Da isolation widths for Q1/Q3) [48]. (Please note 10 ppm does not only refer to mass accuracy but also to the required resolution as the simulation was done following this assumption in the first SWATH-MS paper [48].) 2.3.3 An overview of existing DIA strategies To date, several DIA methods and associated data-processing strategies have been reported in the literature [48–54], their common denominator unbiased MS/MS data acquisition, their differences data collection schemes, data analysis strategies, and the type of instrument on which they are implemented. DIA approaches can be classified according to a predefined m/z range: (i) DIA on full range m/z, where all ions enter into the mass spectrometer for fragmentation; and (ii) DIA on selected range m/z, where the full precursor-ion mass range is divided into smaller m/z isolation windows [45]. Here, we discuss some of the DIA approaches developed on the newgeneration mass spectrometers.

2.3.3.1 Full m/z range DIA The MSE method was developed in 2005 by the Waters Corporation on a QTOF mass spectrometer [55]; high- and  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

311 low-energy oscillations in CID lead to the acquisition of highand low-energy spectra of precursor and fragment ions, respectively [56]. MSE accounting software is based on ioncounting algorithms. The tentative peptides are identified and scored according to how well they fit certain physicochemical parameters (e.g., accurate mass, retention time, ion intensity, charge state of precursor and product ions), the spectra of all the highest scoring peptides removed by the depletion algorithm, and the low-abundant proteins identified. A recent report demonstrated that fragmentation efficiency could be enhanced by ion-mobility drift time-specific collision energy, extending the analytical power of MSE [57]. MSE has found a direct clinical application in the pathophysiology of schizophrenia; proteomic analysis of postmortem brain [58] via analysis of the human pituitary proteome [59] or via serum profiling [60]. Owing to its robustness and reproducibility, the method afforded a relatively simple comparative analysis to detect proteome changes. Successfully applied in clinical studies related to bacterial proteomes and their resistance to specific drugs [61, 62], MSE was also applied in the reproductive physiology study that analyzed the secretome of the human embryo [63]. “All-ion fragmentation” is a variation of the MSE method developed for bench top Orbitrap equipped with C-trap and higher energy CID cell [64]. The geometry of the instrument permits an alternate high resolution and mass accuracy measurement for both precursor and fragment ions within one duty cycle. All precursor ions coming into the MS instrument at a certain time during the analysis are accumulated in the Ctrap, and then injected into Orbitrap for high resolution and mass accuracy scan recording. For fragment ion recording, the precursors first accumulated in the C-trap pass into the higher energy CID cell for fragmentation, and the fragments generated returned to the C-trap and injected into the Orbitrap mass analyzer. However, unlike hybrid mass spectrometers equipped with a linear ion trap, this stand-alone Orbitrap analyzer does not allow precursor-ion selection, resulting in certain loss of linkage between precursors and their fragment ions [65]. The data analysis can be performed by conventional database searching in the MaxQuant software, whereby the information of precursor-ion linkage is first reconstructed using the notion that precursor and associated fragment ion signals have the same chromatographic elution profile.

2.3.3.2 Selected range m/z DIA An alternative instrumental setup for DIA is based on the isolation of consecutive, relatively small m/z isolation windows that collectively cover the complete precursor-ion mass range. The initial concept of this approach was introduced in 2004 [50]; the sequential isolation and fragmentation of precursor windows of 10 m/z was performed within an ion-trap mass spectrometer until the desired mass range had been covered. Since then, other DIA methods have been established. The precursor acquisition independent from ion count (PAcIFIC) mode relies on the acquisition of fragment ion www.clinical.proteomics-journal.com

312

T. Sajic et al.

spectra derived from all m/z channels (channel width equal to 2.5 m/z) across a precursor-ion mass range of 400–1400 m/z, which generally covers most tryptic peptides of a proteome. In each window, precursors are selected for fragmentation, irrespective of whether a precursor ion is detected [52]. To cover the desired mass range at the small precursor selection window increments, samples need to be injected multiple times. Thus, PAcIFIC requires, per sample, a long total acquisition time, one that is also dependent on the scan speed of the mass spectrometer. Instrument shows improvements in scan rate could reduce the measurement time required for one sample from 4.3 to 2.5 days [51]. Software developed for PAcIFIC creates multiple data files where information is filtered by the assignment of unique precursor monoisotopic m/z to one or more multiplexed MS2 spectra. PAcIFIC is increasing the dynamic range of protein quantification and helping to thoroughly catalog a proteome [66]. Thus, it already proved optimal for applications in the discovery phase of biomarker projects, has been applied to plasma profiling in abdominal aortic aneurysms [67], urinary proteomes [68, 69], and drug resistance studies [70]. FT–all reaction monitoring features a DIA implementation in which all ions from the broad m/z window (100 m/z) are accumulated and subjected to various fragmentation modes, such as CID or infrared multiphoton dissociation. The generated fragments are then analyzed on ultrahighresolution/high mass accuracy mass analyzer [71]. The data analysis of FT–all reaction monitoring is based on matching the multiplexed spectra against theoretical or empirical peptide fragmentation patterns and the associated scoring system of these peptide matches. Even if the database is large, low false discovery rates (FDRs) can be achieved owing to the ultrahigh mass accuracy of the recorded spectra afforded by FT analyzer. In the last 2 years, significant instrument improvements were achieved on hybrid quadrupole Orbitrap mass analyzer (Q Exactive) and other Orbitrap-based platforms (Thermo scientific) for DIA implementation. These improvements include advance quadrupole technology (http://planetorbitrap.com/), which optimizes precursor selection and transmission and ensures consistent ion transmission during larger m/z window acquisitions, and a faster scanning rate [65], which accelerates DIA productivity. Accordingly, some novel DIA methods (presented below) are being established on Orbitrap-based mass spectrometers. The multiplexed MS/MS for improved DIA was demonstrated on Q-Exactive [47], in which five separated 4 m/z windows were acquired together per spectrum (a total of 20 m/z per scan). The resultant spectra was subsequently demultiplexed with the Skyline software tool [72] into five separated 4 m/z windows, increasing the sampling frequency and selectivity by the 100 windows of 4 m/z covering the mass range of 400 m/z (500–900 m/z). Multiplexed MS/MS for improved DIA provides a good balance between selectivity, speed, and sensitivity. The pSMART [73] features another isolation strategy, in which the full m/z range of 400–1000 m/z is divided  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics Clin. Appl. 2015, 9, 307–321

into 120 acquisition windows of 5 Da in each cycle time, while each cycle is divided into several segments (4 s each). In every segment, the MS1 scan is acquired on the whole m/z range, while the consecutive MS2 scans are acquired only for the segmented m/z range. In the recent paper [74] the authors introduced asymmetric precursor isolation windows for the following pSMART method, using 5 Da windows from 400 to 800 Da and then increasing to 10 Da from 800 to 1000 Da and finally increasing to 20 Da windows from 1000 to 1200 Da. The wiSIM [75] (normally performed on Orbitrap Fusion) uses three high-resolution SIM scans with wide isolation windows (200 Da) to cover all precursor ions of 400–1000 m/z. In parallel with each SIM scan, 17 sequential ion-trap MS/MS with 12 Da isolation windows are acquired to cover the associated 200 Da SIM mass range. The resulting cycle time is less than 4 s. In both pSMART and wiSIM, peptide quantification relies on the high-resolution high-accuracy MS1 scan, while identification is based on selective MS2 scans resulting from small isolation windows. Corresponding data analysis pipelines still need to be developed for these new DIA strategies. Significantly, the DIA strategy termed MSALL was also developed for the QTOF mass spectrometer (TripleTof 5600TM series, AB Sciex), where multiplexed spectra containing corresponding fragments of all precursors within full m/z range are acquired. SWATH-MS, a variation of the original MSALL DIA, uses sequential isolation windows (usually of 25 m/z, or smaller), acquiring corresponding multiplexed MS2 spectra consecutively until the entire mass range is covered (400– 1200 m/z). SWATH offers more selectivity than MSALL owing to relatively smaller Q1 windows, which simplifies MS/MS spectra interpretation. The details of SWATH-MS workflow and its clinical applications are discussed in the next section. Table 1 presents an outline of the emerging dataindependent strategies with an overview of their MS setup characteristics and following data processing, together with literature examples of their clinical application studies.

3

SWATH-MS based workflow for biomarker discovery

3.1 An overview of SWATH-MS in clinical-related studies Based on the instrumental principle mentioned above, in SWATH-MS, the data are acquired by the mass spectrometer recursively cycling through precursor-ion selection windows (swathes) that collectively cover the whole precursor-ion mass range (Fig. 1D). The data from the sequential swathes are then compiled in a computer into a single file termed SWATH map [13]. Each precursor-ion selection window contains three-dimensional information for a given peptide, that is, retention time, fragment ion m/z, and intensity. The SWATH map therefore derives a permanent, complete digital record of fragment ion spectra of all peptides detectable www.clinical.proteomics-journal.com

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

a

Software PAcIFIC (purposely developed) FT-ARM Programme (C++)

3 s × 67

Pinpoint

OpenSWATH; Skyline, Spectronaut, Peakview

20 s 3.6 s

3.2 s

5 × 4 m/z, 500–900 m/z per inj. 120 (by four segments)a 5 m/z, 400–1000 m/z per inj. 51 (by three segments)a 12 m/z, 400–1000 m/z per inj.

[47]

[73], [74]a

[74]

25 m/z or variable m/z (based on precursor density), 400–1200 m/z per inj.

Pinpoint

3.5 s

100 m/z, 500 m/z per inj. (500–1500 m/z)

[71]

Ion-Trap-FT, Ion-Trap-Orbitrap (Thermo scientific) QuadrupoleOrbitrap/Q-Exactive (Thermo scientific) QuadrupoleOrbitrap/Q-Exactive (Thermo scientific) Quadrupole-OrbitrapIon-Trap/Orbitrap Fusion (Thermo scientific) Q-TOF/TripleTof 5600TM , 6600TM (AB Sciex) [48]

Skyline (demultiplexing)

5.45 s × 2

2.5 m/z, covering 15 m/z per inj. (400–1400 m/z)

[52]

XDIA processor

(Not disclosed)

20 m/z, 400–1000 m/z per inj.

RelEx

Manually creating data files for SEQUEST

MSE accounting software MaxQuant

Examples of supporting software

[54]

 35 s

2 s

2s

2 s

Duty cycle to cover m/z range

10 m/z (400–1400 m/z)

Full m/z range (100–2000 m/z)

Full m/z range (300–1600 m/z)

Full m/z range (300–2000 m/z)

MS1 isolation strategy

[50]

[49]

[64]

[55]

Reference

Ion-Trap-Orbitrap (Thermo scientific)

Ion-Trap (Thermo scientific) Ion-Trap-Orbitrap (Thermo scientific)

Orbitrap (Thermo scientific) Q-TOF (Applied Biosystems)

Q-TOF (Waters)

MS analyzer type and vendors

pSMART data acquisition method with asymmetric precursor isolation windows.

MSALL with sequential windowed acquisition of all theoretical mass spectra (SWATH)

wiSIM

pSMART

Extended data-independent acquisition (XDIA) The precursor acquisition independent from ion count (PAcIFIC) FT–all reaction monitoring (FT-ARM) Multiplexed MS/MS

Selected range m/z DIA Original DIA

Shotgun CID (CID in-source)

All-ion fragmentation

Full m/z range DIA MSE

Method

Table 1. A summary of emerging DIA mode

[76–78]

_

_

_

_

[67–70]

_

_

_

_

[58–61]

Currently published clinical-related studies

Proteomics Clin. Appl. 2015, 9, 307–321

313

www.clinical.proteomics-journal.com

314

T. Sajic et al.

Proteomics Clin. Appl. 2015, 9, 307–321

Figure 2. Schematic representation of the integrated SWATH-MS workflow in clinical application, which includes SWATH data acquisition, library assay generation by shotgun MS, and data processing (targeted data extraction; identification, quantification, and statistical analysis).

from a (clinical) sample [13]. In essence, the concept and implementation of SWATH-MS offers SRM-like reproducibility and quantification accuracy, but in a shotgunlike high analyte throughput fashion, all favorable features for clinical applications such as biomarker discovery. The SWATH data sets of clinical samples can be reanalyzed perpetually in silico since all analytic ions coming to the instrument are acquired and recorded in the form of a sample digital map [13, 48]. Thus, the great benefit is that all clinical SWATH samples are acquired once and forever. As the goal of SWATH generated from the clinical sample is to reflect the individual’s health status, SWATH application is progressively increasing in systems studies [79] and clinical-related studies, which are usually guided as comparative studies. SWATH-MS can be implemented concisely. Below, we discuss the elements of the integrated SWATH-MS workflow applied for biomarker discovery, which includes experimental design, sample preparation, data acquisition on the TripleTOF systems, library assay generation, and highthroughput “data extraction” across large cohort groups (Fig. 2).

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

3.2 Techniques of sample processing prior to SWATH-MS 3.2.1 General considerations With the high reproducibility provided by SWATH-MS, a comprehensive proteomic survey of the large sample cohort becomes possible, making handling the large number of clinical samples in a parallel, high-throughput manner extremely important in experimental design. This step can reasonably be achieved by using different levels of quality control (protein- and peptide-level quantitative standards, retention time calibrant peptides, repeated injection of standard samples, etc.), as well as by using automatic robotic systems or multiwell plates for extraction of proteomes of interest from patient tissue or body fluids. For example, the non-naturally occurring synthetic peptides in a pooled mix are frequently added to all samples prior to MS injection, serving as an alignment tool to correct the retention time variation between LC-MS runs [80]. Some potentially interesting sample preparation techniques of possible interest in biomarker studies

www.clinical.proteomics-journal.com

315

Proteomics Clin. Appl. 2015, 9, 307–321

using tissue or body fluids in different ways will be briefly discussed in the next subsection.

3.2.2 Plasma samples Blood plasma, among the most desirable sample types for biomarker studies since it can be obtained by a simple, noninvasive procedure, is an extremely complex matrix whose dynamic range of proteins exceeds ten orders of magnitude [18]. In order to reduce plasma background complexity, samples are frequently depleted for high-abundant proteins before MS measurement. A recent study demonstrated several steps in plasma processing, such as depletion, fractionation, and isolation of specific protein sets [81]. Combined with sensitive MS measurement, plasma fractionation was deemed to achieve reproducible quantification of proteins at concentrations in the 50–100 pg/mL range [38]. We and others have used, among these methods, SPE based on hydrazide chemistry (as described in previous studies [38,82,83]) to enrich the N-linked glycoproteins and increase the detection sensitivity of plasma or tissue proteins. In theory, plasma should contain all the proteins released or secreted from tissue. Frequently, however, these are glycoproteins [84], information-rich subproteomes of blood plasma, which nowadays make up the majority of validated biomarker applied clinical diagnostics [85]. The glycoprotein enrichment step has a CV generally less than 20%, comparable to the analogous values from repeat LC-MS analysis of identical samples [86, 87]. In large-scale experiments, glyco extraction is performed on multichannel well plates, and the reference glycoproteins spiked into the plasma samples to check for intraexperimental variations. SWATH-MS combined with N-glycoproteome enrichment is a promising, integrative proteomic platform for biomarker discovery and verification in plasma [76]. Specifically, by using dilution series of isotopically labeled heavy peptides representing biomarker candidates, we have by targeted data analysis determined the LOQ of SWATH-MS to be approximately 0.0456 fmol at peptide level, which translates to a concentration of 5–10 ng protein/mL in plasma. Besides glycopeptide enrichment, phosphopeptide enrichment was also tested to combine with SWATH-MS [88].

3.2.3 Tissue samples Tissue samples, usually very rich in lipid content, could greatly influence the robustness of an analysis, creating variability between tissue sample measurements. Thus, tissue samples require prolonged sample-processing times before injection into LC-MS/MS. Several available solutions exist to achieve rapid enzymatic digestion. Pressure cycling technology, for example, applies cycles of ambient and ultrahigh hydrostatic pressure to induce cell lysis, and enable separation of the solubilized proteins from the lipid bilayer. Coupled with SWATH-MS measurement, this should prove a quick, elegant  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

way of processing tissue samples prior to clinical studies [89]. Like blood plasma, tissue samples can be enriched for glycoproteins, to specifically study the membrane-associated or membrane-secreted subproteomes significantly enriched in glycoproteins. Following N-glycoproteome enrichment, samples are analyzed by SWATH-MS in the manner described above. It was glycoproteins enriched from tissue samples and processed by SWATH-MS that led to the discovery of protein biomarkers indicating the aggressiveness of prostate cancer [77].

3.2.4 Affinity purified protein samples Affinity purification in combination with SWATH MS (APSWATH MS) was recently applied [90, 91]. The 14-3-3␤ scaffold protein interactome from samples was obtained by single-step affinity purification on streptavidin beads. The consistent, reproducible quantification of a large set of proteins (e.g., >1500) across different samples or time point experiments enables one to obtain reliable information about specific dynamic changes in protein complex interaction networks after stimulation of the insulin-PI3K-AKT pathway. Widely studied, the 14-3-3␤ interactome is one of the most extensively networked of all protein complexes in the cell. AP-SWATH MS’s specific ability to reproducibly quantify dynamic changes and protein interaction complexes is particularly important for network biological analysis because it supports the detection of interaction network reorganization and rewiring in response to perturbations [92].

3.3 Instrumental setup for SWATH data acquisition In a prototype SWATH-MS experiment, on a 5600 TripleTOF mass spectrometer a set of 32 sequential separated windows is used (1 Da overlapping between adjacent windows) with an isolation width of 25 Da each to cover the mass range 400– 1200 Da. The precursor isolation window setup is: [400–426], [425–451], . . . [1175–1201]. An accumulation time of 100 ms is used for each fragment ion scan and for the survey scan acquired at the beginning of each cycle, resulting in a total cycle time of 3.3 s. Peptide separation prior to SWATH acquisition is performed on standard nanoLC system at a flow rate of 300 nL/min with a linear gradient of 2–35% of organic phase (98% ACN and 0.1% formic acid). Recently, a variant of the SWATH mass experiment method was developed based on the variable m/z windows for ion isolation [93]. The window size is predefined depending on the precursor density according to the survey scan from either shotgun or SWATH acquisition of the same sample. Narrower m/z windows are applied for precursor-dense regions, wider windows for precursor-sparse segments. www.clinical.proteomics-journal.com

316

T. Sajic et al.

3.4 Library assay generation for SWATH-MS An important step in successful SWATH data analysis is the generation of the appropriate spectral libraries containing MS coordinates used for targeted data extraction of proteins of interest. The mass spectrometric reference maps of the proteome or subproteome of interest contain collections of fragment ion spectra of peptides (spectral or assay libraries) corresponding to predicted protein sequences based on the genome [17, 94]. The spectral libraries are usually generated by deep shotgun sequencing of (i) natural peptides extracted from suitable clinical samples to be used for SWATH measurements; or (ii) synthetic peptides to obtain high-quality spectra for those proteins difficult to detect in the natural sample [13]. For example, to analyze the N-glycoproteins by SWATH-MS, the assay libraries were generated by shotgun sequencing of the natural enriched glycopeptides in suitable samples, and from the sequencing libraries of synthetic glycopeptides [77]. To increase proteome coverage and obtain high-quality spectra, sometimes a different fractionation or enrichment strategy can be applied, or the samples injected several times in shotgun mode. It is, however, important to note that the instrument type for the generation of spectral libraries in DDA mode should preferably be the same for the SWATH experiment; the high similarity of the resulting reference spectra and the spectra generated in SWATH acquisition enhances the specificity of the data analysis procedure [95]. Moreover, consensus spectra are favorable [95] can now be automatically constructed by merging all valid peptide spectrum matches filtered for 1% FDR, of the same peptide, by using algorithms such as SpectraST [96] to finally obtain high-quality reference maps. Once generated in the libraries, the assays, if of high quality, are easily transferable. One repository of the peptide library assays, or preliminary reference map of >10 000 human proteins, was recently completed, and is publicly accessible for SWATH-MS analyses in human samples [97].

3.5 Targeted data extraction in SWATH maps For confident targeted data extraction and quantification of SWATH data, we use an automated open-source software, openSWATH [98], which is based on OpenMS [99] (tutorial available from http://www.openswath.org) and implemented on iPortal [100] with a user-friendly interface. First, SWATH data .wiff files are converted to mzXML format (i.e., by ProteoWizard [101] with an optimized peak picking/centroiding step) as an input format for OpenSWATH. Then, the targeted data analysis is executed by automatic integration of peak group extraction from SWATH maps and a decoy scoring system written with a pyProphet algorithm optimized for SWATH data analysis, and based on a reimplementation of the mProphet scoring algorithm developed for SRM analysis [37]. OpenSWATH identifies the peak groups from the SWATH maps at user-defined FDR, aligning them between  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics Clin. Appl. 2015, 9, 307–321

SWATH maps from different samples based on the clustering behaviors of the retention time in each run [98]. The quantitative information of unique peptides is concurrently reported by OpenSWATH. Besides OpenSWATH, SWATH data processing could be done with the commercial software package provided by the vendor (PeakView, ABSciex); be performed on Skyline [72] that can visualize the extracted ion peaks, making manual analysis possible; or be done on Spectronaut [102] that provides automated, rapid data analysis. OpenSWATH and Skyline are free for academic use; Spectronaut requires the existence of a commercial set of peptide standards spiked in the sample for retention time alignment.

3.6 Statistical data analysis and visualization OpenSWATH uses the sum of the integrated chromatographic fragment ion peak areas of SWATH-MS2 data to quantify peptides. Then, the peptide peak areas so obtained are processed for protein abundance. The absolute label-free quantification [103], an available R-package, could be used for further SWATH data summarization and automatic absolute label-free protein abundance estimation. MSstats [http://www.msstats.org/] developed by the Vitek group is an open-source, R-based package for statistical relative quantification of peptides and proteins in MS-based proteomic experiments. MSstats statistical analysis, based on a family of linear mixed-effects models [104], enables three types of analysis to be performed: (i) data processing and visualization (i.e., data normalization and transformation, and quality control); (ii) testing the significance for differential protein abundance between condition and estimation of protein abundance in individual biological samples or conditions on a relative scale; and (iii) a model-based calculation of a sample size for a future experiment. Most of these statistical functions provided by MSstats are now applicable to DIA or SWATH data, depending on the experimental design.

3.7 Performances of SWATH-MS applied in clinical fields Once the proteome of a clinical sample has been recorded by SWATH-MS, targeted data extraction of thousands of features can be applied to detect and quantify specific sets of proteins expected to be associated with a specific clinical question. The main advantage of SWATH-MS over SRM measurement is that a much larger number of proteins can reproducibly be quantified in a single experiment. In a single human cell line such as HeLa or U2OS, in one SWATH injection >2000– 3000 proteins (>10 000–15 000 peptides) could be detected with protein FDR cutoff set to 1% [97]. In a complex biological background such as yeast tryptic digest, the SRM measurements were found to be tenfold more sensitive compared with SWATH. In yeast samples, SWATH measurement linearity was found to be almost four orders of magnitude; the www.clinical.proteomics-journal.com

317

Proteomics Clin. Appl. 2015, 9, 307–321

LOD to be in the amol range; and the precision estimated at 13.7–14.9% [48]. SWATH measurements of N-linked glycoproteins in human plasma demonstrated the level of quantitative reproducibility between SWATH measurements to be similar to SRM experiments, with a threefold lower sensitivity than SRM [76]. Similar quantitative performance between SWATH and SRM measurements was observed for phosphopeptides isolated from human plasma [88]. SWATH technology has already been applied in comparative biomarker studies of prostate cancer aggressivity [77], biomarker discovery related to lung carcinoma [78], and prospective plasma studies [76,88]. SWATH-MS was recently combined with multiple protease digestion to detect ErbB2 protein, a breast cancer biomarker [105]. Also, SWATH-MS was applied in the study of 14-3-3␤ scaffold protein interactome, which is related to cancer biology [90].

4

Summary and outlook

MS-based measurements of small molecules were introduced to clinical laboratories years ago [106]. Examples include their applications in therapeutic drug monitoring, forensic toxicological screening, as well as doping control [107, 108]. Unlike the clinical assessment of small metabolites, MS-based proteomic analyses for clinical assessment has progressed slowly despite an urgent need for therapeutic applications [14, 109] and protein biomarker-based diagnostics [85, 110]. The MS-based biomarker pipeline generally consists of a discovery, verification, and validation phase [7]. Currently, the lack of a uniform pipeline connecting marker discovery and well-established methods of validation is seen as a gap for efficient integration of MS-based proteomics into biomarkerrelated clinical applications [23, 42]. The targeted approach, based on SRM, presents itself as an ideal tool for validation because of its reproducibility, accuracy, and high sample throughput in protein measurement across multiple laboratories [41]. However, owing to its lack of analyte throughput capacity, not all potential biomarker candidates supported from the discovery phase can be measured in the targeted verification phase. Assay optimization for selected markers is time consuming. The DIA modes with an unbiased MS/MS data acquisition normally results in an increased sensitivity, dynamic range, and proteome coverage in complex protein samples compared with DDA. At the present situation, the various DIA methods differ in data collection, data analysis, and the type of instrument on which they are implemented. In particular, the reproducible DIA technologies in the latest MS instruments can benefit clinical biomarker applications by enabling the reliable protein measurements across complex clinical samples that may undergo differential changes and biological response depending on the environment, tissue type, or diseased state [111]. Although the DIA assays have already been generated in the global proteomic scale [97], we suggest that spectral  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

libraries specific to certain clinical questions or biomarker sets be generated if the protein assay is not available, if the biomarkers present in PTM forms or as mutated forms [19], or if the study focus is related to clinical peptidomics, such as neuropeptidomics or MHC peptidomics. Further DIA approaches with robust proteomic pipelines and integrated data analysis strategy are expected to be matured soon, to a level that naturally fills the gap between the biomarker discovery and validation phases. Forthcoming DIA strategies will find wide applications in new discovery paradigms in molecular drug targets and biomarkers studies, and in novel research areas such as functional proteomics and computational epigenetics [112]. These technologies, combined with the features discussed in this review, will hopefully allow biomarker discovery to take up the challenges that have been facing it for decades. We thank Ching Chiek Koh (ETH Zurich) for proofreading the manuscript, and Christina Ludwig (ETH Zurich) for sharing her ideas during the generation of Fig. 1. Many insights presented herein are derived from internal discussions in the Aebersold group. This work is supported by the Swiss National Science Foundation (31003A_130530). The authors have declared no conflict of interest.

5

References [1] Kitano, H., Computational systems biology. Nature 2002, 420, 206–210. [2] Kitano, H., Systems biology: a brief overview. Science 2002, 295, 1662–1664. [3] Lamond, A. I., Uhlen, M., Horning, S., Makarov, A. et al., Advancing cell biology through proteomics in space and time (PROSPECTS). Mol. Cell. Proteomics 2012, 11, O112 017731. [4] Polanski, M., Anderson, N. L., A list of candidate cancer biomarkers for targeted proteomics. Biomark. Insights 2007, 1, 1–48. [5] Haug, U., Rothenbacher, D., Wente, M. N., Seiler, C. M. et al., Tumour M2-PK as a stool marker for colorectal cancer: comparative analysis in a large sample of unselected older adults vs colorectal cancer patients. Br. J. Cancer 2007, 96, 1329–1334. [6] Shackleton, M., Quintana, E., Fearon, E. R., Morrison, S. J., Heterogeneity in cancer: cancer stem cells versus clonal evolution. Cell 2009, 138, 822–829. [7] Parker, C. E., Borchers, C. H., Mass spectrometry based biomarker discovery, verification, and validation—quality assurance and control of protein biomarker assays. Mol. Oncol. 2014, 8, 840–858. [8] Kulasingam, V., Pavlou, M. P., Diamandis, E. P., Integrating high-throughput technologies in the quest for effective biomarkers for ovarian cancer. Nat. Rev. Cancer 2010, 10, 371–378. www.clinical.proteomics-journal.com

318

T. Sajic et al.

Proteomics Clin. Appl. 2015, 9, 307–321

[9] Nilsson, T., Mann, M., Aebersold, R., Yates, J. R., 3rd et al., Mass spectrometry in high-throughput proteomics: ready for the big time. Nat. Methods 2010, 7, 681–685.

[27] Domon, B., Aebersold, R., Options and considerations when selecting a quantitative proteomics strategy. Nat. Biotechnol. 2010, 28, 710–721.

[10] Weston, A. D., Hood, L., Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. J. Proteome Res. 2004, 3, 179– 196.

[28] McDonald, W. H., Yates, J. R., 3rd, Shotgun proteomics and biomarker discovery. Dis. Markers 2002, 18, 99–105.

[11] Hamburg, M. A., Collins, F. S., The path to personalized medicine. N. Engl. J. Med. 2010, 363, 301–304. [12] Ahrens, C. H., Brunner, E., Qeli, E., Basler, K., Aebersold, R., Generating and navigating proteome maps using mass spectrometry. Nat. Rev. Mol. Cell Biol. 2010, 11, 789–801. [13] Liu, Y., Huttenhain, R., Collins, B., Aebersold, R., Mass spectrometric protein maps for biomarker discovery and clinical research. Expert Rev. Mol. Diagn. 2013, 13, 811–825. [14] Rask-Andersen, M., Almen, M. S., Schioth, H. B., Trends in the exploitation of novel drug targets. Nat. Rev. Drug Discov. 2011, 10, 579–590. [15] Parker, C. E., Pearson, T. W., Anderson, N. L., Borchers, C. H., Mass-spectrometry-based clinical proteomics—a review and prospective. Analyst 2010, 135, 1830–1838. [16] Sawyers, C. L., The cancer biomarker problem. Nature 2008, 452, 548–552. [17] Huttenhain, R., Soste, M., Selevsek, N., Rost, H. et al., Reproducible quantification of cancer-associated proteins in body fluids using targeted proteomics. Sci. Transl. Med. 2012, 4, 142ra194. [18] Anderson, N. L., The clinical plasma proteome: a survey of clinical assays for proteins in plasma and serum. Clin. Chem. 2010, 56, 177–185. [19] Zhang, B., Wang, J., Wang, X., Zhu, J. et al., Proteogenomic characterization of human colon and rectal cancer. Nature 2014, 513, 382–387. [20] Poste, G., Bring on the biomarkers. Nature 2011, 469, 156– 157. [21] Kim, K., Kim, Y., Preparing multiple-reaction monitoring for quantitative clinical proteomics. Expert Rev. Proteomics 2009, 6, 225–229. [22] Palmblad, M., Tiss, A., Cramer, R., Mass spectrometry in clinical proteomics—from the present to the future. Proteomics Clin. Appl. 2009, 3, 6–17. [23] Paulovich, A. G., Whiteaker, J. R., Hoofnagle, A. N., Wang, P., The interface between biomarker discovery and clinical validation: the tar pit of the protein biomarker pipeline. Proteomics Clin. Appl. 2008, 2, 1386–1402. [24] Law, K. P., Lim, Y. P., Recent advances in mass spectrometry: data independent analysis and hyper reaction monitoring. Expert Rev. Proteomics 2013, 10, 551–566. [25] Kellie, J. F., Tran, J. C., Lee, J. E., Ahlf, D. R. et al., The emerging process of top down mass spectrometry for protein analysis: biomarkers, protein-therapeutics, and achieving high throughput. Mol. Biosyst. 2010, 6, 1532–1539. [26] Aebersold, R., Mann, M., Mass spectrometry-based proteomics. Nature 2003, 422, 198–207.

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

[29] Delahunty, C. M., Yates, J. R., 3rd, MudPIT: multidimensional protein identification technology. Biotechniques 2007, 43, 563, 565, 567 passim. [30] Kim, M. S., Pinto, S. M., Getnet, D., Nirujogi, R. S. et al., A draft map of the human proteome. Nature 2014, 509, 575– 581. [31] Wilhelm, M., Schlegl, J., Hahne, H., Moghaddas Gholami, A. et al., Mass-spectrometry-based draft of the human proteome. Nature 2014, 509, 582–587. [32] Andrews, G. L., Dean, R. A., Hawkridge, A. M., Muddiman, D. C., Improving proteome coverage on a LTQ-Orbitrap using design of experiments. J. Am. Soc. Mass Spectrom. 2011, 22, 773–783. [33] Andrews, G. L., Simons, B. L., Young, J. B., Hawkridge, A. M., Muddiman, D. C., Performance characteristics of a new hybrid quadrupole time-of-flight tandem mass spectrometer (TripleTOF 5600). Anal. Chem. 2011, 83, 5442–5446. [34] Liu, H., Sadygov, R. G., Yates, J. R., 3rd, A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 2004, 76, 4193–4201. [35] Michalski, A., Cox, J., Mann, M., More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LCMS/MS. J. Proteome Res. 2011, 10, 1785–1793. [36] Picotti, P., Aebersold, R., Selected reaction monitoringbased proteomics: workflows, potential, pitfalls and future directions. Nat. Methods 2012, 9, 555–566. [37] Reiter, L., Rinner, O., Picotti, P., Huttenhain, R. et al., mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 2011, 8, 430–435. [38] Cima, I., Schiess, R., Wild, P., Kaelin, M. et al., Cancer genetics-guided discovery of serum biomarker signatures for diagnosis and prognosis of prostate cancer. Proc. Natl. Acad. Sci. USA 2011, 108, 3342–3347. [39] Surinova, S., Huttenhain, R., Chang, C. Y., Espona, L. et al., Automated selected reaction monitoring data analysis workflow for large-scale targeted proteomic studies. Nat. Protoc. 2013, 8, 1602–1619. [40] Lange, V., Picotti, P., Domon, B., Aebersold, R., Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 2008, 4, 222. [41] Addona, T. A., Abbatiello, S. E., Schilling, B., Skates, S. J. et al., Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat. Biotechnol. 2009, 27, 633– 641. [42] Surinova, S., Schiess, R., Huttenhain, R., Cerciello, F. et al., On the development of plasma protein biomarkers. J. Proteome Res. 2011, 10, 5–16.

www.clinical.proteomics-journal.com

Proteomics Clin. Appl. 2015, 9, 307–321

319

[43] Peterson, A. C., Russell, J. D., Bailey, D. J., Westphall, M. S., Coon, J. J., Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics. Mol. Cell. Proteomics 2012, 11, 1475–1488.

[58] Wang, L., Lockstone, H. E., Guest, P. C., Levin, Y. et al., Expression profiling of fibroblasts identifies cell cycle abnormalities in schizophrenia. J. Proteome Res. 2010, 9, 521– 527.

[44] Elias, J. E., Haas, W., Faherty, B. K., Gygi, S. P., Comparative evaluation of mass spectrometry platforms used in largescale proteomics investigations. Nat. Methods 2005, 2, 667– 675.

[59] Krishnamurthy, D., Levin, Y., Harris, L. W., Umrania, Y. et al., Analysis of the human pituitary proteome by data independent label-free liquid chromatography tandem mass spectrometry. Proteomics 2011, 11, 495–500.

[45] Chapman, J. D., Goodlett, D. R., Masselon, C. D., Multiplexed and data-independent tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 2014, 33, 452–470.

[60] Levin, Y., Wang, L., Schwarz, E., Koethe, D. et al., Global proteomic profiling reveals altered proteomic signature in schizophrenia serum. Mol. Psychiatry 2010, 15, 1088–1100.

[46] Soste, M., Hrabakova, R., Wanka, S., Melnik, A. et al., A sentinel protein assay for simultaneously quantifying cellular processes. Nat. Methods 2014, 11, 1045–1048. [47] Egertson, J. D., Kuehn, A., Merrihew, G. E., Bateman, N. W. et al., Multiplexed MS/MS for improved data-independent acquisition. Nat. Methods 2013, 10, 744–746. [48] Gillet, L. C., Navarro, P., Tate, S., Rost, H. et al., Targeted data extraction of the MS/MS spectra generated by dataindependent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 2012, 11, O111 016717. [49] Purvine, S., Eppel, J. T., Yi, E. C., Goodlett, D. R., Shotgun collision-induced dissociation of peptides using a time of flight mass analyzer. Proteomics 2003, 3, 847–850. [50] Venable, J. D., Dong, M. Q., Wohlschlegel, J., Dillin, A., Yates, J. R., Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat. Methods 2004, 1, 39–45. [51] Panchaud, A., Jung, S., Shaffer, S. A., Aitchison, J. D., Goodlett, D. R., Faster, quantitative, and accurate precursor acquisition independent from ion count. Anal. Chem. 2011, 83, 2250–2257. [52] Panchaud, A., Scherl, A., Shaffer, S. A., von Haller, P. D. et al., Precursor acquisition independent from ion count: how to dive deeper into the proteomics ocean. Anal. Chem. 2009, 81, 6481–6488. [53] Plumb, R. S., Johnson, K. A., Rainville, P., Smith, B. W. et al., UPLC/MS(E); a new approach for generating molecular fragment information for biomarker structure elucidation. Rapid Commun. Mass Spectrom. 2006, 20, 1989–1994. [54] Carvalho, P. C., Han, X., Xu, T., Cociorva, D. et al., XDIA: improving on the label-free data-independent analysis. Bioinformatics 2010, 26, 847–848.

[61] Hughes, M. A., Silva, J. C., Geromanos, S. J., Townsend, C. A., Quantitative proteomic analysis of drug-induced changes in mycobacteria. J. Proteome Res. 2006, 5, 54–63. [62] Silva, J. C., Denny, R., Dorschel, C., Gorenstein, M. V. et al., Simultaneous qualitative and quantitative analysis of the Escherichia coli proteome: a sweet tale. Mol. Cell. Proteomics 2006, 5, 589–607. [63] Cortezzi, S. S., Garcia, J. S., Ferreira, C. R., Braga, D. P. et al., Secretome of the preimplantation human embryo by bottom-up label-free proteomics. Anal. Bioanal. Chem. 2011, 401, 1331–1339. [64] Geiger, T., Cox, J., Mann, M., Proteomics on an Orbitrap benchtop mass spectrometer using all-ion fragmentation. Mol. Cell. Proteomics 2010, 9, 2252–2261. [65] Michalski, A., Damoc, E., Hauschild, J. P., Lange, O. et al., Mass spectrometry-based proteomics using Q Exactive, a high-performance benchtop quadrupole Orbitrap mass spectrometer. Mol. Cell. Proteomics 2011, 10, M111 011015. [66] Jung, S., Smith, J. J., von Haller, P. D., Dilworth, D. J. et al., Global analysis of condition-specific subcellular protein distribution and abundance. Mol. Cell. Proteomics 2013, 12, 1421–1435. [67] Acosta-Martin, A. E., Panchaud, A., Chwastyniak, M., Dupont, A. et al., Quantitative mass spectrometry analysis using PAcIFIC for the identification of plasma diagnostic biomarkers for abdominal aortic aneurysm. PLoS One 2011, 6, e28698. [68] Voss, J., Goo, Y. A., Cain, K., Woods, N. et al., Searching for the noninvasive biomarker holy grail: are urine proteomics the answer? Biol. Res. Nursing 2011, 13, 235–242. [69] Goo, Y. A., Cain, K., Jarrett, M., Smith, L. et al., Urinary proteome analysis of irritable bowel syndrome (IBS) symptom subgroups. J. Proteome Res. 2012, 11, 5650–5662.

[55] Silva, J. C., Denny, R., Dorschel, C. A., Gorenstein, M. et al., Quantitative proteomic analysis by accurate mass retention time pairs. Anal. Chem. 2005, 77, 2187–2200.

[70] Hengel, S. M., Murray, E., Langdon, S., Hayward, L. et al., Data-independent proteomic screen identifies novel tamoxifen agonist that mediates drug resistance. J. Proteome Res. 2011, 10, 4567–4578.

[56] Levin, Y., Hradetzky, E., Bahn, S., Quantification of proteins using data-independent analysis (MSE) in simple and complex samples: a systematic evaluation. Proteomics 2011, 11, 3273–3287.

[71] Weisbrod, C. R., Eng, J. K., Hoopmann, M. R., Baker, T., Bruce, J. E., Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification. J. Proteome Res. 2012, 11, 1621–1632.

[57] Distler, U., Kuharev, J., Navarro, P., Levin, Y. et al., Drift time-specific collision energies enable deep-coverage dataindependent acquisition proteomics. Nat. Methods 2014, 11, 167–170.

[72] MacLean, B., Tomazela, D. M., Shulman, N., Chambers, M. et al., Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26, 966–968.

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.clinical.proteomics-journal.com

320

T. Sajic et al.

Proteomics Clin. Appl. 2015, 9, 307–321

[73] Vogelsang, M. S., Prakash, A., Sarracino, D., Vadali, G. et al., Characterizing qualitative and quantitative global changes in the aging heart using pSMART, a novel acquisition method (http://www.thermoscientific.com/).

[87] Stahl-Zeng, J., Lange, V., Ossola, R., Eckhardt, K. et al., High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites. Mol. Cell. Proteomics 2007, 6, 1809–1817.

[74] Prakash, A., Peterman, S., Ahmad, S., Sarracino, D. et al., Hybrid data acquisition and processing strategies with increased throughput and selectivity: pSMART analysis for global qualitative and quantitative analysis. J. Proteome Res. 2014, 13, 5415–5430.

[88] Zawadzka, A. M., Schilling, B., Held, J. M., Sahu, A. K. et al., Variation and quantification among a target set of phosphopeptides in human plasma by multiple reaction monitoring and SWATH-MS2 data-independent acquisition. Electrophoresis 2014, 35, 3487–3497.

[75] Kiyonami, R., Vogelsang, M. S., Zabrouskov, V., Huhmer, A. et al., Large scale targeted protein quantification using WiSIM-DIA workflow on a Orbitrap fusion tribrid mass spectrometer. Thermo Scientific Application Note 600 2014. http://www.thermoscientific.com/en/product/orbitrapfusion-tribrid-mass-spectrometer.html.

[89] Guo, T., Aebersold, R., 62nd ASMS Conference on Mass Spectrometry and Allied Topics, ETH Zurich, Zurich, Switzerland 2014.

[76] Liu, Y., Huttenhain, R., Surinova, S., Gillet, L. C. et al., Quantitative measurements of N-linked glycoproteins in human plasma by SWATH-MS. Proteomics 2013, 13, 1247–1256. [77] Liu, Y., Chen, J., Sethi, A., Li, Q. K. et al., Glycoproteomic analysis of prostate cancer tissues by SWATH mass spectrometry discovers N-acylethanolamine acid amidase and protein tyrosine kinase 7 as signatures for tumor aggressiveness. Mol. Cell. Proteomics 2014, 13, 1753–1768. [78] Zhang, F., Lin, H., Gu, A., Li, J. et al., SWATH- and iTRAQbased quantitative proteomic analyses reveal an overexpression and biological relevance of CD109 in advanced NSCLC. J. Proteomics 2014, 102, 125–136. [79] Findlay, G. M., Smith, M. J., Lanner, F., Hsiung, M. S. et al., Interaction domains of Sos1/Grb2 are finely tuned for cooperative control of embryonic stem cell fate. Cell 2013, 152, 1008–1020. [80] Escher, C., Reiter, L., MacLean, B., Ossola, R. et al., Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 2012, 12, 1111–1121. [81] Lee, H. J., Lee, E. Y., Kwon, M. S., Paik, Y. K., Biomarker discovery from the plasma proteome using multidimensional fractionation proteomics. Curr. Opin. Chem. Biol. 2006, 10, 42–49. [82] Zhang, H., Li, X. J., Martin, D. B., Aebersold, R., Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat. Biotechnol. 2003, 21, 660–666. [83] Kalin, M., Cima, I., Schiess, R., Fankhauser, N. et al., Novel prognostic markers in the serum of patients with castrationresistant prostate cancer derived from quantitative analysis of the pten conditional knockout mouse proteome. Eur. Urol. 2011, 60, 1235–1243. [84] Zhang, H., Liu, A. Y., Loriaux, P., Wollscheid, B. et al., Mass spectrometric detection of tissue proteins in plasma. Mol. Cell. Proteomics 2007, 6, 64–71. [85] Schiess, R., Wollscheid, B., Aebersold, R., Targeted proteomic strategy for clinical biomarker discovery. Mol. Oncol. 2009, 3, 33–44. [86] Zhang, H., Yi, E. C., Li, X. J., Mallick, P. et al., High throughput quantitative analysis of serum proteins using glycopeptide capture and liquid chromatography mass spectrometry. Mol. Cell. Proteomics 2005, 4, 144–155.

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

[90] Collins, B. C., Gillet, L. C., Rosenberger, G., Rost, H. L. et al., Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14–3–3 system. Nat. Methods 2013, 10, 1246–1253. [91] Lambert, J. P., Ivosev, G., Couzens, A. L., Larsen, B. et al., Mapping differential interactomes by affinity purification coupled with data-independent mass spectrometry acquisition. Nat. Methods 2013, 10, 1239–1245. [92] Goh, W. W., Wong, L., Sng, J. C., Contemporary network proteomics and its requirements. Biology 2013, 3, 22–38. [93] Christie L., Hunter, B. C., Gillet, L., Aebersold, R., ASMS 2014 Scientific Presentations, AB SCIEX, Redwood City, CA; ETH Zurich, Zurich, Switzerland 2014. [94] Picotti, P., Clement-Ziza, M., Lam, H., Campbell, D. S. et al., A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature 2013, 494, 266– 270. [95] Toprak, U. H., Gillet, L. C., Maiolica, A., Navarro, P. et al., Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol. Cell. Proteomics 2014. [96] Lam, H., Deutsch, E. W., Eddes, J. S., Eng, J. K. et al., Building consensus spectral libraries for peptide identification in proteomics. Nat. Methods 2008, 5, 873–875. ¨ [97] Rosenberger, G., Koh, C. C., Guo, T., Rost, H. L. et al., A repository of assays to quantify 10,000 human proteins by SWATH-MS. Scientific Data 2014, 1, 140031. [98] Rost, H. L., Rosenberger, G., Navarro, P., Gillet, L. et al., OpenSWATH enables automated, targeted analysis of dataindependent acquisition MS data. Nat. Biotechnol. 2014, 32, 219–223. [99] Bertsch, A., Gropl, C., Reinert, K., Kohlbacher, O., OpenMS and TOPP: open source software for LC-MS data analysis. Methods Mol. Biol. 2011, 696, 353–367. ´ B., Schmid, E. et al., iPor[100] Kunszt, P., Blum, L., Hullar, tal: The Swiss grid proteomics portal: Requirements and new features based on experience and usability considerations. Concurrency Computat.: Pract. Exper. 2014. DOI: 10.1002/cpe.3294. [101] Kessner, D., Chambers, M., Burke, R., Agus, D., Mallick, P., ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 2008, 24, 2534–2536. [102] Bernhardt, O. M., Selevsek, N., Gillet, L. C., Rinner, O. et al., Spectronaut: a fast and efficient algorithm for MRM-like

www.clinical.proteomics-journal.com

Proteomics Clin. Appl. 2015, 9, 307–321

processing of data independent acquisition (SWATH-MS) data. [103] Rosenberger, G., Ludwig, C., Rost, H. L., Aebersold, R., Malmstrom, L., aLFQ: an R-package for estimating absolute protein quantities from label-free LCMS/MS proteomics data. Bioinformatics 2014, 30, 2511– 2513. [104] Choi, M., Chang, C. Y., Clough, T., Broudy, D. et al., MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics 2014. [105] Held, J. M., Schilling, B., D’Souza, A. K., Srinivasan, T. et al., Label-free quantitation and mapping of the ErbB2 tumor receptor by multiple protease digestion with data-dependent (MS1) and data-independent (MS2) acquisitions. Int. J. Proteomics 2013, 2013, 791985. [106] Chace, D. H., Mass spectrometry in the clinical laboratory. Chem. Rev. 2001, 101, 445–477.

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

321 [107] Roux, A., Lison, D., Junot, C., Heilier, J. F., Applications of liquid chromatography coupled to mass spectrometrybased metabolomics in clinical chemistry and toxicology: a review. Clin. Biochem. 2011, 44, 119–135. [108] Strathmann, F. G., Hoofnagle, A. N., Current and future applications of mass spectrometry to the clinical laboratory. Am. J. Clin. Pathol. 2011, 136, 609–616. [109] Alkhalfioui, F., Magnin, T., Wagner, R., From purified GPCRs to drug discovery: the promise of protein-based methodologies. Curr. Opin. Pharmacol. 2009, 9, 629–635. [110] Simpson, R. J., Bernhard, O. K., Greening, D. W., Moritz, R. L., Proteomics-driven cancer biomarker discovery: looking to the future. Curr. Opin. Chem. Biol. 2008, 12, 72–77. [111] Goh, W. W., Wong, L., Networks in proteomics analysis of cancer. Curr. Opin. Biotechnol. 2013, 24, 1122–1128. [112] Goh, W. W., Wong, L., Computational proteomics: designing a comprehensive analytical strategy. Drug Discov. Today 2014, 19, 266–274.

www.clinical.proteomics-journal.com

Using data-independent, high-resolution mass spectrometry in protein biomarker research: perspectives and clinical applications.

In medicine, there is an urgent need for protein biomarkers in a range of applications that includes diagnostics, disease stratification, and therapeu...
502KB Sizes 0 Downloads 9 Views