Journal of Chromatography A, 1373 (2014) 61–72

Contents lists available at ScienceDirect

Journal of Chromatography A journal homepage: www.elsevier.com/locate/chroma

Inversion of peak elution order prevents uniform time alignment of complex liquid-chromatography coupled to mass spectrometry datasets Vikram Mitra a,d,e , Age Smilde b , Huub Hoefsloot b , Frank Suits c , Rainer Bischoff a,d,e , Péter Horvatovich a,d,e,∗ a

Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands c IBM T.J. Watson Research Centre, 1101 Kitchawan Road, Yorktown Heights, 10598 NY, USA d Netherlands Bioinformatics Centre, Geert Grooteplein 28, 6525 GA Nijmegen, The Netherlands e Netherlands Proteomics Centre, Padualaan 8, 3584 CH Utrecht, The Netherlands b

a r t i c l e

i n f o

Article history: Received 31 August 2014 Received in revised form 24 October 2014 Accepted 27 October 2014 Available online 13 November 2014 Keywords: Retention time alignment Inversion of peak elution order Shotgun proteomics Liquid chromatography Data pre-processing Orthogonal separation

a b s t r a c t Retention time alignment is one of the most challenging steps in processing LC–MS datasets of complex proteomics samples acquired within a differential profiling study. A large number of time alignment methods have been developed for accurate pre-processing of such datasets. These methods generally assume that common compounds elute in the same order but they do not test whether this assumption holds. If this assumption is not valid, alignments based on a monotonic retention time function will lose accuracy for peaks that depart from the expected order of the retention time correspondence function. To address this issue, we propose a quality control method that assesses if a pair of complex LC–MS datasets can be aligned with the same alignment performance based on statistical tests before correcting retention time shifts. The algorithm first confirms the presence of an adequate number of common peaks (>∼100 accurately matched peak pairs), then determines if the probability for a conserved elution order of those common peaks is sufficiently high (>0.01) and finally performs retention time alignment of two LC–MS chromatograms. This procedure was applied to LC–MS and LC–MS/MS datasets from two different inter-laboratory proteomics studies showing that a large number of common peaks in chromatograms acquired by different laboratories change elution order with considerable retention time differences. © 2014 Elsevier B.V. All rights reserved.

1. Introduction Liquid chromatography coupled online to mass-spectrometry (LC–MS) is a powerful method for the quantification and identification of peptides in complex proteomics samples. Identification of proteins and peptides that discriminate among predefined groups of samples, for example as part of biomarker discovery studies, is an important activity in proteomics. The comparative and comprehensive profiling of complex body fluids, such as cerebrospinal fluid or serum, requires LC–MS

∗ Corresponding author at: Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands. Tel.: +31 50 363 3341; fax: +31 50 363 7582. E-mail addresses: [email protected], [email protected] (P. Horvatovich). http://dx.doi.org/10.1016/j.chroma.2014.10.101 0021-9673/© 2014 Elsevier B.V. All rights reserved.

platforms with high resolution and sample throughput. These two requirements often conflict with each other, since comprehensive, high-resolution analytical methods require considerable analysis time per sample. One way to increase sample throughput without compromising the overall comprehensiveness of the analysis is to share analysis tasks between multiple instruments operating in the same or different laboratories. To benefit from the collected information it is, however, necessary to integrate the data, which requires sophisticated data pre-processing workflows and thorough quality control [1]. Similar challenges exist when samples are analyzed in the same laboratory but in different analysis batches on the same instrument or using multiple instruments, and the obtained datasets needs to be analyzed and evaluated as an ensemble. This situation often occur in biomarker discovery, when discovery results are validated with independent sets of samples analyzed over several weeks or months after the analysis of the set of samples used for the initial discovery.

62

V. Mitra et al. / J. Chromatogr. A 1373 (2014) 61–72

Data integration of chromatograms obtained from diverse sources such as inter-laboratory studies, studies performed in single laboratory but in multiple batches analyses, or studies based on a single batch performed when the overall analysis time is on the order of weeks, are challenging due to changes in fixed data acquisition parameters related to chromatographic conditions such as eluent composition, alteration of stationary phase and gradient program. Additionally many other factors may affect the quality and consistency of LC–MS data, such as ion pairing reagent, type and pore size of the chromatographic material [2], ionization and ion transmission efficiency and the type and model of mass spectrometer. Diversity in experimental workflows, even when identical samples are analyzed, may produce highly variable chromatograms leading to differences in peak selection in data dependent MS/MS acquisition [3]. Recently, quality control metrics to monitor some of these factors [4] have been introduced using a standard yeast sample [5]. Correction of non-linear retention time shifts between chromatograms is one of the most challenging data pre-processing steps in the integration of LC–MS data from different sources. Time alignment algorithms use retention times of common compounds to derive the retention time correspondence function between two chromatograms assuming a non-linear monotonic transformation [6–10]. Most of these methods correct retention time differences between pairs of chromatograms [6–11], while relatively few optimize the global time alignment among multiple chromatograms in one step [12]. Some time alignment methods construct a global peak list or a consensus map after combining multiple LC–MS datasets or peak lists, generally using information obtained from pair-wise retention time alignments [13–16]. Other time alignment approaches use raw data and full or partial spectral information for retention time alignment, such as single stage MS spectra in LC–MS [8,9,10,14,16,17,18] or UV spectra in LC–DAD [19]. Time alignment procedures in the literature are often confounded with a peak matching step. Time alignment is an optional step in a data pre-processing workflow that corrects only for retention time shift of features or raw signals. In contrast, the peak matching step is compulsory steps since it matches detected features in multiple LC–MS(/MS) chromatograms. The accuracy of the peak matching procedure is, however, influenced by the ability of the peak matching algorithm to determine the exact location of a peak in different chromatograms, and this ability can be enhanced by more accurate retention time alignment. Early versions of these methods relied on total ion chromatograms (TICs) to correct retention time shifts [11], while the newer generation of algorithms developed for the alignment of complex proteomics and metabolomics datasets make use of information provided by resolution in the m/z dimension to assure that only common compounds are selected to calculate the alignment function [8–10,14,16–18]. Another aspect that discriminates time alignment algorithms is how well they can accommodate large retention time differences. Some algorithms based on correlation optimized warping [9,11] or dynamic time warping [8,15] have a restricted search space, while others typically using regression analysis do not include such constraints [16,17]. Most algorithms have been developed and tested with datasets from ‘within-batch samples’ that were obtained in a single laboratory with well-controlled acquisition parameters. These datasets have low compound signal variation due to nearly identical sample composition and analysis conditions. These algorithms assume that separation between chromatograms is parallel so that a monotonic non-linear retention time correspondence function can be calculated, or in other words that there is no change in the elution order of common peaks. A prerequisite for the proper application of such algorithms to complex LC–MS datasets is that the assumption of monotonicity of the retention time correspondence function is valid [6,18,20]. It is rare for

compounds to change elution order in datasets acquired within the same well-controlled LC–MS acquisition batch [6,21,22], but this is commonly observed in chromatograms acquired under diverse experimental conditions, including data from different chromatographic columns, data acquired during single long analysis batch, or data acquired in different analysis batch [23,24]. Peak order elution changes are commonly observed when using C18 reverse phase materials from different manufacturers or when elution conditions, such as flow rate, gradient program, ion pairing agent or pH change [22,24–30]. Indeed, the linear-solvent-strength theory (LSS) developed by Snyder et al. [22,24] provides a theoretical explanation for elution order changes of peaks or slight orthogonality separations observed in different chromatograms. Peak elution order inversion is due to different slope (referred to as parameter S) of compounds in the linear model of retention factor using volume fraction organic content change as the independent parameter. Krokhin et al. built a model to predict slope S for tryptic peptides using parameters that are derived from the peptide sequence, such as molecular weight, charge state and molecular hydrophobicity index [26,27]. However, current time alignment methods do not perform any quality check if the conditions needed to align two chromatograms (i.e. the presence of a sufficient number of common compounds that follow a non-linear monotonic retention time trend and therefore have the same elution order) are met prior to proceeding with the alignment. This bears the risk that chromatograms may be incorrectly aligned and that the subsequent comparisons are invalid. In the current literature there is only the generalized fuzzy Hough transform method that calculates correspondences between acquisition-time-sorted sets of NMR spectra acquired within the same batch in order to identify peak order inversion [31]. This algorithm was recently implemented to analyse LC–MS data [32]. However, this method requires a set of analyses in one batch where the retention time changes of common compound’s peaks with elution order inversion changes can be traced in small consecutive steps. This requires traced compounds to be present in the majority of chromatograms and it therefore cannot be applied in general way to compounds present only in small fraction of the chromatograms, or compounds that show gaps in the trace-path of subsequently analyzed chromatograms. This condition requires the datasets to be obtained with a single, long analysis batch in case of LC–MS acquisition and shifting of analytical conditions can only happen in small steps. It cannot be used in datasets obtained in different analysis batches where analytical conditions are substantially different. In this paper, we demonstrate that time alignment methods assuming a monotonic retention time corresponding function are not able to correct non-linear retention time shifts uniformly accurate for all peaks and chromatographic positions between LC–MS chromatograms having a considerable change in peak elution order of common compounds. Additionally, we describe a quality control method for LC–MS and LC–MS/MS data to assess if two chromatograms have parallel separation and therefore can be aligned with uniform accuracy and if the chromatograms have enough common compounds to drive the alignment procedure. We have assessed the accuracy of the method with simulated data and with complex proteomics LC–MS(/MS) datasets that were obtained in different laboratories. The method uses LC–MS or LC–MS/MS peak lists and first assesses if there are a sufficient number of common compounds and estimates the probability of a similar elution order of the common peaks using orthogonal residuals with respect to the monotonic non-linear retention time correspondence function. When a sufficient number of accurately matched peak pairs (>∼100) has been identified and the probability for similar elution of common peaks is high (>0.01) the two peak lists are aligned with a non-linear monotonic function. The specific value for the required number of peak pairs will depend on the parameters of the experiment and cannot in general be specified either as a fixed value or

V. Mitra et al. / J. Chromatogr. A 1373 (2014) 61–72

as a fraction of the number of peak pairs, but for the purposes of this paper and for this experiment we are using 100 as the required number. 2. Materials and methods 2.1. Rat CSF and mouse serum datasets Cerebrospinal fluid sample was obtained from an ongoing biomarker discovery study in a rat model of experimental autoimmune encephalomyelitis (EAE). Mouse serum datasets acquired according to an experimental design were obtained from the National Cancer Institute’s Mouse Proteomic Technologies Initiative [29]. The study design, sampling method, sample preparation and LC–MS/MS analysis in two laboratories is described in supporting information (Sections 1 and 2) and the rat cerebrospinal fluid (CSF) dataset was obtained according to Rosenling et al. [33]. 3. Theory 3.1. Simulated datasets The outcome of peak matching is a set of matched peak pairs (MPPs) characterized by two retention times corresponding to the matched peaks in the two chromatograms. To assess the accuracy of the peak matching procedure we used Monte-Carlo simulations to separate accurately matched peak pairs (AMPPs) and inaccurately matched peak pairs (IMPPs). AMPPs were simulated as two retention time coordinates sampled from a non-linear monotonic trend with a constant fluctuation. The peak density along the monotonic retention time trend is obtained from the peak density of one LC–MS chromatogram. The retention time coordinates of randomly matched peak pairs (RMPPs) were sampled separately from the peak density of two different LC–MS chromatograms. The Monte Carlo simulation therefore had six parameters: (1–3) retention time values of MPPs based on three LC–MS chromatograms, two of which are used to create RMPPs) and one to calculate the distribution of AMPPs along the monotonic retention time trend, (4) the total number of peak pairs, (5) the ratio of AMPPs with respect to all MPPs and the (6) the fluctuation of AMPPs along the main monotonic non-linear retention time correspondence trend. By using three different LC–MS chromatograms, we have varied parameters 4–6 (see supporting information Section 3 “Monte-Carlo simulated datasets” for details). More details of these simulations can be found in the supporting information (Section 3). 3.2. Quality control and time alignment algorithms 3.2.1. Theoretical background of quality control metrics There are two conditions that must be met for the successful retention time alignment of LC–MS chromatograms. The first is that the two analyzed samples should have a sufficient number of common compounds which can drive the time alignment procedure, and second that the elution order of the common compounds should be the same in both chromatograms, e.g. separation of the two chromatograms should be parallel. Only when these conditions are fulfilled it is possible to determine one-to-one correspondences between the common compounds unambiguously and to derive a non-linear monotonic function that corrects the retention time differences between the two chromatograms in a general approach. Our quality control method uses peak lists as input and begins with a peak matching procedure to determine common peaks. The peak matching is performed based on either the agreement of a sequence of identified peptides (peptide sequence-based peak matching) or by pairing peaks that are close in mass and intensity

63

rank (intensity-rank-based peak matching). Due to the error rate of both approaches, the matching procedure generates AMPPs and IMPPs, although we consider that due to the rather conservative MS/MS identification parameters (Section 6 in supporting information) the percentage of inaccurately identified and matched peak pairs is below 1%. The quality control procedure assesses if a sufficient number of common peaks (AMPPs) can be found and if these common peaks follow the same elution order. We subsequently used the AMPPs to derive the non-linear monotonic retention time trend by means of locally weighted scatterplot smoothing (LOWESS) regression. An insufficient number of AMPPs (less than 100) compose the first time alignment quality metric and indicates that either there are too few common compounds in the samples or that the peak pairing method was not able to find them. The second time alignment quality metric is based on two F-tests comparing the orthogonal residual variance obtained (1) between two chromatograms of interest, and (2) between one chromatogram of interest and a chromatogram that was acquired in the same analysis batch having the same elution order as the chromatograms of interest. The F-test provides the probability of the null hypothesis concerning the equality of the two orthogonal variances. The largest retention time difference of AMPPs that change elution order (maxD) is determined, which represents the largest error to determine a corresponding location of one chromatogram along the retention time dimension in the other chromatogram after alignment. 3.2.2. Probability of parallel separation in two chromatograms Selection of AMPPs allows the determination of a monotonic non-linear retention time correspondence trend between two LC–MS chromatograms of interest and the magnitude of fluctuation of AMPPs around this trend, which is expressed as orthogonal residual variance. This orthogonal variance is compared to the orthogonal variance of AMPPs from two LC–MS chromatograms with no peak elution order inversion of common compounds, e.g. chromatograms that were obtained within the same analysis batch and acquired under the same well-controlled chromatographic conditions. One chromatogram should be the same between these two pairs of chromatograms to minimize dissimilarities between chromatogram pairs and their effect on the orthogonal variance. Changes in peak elution order of common compounds are indicated by an increase in the fluctuation around the main monotonic retention time trend. AMPPs show also some fluctuation even when peak elution order of common compounds is preserved, which could be viewed as a form of background fluctuation. This background fluctuation exists because of the error to determine the peak location by peak detection algorithms or due to the imprecise location of an MS/MS event within a single stage peak (Fig. S1). For that reason we determined the probability that peak elution order of common compounds is conserved using an F-test comparing the orthogonal residual variance (fluctuation) of AMPPs around the main monotonic retention time trend in two pairs of chromatograms. There are two chromatograms of interest leading to two pairs of orthogonal variances allowing to perform two F-tests. The F-tests are performed with the null hypothesis that the two orthogonal variances are equal, which is rejected at a user-defined Type-I error threshold (0.01 in this case). The F-tests have the null hypothesis that separation in the two chromatograms of interest is parallel (variance represented by the orthogonal residual is due to error determining peak positions). The alternative hypothesis is that separation in the two chromatograms has some orthogonality. 3.3. Description of the quality control algorithm Fig. 1 shows the main parts of the quality control method to test peak elution order inversion and if chromatograms have

64

V. Mitra et al. / J. Chromatogr. A 1373 (2014) 61–72

Fig. 1. Flow chart showing the four most important steps of the quality control method for assessment of uniform time alignment of two LC–MS(/MS) chromatograms. The first step consists of matching peaks between two chromatograms. This matching procedure can be performed based on agreement of identified peptide sequences and PTMs using LC–MS/MS data, or using an intensity-rank-based peak matching procedure on two single-stage LC–MS peak lists. The second step consists of the selection of AMPPs. In the third step, the non-linear main monotonic retention time correspondence function is determined using LOWESS regression. Peak elution order similarity between the two LC–MS(/MS) chromatograms is performed using two F-tests on the orthogonal residual variance. When the probability of the null hypothesis of the F-test is above the significance level (>0.01), correction of the retention times of the sample peak lists can be performed with uniform high accuracy. The procedure results in an aligned peak list of the sample chromatogram (peak list 2 in red) to the reference chromatogram (peak list 1 in black). The algorithm stops: (1) a low number of MPPs (0.9). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

4.3. Comparing LC–MS(/MS) data from inter-laboratory studies 4.3.1. Rat cerebrospinal fluid LC–MS/MS data The new quality control method was developed with the purpose to assess, with suitable confidence, if it is possible to establish one-to-one correspondences between peaks in complex LC–MS datasets acquired in different analysis batches. In other words the quality control method assesses if two LC–MS(/MS) chromatograms can be aligned with the same accuracy or not. We have tested the performance of our method using two datasets acquired during inter-laboratory studies. The first dataset that we evaluated comprised seven complex proteomics samples of trypsin-digested rat CSF. All samples were measured in two laboratories using different C18 LC columns, gradient programs and mass analysers (Orbitrap and qTOF) operated in data-dependent MS/MS acquisition mode. A database search identified approximately unique 900 peptides in the samples analyzed on the qTOF and approximately 2300 peptides using the Orbitrap. The difference in the number of identified peptides between the two laboratories is due to the higher sensitivity, resolution and mass accuracy of the Orbitrap instrument. To establish the interrelationship, we focused on a single sample (sample 6) and found 675 common peptides based on assigned peptide sequences and PTMs and less than 0.01 Da of precursor mass difference. Peptide identification based on MS/MS spectra thus enabled us to construct a ‘ground truth’ and to assess the performance of the quality control method for alignment of inter-laboratory LC–MS/MS datasets.

We first matched peaks in the two datasets based on the identified peptide sequences including some of the possible PTMs (Fig. 3a–c; see PTM list in Section 6 “Peptide identification parameters”, supporting information), subsequently using an intensity-rank-based peak matching procedure on chargedeconvoluted lists of precursor ions (Fig. 3e–g) and finally using an intensity-rank-based peak matching procedure on isotopeand charge-deconvoluted peak lists based on pre-processing the single-stage LC–MS data with the OpenMS peak detection method (Fig. 3h–j) [34] (see Section 4.1 “Pre-processing of single stage LC–MS data”, supporting information). This step was followed by applying our method to select AMPPs (see step 2 in Fig. 1). The peak matching procedures were applied to two LC–MS chromatograms that were acquired in inter- (Fig. 3a, c, d, f, g and i) or intralaboratory studies (Fig. 3b, e and h). Fig. 3a–c shows results for MPPs obtained with peptidesequence-based peak matching. Larger fluctuation of AMPPs is observed in inter-laboratory LC–MS/MS chromatograms (middle scatter plot) around the main monotonic retention time correspondence trend compared to AMPPs obtained from intra-laboratory LC–MS peak lists (Fig. 3a and c). This indicates a considerable extent of elution order inversion of common peaks (see Fig. 5 for an example) with large retention time differences and thus indicates some orthogonality of separations between the two interlaboratory chromatograms. Precursor ion peak lists allow estimation of the peak pairing error that occurs when using intensity-rank-based peak

V. Mitra et al. / J. Chromatogr. A 1373 (2014) 61–72

67

Fig. 3. Scatter plots showing retention times of MPPs between two LC–MS(/MS) chromatograms after the first step of the quality control method. All chromatograms were reconstituted from LC–MS/MS data of sample 6 from trypsin-digested rat CSF data set. The first row (a, b, c) shows the outcome of peak matching using identified peptide sequences and PTMs on precursor ion peak lists, the second row (d, e, f) shows results for two precursor ion peak lists using intensity-rank-based peak matching and the third row (g, h, i) shows results for two single-stage LC–MS peak lists and matched with the intensity-rank-based peak matching procedure. The scatter plots in the middle column were obtained from two LC–MS chromatograms that were acquired in two different laboratories using the same sample, while the right and left columns were obtained with two LC-M(/MS) chromatograms from the same laboratory on two different samples, from which one was used in the middle scatter plot. The main non-linear monotonic retention time correspondence functions obtained with LOWESS regression are represented as solid red lines and the boundaries of AMPPs are presented as red dashed lines. Blue dots are MPPs and green circle represent selected peak pairs considered to be accurately matched (AMPPs). These plots show that highly similar non-linear monotonic retention time trends and similar extent of fluctuation of AMPPs along this trend were obtained independently of the type of peak lists or peak matching methods. From all combinations of peak lists and peak matching methods a larger orthogonal variance of AMPPs was observed between two inter-laboratory chromatograms compared to results obtained between two intra-laboratory chromatograms acquired in the same analysis batch. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

matching. The second row of Fig. 3 (scatter plots d–f) was obtained by matching the precursor ion peak lists of common peptides in the two LC–MS/MS chromatograms with the intensity-rank-based peak matching method. The intensity-rank-based peak matching method found 61.3 ± 8.2% of the MPPs that were matched based on peptide sequence information in the seven samples acquired in the two laboratories, while the number of IMPPs was low (7.86 ± 1.95 IMPPs out of a total of 114.0 ± 81.9 AMPPs). Based on these results we endeavored to match single-stage LC–MS peak lists from the inter-laboratory study of the rat CSF proteome using the intensity-rank-based peak matching procedure. The peak list in this case was obtained after pre-processing single stage LC–MS part of the LC–MS/MS chromatograms with OpenMS (Fig. 3g–i). The intensity-rank-based peak matching procedure on the obtained peak lists provides a precise extraction of AMPPs and derives a retention time correspondence function that is similar to the function obtained with precursor ion peaks list using both peak matching methods. Similarly the AMPPs show much larger fluctuation around the main monotonic time alignment trend when the data were acquired in different laboratories

compared to intra-laboratory chromatogram pairs. Single stage peak lists paired with the intensity-rank-based peak matching procedure resulted in fewer IMPPs in intra-laboratory chromatograms (Fig. 3g and i) compared to chromatograms that were run in different laboratories (Fig. 3h) despite the fact that the intra-laboratory data were obtained from different animals of the same treatment group. A larger fluctuation was observed for the AMPPs around the main monotonic trend for peak lists that were matched with the intensity-rank-based method on precursor ion peak lists derived from MS/MS data (Fig. 3d and f), than for single stage LC–MS peak lists using the same peak matching procedure (Fig. 3g and i). In both cases the LC–MS/MS data were generated within the same laboratory and the same analysis batches. This difference can be attributed to the more accurate determination of peak maxima using singlestage MS data than the rather imprecise retention time value of precursor ion. This imprecision is due to switching between MS and MS/MS modes in a data-dependent manner, where precursor ion retention times are not precisely defined due to random sampling of the chromatographic peak above the MS/MS triggering

Intensity-rankbased peak matching Single stage MS 26.19 (109, 220) 16.09 (109, 144) Numbers in parenthesis shows the two degrees of freedom of the F-test. a

0.853 ± 0.602 0.406 ± 0.230 0.437 ± 0.244 LC–MS data from of the National Cancer Institute’s Mouse Proteomic Technology Initiative (Fig. S10 in supporting information) Lab1 Gly LungEGFR tumor vs Lab1 Gly LungEGFR healthy 0.823 ± 0.517 a Lab1 Gly LungEGFR tumor vs Lab2 Gly LungEGFR tumor 0.823 ± 0.517 b Lab2 Gly LungEGFR tumor vs Lab2 Gly LungEGFR healthy 0.406 ± 0.230 c

4.18 9.87 5.88

Intensity-rankbased peak matching Single stage MS 1181.5 (212, 266) 71.55 (212, 118) 0.468 ± 0.230 0.510 ± 0.300 0.597 ± 0.322 0.453 ± 0.167 0.453 ± 0.167 0.510 ± 0.300 Lab1 sample 6 vs Lab1 sample 12 Lab1 sample 6 vs Lab2 sample 6 Lab2 sample 6 vs Lab2 sample 12 g h i

6.14 23.15 7.80

0.468 ± 0.230 0.510 ± 0.300 0.597 ± 0.322 0.453 ± 0.167 0.453 ± 0.167 0.510 ± 0.300 Lab1 sample 6 vs Lab1 sample 12 Lab1 sample 6 vs Lab2 sample 6 Lab2 sample 6 vs Lab2 sample 12 d e f

6.68 43.37 7.67

398.50 (253, 868) 56.90 (253, 111)

Precursor ion

Identified peptides sequence and PTM Intensity-rankbased peak matching Precursor ion 94.82 (455,859) 21.09 (455, 218) 7.18 50.75 7.95 0.468 ± 0.230 0.510 ± 0.300 0.597 ± 0.322 0.453 ± 0.167 0.453 ± 0.167 0.510 ± 0.300 LC–MS/MS data from rat CSF dataset (Fig. 3) Lab1 sample 6 vs Lab1 sample 12 a Lab1 sample 6 vs Lab2 sample 6 b Lab2 sample 6 vs Lab2 sample 12 c

Type of peak list F-testsa Maximal distance Peak width LC–MS Y axis Peak width LC–MS X axis Chromatograms involved in the alignment procedure Plot label

threshold (see Fig. S1, supporting information for further explanation). The difference between intensity-rank-based peak matching of precursor ions and single-stage MS peak lists is further magnified by the considerable amount of peak tailing (6–7 min) (see tailing peak example and histogram of peak width at half maximum in Fig. S8, supporting information). Table 1 shows the key outcome of the two F-tests that were performed between the orthogonal residual variances of chromatograms obtained in inter- and intra-laboratory studies for each type of peak matching procedures and type of peak list. The Ftests (test statistics 398.50–21.09 with p-values < 0.01) indicate large differences in the fluctuation of peak pairs along the main monotonic retention time correspondence trend. This provides strong evidence of peak order inversion between common peaks acquired in two laboratories independently from the type of peak list and peak matching procedure. The p-value of the F-test’s null hypothesis, where the F-test was performed on orthogonal residuals of two pairs of chromatograms from the same laboratory varied between 0.0406 and 0.304 (data not shown in Table 1) indicating a high probability for the same elution order of common peaks. As expected, the largest retention time difference of selected AMPPs with a different elution order (maxD) and the amount of the orthogonal residuals were much more pronounced for the inter-laboratory than for the intra-laboratory peak lists (Fig. 3 and Table 1). The largest observed maxD in intra-laboratory AMPPs was 6.2–11.2-times the peak width at half maximum, while the maxD observed in inter-laboratory AMPPs was between 47.5 and 116.6-times the peak width at half maximum. This large difference between values is partially due to peak elution order inversion of common compounds because of the different gradient programs and different C18 columns that were used in the two laboratories and to the difference in determining peak locations (precursor ion MS/MS event versus single-stage MS peak maxima). The maxD of AMPPs obtained with intensity-rank-based peak matching of inter-laboratory single-stage LC–MS peak lists was lower (53.4 and 47.5-times the peak width at half maximum) than in precursor ion peak lists matched using peptide sequence information (116.6 and 103.5-times the peak width) or on intensity-rank-based matched precursor ion peak lists (95.3 and 84.7-times the peak width at half maximum). maxDs define the retention time error to determine corresponding retention time locations of the sample chromatogram in the reference chromatogram based on time alignment. Fig. 4a shows the boundary of maxD superimposed on scatter plots of peak pairs that were matched using peptide sequences and PTMs of inter-laboratory LC–MS/MS chromatograms. The maxD for inter-laboratory LC–MS/MS chromatogram pairs (red dashed line) is considerably larger compared to the maxD for intra-laboratory LC–MS/MS chromatogram pairs (two green dashed line) emphasizing the “inter-laboratory effect”, and the presence of peaks that change elution order with large retention time differences. The best fit monotonic retention time correction function obtained with the LOWESS procedure using AMPPs for interlaboratory chromatograms of sample 6 yields an ambiguous alignment. Fig. 5 shows an example of extracted ion chromatograms for three peaks in both chromatograms after alignment each chromatogram to the other. The extracted ion chromatograms before alignment is shown in Fig. S9 (supporting information). The green peaks located on the main monotonic function are perfectly aligned, while the two other peaks, which are located far from the monotonic retention time correspondence function, are far from each other. Any other type of correction using one monotonic retention time correspondence function will yield similarly poor results, which means that peaks close to the correcting function will be well corrected, while distant peaks from this function will be misaligned. Although in the presented chromatograms all peaks become closer

Peak matching procedure

V. Mitra et al. / J. Chromatogr. A 1373 (2014) 61–72 Table 1 Main characteristics (peak width at half maximum, maximal distance of orthogonal residuals in minutes and F-test test statistics for the residual orthogonal variance) of LC–MS chromatograms and peak pairing results to assess the probability of peak order inversion. Data are provided for the rat CSF and mouse serum datasets.

68

V. Mitra et al. / J. Chromatogr. A 1373 (2014) 61–72

69

Fig. 4. Scatter plot showing retention times of MPPs using peptide sequence and PTM information (blue dots) from two chromatograms of sample 6 from trypsin-digested rat CSF datasets acquired in two laboratories. The retention time correspondence function is shown as a red solid line. The maxD shown as a red dashed line with a distance of ±17.8 min indicated by red arrow from the main monotonic retention time trend represents the maximal retention time difference of AMPPs that change elution order in the two inter-laboratory chromatograms and reflect the largest error after time alignment. maxD shown as green dashed lines with a distance of ±4.8 and 5.0 min indicated by green arrow from the main monotonic retention time trend representing the maximal retention time difference of peaks changing elution order in two inter-laboratory chromatograms that were run in the same acquisition batch. maxD shows the uncertainty to determine corresponding peak locations. Peak pairs with red, blue and green circles in the black dashed box area are corresponding to the three peak pairs that are used to illustrate the effect of peak elution order inversion in Fig. 5. Fluctuation of AMPPs in two intra-laboratory chromatograms acquired with the same chromatographic conditions is much smaller as compared to two inter-laboratory chromatograms where peak elution order is changing with large extent due to different chromatographic condition such as the use of different types of C18 columns and organic eluent gradient program. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

in alignment, the improvement remains non-uniform and results in considerable overall misalignment of peaks. The non-uniform alignment of chromatographic locations makes it difficult to determine the exact peak correspondence after alignment using the retention time information alone, since the maximal error to determine corresponding locations are equivalent to maxD in Fig. 4. This problem manifests most obviously in case of single stage LC–MS data, where retention time and m/z information are used to determine peak correspondence, but persist also for LC–MS/MS since not all the peaks observed in single stage MS signal are identified and fragmented [38]. From Fig. 4 it is also obvious that retention time corresponding function providing accurate alignment going through all matched peak pairs would results in a spurious function, which would provide only alignment of peak that were used to device the retention time correspondence function and therefore result in over-fitting. This spurious retention time function would be clearly not monotonous and therefore cannot provide invertible one-toone correspondence between two chromatograms independently which one is aligned to the other. Because any type of monotonic retention time correspondence function will result in an alignment quality between perfect alignment and maximal shift of maxD for corresponding peaks pairs, these chromatograms cannot be aligned with uniform high accuracy. The locations of the three peaks presented in Fig. 5 are highlighted in the scatter plot of Fig. 4. The maximal alignment error of maxD is shown in Table 1 and with distance “D” in Fig. 4. 4.3.2. Mouse serum LC–MS data We further tested the accuracy of our two-step time alignment procedure on a dataset, where the same samples were analyzed by single-stage MS only in two different laboratories as part of the National Cancer Institute’s Mouse Proteomic Technology Initiative (see details in Sections 1 and 2 of the supporting information). The underlying experimental design has the goal to reveal the effect of

four factors on the mouse serum proteome: (1) the cancer type, (2) the serum depletion method, (3) the effect of healthy versus cancer and (4) the effect of the different laboratories performing the LC–MS analysis. The dataset contained 48 analyses performed in two different laboratories with 24 identical samples. Fig. S10 (supporting information) shows the outcome of the intensity-rank-based peak matching procedure using single-stage MS peak lists following the same outline as presented in Fig. 3. Fig. S10 illustrates that AMPPs have a slightly smaller but still considerable variability along the main retention time trend than the rat CSF LC–MS/MS datasets described above when comparing fluctuation of orthogonal residual in inter- and intra-laboratory LC–MS chromatogram pairs. This indicates that the LC–MS acquisition conditions used in the two laboratories resulted in a smaller maxD relative to the peak widths than in the previously examined rat CSF LC–MS/MS datasets. In this case maxD in inter-laboratory chromatograms was between 12.0 and 24.3-times the peak width at half maximum compared to the intra-laboratory analyses (between 4.9- and 13.5-times the peak width at half maximum) leading to a smaller retention time error to determine the retention time correspondence of peaks. However the F-test statistics comparing equality of orthogonal residual variance of the AMPPs of the two intra and inter-laboratory LC–MS chromatogram pairs were 26.19 and 16.09, respectively, indicating that the probability of the null hypothesis that common peaks follow the same elution order is low in this dataset as well. In this case it is not possible to test peak order inversion by matching peaks using identified peptide sequence information due to the lack of MS/MS information in this dataset. The high test statistics and low probabilities of the F-test’s null hypothesis are good indicator for peak elution order inversion, however, the maximum retention time difference between peaks changing elution order is much more limited than in previously rat CSF dataset. This led to the conclusion, that in this dataset the maximal error of retention time difference between corresponding peaks is smaller, but still considerable.

70

V. Mitra et al. / J. Chromatogr. A 1373 (2014) 61–72

Fig. 5. Extracted ion chromatograms (EIC) of three peptides after aligning one of the chromatograms to the other one. The chromatograms were obtained from the analysis of sample 6 of the rat CSF dataset and analyzed in two laboratories. In (a) chromatogram of laboratory 1 was aligned to the chromatogram of laboratory 2, while in (b) the chromatogram of laboratory 2 was aligned to the chromatogram of laboratory 1. Peptide LTLPQLEIR (green arrows) is located on the monotonic retention-time corresponding function, while the peptides DIAPTLTLYVGK (red arrows) and VHQFFNVGLIQPGSVK (blue arrows) are located far from this function. Retention time alignment using the main monotonic retention time corresponding function provides well aligned peaks for the first peptide (green traces). The two other peptides (red and blue arrows) suffer from considerable misalignment with retention time error close to maxD due to peak elution order inversion in the dataset. Locations of the three peaks are shown in the scatter plot of Fig. 4 with corresponding red, green and blue circles. The EICs are normalized to the highest peaks and the Y axis represent ion counts relative to the most abundant signal intensity in the three EICs. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).

4.3.3. Robustness of the alignment function We have further tested the robustness of the intensity-rankbased peak matching algorithm and the approach that is based on matching peptides with identified sequences and PTMs with subsequent alignment of all chromatogram pairs of the same samples acquired in two laboratories (7 and 24 for rat CSF and mouse serum datasets, respectively). Fig. S11 in supporting information shows overlay plots of the main monotonic retention time correspondence trends that were obtained using inter-laboratory LC–MS(/MS) chromatogram pairs of the same samples. In both datasets, all chromatograms of one laboratory were acquired in the same analysis batch, therefore it is expected that retention time correspondence functions in inter-laboratory LC–MS(/MS) chromatogram pairs of the same sample are highly similar. The obtained plots show that the main retention time correspondence functions

are indeed highly similar independently of the peak matching procedure and the type of peak list. The main monotonic retention time correspondence function obtained with intensity-rank-based peak matching of single-stage LC–MS peak lists (Fig. S11c) shows a slightly higher variability than the analysis of precursor ion LC–MS peak lists using peak matching based on the agreement of identified peptide sequences and PTMs (Fig. S11a). This is most probably due to the higher ratio of IMPPs that were assigned as AMPPs by the intensity-rank-based peak matching method, which is translated to a slightly less precise determination of the main monotonic retention time correspondence trend. However this effect is limited and has little effect on the computational accuracy of the main monotonic retention time trend even in datasets with considerable peak elution order inversion of common compounds.

V. Mitra et al. / J. Chromatogr. A 1373 (2014) 61–72

5. Conclusions and outlook We have developed a quality control method that tests if the two critical conditions to align two LC–MS(/MS) chromatograms with uniformly high accuracy are met: identification of a sufficient number of common peaks and high probability of similar elution order of them. We have observed that the elution order of the same peptide changes by several peak widths at half maximum in complex proteomics datasets that were acquired under dissimilar chromatographic conditions as often occurring in multi-laboratory setups. Peak elution order changes of common compounds result in an increased error of maxD to determine corresponding peak locations in two chromatograms by time alignment algorithms using only m/z and retention time information. Peak elution order changes of common compounds are not only limiting retention time alignment of LC–MS data, but also present one of the major obstacles against the use of the accurate mass and time tag approach (AMT) to match peaks in two singlestage LC–MS chromatograms [39–41] using retention time and m/z information. AMT can only be applied to data that were acquired under well-controlled chromatographic conditions, which implies a highly similar elution order of common compounds in the two chromatograms. Tests for peak elution order inversion of common compounds should be applied before applying the AMT approach to avoid possible errors in the use of time tags for peak matching. Our advice to experimental scientists to assess parallel separation of chromatograms acquired in different analysis batches, is to analyse a representative standard sample such as a trypsin-digested yeast standard [5] using data dependent LC–MS/MS acquisition. The data obtained from this reference standard sample in the different analysis batches can be used to assess peak elution order inversion of common compounds due to different chromatographic conditions. This situation occurs often in biomarker research when validation datasets are acquired in different acquisition batches, often several months or even years later than the original discovery dataset. LSS theory provides an explanation for the elution order changes and for resulting orthogonal separation in chromatograms acquired in different chromatographic conditions such as column size, eluent flow rate or different gradient elution programs. This means that conditions needed for common compounds to follow the same elution order, and therefore for accurate alignment of all compounds with the same accuracy, is only possible if all of the influencing parameters are kept constant. Retention time prediction algorithms using peptide sequence are becoming more and more accurate, for example by modeling the compound’s S slope in a linear model of retention factors using molecular parameters derived from peptides sequences such as molecular weight, charge and hydrophobicity [2,26,27]. With the development of accurate retention time prediction models it will be possible to predict more precisely the location of peaks that have known peptide sequence information in other chromatograms acquired under different chromatographic conditions such as different gradient programs. Currently these models were only tested on tryptic peptides, and further work should be performed to extend prediction models to non-tryptic peptides and peptides involving post-translational chemical modifications, and any other chromatographic parameters that can vary between different proteomics laboratories. However, generic time alignment algorithms will maintain their importance for peaks without identification and compounds that cannot be precisely modeled by retention time prediction algorithms. The performance of the F-test may depend on the heterogeneity of the AMPPs distribution in the retention space of the two chromatograms. The chromatograms in these studies show homogeneous distribution of AMPPs, but further studies are needed to determine the ability of F-test to detect orthogonal

71

separation in chromatograms showing heterogeneous distribution of AMPPs. Source code, installation guide, user manual and example datasets are available at https://trac.nbic.nl/pre-alignment/. Acknowledgment We thank the National Cancer Institute’s Mouse Proteomic Technology Initiative and Dr. Theo Luider (Erasmus Medical Center, Rotterdam) for providing the experimental design data set of mouse serum and the rat CSF orbitrap data, respectively. We thank the Netherlands Proteomics Center II (Project Bioinformatics in Proteomics E4.2) and the Netherlands Bioinformatics Center (Gaining Momentum Initiative) for funding. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.chroma. 2014.10.101. References [1] Y. Ishihama, K. Shinoda, M. Sugimoto, M. Tomita, Informatics for peptide retention properties in proteomic LC–MS, Proteomics 8 (2008) 787–798. [2] O.V. Krokhin, Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents, Anal. Chem. 78 (2006) 7785–7795. [3] D.C. Liebler, Summary of United States human proteome organisation (HUPO) symposium entitled standardized clinical proteomics platforms, Mol. Cell. Proteomics 8 (2009) 1165–1166. [4] P.A. Rudnick, K.R. Clauser, L.E. Kilpatrick, D.V. Tchekhovskoi, P. Neta, N. Blonder, D.D. Billheimer, R.K. Blackman, D.M. Bunk, H.L. Cardasis, A.J. Ham, J.D. Jaffe, C.R. Kinsinger, M. Mesri, T.A. Neubert, B. Schilling, D.L. Tabb, T.J. Tegeler, L. VegaMontoto, A.M. Variyath, M. Wang, P. Wang, J.R. Whiteaker, L.J. Zimmerman, S.A. Carr, S.J. Fisher, B.W. Gibson, A.G. Paulovich, F.E. Regnier, H. Rodriguez, C. Spiegelman, P. Tempst, D.C. Liebler, S.E. Stein, Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses, Mol. Cell. Proteomics 9 (2010) 225–241. [5] A.G. Paulovich, D. Billheimer, A.J. Ham, L. Vega-Montoto, P.A. Rudnick, D.L. Tabb, P. Wang, R.K. Blackman, D.M. Bunk, H.L. Cardasis, K.R. Clauser, C.R. Kinsinger, B. Schilling, T.J. Tegeler, A.M. Variyath, M. Wang, J.R. Whiteaker, L.J. Zimmerman, D. Fenyo, S.A. Carr, S.J. Fisher, B.W. Gibson, M. Mesri, T.A. Neubert, F.E. Regnier, H. Rodriguez, C. Spiegelman, S.E. Stein, P. Tempst, D.C. Liebler, Interlaboratory study characterizing a yeast performance standard for benchmarking LC–MS platform performance, Mol. Cell. Proteomics 9 (2010) 242–254. [6] K.M. Aberg, E. Alm, R.J. Torgrip, The correspondence problem for metabonomics datasets, Anal. Bioanal. Chem. 394 (2009) 151–162. [7] C. Christin, R. Bischoff, P. Horvatovich, Data processing pipelines for comprehensive profiling of proteomics samples by label-free LC–MS for biomarker discovery, Talanta 83 (2011) 1209–1224. [8] C. Christin, H.C. Hoefsloot, A.K. Smilde, F. Suits, R. Bischoff, P.L. Horvatovich, Time alignment algorithms based on selected mass traces for complex LC–MS data, J. Proteome Res. 9 (2011) 1483–1495. [9] C. Christin, A.K. Smilde, H.C. Hoefsloot, F. Suits, R. Bischoff, P.L. Horvatovich, Optimized time alignment algorithm for LC–MS data: correlation optimized warping using component detection algorithm-selected mass chromatograms, Anal. Chem. 80 (2008) 7012–7021. [10] F. Suits, J. Lepre, P. Du, R. Bischoff, P. Horvatovich, Two-dimensional method for time aligning liquid chromatography–mass spectrometry data, Anal. Chem. 80 (2008) 3095–3104. [11] G. Tomasi, F. van den Berg, C. Andersson, Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data, J. Chemom. 18 (2004) 231–241. [12] J. Listgarten, M.R. Neal, T.S. Roweis, A. Emili, Multiple Alignment of Continuous Time Series, The MIT Press, Cambridge, 2005. [13] L.N. Mueller, O. Rinner, A. Schmidt, S. Letarte, B. Bodenmiller, M.Y. Brusniak, O. Vitek, R. Aebersold, M. Muller, SuperHirn—a novel tool for high resolution LC–MS-based peptide/protein profiling, Proteomics 7 (2007) 3470–3480. [14] T. Pluskal, S. Castillo, A. Villar-Briones, M. Oresic, MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data, BMC Bioinform. 11 (2010) 395. [15] A. Prakash, P. Mallick, J. Whiteaker, H. Zhang, A. Paulovich, M. Flory, H. Lee, R. Aebersold, B. Schwikowski, Signal maps for mass spectrometry-based comparative proteomics, Mol. Cell. Proteomics 5 (2006) 423–432. [16] B. Voss, M. Hanselmann, B.Y. Renard, M.S. Lindner, U. Kothe, M. Kirchner, F.A. Hamprecht, SIMA: simultaneous multiple alignment of LC/MS peak lists, Bioinformatics 27 (2011) 987–993.

72

V. Mitra et al. / J. Chromatogr. A 1373 (2014) 61–72

[17] K. Podwojski, A. Fritsch, D.C. Chamrad, W. Paul, B. Sitek, K. Stuhler, P. Mutzel, C. Stephan, H.E. Meyer, W. Urfer, K. Ickstadt, J. Rahnenfuhrer, Retention time alignment algorithms for LC/MS data must consider non-linear shifts, Bioinformatics 25 (2009) 758–764. [18] E. Lange, R. Tautenhahn, S. Neumann, C. Gropl, Critical assessment of alignment procedures for LC–MS proteomics and metabolomics measurements, BMC Bioinform. 9 (2008) 375. [19] N.-P.V. Nielsen, J.M. Carstensen, J. Smedsgaard, Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping, J. Chromatogr. A 805 (1998) 17–35. [20] M. Vandenbogaert, S. Li-Thiao-Te, H.M. Kaltenbach, R. Zhang, T. Aittokallio, B. Schwikowski, Alignment of LC–MS images, with applications to biomarker discovery and protein identification, Proteomics 8 (2008) 650–672. [21] K. Marc, S. Benjamin, S. Hanno, A.J.S. Judith, A.H. Fred, amsrpm: robust point matching for retention time aligment of LC/MS data with R, J. Stat. Softw. 18 (2007) 12. [22] L.R. Snyder, J.W. Dolan, High-Performance Gradient Elution: The Practical Application of the Linear-Solvent-Strength Model, Wiley-Interscience, Hoboken, 2007. [23] T.G. Bloemberg, J. Gerretzen, H.J.P. Wouters, J. Gloerich, M. van Dael, H.J.C.T. Wessels, L.P. van den Heuvel, P.H.C. Eilers, L.M.C. Buydens, R. Wehrens, Improved parametric time warping for proteomics, Chemom. Intell. Lab. Syst. 104 (2010) 65–74. [24] J.L. Glajch, M.A. Quarry, J.F. Vasta, L.R. Snyder, Separation of peptide mixtures by reversed-phase gradient elution. Use of flow rate changes for controlling band spacing and improving resolution, Anal. Chem. 58 (1986) 280–285. [25] L.V. Perez-Arribas, F.J. Manuel de Villena-Rueda, M.E. Leon-Gonzalez, R. Gonzalo-Lumbreras, L.M. Polo-Diez, New approach to optimize HPLC separations of acid-base compounds with elution order involved, by using combined three-band resolution maps, Anal. Bioanal. Chem. 396 (2010) 2647–2656. [26] V. Spicer, M. Grigoryan, A. Gotfrid, K.G. Standing, O.V. Krokhin, Predicting retention time shifts associated with variation of the gradient slope in peptide RP-HPLC, Anal. Chem. 82 (2010) 9678–9685. [27] H. Vu, V. Spicer, A. Gotfrid, O.V. Krokhin, A model for predicting slopes S in the basic equation for the linear-solvent-strength theory of peptide separation by reversed-phase high-performance liquid chromatography, J. Chromatogr. A 1217 (2010) 489–497. [28] M.R. Euerby, P. Petersson, Chromatographic classification and comparison of commercially available reversed-phase liquid chromatographic columns using principal component analysis, J. Chromatogr. A 994 (2003) 13–36.

[29] A.B. Chakraborty, S.J. Berger, Optimization of reversed-phase peptide liquid chromatography ultraviolet mass spectrometry analyses using an automated blending methodology, J. Biomol. Tech. 16 (2005) 327–335. [30] E. Reh, U. Kapfer, Examination of new stationary phases for the reversed-phase chromatography of peptides, Chromatographia 30 (1990) 663–674. [31] L. Csenki, E. Alm, R.J. Torgrip, K.M. Aberg, L.I. Nord, I. Schuppe-Koistinen, J. Lindberg, Proof of principle of a generalized fuzzy Hough transform approach to peak alignment of one-dimensional 1HNMR data, Anal. Bioanal. Chem. 389 (2007) 875–885. [32] E. Tengstrand, J. Lindberg, K.M. Aberg, TracMass 2—a modular suite of tools for processing chromatography-full scan mass spectrometry data, Anal. Chem. 86 (2014) 3435–3442. [33] T. Rosenling, M.P. Stoop, A. Attali, H. van Aken, E. Suidgeest, C. Christin, C. Stingl, F. Suits, P. Horvatovich, R.Q. Hintzen, T. Tuinstra, R. Bischoff, T.M. Luider, Profiling and identification of cerebrospinal fluid proteins in a rat EAE model of multiple sclerosis, J. Proteome Res. 11 (2012) 2048–2060. [34] O. Kohlbacher, K. Reinert, C. Gropl, E. Lange, N. Pfeifer, O. Schulz-Trieglaff, M. Sturm, TOPP—the OpenMS proteomics pipeline, Bioinformatics 23 (2007) e191–e197. [35] M. Sturm, A. Bertsch, C. Gropl, A. Hildebrandt, R. Hussong, E. Lange, N. Pfeifer, O. Schulz-Trieglaff, A. Zerck, K. Reinert, O. Kohlbacher, OpenMS—an opensource software framework for mass spectrometry, BMC Bioinform. 9 (2008) 163. [36] F.N. Fritsch, R.E. Carlson, Monotone piecewise cubic interpolation, Siam J. Numer. Anal. 17 (1980) 238–246. [37] P. Horvatovich, N.I. Govorukhina, T.H. Reijmers, A.G. van der Zee, F. Suits, R. Bischoff, Chip-LC–MS for label-free profiling of human serum, Electrophoresis 28 (2007) 4493–4505. [38] A. Michalski, J. Cox, M. Mann, More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to datadependent LC–MS/MS, J. Proteome Res. 10 (2011) 1785–1793. [39] J.S. Zimmer, M.E. Monroe, W.J. Qian, R.D. Smith, Advances in proteomics data analysis and display using an accurate mass and time tag approach, Mass Spectrom. Rev. 25 (2006) 450–482. [40] L. Pasa-Tolic, C. Masselon, R.C. Barry, Y. Shen, R.D. Smith, Proteomic analyses using an accurate mass and time tag strategy, Biotechniques 37 (2004) 621–624 (626–633, 636 passim). [41] R.D. Smith, G.A. Anderson, M.S. Lipton, L. Pasa-Tolic, Y. Shen, T.P. Conrads, T.D. Veenstra, H.R. Udseth, An accurate mass tag strategy for quantitative and highthroughput proteome measurements, Proteomics 2 (2002) 513–523.

Inversion of peak elution order prevents uniform time alignment of complex liquid-chromatography coupled to mass spectrometry datasets.

Retention time alignment is one of the most challenging steps in processing LC-MS datasets of complex proteomics samples acquired within a differentia...
3MB Sizes 0 Downloads 7 Views