time-of-flight mass spectrometry-based metabolomics data.

Article pubs.acs.org/ac

Ion Trace Detection Algorithm to Extract Pure Ion Chromatograms to Improve Untargeted Peak Detection Quality for Liquid Chromatography/Time-of-Flight Mass Spectrometry-Based Metabolomics Data San-Yuan Wang,†,‡ Ching-Hua Kuo,‡,§,∥,# and Yufeng J. Tseng*,†,‡,§,⊥,# †

Department of Computer Science and Information Engineering, ‡The Metabolomics Core Laboratory, Center of Genomic Medicine, §School of Pharmacy, College of Medicine, ∥Department of Pharmacy, National Taiwan University Hospital, ⊥Graduate Institute of Biomedical Electronics and Bioinformatics, #Drug Research Center, College of Medicine, National Taiwan University, 106 Taipei, Taiwan S Supporting Information *

ABSTRACT: Able to detect known and unknown metabolites, untargeted metabolomics has shown great potential in identifying novel biomarkers. However, elucidating all possible liquid chromatography/time-of-flight mass spectrometry (LC/ TOF-MS) ion signals in a complex biological sample remains challenging since many ions are not the products of metabolites. Methods of reducing ions not related to metabolites or simply directly detecting metabolite related (pure) ions are important. In this work, we describe PITracer, a novel algorithm that accurately detects the pure ions of a LC/TOF-MS profile to extract pure ion chromatograms and detect chromatographic peaks. PITracer estimates the relative mass difference tolerance of ions and calibrates the mass over charge (m/z) values for peak detection algorithms with an additional option to further mass correction with respect to a user-specified metabolite. PITracer was evaluated using two data sets containing 373 human metabolite standards, including 5 saturated standards considered to be split peaks resultant from huge m/z fluctuation, and 12 urine samples spiked with 50 forensic drugs of varying concentrations. Analysis of these data sets show that PITracer correctly outperformed existing state-of-art algorithm and extracted the pure ion chromatograms of the 5 saturated standards without generating split peaks and detected the forensic drugs with high recall, precision, and F-score and small mass error.

L

metabolomics approach was recently used to compare hydrophobic metabolite profiles of Pseudomonas aeruginosa strains lacking a functional pyochelin gene cluster.25 These experiments revealed that this cluster regulates many metabolites, in addition to pyochelin, including a family of novel metabolites that were characterized as 2-alkyl-4,5dihydrothiazole-4-carboxylates. Moreover, it has been successfully applied in discovering key metabolites responsible for the bacterial protection mechanisms in high proton concentration and metal-rich environment of biofilms growing in pH ∼0.9 acid mine drainage.26 However, elucidating all of the possible LC/TOF-MS ion signals in a complex biological sample remains challenging. For an example, the electrospray ionization process in LC/TOF-MS experiments tends to produce 90% or more ions not produced from metabolites.27 Therefore, methods of reducing any ions not related to metabolites or simply directly detecting metabolite related ions are of interest. Two major ion sources

iquid chromatography/time-of-flight mass spectrometry (LC/TOF-MS) has recently become an important and standard technique for metabolomics studies associated with environmental and stress,1−3 functional genomics,4−6 biomarker discovery,7−10 and integrative systems biology.11−14 Metabolomics experiments usually follow one of two main strategies: targeted or untargeted studies. Targeted metabolomics allows for the simultaneous, absolute quantification of a limited number of known metabolites in biological samples by comparing with internal standards.15−21 Untargeted metabolomics tries to detect all metabolites, both known and unknown, in a sample.20 With the ability to detect a larger number of molecules, untargeted metabolomics has shown great potential in identifying novel biomarkers. For example, untargeted metabolomics approaches were used to identify novel substrates of various enzymes, including N-acyl taurines for fatty acid amide hydrolase22 and fatty acids for different families of human cytochrome P450 enzymes.23 In addition, it was applied successfully to identify both known and novel vitamin E metabolites down-regulated upon activation of the pregnane X receptor, a member of the mammalian nuclear receptor superfamily.24 In bacterial systems, an untargeted © XXXX American Chemical Society

Received: December 18, 2014 Accepted: January 26, 2015

A

DOI: 10.1021/ac504711d Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Article

and to identify peaks in the pure ion chromatograms. Before execution of PITracer, the LC/TOF MS raw data must first be converted to the mzXML44,45 or mzML46 formats in centroid mode. In the first step, the relative mass difference tolerance is estimated based on the distribution of relative mass differences between the adjacent m/z values in a sorted m/z value list. In the second step, the most abundant base ion will be extracted as the reference ion used to calibrate the other m/z values to reduce the relative mass differences (this calibration step is optional, for details, please see Supporting Information). In the third step, the pure ion chromatograms are extracted according to the estimated relative mass tolerance of the calibrated LC/ TOF-MS profile. Finally, the peaks in each pure ion chromatogram are detected recursively based on a scoring function present in our previous implemented peak detection algorithm, TIPick.33 Additionally, the m/z values of the peaks reported by TIPick can be corrected according to a known (user-specified) metabolite present in the samples (optional). The overall procedure of PITracer is illustrated in Figure 1.

not produced by the analytes themselves originate from random noise and background ions.28 While the random noise is inevitably generated by commonly used ionization sources in LC/TOF-MS,29−31 background ions are mainly the result of the contribution from contaminants, the mobile phase, buffers, and the sample matrix.28,32,33 One could use background subtraction and chromatogram selection algorithms to reduce the background ions and the random noise effect.32−36 However, these algorithms can falsely remove signals and generate artificial split peaks that complicate subsequent data analysis if the parameters of the algorithms are not selected appropriately.33 Therefore, one other strategy is to identify the “pure” ions, which defined as the ions that originate from the same analyte. It is not easy to extract pure ions, however; split peaks and incorrect assignments can often be found as a result of ions from a peak being split into more than two adjacent mass over charge (m/z) bins and the ions from the coeluting peaks with extremely similar m/z values being assigned into the same bin, despite the use of a smaller bin size.37−41 The XCMS package (matchedFilter algorithm) combines the maximum signal intensity from adjacent extracted ion chromatograms (EICs) to reduce this binning issue, however, coeluting peaks with extremely similar m/z values assigned to the same bin can still be seen.37 Another function, centWave, of the XCMS package utilizes a f indROI function that locates the region containing potentially interesting masses, and then finds a region of interest (ROI) based on the mass accuracy and determines whether the ROI is contributed by an analyte based on the expected chromatographic peak width. However, the relative mass differences of the adjacent ions in a chromatographic peak are proportional to the intensities of the ions33,42 and the relative mass difference tolerances of different LC/TOF-MS profiles might vary. Moreover, the fluctuations of the ambient temperature can affect the mass accuracy of TOF instruments due to the high voltage drift during heating from the power supplied and other voltage fluctuations.43 Hence, algorithms using a fixed relative mass difference tolerance would also have split peaks due to the inconsistent relative mass differences. To solve the aforementioned problems, we hereby provide a novel way to extract the pure ions for untargeted metabolomics studies in LC/TOF-MS data called PITracer, pure ion tracer. PITracer can estimate the relative mass difference tolerance adaptively for each scan and calibrate m/z values to remove ions that are not continually present in the LC/TOF-MS profile. As a result, purer ion chromatograms can be generated for more precise chromatographic peak detection. Also, LC/ TOF-MS profile file sizes of LC/TOF-MS profiles can be reduced significantly when compared to current untargeted LC/TOF-MS studies. PITracer allows the full analytical power of LC/TOF-MS to be utilized for automated exploratory analytical chemistry. The performance of PITracer was demonstrated in two data sets containing (1) 373 human metabolite standards and (2) 12 urine samples spiked with 50 forensic drugs with varying concentrations and benchmarked against other conventional software.

Figure 1. Workflow of PITracer.

Estimation of the Relative Mass Difference Tolerance. In this step (the first step in the overall procedure), all m/z values in an LC/TOF-MS profile are sorted. Then the relative mass differences between adjacent m/z values in the sorted list are calculated with eq 1. δo =

|m /zo − m /zo − 1| × 106 m /zo

(1)

where o is the order the m/z value in the sorted list. The list of the relative mass differences comprises two components. The first being the relative mass differences between m/z values from the same metabolite or metabolites with the same molecular weight. The first component is caused by measurement variation and tends to be extremely small. The second component is the difference between m/z values of different analytes/random noise. Because the m/z values are sorted before the relative mass differences are calculated, the distribution of the relative differences between m/z values in different analytes/random noise can be approximated by a uniform distribution, so long as the signals and noise data points are independent.47 To estimate the relative mass difference tolerance, we used Gaussian kernel to calculate the density of each relative mass difference and then used the average density plus the standard deviation of the densities of the relative differences as the minimum density of the relative mass differences between m/z values in the same peak. However, the relative mass differences of adjacent true ions in a saturated chromatographic peak are proportional to their

■

WORKFLOW, METHODS, AND MATERIALS Experimental Section. Chemicals, sample preparation, chromatographic, and mass spectrometric analysis are detailed listed in the Supporting Information. Overall Procedures of the Main Algorithm. PITracer comprises four major steps to extract pure ion chromatograms B



Article

relative mass difference tolerance (δ̂i,j) or the relative mass difference of the base ion in each adjacent scan (δi,b). The relative mass difference of the base ion is calculated with the equation (Supporting Information S-1). When (|(m/z′)i,j − (m/z′)i−1,k|/(m/zi,j)) ≤ (max(δi,ĵ ,δi,b)/106) and ((m/z′)i−1,k+1 ≥ (m/z′)i,j ≥ (m/z′)i−1,k ≥ (m/z′)i,j−1) or (|(m/z′)i,j − (m/ ̂ ,δi,b)/106) and ((m/z′)i−1,k−1 ≤ z′)i−1,k|/(m/z′i,j)) ≤ (max(δi,j, (m/z′)i,j ≤ (m/z′)i−1,k ≤ (m/z′)i,j+1), we add an edge from the ioni−1,k to the ioni,j. The left part of Figure 2A shows the result of adding edges. The black edges indicate that the ion pairs with the relative mass difference are less than the estimated relative mass difference tolerance and the green edges indicate that the ion pairs with the relative mass differences are larger than the estimated relative mass difference tolerance. (B) Then, construct the pure ion trace. We first sort the edges according the relative mass differences of the ion pairs and select the edge with the minimum relative mass difference (the red line in Figure 2B). We then check the trace id of the ion on the left of the edge (the i − 1th scan), if the trace id of this ion is zero, we will increase the TraceIDlast and assign the new TraceIDlast to the ion pair (the trace id of the ion pair are 1 in Figure 2B) or we assign the trace id of the left ion in the i − 1th scan to the ion right of it in the ith scan. Finally, all of the edges of the selected ions will be removed (the gray dashed lines and red solid lines in Figure 2B). (C) If there is any edge with relative mass difference smaller than the estimated relative mass difference tolerance, step 2B will be processed excluding sorting the edges to select another trace (left part of Figure 2C). If no relative mass difference of ion pairs exists under the estimated tolerance, the tracker will continue onto the next scan to perform the process of 2A (the right part of Figure 2C). (3) Ion trace merging: for any two neighboring mass traces, if the relative mass difference of median m/z values of the two mass traces is smaller than the estimated relative mass difference tolerance, and the distance of their retention time is smaller than a user-defined value (default value is 1 s.), then the two ion traces are merged together into a single ion trace. (4) Cleanup: (A) Remove all the ions, which were found having any connection, and (B) remove all the traces, which contain less than Lmin ions, where Lmin is the minimum length of the ion trace in the scan. When we know the trace id of each ion, the intensities and scan indices of the ions with the same trace id will be directly used as a pure ion chromatogram. Because there are no ions in the same scan with the same trace id, the trace ids of the ions of the coeluting analytes will be different. The pure ion chromatograms will be used for chromatographic peak detection (for the last step in the overall procedure). Performance Evaluation of PITracer for Pure Ion Chromatogram Extraction. To evaluate the performance of PITracer for pure ion chromatogram extraction, we used 507 chemical standards to create the evaluation data set. Five hundred and seven chemical standards were divided into 31 mixtures for chromatographic and mass spectrometric analysis, and the data was collected using both positive and negative ionization modes. Among them, 373 chemical standards were clearly visible in the LC/TOF-MS experiments, and were used as the evaluation data set. To evaluate the performance of PITracer in chromatograms with saturated chromatographic peaks, we examined the EICs of the 373 chemical standards of the LC/TOF-MS data set with the tolerance of relative mass accuracy set at ±10 ppm. The relative mass accuracy is defined as ((m/zE) − (m/zT))/(m/

intensities.33,42 Because the maximum intensity of the saturated chromatographic peaks (Isaturated) and the corresponding relative mass difference tolerance (δsaturated) are either known or can be easily assessed, we can directly assign the relative mass difference tolerance (δ̂i,j) according to the intensity (Ii,j) of the jth ion in the ith scan with eq 2 to resolve the issue of the higher relative mass differences of the adjacent pure ions in a saturated chromatographic peak. ⎧ p(x = 0.6745) ⎪ ⎪ δsaturated if Ii , j ≥ R × Isaturated R = p(x = 0) δî , j = ⎨ ⎪ ⎪δ ̂ ⎩ overall if otherwise

(2)

In eq 2, the δ̂overall is the estimated relative mass difference between m/z values in the same peak, R is the ratio of the intensity of the ion with higher relative mass difference to the 2 intensity of the saturated ion, and p(x) = (1/(2π)1/2)e−x /2 is the probability density function of the standard normal distribution. Extraction of Pure Ion Traces Using the Estimated Relative Mass Difference Tolerance Based on Density, Intensity, and Time. To avoid incorrect assignments when using the binning technique, we developed a computing approach to directly detect and collect the ion traces of pure ions that originate from the same analyte. Figure 2 shows the

Figure 2. Schematic illustration of the pure ion trace detection of LC/ TOF-MS data. Three consecutive scans are shown in more detail with m/z value on the y-axis and scan index on the x-axis. (A) Initialization: the tracker finds the ion pair in the i − 1th and the ith scans with relative mass differences under the relative mass difference tolerance, where i = 2 (black edges). (B) Then, PITracer will select the edge with minimum relative mass difference (red edge), assign the trace id to the ion pair, and remove the other edges connected to the ion pair (gray dashed edges). (C and D) After assigning the trace ids to the ions in the i − 1th and ith scan, the tracker will do the same process with ions in the ith and i + 1th scans.

workflow of the pure ion trace detection process. Each ion trace consists of the ions that appear continuously over a window of retention time within a machine-dependent relative mass difference tolerance. The ion traces are achieved by the following processes. (1) Initialization: We first set the last trace id (TraceIDlast) and all the trace ids (TraceIDi,j) of each ion as 0 ∀ i = 1, ..., N , j = 1, ..., M , TraceIDi , j = 0, TraceIDlast =0

where N is the number of scans and M is the number of ions in each scan. (2) For each scan i = 2, ..., N, (A) find the ion pairs in the i − 1th and the ith scans satisfying the condition of the relative mass differences of the ion pairs being less than the C



Article

score is defined as ((2 × recall × precision)/(recall + precision)).

zT)) × 106, where the (m/zE) is the m/z value provided by the mass spectrometry data and the (m/zT) is the m/z value calculated through the empirical formula of each chemical standard. We manually inspected the EIC of each chemical standard and observed 5 chemical standards as split peaks because of instrumental saturation. The chromatograms of the 5 chemical standards were used to evaluate the chromatogram extraction and peak detection of PITracer when the bigger fluctuation in m/z values are occurring. Performance Evaluation of PITracer for Untargeted Peak Detection. To evaluate the performance of PITracer for untargeted peak detection, we spiked 50 forensic drugs with two concentrations (high, 10 times of the cut off values listed in Table S-1; and low, 1 time of the cut off values of each forensic drug) into 6 urine samples. The LC/TOF-MS profiles of the 12 spiked urine samples and the metabolite mixtures include saturated chromatographic peaks were used as the evaluation data set. The chromatograms generated by PITracer from each LC/TOF-MS profiles were used to substitute those generated by the targeted ion chromatograms extraction procedure of TIPick for chromatographic peak detection. One of the most well-known algorithms for chromatogram generation and peak detection in LC/TOF-MS-based metabolomics data, centWave, is shown to have high performance among current peak detection algorithms.38 The centWave algorithm is employed in several previous studies48−51 and is selected to be compared against the combination of PITracer and TIPick. To evaluate the performance of the above-mentioned algorithms, recall (sensitivity), precision, and F-score are selected to be the measurements. Recall is one of the most commonly used measurements for peak detection algorithms.38,52,53 The recall of peak detection is defined as ((TP)/(TP + FN)), where TP is the abbreviation of the true positive, which is the number of true peaks reported by the algorithm, and FN is the abbreviation of false negative, which is the number of true peaks that were not detected by the algorithm. If the peak boundary of the detected peak covered the retention time of the forensic drug and the relative mass error between the m/z value of the detected peak and the m/z value of the protonated peak of the forensic drug is within the ±20 ppm threshold, the detected peak will be considered a true peak. Greater recall values indicate that more true peaks are detected. However, the recall value does not consider type I errors (false positives). One of the measurements used in evaluation is the precision value. It is defined as ((TP)/(TP + FP)), where FP is the abbreviation of false positive, which is the number of false peaks reported by the algorithm. If the peak boundary of the detected peak did not cover the retention time span of the forensic drug, the retention time difference between the detected peak and the forensic drug is less than 3 s, and the relative mass error between the m/z value of the detected peak and the m/z value of the protonated peak of the forensic drug was within the ±20 ppm threshold, then the detected peak is considered a false peak. High precision values indicate that more true positives than false positives are detected. For each algorithm, a peak reported should be present in at least two of the three duplicate injections. An ideal peak detection algorithm should generate a peak list with both high recall and precision values. A single value to combine measurement of precision and recall is called the F-score in information retrieval.54 F-score was used to evaluate the performance of the peak detection algorithm in the previous studies.33,38 The F-

■

RESULTS AND DISCUSSION Modeling the Distribution of the Relative Mass Differences with PITracer. Split peaks or incorrect assignments can be found if smaller or greater relative mass difference tolerances were used while extracting pure ions. Therefore, choosing a relative mass difference tolerance becomes important in extracting the pure ions. PITracer estimates the relative mass difference tolerance with the distribution of the relative mass differences between m/z values from the nearest neighboring ion pairs. According to the uniform distribution, the distribution of the relative mass differences between the ions in different analytes could be estimated by the average density plus the half standard deviation of the densities of the relative mass differences. The red line in Supporting Information Figure S-1 indicates the estimated uniform distribution formed by the relative mass differences. The red point is the estimated relative mass difference tolerance between m/z values (different analytes) in the same peak. The average estimated relative mass difference tolerances found for the positive, negative mode in the standard library data set and forensic drugs data set are 6.22 (5.71−8.00), 6.49 (5.67− 8.00), and 6.31 (5.71−6.86) ppm, respectively. Moreover, the intensity (Isaturated) and relative mass difference (δsaturated) of each saturated chromatographic peak were about 1 050 000 counts and 13 ppm, respectively. Construction of Purer LC/TOF-MS Profiles and Pure Ion Chromatograms with PITracer. PITracer considers only continuous ions in both the m/z and the retention time domains as pure ions. PITracer ignores the short ion traces or the isolated ions (with no nearby neighboring ions). Figure 3

Figure 3. An example region of a pure ion trace (solid circle points) and random noise (cross points).The ions in red boxes are longer than or equal to 5 (minimum peak width in this study) consecutive mass spectra and the ions in blue boxes are shorter than 5 consecutive mass spectra.

shows pure ion traces and random noises in an example region of a chromatogram. The ions in red boxes are longer than or equal to 5 (2.5 s, the minimum peak width in this study) consecutive mass spectra and the ions in blue boxes are shorter than 5 consecutive mass spectra. The ions in the blue boxes will be ignored while the ions with a long enough ion trace length (in the red boxes) will be kept. There are about 20% and 30% ions considered to be pure ions in the standard library mixtures data set and the forensic drugs-spiked urine samples after the pure ion trace detection. It shows most peaks in the untargeted metabolomics were not continuous ion traces and therefore can be potentially omitted for the next peak identification steps. It is important pointing out that all peaks (chromatograms) of the D



Article

373 chemical standards from the NTU MetaCore LC/TOFMS library data set and the 50 forensic drugs spiked in the human urine are all preserved after construction of purer LC/ TOF-MS profiles showing that this procedure will not miss important ions. Supporting Information Figures S-2 and S-3 show the LC/TOF-MS profiles before and after ignoring the random noises of standard library and forensic drugs data sets, respectively. In both cases, peak numbers are greatly reduced after extracting pure ions. Extraction of Chromatograms with Saturated Chromatographic Peaks. To evaluate PITracer for improving peak detection of split peaks due to instrumental saturation, PITracer was applied to generate the pure ion chromatograms of human metabolite standard mixtures from the NTU MetaCore metabolomics standard library which includes the 5 saturated chromatograms identified in our previous study.33 PITracer can successfully detect the 5 pure ion chromatograms with the higher relative mass difference tolerance for the saturated chromatographic peaks autoidentified by PITracer. For an example, Figure 4A shows a typical split peak in an extracted ion chromatogram with RT range from 333 to 350 s. In Figure 4A, the pure ion chromatogram of dimethyl benzimidazole extracted by PITracer is shown as circles and the EIC of dimethyl benzimidazole with the relative mass error within ±10 ppm, a mass error tolerance that is often used in the fixed mass error tolerance algorithm, is shown in Figure 4A as the black line. We can see if the higher relative mass difference tolerance for the saturated ion is not performed, the pure ion chromatogram of dimethyl benzimidazole will be separated into several segments and cause split peaks just like in Figure 4A. Figure 4B and 4C show that the m/z value and the relative mass error increased with intensity (see corresponding intensities from 4A) and the relative mass error of saturated ions was greater than 30 ppm. Figure 4D shows that the adjacent relative mass differences of the saturated ions was greater than 12 ppm. Table 1 shows that the detected result of the 5 peaks considered to be split due to instrumental saturation could be detected by PITracer while, centWave cannot correctly detect the 5 saturated peaks and would generate split peaks for N,Ndimethylaniline and 6-methyladenine. PITracer can successfully extract and detect the 5 human metabolite standards without generating split peaks. Supporting Information Figures S-4−S-7 show the detected ions of the other four saturated peaks by PITracer. Application of PITracer for Peak Detection in an Untargeted Forensic Drug Screening. PITracer can be used for untargeted forensic drug screening and general metabolomics studies to extract the completed pure ion chromatograms of chemicals in a complex matrix. Those chromatograms can be used to improve the qualities of detecting peaks. To compare and evaluate the performance of PITracer and centWave for detecting the analytes in a practical case, we analyzed the urine samples spiked with 50 forensic drugs at two concentration levels (high, 10 times of the cut off values; and low, 1 time of the cut off values of each forensic drug). The cut off values listed in Supporting Information Table S-1 for the 50 forensic drugs were defined according to the previous studies and regulations reported from the Ministry of Health and Welfare, Republic of China (Taiwan).55−58 For each experiment with different concentrations, the relative mass difference tolerance was estimated by PITracer and the minimal length of an ion trace was set as 5 scans with a

Figure 4. Chromatogram, m/z trace, relative mass error, and the adjacent relative mass difference of the detected ions of dimethyl benzimidazole. Panel A shows the intensity of each pure ion of dimethyl benzimidazole and the EIC of dimethyl benzimidazole [M + H]+ signals with the relative mass error in ±10 ppm. Panel B shows the m/z trace of the dimethyl benzimidazole [M + H]+ signals. The corresponding relative mass error and the relative mass differences between two adjacent scans are shown in panel C and D. The solid circles are the ions with intensities lower than R × Isaturated, the open circles are ions with intensities greater than R × Isaturated. The color bar in each panel indicates the intensity.

Table 1. Result of the Saturated Chromatographic Peaks Detection Using PITracer and centWave PITracer compound name 1-methyladenosine decanoylcarnitine N,N-dimethylaniline 6-methyladenine dimethyl benzimidazole

E

centWave

completed peak

split peak

completed peak

split peak

1 1 1 1 1

0 0 0 0 0

0 0 0 0 0

0 0 2 2 0



Article

minimal peakwidth set to 5 scans. Recommended centWave parameters by XCMS from the journal of Nature Protocols for metabolomics data were used,59 while the parameters: ppm and peakwidth, which are the maximal tolerated m/z deviation in consecutive scans and peak width of a chromatographic peak were adjusted for centWave to ensure optimal parameters could be used for centWave. The following centWave parameters were used for performance evaluation: maximal tolerated m/z deviation of 5, 10, 15, 20, 25, 30, 40, and 50 ppm; peak width of c(2, 15), c(2, 20), c(2, 30), c(2, 60), c(5, 15), c(5, 20), c(5, 30), and c(5, 60), minimal peak width is 2 or 5 s and maximal is 15, 20, 30, or 60 s; prefilter of c(0, 0), containing at least 0 ions with intensity greater than 0. Optimal parameters for centWave were selected from the above range with the best results and the optimized parameter values are maximal tolerated m/z deviation of 30 ppm and peak width of c(5, 30). After extracting the pure ion chromatograms of the spiked urine samples, we manually examined whether the chromatograms of the drugs were extracted correctly by PITracer. PITracer was able to extract all chromatograms caused by the 50 forensic drugs spiked in human urine samples (with two different concentrations and three technical repeats). It is worth mentioning that PITracer could detect chromatograms with abnormal peak shapes which were ignored by centWave. Fortyeight out of the 50 forensic drugs can be detected in the chromatograms generated by PITracer without performing an additional mass calibration process (see Table 2). Even with a

and from 0.95 to 0.98 with the additional mass calibration. The results of the forensic drug detection in the chromatograms calibrated by the most often seen base ions showed that this calibration helped to improve the number of true positives (Tables 2) and F-score values (Tables 3) in the urine samples. Table 3. Performance for the Untargeted Forensic Drug Detection in High and Low Concentration-Spiked Urine Samples Using PITracer and centWave PITracer

Ma recall precision F-score recall precision F-score recall precision F-score a

Table 2. Average Number of True Positives of Untargeted Forensic Drug Detection in High and Low ConcentrationSpiked Urine Samples Using PITracer and centWavea PITracer without mass calibration

H1 H2 H3 H4 H5 H6 L1 L2 L3 L4 L5 L6

47.33 38.67 35.33 40.00 42.67 42.67 48.00 44.33 45.33 40.67 46.67 46.00

with mass calibration

XCMS recommended parameters

average number of true positives 50.00 39.33 50.00 26.33 50.00 22.00 49.67 28.33 50.00 39.67 49.67 39.00 49.33 46.67 48.67 37.67 49.00 38.33 47.33 35.00 49.00 41.67 47.00 41.67

with mass calibration

high concentration-spiked urine samples 0.82 >0.99 0.65 >0.99 >0.99 0.82 0.90 >0.99 0.72 low concentration-spiked urine samples 0.90 0.97 0.80 >0.99 >0.99 0.91 0.95 0.98 0.85 all spiked urine samples 0.86 0.98 0.72 >0.99 >0.99 0.87 0.92 0.99 0.78

optimized parameters 0.85 >0.99 0.91 0.92 >0.99 0.96 0.88 >0.99 0.93

M is the abbreviation of measurement.

High recall, precision, and F-score with the values of 0.98, 0.99, and 0.99 using PITracer for untargeted forensic drug detection in high and low concentration-spiked urine samples can also be seen in Table 3. XCMS (centWave) on the contrary could not detect many forensic drugs in the high concentration-spiked samples (see Table 2) with either the recommended parameters or with optimized parameters. In short, PITracer gave better recall and F-scores than centWave with similar precision in forensic drug-spiked urine samples showing PITracer tends to identify more analytes and generate less m/z error (usually less than 5 ppm, see Supporting Information Figure S-8). centWave failed to detect 2 forensic drugs completely (none in three repeats), PMMA and PMA while they were detectable by PITracer (Figure S-8). Moreover, centWave tends to generate much higher relative mass error (around 10−20 ppm in high concentration, and 5 to 20 ppm in low concentration) as we can see from the forensic drugs cases with varying concentration-spiked urine samples (see Supporting Information Figure S-8). Additional comparisons of MZmine60 in forensic drug-spiked urine samples can also be found in the supplementary (see Supporting Information Table S-2−S-4). PITracer outperforms both centWave and MZmine in the analysis of forensic drug-spiked urine samples, and also MZmine gave poorer recall and F-score than centWave. In summary, we demonstrate that PITracer can correctly extract the pure ion chromatograms and improve the peak detection using mass calibration for better mass precision and accuracy. However, cautions should be taken when performing mass calibration according to the most often present m/z value in base ions, one must ensure that the selected ion for mass calibration can be detected in every sample.

centWave

sample ID

without mass calibration

centWave XCMS recommended parameters

optimized parameters 47.33 41.33 34.67 39.67 45.67 45.00 48.00 44.67 45.00 42.67 47.33 47.33

a With two different concentrations, H and L, and three technical repeats. H1, H2, ..., H6: The 6 high concentration-spiked urine samples. L1, L2, ..., L6: The 6 low concentration-spiked urine samples.

complex matrix effect, in both the various high and low concentration-spiked urine samples, PITracer were able to recover at least 35 out of the 50 and 40 out of the 50 drugs, respectively. After performing additional mass calibration and correcting the mass values according to the selected target, tryptophan, PITracer correctly detected 49 and 47 drugs in high and low concentration-spiked urine samples on average, respectively (see Table 2). The average F-score of PITracer for high and low concentration-spiked urine samples increased from 0.90 to 0.99 F



■

Article

(4) Wilson, I. D.; Nicholson, J. K.; Castro-Perez, J.; Granger, J. H.; Johnson, K. A.; Smith, B. W.; Plumb, R. S. J. Proteome Res. 2005, 4, 591−598. (5) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994−999. (6) Ham, A.-J. L.; Engelward, B. P.; Koc, H.; Sangaiah, R.; Meira, L. B.; Samson, L. D.; Swenberg, J. A. DNA Repair 2004, 3, 257−265. (7) Xiao, J. F.; Varghese, R. S.; Zhou, B.; Nezami Ranjbar, M. R.; Zhao, Y.; Tsai, T.-H.; Di Poto, C.; Wang, J.; Goerlitz, D.; Luo, Y. J. Proteome Res. 2012, 11, 5914−5923. (8) Sasada, S.; Miyata, Y.; Tsutani, Y.; Tsuyama, N.; Masujima, T.; Hihara, J.; Okada, M. Oncol. Rep. 2013, 29, 925−931. (9) Armitage, E. G.; Barbas, C. J. Pharmaceut. Biomed. 2014, 87, 1− 11. (10) Zhang, T.; Wu, X.; Ke, C.; Yin, M.; Li, Z.; Fan, L.; Zhang, W.; Zhang, H.; Zhao, F.; Zhou, X. J. Proteome Res. 2012, 12, 505−512. (11) Weckwerth, W. Bioanalysis 2010, 2, 829−836. (12) Liberman, L. M.; Sozzani, R.; Benfey, P. N. Curr. Opin. Plant Biol. 2012, 15, 162−167. (13) Bellew, M.; Coram, M.; Fitzgibbon, M.; Igra, M.; Randolph, T.; Wang, P.; May, D.; Eng, J.; Fang, R.; Lin, C.; Chen, J.; Goodlett, D.; Whiteaker, J.; Paulovich, A.; McIntosh, M. Bioinformatics 2006, 22, 1902−1909. (14) Allwood, J. W.; De Vos, R. C. H.; Moing, A.; Deborde, C.; Erban, A.; Kopka, J.; Goodacre, R.; Hall, R. D. In Methods in Enzymology, Jameson, D., Verma, M., Westerhoff, H. V., Eds.; Academic Press: Amsterdam, 2011; pp 299−336. (15) Nicholson, J. K.; Connelly, J.; Lindon, J. C.; Holmes, E. Nat. Rev. Drug Discovery 2002, 1, 153−161. (16) Werf, M. v.; Jellema, R.; Hankemeier, T. J. Ind. Microbiol. Biotechnol. 2005, 32, 234−252. (17) Watkins, S.; German, J. Curr. Opin. Mol. Ther. 2002, 4, 224. (18) Griffin, J. L. Curr. Opin. Chem. Biol. 2006, 10, 309−315. (19) Quinones, M. P.; Kaddurah-Daouk, R. Neurobiol. Dis. 2009, 35, 165−176. (20) Kenar, E.; Franken, H.; Forcisi, S.; Wormann, K.; Haring, H. U.; Lehmann, R.; Schmitt-Kopplin, P.; Zell, A.; Kohlbacher, O. Mol. Cell. Proteomics 2014, 13, 348−359. (21) Shulaev, V. Briefings Bioinf. 2006, 7, 128−139. (22) Saghatelian, A.; Trauger, S. A.; Want, E. J.; Hawkins, E. G.; Siuzdak, G.; Cravatt, B. F. Biochemistry 2004, 43, 14332−14339. (23) Tang, Z.; Martin, M. V.; Guengerich, F. P. Anal. Chem. 2009, 81, 3071−3078. (24) Cho, J.-Y.; Kang, D. W.; Ma, X.; Ahn, S.-H.; Krausz, K. W.; Luecke, H.; Idle, J. R.; Gonzalez, F. J. J. Lipid Res. 2009, 50, 924−937. (25) Vinayavekhin, N.; Saghatelian, A. ACS Chem. Biol. 2009, 4, 617−623. (26) Mosier, A. C.; Justice, N. B.; Bowen, B. P.; Baran, R.; Thomas, B. C.; Northen, T. R.; Banfield, J. F. mBio 2013, 4, e00484−00412. (27) Bueschl, C.; Kluger, B.; Berthiller, F.; Lirk, G.; Winkler, S.; Krska, R.; Schuhmacher, R. Bioinformatics 2012, 28, 736−738. (28) Keller, B. O.; Sui, J.; Young, A. B.; Whittal, R. M. Anal. Chim. Acta 2008, 627, 71−81. (29) Fleming, C. M.; Kowalski, B. R.; Apffel, A.; Hancock, W. S. J. Chromatogr. A 1999, 849, 71−85. (30) Andreev, V. P.; Rejtar, T.; Chen, H. S.; Moskovets, E. V.; Ivanov, A. R.; Karger, B. L. Anal. Chem. 2003, 75, 6314−6326. (31) Katajamaa, M.; Oresic, M. J. Chromatogr. A 2007, 1158, 318− 328. (32) Windig, W.; Phalp, J. M.; Payne, A. W. Anal. Chem. 1996, 68, 3602−3606. (33) Ho, T. J.; Kuo, C. H.; Wang, S. Y.; Chen, G. Y.; Tseng, Y. J. J. Mass Spectrom. 2013, 48, 234−242. (34) Windig, W. Chemom. Intell. Lab. Syst. 2005, 77, 206−214. (35) Windig, W.; Smith, W. F. J. Chromatogr. A 2007, 1158, 251− 257. (36) Li, Y.; Qu, H.; Cheng, Y. Anal. Chim. Acta 2008, 612, 19−22. (37) Smith, C. A.; Want, E. J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. Anal. Chem. 2006, 78, 779−787.

CONCLUSIONS PITracer, a pure ion detection algorithm, was developed to generate the pure ion chromatograms and detect chromatographic peaks produced by metabolites in LC/TOF-MS-based metabolomics data. PITracer estimates the relative mass difference tolerance for each LC/TOF-MS profile, considers the relative mass difference tolerance of saturated chromatographic peaks, and then uses the most commonly present m/z value in base ions to calibrate m/z values to reduce the relative mass differences and improve mass precision. The pure ion chromatograms are extracted according to the estimated relative mass difference tolerance with the calibrated LC/ TOF-MS profile for peak detection algorithms. The m/z values of detected peaks reported by the peak detection algorithms can also be corrected additionally according to a user-specified metabolite. PITracer can extract the chromatograms with varying m/z value fluctuations and detect analytes having low intensities with sensitivity and precision. Hence, analyzing LC/TOF-MSbased metabolomics data with PITracer can detect the metabolites more thoroughly and might reveal new metabolites. PITracer allows for the full analytical power of LC/TOF-MS to be utilized for automated exploratory analytical chemistry in metabolomics studies and is available at http://PITracer.cmdm. tw.

■

ASSOCIATED CONTENT

S Supporting Information *

Mass calibration by base ions, experimental details, data analysis, and comparisons of MZmine, figures showing the distribution of the relative mass difference, m/z vs scan index plots, chromatogram, m/z trace, relative mass error, relative mass difference, and relative mass error, and tables showing information on 50 forensic drugs, parameters and set values for chromatogram deconvultion algorithms, average number of true positives, performance of untargeted drug detection, and additional references. This material is available free of charge via the Internet at http://pubs.acs.org.

■

AUTHOR INFORMATION

Corresponding Author

*Voice: +886.2.3366.4888#529. Fax: +886.2.23628167. E-mail: [email protected]. Notes

The authors declare no competing financial interest.

■

ACKNOWLEDGMENTS This work was funded by the Ministry of Science and Technology, Taiwan, grants numbers NSC101-2627-B-002004 and NSC100-2321-B-002-019. Resources of the National Taiwan University’s Laboratory of Computational Molecular Design and Metabolomics, Metabolomics Core Laboratory and Department of Computer Science and Information Engineering were used in performing these studies.

■

REFERENCES

(1) Andersen, J. S.; Lam, Y. W.; Leung, A. K. L.; Ong, S.-E.; Lyon, C. E.; Lamond, A. I.; Mann, M. Nature 2005, 433, 77−83. (2) Nesatyy, V. J.; Suter, M. J.-F. Environ. Sci. Technol. 2007, 41, 6891−6900. (3) Oksman-Caldentey, K.-M.; Inzé, D. Trends Plant Sci. 2004, 9, 433−440. G



Article

(38) Tautenhahn, R.; Bottcher, C.; Neumann, S. BMC Bioinf. 2008, 9, 504. (39) Yu, T.; Park, Y.; Johnson, J. M.; Jones, D. P. Bioinformatics 2009, 25, 1930−1936. (40) Zhou, B.; Xiao, J. F.; Tuli, L.; Ressom, H. W. Mol. BioSyst. 2012, 8, 470−481. (41) Castillo, S.; Gopalacharyulu, P.; Yetukuri, L.; Orešič, M. Chemom. Intell. Lab. Syst. 2011, 108, 23−32. (42) Mihaleva, V. V.; Vorst, O.; Maliepaard, C.; Verhoeven, H. A.; de Vos, R. C. H.; Hall, R. D.; van Ham, R. C. H. J. Metabolomics 2008, 4, 171−182. (43) Gadgil, H. S.; Pipes, G. D.; Dillon, T. M.; Treuheit, M. J.; Bondarenko, P. V. J. Am. Soc. Mass. Spectr. 2006, 17, 867−872. (44) Pedrioli, P. G.; Eng, J. K.; Hubley, R.; Vogelzang, M.; Deutsch, E. W.; Raught, B.; Pratt, B.; Nilsson, E.; Angeletti, R. H.; Apweiler, R.; Cheung, K.; Costello, C. E.; Hermjakob, H.; Huang, S.; Julian, R. K.; Kapp, E.; McComb, M. E.; Oliver, S. G.; Omenn, G.; Paton, N. W.; Simpson, R.; Smith, R.; Taylor, C. F.; Zhu, W.; Aebersold, R. Nat. Biotechnol. 2004, 22, 1459−1466. (45) Lin, S. M.; Zhu, L.; Winter, A. Q.; Sasinowski, M.; Kibbe, W. A. Expert Rev. Proteomics 2005, 2, 839−845. (46) Deutsch, E. Proteomics 2008, 8, 2776−2777. (47) Yu, T.; Park, Y.; Li, S.; Jones, D. P. J. Proteome Res. 2013, 12, 1419−1427. (48) Xia, J.; Psychogios, N.; Young, N.; Wishart, D. S. Nucleic Acids Res. 2009, 37, W652−660. (49) Huang, S.-M.; Toh, W.; Benke, P.; Tan, C.; Ong, C. Metabolomics 2014, 10, 1084−1093. (50) Kiefer, P.; Schmitt, U.; Vorholt, J. A. Bioinformatics 2013, 29, 963−964. (51) Juarez, P.; Fernandez-del-Carmen, A.; Rambla, J. L.; Presa, S.; Mico, A.; Granell, A.; Orzaez, D. J. Agric. Food Chem. 2014, 62, 8158− 8168. (52) Du, P.; Kibbe, W. A.; Lin, S. M. Bioinformatics 2006, 22, 2059− 2065. (53) Zhang, J.; Gonzalez, E.; Hestilow, T.; Haskins, W.; Huang, Y. Curr. Genomics 2009, 10, 388−401. (54) Jensen, L. J.; Saric, J.; Bork, P. Nat. Rev. Genet. 2006, 7, 119− 129. (55) Tsai, I. L.; Weng, T. I.; Tseng, Y. F. J.; Tan, H. K. L.; Sun, H. J.; Kuo, C. H. J. Anal. Toxicol. 2013, 37, 642−651. (56) Moriya, F. Leg. Med. 2009, 11 (Suppl 1), S434−436. (57) Freye, E.; Levy, J. V. Pharmacology and Abuse Cocaine, Amphetamines, Ecstasy and Related Designer Drugs; Springer: Dordrecht, the Netherlands, 2009. (58) Ministry of Health and Welfare, Republic of China (Taiwan) Regulations Governing Drug Abuse Urine Testing Operations, 2008. http://mohwlaw.mohw.gov.tw/Chi/EngContent.asp?Msgid=114 (accessed 18 December 2014). (59) Patti, G. J.; Tautenhahn, R.; Siuzdak, G. Nat. Protoc. 2012, 7, 508−516. (60) Pluskal, T.; Castillo, S.; Villar-Briones, A.; Oresic, M. BMC Bioinf. 2010, 11, 395.

H


Statistical analysis and modeling of mass spectrometry-based metabolomics data.

Mass spectrometry data of metabolomics analysis of Nepenthes pitchers.

Metabolomics data normalization with EigenMS.

Mass spectral similarity for untargeted metabolomics data analysis of complex mixtures.

mass spectrometry metabolomics data.

Multisegment injection-capillary electrophoresis-mass spectrometry: a high-throughput platform for metabolomics with high data fidelity.

Mass spectrometry in plant metabolomics strategies: from analytical platforms to data acquisition and processing.

Improved data-dependent acquisition for untargeted metabolomics using gas-phase fractionation with staggered mass range.

High precision mass measurements for wine metabolomics.

Basics of mass spectrometry based metabolomics.

Integration of metabolomics data into metabolic networks.

Multilevel pharmacokinetics-driven modeling of metabolomics data.

Label-free data standardization for clinical metabolomics.

COordination of Standards in MetabOlomicS (COSMOS): facilitating integrated metabolomics data access.

Training in metabolomics research. I. Designing the experiment, collecting and extracting samples and generating metabolomics data.

Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data.

Mass-spectrometry-based microbial metabolomics: recent developments and applications.

Rethinking Mass Spectrometry-Based Small Molecule Identification Strategies in Metabolomics.

Overview of mass spectrometry-based metabolomics: opportunities and challenges.

Evaluating plant immunity using mass spectrometry-based metabolomics workflows.

Mitochondrial metabolomics using high-resolution Fourier-transform mass spectrometry.

Mass spectrometric based approaches in urine metabolomics and biomarker discovery.

Mass spectrometry as a quantitative tool in plant metabolomics.

MVAPACK: a complete data handling package for NMR metabolomics.