G Model

ARTICLE IN PRESS

CHROMA-356449; No. of Pages 9

Journal of Chromatography A, xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Journal of Chromatography A journal homepage: www.elsevier.com/locate/chroma

A new chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis Yong-Jie Yu a,∗,1 , Hai-Yan Fu b,1 , Li Zhang c , Xiao-Yu Wang a , Pei-Jian Sun a , Xiao-Bing Zhang a , Fu-Wei Xie a,∗ a

Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China College of Pharmacy, South-Central University for Nationalities, Wuhan 430074, China c Technology Center of China Tobacco Guizhou Industrial Co. Ltd., Guiyang 550009, China b

a r t i c l e

i n f o

Article history: Received 5 December 2014 Received in revised form 23 March 2015 Accepted 16 April 2015 Available online xxx Keywords: Metabolic profiling analysis Plant sample Automatic chromatographic peak detection Peak-shift alignment

a b s t r a c t An automatic and efficient data analysis method for comprehensive metabolic profiling analysis is urgently required. In this study, a new chemometric-assisted method for metabolic profiling analysis (CAMMPA) was developed to discover potentially valuable metabolites automatically and efficiently. The proposed method mainly consists of three stages. First, automatic chromatographic peak detection is performed based on the total ion chromatograms of samples to extract chromatographic peaks that can be accurately quantified. Second, a novel peak-shift alignment technique based on peak detection results is implemented to resolve time-shift problems across samples. Consequently, aligned results, including aligned chromatograms, and peak area tables, among others, can be successfully obtained. Third, statistical analysis using results from unsupervised and supervised classification results, together with ANOVA and partial least square-discriminate analysis, is performed to extract potential metabolites. To demonstrate the proposed technique, a complex GC–MS metabolic profiling dataset was measured to identify potential metabolites in tobacco plants of different growth stages as well as different plant tissues after maturation. Results indicated that the efficiency of the routine metabolic profiling analysis procedure can be significantly improved and potential metabolites can be accurately identified with the aid of CAMMPA. © 2015 Published by Elsevier B.V.

1. Introduction Metabolomics aims to achieve the large-scale identification and quantification of metabolites to study metabolite changes and is widely applied in various scientific fields [1–5]. Metabolic analysis can usually be classified as either target or untargeted analysis [2,3,6]. Untargeted analysis detects all metabolites species simultaneously and is commonly achieved by using techniques such as GC–MS and LC–MS. These techniques provide large amounts of data containing information on a broad range of compound classes and are thus valuable for metabolic profiling analysis [7,8]. In contrast to instrumental improvements, data analysis methods for metabolic profiling analysis lag behind in recent years. The problem of hundreds of chromatographic peaks eluting in a single total ion chromatogram (TIC) is commonly encountered in most practical applications. Moreover, manual

∗ Corresponding authors. Tel.: +86 371 67672502; fax: +86 731 67670260. E-mail addresses: [email protected] (Y.-J. Yu), [email protected] (F.-W. Xie). 1 These authors contributed equally to this work.

verification of large-scale datasets is cumbersome, inefficient, prone to errors, and even irreproducible during complex sample analysis [2,5,9]. As such, a data analysis method for the large-scale, high-throughput conversion of raw datasets into organized data for data visualization and potential metabolite extraction must be developed. A number of methods and software tools are used to process GC–MS or LC–MS data, and these techniques are frequently applied in metabolic studies. Some of these tools include ADAPGC 1.0 and 2.0 [10,11], MRMPROBS [12], MET-IDEA [13], CAMERA [14], MetSign [15], MetAlign [16], and XCMS [17–20], among others [21–36]. When processing metabolic profiling sets based on data obtained through chromatography coupled to MS, the performance of a data-mining method mainly depends on two critical stages: (1) peak detection and integration [11,23] and (2) peakshift alignment [31,34]. Various algorithms have been individually proposed for either chromatographic peak detection or time peak alignment [37–51]. For instance, widely used time-shift alignment methods are often represented by correlation-optimized warping and dynamic-time warping [37,52]. However, in practical application, analysis of complicated samples continues to challenge

http://dx.doi.org/10.1016/j.chroma.2015.04.029 0021-9673/© 2015 Published by Elsevier B.V.

Please cite this article in press as: Y.-J. Yu, et al., A new chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis, J. Chromatogr. A (2015), http://dx.doi.org/10.1016/j.chroma.2015.04.029

G Model CHROMA-356449; No. of Pages 9

ARTICLE IN PRESS Y.-J. Yu et al. / J. Chromatogr. A xxx (2015) xxx–xxx

2

researchers, especially when large-scale chromatograms are analyzed for metabolic profiling. An automatic chromatographic peak detection (ACPD) method was proposed recently in our previous study [53]. This method can efficiently and accurately detect valuable chromatographic peaks with signal-to-noise ratios larger than 10 using an automatic instrumental noise estimation strategy and provide reasonable background-drift correction results. Using peak detection results as bases, a novel peak-shift alignment strategy was further established to align chromatographic peaks across samples. In the present study, a chemometric-assisted method based on GC–MS was developed for metabolic profiling analysis. This method involves three phases: (1) The TICs of samples were analyzed to extract underlying chromatographic peaks; (2) peak-shift alignment based on ACPD results was performed; and (3) a peak area list table that could be directly used for subsequent statistical, such as through ANOVA coupled with partial least square-discriminate analysis (PLS-DA), was constructed to identify potentially valuable metabolites. To demonstrate the performance of the proposed method, 30 plant samples, including 12 samples from different growth periods and 18 samples from various issues of mature tobacco plants, were analyzed by a GC–MS system. In each TIC obtained, more than 240 chromatographic peaks were detected, and a large number of peaks overlapped. Analysis of such datasets is a laborious and challenging endeavor for most analysts. The new method proposed in the present work allows automatic and efficient data processing with satisfactory classification results. Finally, a number of potential metabolites showing significant differences over three growth stages and plant tissues after maturation were extracted. 2. Experiment 2.1. Sample preparation Thirty plant samples used for metabolic profiling analysis were collected from Zunyi, Guizhou Province, China. Of these samples, 18 were obtained from the lower, middle, and upper leaf surfaces of tobacco after maturation (six samples were collected per stage). Twelve other samples were collected at different growth stages with six samples per stage. All samples were rapidly frozen in liquid nitrogen and stored before analysis. The samples were then freeze dried over a 24 h period and ground to powder. Metabolites were extracted using a solvent mixture of MeOH (JT Baker, USA), CHCl3 (JT Baker, USA), and H2 O (5/2/2, v/v/v). About 20 mg of each sample was placed in a 5 mL tube, and then mixed with 2 mL of solvent. The mixture was sonicated for 40 min at room temperature, after which approximately 400 ␮L of supernatant was transferred to a 2.5 mL Eppendorf tube and dried by a gentle flow of helium gas. Derivatization was performed to reduce the influences of sugars as extensively as possible and improve the volatility of non-volatile compounds. Finally, 1 ␮L of supernatant was analyzed on an Agilent 7890-5975 GC–MS system. Details of the sample preparation for GC–MS are discussed in our previous work [54]. 2.2. GC–MS analysis The chromatographic column used in this work was a DB-5MS column (30 m × 0.25 mm, 0.25 ␮m). Analysis was carried out at 290 ◦ C and the injector featured a 1:5 split ratio. Helium, as the carrier gas, was flowed through the system at a constant rate of 1.0 mL min−1 . The column temperature was kept constant for 4 min at 70 ◦ C and then increased to 310 ◦ C at a rate of 5 ◦ C min−1 . This temperature was maintained for 10 min. The transfer line and ion source temperatures were set to 280 and 230 ◦ C, respectively. Mass

spectra were acquired in full scan monitoring mode within the scan range of 40–510. Finally, a TIC with over 10,000 elution channels was collected for each sample. The GC–MS datasets were acquired in netcdf format and converted by Matlab 2013b (Mathworks, USA). 3. Methodology The proposed CAMMPA method comprised three stages: ACPD of TIC, a novel peak-shift alignment, and statistical analysis (Fig. 1). Each of these stages is discussed in this section in detail. 3.1. Data preprocessing A denoising strategy was applied to the TICs prior to data analysis. Our investigation indicated that three-point moving window averaging is suitable for most cases. 3.2. ACPD As the ACPD strategy was explained in our previous work [53], some basic details are provided in the present work. The TIC of a sample was analyzed to obtain chromatographic peaks that can be accurately quantified, i.e. chromatographic peaks with signalto-noise ratios over 10. Signals that continuously increased or decreased more by over three elution channels were temporally recognized as evidence of analytes: xi < xi+1 < xi+2 < xi+3 < . . .

(1)

xj > xj+1 > xj+2 > xj+3 > . . .

(2)

where x is the recorded signal and i and j represent the ith and jth elution channels, respectively. After eliminating these signals, the instrumental noise level can be immediately estimated by robust statistical analysis. A pseudo-peak elimination step was then used to separate artifacts from real peaks. Using a chromatographic peak clustering strategy, the peak baseline can be accurately estimated, and peak information, including retention time, peak area, elution time, and peak height, can be simultaneously obtained; these data are very helpful for subsequent peak-shift alignment analysis. Small peaks with average signal-to-noise ratios less than 10 were eliminated because these compounds cannot be accurately quantified. 3.3. Peak-shift alignment A peak-shift alignment procedure was performed based on the TICs of the samples. Fig. 1 describes the peak alignment procedure in detail, and a brief discussion of peak-shift alignment is provided in Fig. 2. The reference sample must be determined prior to analysis. In general, the sample with the most number of chromatographic peaks were separated could be selected as the reference. In this work, the TIC of the 10th sample was selected as the reference. The chromatographic peaks of the reference and test sample were aligned individually. When a chromatographic peak is selected, its maximum time-shift range should be pre-estimated. According to our experience, a time-shift ±0.5 min can be used for most cases in metabolic profiling analysis based on GC–MS. Fig. 2A shows the elution segment of t2 , which is a chromatographic peak that must be aligned to the reference sample. The corresponding elution segment in the reference sample, r2 , is also provided in Fig. 2A, where three chromatographic peaks are detected by ACPD. A rough alignment procedure was used in our peak alignment method to align the time-shifts between the segment in the reference sample and that in the test sample. A peak-to-peak alignment strategy was used to improve data analysis efficiency. Three chromatographic peaks in the reference sample were separately aligned to each of the chromatographic peaks in the test sample, and the

Please cite this article in press as: Y.-J. Yu, et al., A new chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis, J. Chromatogr. A (2015), http://dx.doi.org/10.1016/j.chroma.2015.04.029

G Model

ARTICLE IN PRESS

CHROMA-356449; No. of Pages 9

Y.-J. Yu et al. / J. Chromatogr. A xxx (2015) xxx–xxx

3

Import TICs of Analyzed Samples

TIC of Reference Sample

1. 2. 3. 4. 5. 6. 7.

Data Smoothing

Automatic Chromatographic Peak Detection Stage Rough chromatographic peak detection Instrumental noise level estimation Pseudo-peak elimination Peak clustering and baseline estimation Peak integration Small peak elimination Provide peak information: elution range, retention time, peak area, etc.

Peak Alignment Stage Select a peak in the reference sample and get its elution segment. Extract peaks from the test sample in the corresponding elution segment. Perform rough alignment of the selected segment. Implement precise alignment of the reference sample. Register aligned chromatographic peaks. Create a peak alignment table. Change the roles of the reference and test samples to achieve another peak alignment table. 8. Merge two tables to create the final peak alignment table. 9. Get peak alignment results: aligned TICs, aligned peak area list, etc. 1. 2. 3. 4. 5. 6. 7.

Statistical Analysis Stage 1. Perform unsupervised classification methods such as principal component analysis and hierarchical cluster analysis to visualize the distributions of analyzed samples. 2. Perform ANOVA to extract compounds showing significant differences in various groups. 3. Perform PLS-DA to obtain the VIPs of compounds and obtain potential valuable metabolites. 4. Extract information of potential valuable metabolites: retention time, peak area, distribution characteristics in various groups, etc.

Export results for further metabolite identification and other analyses. Fig. 1. Flowchart of CAMMPA for extracting potential metabolites.

results of this procedure are illustrated in Fig. 2B. When the first peak of the reference sample is aligned to the first peak in the test sample, the cosine value of the first reference peak may be calculated according to the following equation: cosine1 =



r1T x2 r1T r1



x2T x2



(3)

where r1 represents the centered chromatographic signal of the first reference sample and x2 is the centered chromatographic signal corresponding to the aligned elution channels of r1 . The cosine values of the second and third peaks can be calculated similarly. The cosine values of the first, second, and third chromatographic peaks in the reference sample are 0.9977, 0.8123, and 0.5809, respectively. The roughly aligned chromatograms of the reference and the

test samples are depicted in Fig. 2B. In this work, a total COS value was employed to evaluate the quality of an aligned chromatogram; this value was calculated as the sum of the weighted cosine values of three reference peaks: COS = w1 cosine1 + w2 cosine2 + w3 cosine3

(4)

where w1 = l1 /(l1 + l2 + l3 ) and l1 , l2 , and l3 are the number sof elution channels of the first, second, and third peaks, respectively. The rough alignment result is the value corresponding to the maximum COS value (Fig. 2B). A precise alignment step was then performed. The relative distance, cosine value, and real distance between the chromatographic peak r2 in the reference sample and each of the peaks (t1 , t2 , and t3 ) in the test sample were independently calculated. The

Please cite this article in press as: Y.-J. Yu, et al., A new chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis, J. Chromatogr. A (2015), http://dx.doi.org/10.1016/j.chroma.2015.04.029

G Model CHROMA-356449; No. of Pages 9 4

ARTICLE IN PRESS Y.-J. Yu et al. / J. Chromatogr. A xxx (2015) xxx–xxx

Fig. 2. Peak-shift alignment workflow of CAMMPA using a segment of the second compound, r2 . (A) Segments in the reference and test samples are independently extracted using retention times of r2 and t2 , respectively. Chromatographic peaks from ACPD of CAMMPA are marked in two segments. (B) COS values obtained by rough alignment of three test peaks to corresponding references peaks. The high COS value indicates that rough alignment is achieved. (C) A precise alignment step is used to align the reference peak, r2 , to the correct chromatographic peak. (D) Registration of the alignment results to create a peak alignment table (right table). By switching the roles of the reference and test samples, another peak alignment table can be created (left table). By merging these two tables, the final alignment table can be created (bottom). (E) Peak alignment results obtained for subsequent data analysis.

chromatographic peak t2 with the minimum absolute value and the maximum cosine value must be aligned to the reference peak r2 . The peak alignment procedure was repeated until all of the peaks in the reference sample were aligned and an alignment table was obtained. In our peak alignment strategy, if two chromatographic peaks in the reference sample are aligned to the same chromatographic peak in test sample, the peak with the smaller cosine value will be deleted (see the upper-right table in Fig. 2D where the first reference peak is represented by a missing peak in the test sample). Therefore, a strategy that reverses the roles of the reference sample and the test sample was used. Fig. 2D reveals no missing peaks after the roles of the reference and test samples were switched. The alignment results of all reference peaks can be obtained by merging the two tables. Finally, alignment results including an aligned chromatogram plot, a retention time list table, an area list table, and a peak number list table can be obtained (Step 4, Fig. 2). In the proposed method, peaks in the reference sample were not required to be presented in the test sample. It is assumed that the peak elution order in the reference sample is consist with that in the test samples. In the case where an analyte in the test sample is absent in the reference sample, the corresponding peak will

be treated as an additional peak. By contrast, if an analyte in the reference sample is absent in the test reference sample, the corresponding peak area will be treated as zero.

3.4. Statistical analysis for extracting potentially valuable metabolites Statistical analysis was performed based on the area list table. Unsupervised classification methods, such as principal component analysis (PCA) and hierarchical cluster analysis (HCA), were used to visualize the distribution characteristics of analyzed samples. Supervised classification methods, such as ANOVA and PLS-DA, were then used to extract potential metabolites. Potentially valuable metabolites are those compounds that show significant differences among various groups (p value); these differences are important for classification according to the variable importance in projection (VIP) of the metabolites. Finally, information on potential metabolites, their including retention times, peak areas, distribution characteristics in various groups, and mass spectra, were obtained to allow further filtering and identification of valuable compounds.

Please cite this article in press as: Y.-J. Yu, et al., A new chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis, J. Chromatogr. A (2015), http://dx.doi.org/10.1016/j.chroma.2015.04.029

G Model CHROMA-356449; No. of Pages 9

ARTICLE IN PRESS Y.-J. Yu et al. / J. Chromatogr. A xxx (2015) xxx–xxx

3.5. Data analysis In this work, the GC–MS data were manually treated. AMDIS was first performed in an Agilent workstation for overlapping peak deconvolution. The quantification results were then obtained by the workstation based on a list of ion and retention times. Manually modification of peak detection and integration results was performed according to a data visualization procedure. Additionally, all of the data were analyzed by MET-IDEA [55]. All calculations were carried out using Matlab (MathWorks, USA). A computer with Windows 7 (64-bit), an Intel® CoreTM i5 CPU (2.8 GHz), and 8G RAM was used. 4. Results and discussion Metabolic profiling of 30 samples was performed in two stages. First, metabolite changes in different growth stages were investigated, after which the distribution characteristics of key compounds in different tissues of the mature plant were identified. Routine data analysis of the GC–MS data obtained, such as through visual inspection and manual modification of integrated chromatographic peaks, is time consuming because hundreds of chromatographic peaks may be present in each TIC. CAMMPA was used for data analysis to extract potentially valuable metabolites. A three-point moving window averaging strategy was applied to the TICs of all of the samples to improve their signalto-noise ratios and maintain peak area after denoising. The results from each stage of CAMMPA are discussed in detail in the following sections. 4.1. Chromatographic peak detection results ACPD is the first and possibly the most critical stage of the data analysis procedure because subsequent peak-shift alignment depends largely on the peak detection results from ACPD. A significant advantage of ACPD is that instrumental noise levels can be automatically estimated. The estimated noise levels in the TICs of 30 samples are shown in Fig. 3A. Noise levels rapidly changed among the samples. The maximum value is almost twice as large as the minimum value because of background drift changes. Fig. 3B shows the distribution characteristics of the first order-derivative of the 30th TIC with removal of chromatographic peaks. A Gaussian curve with a mean value of −46.31 and a standard derivation of 547.58 reveal good data fitting. To verify the instrumental noise estimation results, the TIC of the 30th sample and its first-order derivative, together with the estimated instrumental noise level, are plotted in Fig. 3C. The chromatographic peaks can be successfully separated from the instrumental noise. Fig. 3D shows the detected chromatographic peaks marked with red lines. Useful peaks can clearly be reasonably extracted and integrated. Fig. 4 provides an overview of the chromatographic peaks extracted from the reference sample. A total of 248 chromatographic peaks that can be accurately quantified are detected from the TICs. In practical application, analysts must manually check the eluted compounds and modify unreasonable integrated peaks provided by the workstation; this activity is quite trivial but very time consuming. Fortunately, these peaks can be reasonably integrated by the ACPD stage of the proposed CAMMPA method, although several chromatographic peaks may be observed to significantly overlap. The zoomed-in plots illustrate the chromatographic peaks eluted in three different elution ranges in greater detail. The separation shown in Fig. 4 is the most common finding encountered during metabolic profiling analysis of complex samples. The chromatographic peaks in R1 of Fig. 4 indicate that the compounds are incompletely separated. However, these compounds can be

5

accurately quantified after a reasonable split by the red lines shown in R1 of Fig. 4. The separation in R2 is satisfactory because metabolites eluted in this range are successfully separated. The ACPD results of R2 also suggest that chromatographic peaks are precisely extracted and peak integration results are acceptable. By contrast, the separation results in R3 are highly unsatisfactory because several chromatographic compounds are simultaneously eluted and serious overlapping may be observed. For practical analysis, the integrated chromatographic peak results may be considered acceptable. 4.2. Peak-shift alignment results Fig. 5A shows the chromatographic peak areas detected in 30 samples; the patterns obtained contain over 7000 peaks that can be accurately quantified. In this figure, the chromatographic peaks significantly shifted among the samples. Fig. 5B shows that, in contrast to the previous figure, the chromatographic peaks are properly aligned by CAMMPA. Fig. 5C provides a more detailed visualization of several ranges in Fig. 5B. The results in Fig. 5B and C confirm that the peak-shift alignment results of detected peaks are acceptable for data analysis. To verify the performance of CAMMPA during peak-shift alignment, the TICs of 30 samples are shown in Fig. 6A; corresponding results obtained from CAMMPA are displayed in Fig. 6B. Fig. 6(a1)–(a5) depicts five elution segments selected from the original TICs of the 30 samples. The time-shift situations differed among various segments. The time-shift problems in Fig. 6(a1) and (a3) are more serious than those in other figures. Moreover, in some cases, time-shifts differed among compounds eluted in the same elution segment. The compounds eluted between 10.7 and 10.8 min shifted much less than those eluted between 10.3 and 10.5 min. Application of CAMMPA sufficiently resolve the timeshift problem (Fig. 6(b1)–(b5)). The profiles depicted in the second column of Fig. 6 are based on peaks extracted through CAMMPA. Obvious break points are shown in Fig. 6B. The profiles shown in Fig. 6(b1)–(b5) confirm that chromatographic peaks can be accurately detected through CAMMPA and that the peak-shift alignment results are acceptable for subsequent metabolic analysis. 4.3. Statistical analysis results Statistical analysis by CAMMPA was performed based on the area list table obtained from the peak-shift alignment results (Fig. 6). In this study, a peak area list table with a size of 248 × 30 was obtained. Unsupervised data analysis based on PCA and HCA was initially performed. Fig. 7A shows the scores of 30 samples projected into a subspace spanned by the first principal components. Even if only part of the complete information (e.g., 60.0%) is extracted by the first two principal components, five groups can be observed (Fig. 7A). Twelve growth-stage samples are located in the top-left corner of Fig. 7A, which are clearly separated from the maturation samples. The growth-stage samples can be further divided into two subgroups: the rosette-stage samples and the vigorous growth-stage samples. Mature samples composed of lower-, middle-, and upper-leaf samples were also successfully separated (Fig. 7A). The clustering results were confirmed by HCA analysis, and five groups were obtained (data not shown). These results reveal that the unsupervised classification results are acceptable for primary data analysis. Unsupervised classification was performed using the peak area list table obtained from the manually treated results and those obtained from MET-IDEA. The sample distribution characteristics of 30 samples in terms of the first two principal components are shown in Fig. 7B, while those obtained from MET-IDEA are plotted in Fig. 7C. The results shown in Fig. 7 indicate that the samples

Please cite this article in press as: Y.-J. Yu, et al., A new chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis, J. Chromatogr. A (2015), http://dx.doi.org/10.1016/j.chroma.2015.04.029

G Model CHROMA-356449; No. of Pages 9 6

ARTICLE IN PRESS Y.-J. Yu et al. / J. Chromatogr. A xxx (2015) xxx–xxx

Fig. 3. Automatic chromatographic detection stage of CAMMPA. (A) Estimated instrumental noise levels in the TICs of 30 samples. (B) Verification of the accuracy of estimated noise levels using the first-order derivative distribution of the 30th TIC as an example. The estimated Gaussian curve fits the data well. (C) Estimated noise levels can successfully separate useful signals from artifacts. (D) Chromatographic peaks detected in the elution range of (C).

in Fig. 7A are reasonably grouped. In the manually treated data, a total of 470 chromatographic peaks were quantified. More chromatographic peaks (698 peaks) were obtained from MET-IDEA. Chromatographic peaks from both manually treated and METIDEA results numbered far more than those from CAMMPA. While CAMMPA can determine potential biomarkers that support the clustering results, it cannot achieve resolution of overlapping chromatographic peaks. In cases where chromatographic peaks are partially co-eluted, these peaks will be forcibly separated into several individual peaks according to the local minimum positions. However, if two chromatographic peaks are so significantly

overlapped that no local minimum can be found between them, the two peaks will be merged as a single chromatographic peak. In fact, we recommend users to manually resolve overlapping peaks identified as potential biomarkers by CAMMPA. Potential metabolites are extracted by the following data analysis strategy. First, ANOVA is used to identify compounds showing significant differences among various groups. In this work, a compound with p < 0.001 was subjected to subsequent metabolite analysis. PLS-DA is then used to calculate the VIPs of potential metabolites. VIPs exceeding 1 are retained as potential metabolites for further analysis, such as metabolite identification. Over 50

Fig. 4. Automatic chromatographic detection results of CAMMPA obtained by using the TIC of the reference sample as an example. Three rectangular plots show detailed peak detection results within various elution ranges.

Please cite this article in press as: Y.-J. Yu, et al., A new chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis, J. Chromatogr. A (2015), http://dx.doi.org/10.1016/j.chroma.2015.04.029

G Model CHROMA-356449; No. of Pages 9

ARTICLE IN PRESS Y.-J. Yu et al. / J. Chromatogr. A xxx (2015) xxx–xxx

7

Fig. 5. Peak alignment results of CAMMPA. (A) Original peak detection results from the ACPD stage of CAMMPA. (B) Peak alignment results of CAMMPA. (C) The four plots show four detailed peak ranges after peak alignment.

Fig. 6. Peak-shift alignment results of CAMMPA. (A) Original TICs of 30 samples and (B) results obtained through CAMMPA after automatic peak detection and alignment. (a1)–(a5) Five selected elution windows marked in (A) and (b1)–(b5) the corresponding alignment results. Profiles in the second column are plotted based on ACPD results.

Please cite this article in press as: Y.-J. Yu, et al., A new chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis, J. Chromatogr. A (2015), http://dx.doi.org/10.1016/j.chroma.2015.04.029

G Model CHROMA-356449; No. of Pages 9

ARTICLE IN PRESS Y.-J. Yu et al. / J. Chromatogr. A xxx (2015) xxx–xxx

8

detect chromatographic peaks and provides accurate peak-shift alignment results. Potential metabolites can be obtained by analyzing the aligned peak area list through statistical methods, including PCA, HCA, ANOVA, and PLS-DA. A significant advantage of CAMMPA is that the method requires a short data-processing procedure and presents efficiency for data analysis. Results from plant samples confirmed the applicability of the proposed method in discovering potential metabolites in complex samples. Thus, CAMMPA presents a good approach for efficient metabolic profiling analysis. Acknowledgements The authors gratefully acknowledge the financial support of the Foundation of ZTRI (Grant No. 322013CA0290). Hai-Yan Fu is also grateful for the financial support provided by the open funds of the State Key Laboratory of Chemo/Biosensing and Chemometrics of Hunan University (No. 201111) and the National Natural Science Foundation of China (No. 21205145). References

Fig. 7. (A) Unsupervised classification results based on PCA and CAMMPA. (B) Sample score plot obtained by performing PCA on manually integrated peak areas. (C) Sample score plot obtained through PCA of the MET-IDEA results. Sample numbers 1–6 indicate rosette-stage samples, numbers 7–12 indicate vigorous growing-stage samples, numbers 13–18 indicate lower-leaf samples, numbers 19–24 indicate middle-leaf samples, and numbers 25–30 indicate upper leaf samples.

potential metabolites were identified in different growth stages, and relevant information, including retention times, mass spectra, peak areas, and relative distributions in various groups, was obtained through CAMMPA. Similarly, over 70 potential metabolites showing significant differences in the lower, middle, and upper regions of mature leaves were found. These metabolites require further study by skilled analysts for further characterization. The efficiency of metabolic profiling analysis is an important characteristic that must be considered in practical applications. Our investigation indicated that no more than 2 min is required by CAMMPA to analyze 30 samples (the data size is almost 800 Mb). Based on the facts that over 10,535 × 471 × 30 data points are contained in the GC–MS dataset and that about 7000 chromatographic peaks must be aligned, CAMMPA can be treated as an efficient method for metabolic profiling analysis. While the example used in the present work is obtained from GC–MS, data obtained from LC–MS can also be analyzed by CAMMPA. 5. Conclusion A new CAMMPA technique was developed for automatic metabolic profiling analysis and potential metabolite identification of complex samples. The proposed method can automatically

[1] N.L. Kuehnbaum, P. Britz-McKibbin, New advances in separation science for metabolomics: resolving chemical diversity in a post-genomic era, Chem. Rev. 113 (2013) 2437–2468. [2] M. Li, L. Yang, Y. Bai, H. Liu, Analytical methods in lipidomics and their applications, Anal. Chem. 86 (2014) 161–175. [3] C.R. Morris, J.T. Scott, H.-m. Chang, R.R. Sederoff, D. O’Malley, J.F. Kadla, Metabolic profiling: a new tool in the study of wood formation, J. Agric. Food Chem. 52 (2004) 1427–1434. [4] T.D. Veenstra, M. Zhou, Tissue proteomics and metabolomics an excellent start and a promising future, J. Proteome Res. 8 (2009) 1617. [5] A. Zhang, H. Sun, P. Wang, Y. Han, X. Wang, Modern analytical techniques in metabolomics analysis, Analyst 137 (2012) 293–300. [6] K. Dettmer, P.A. Aronov, B.D. Hammock, Mass spectrometry-based metabolomics, Mass Spectrom. Rev. 26 (2007) 51–78. [7] Y. Pico, M. la Farre, C. Soler, D. Barcelo, Identification of unknown pesticides in fruits using ultra-performance liquid chromatography-quadrupole time-of-flight mass spectrometry. Imazalil as a case study of quantification, J. Chromatogr. A 1176 (2007) 123–134. [8] M. Farre, Y. Pico, D. Barcelo, Application of ultra-high pressure liquid chromatography linear ion-trap orbitrap to qualitative and quantitative assessment of pesticide residues, J. Chromatogr. A 1328 (2014) 66–79. [9] M. Ernst, D.B. Silva, R.R. Silva, R.Z. Vencio, N.P. Lopes, Mass spectrometry in plant metabolomics strategies: from analytical platforms to data acquisition and processing, Nat. Prod. Rep. 31 (2014) 784–806. [10] W. Jiang, Y. Qiu, Y. Ni, M. Su, W. Jia, X. Du, An automated data analysis pipeline for GC-TOF-MS metabonomics studies, J. Proteome Res. 9 (2010) 5974–5981. [11] Y. Ni, Y. Qiu, W. Jiang, K. Suttlemyre, M. Su, W. Zhang, W. Jia, X. Du, ADAPGC 2.0: deconvolution of coeluting metabolites from GC/TOF-MS data for metabolomics studies, Anal. Chem. 84 (2012) 6619–6629. [12] H. Tsugawa, M. Arita, M. Kanazawa, A. Ogiwara, T. Bamba, E. Fukusaki, MRMPROBS: a data assessment and metabolite identification tool for large-scale multiple reaction monitoring based widely targeted metabolomics, Anal. Chem. 85 (2013) 5191–5199. [13] C.D. Broeckling, I.R. Reddy, A.L. Duran, X. Zhao, L.W. Sumner, MET-IDEA: data extraction tool for mass spectrometry-based metabolomics, Anal. Chem. 78 (2006) 4334–4341. [14] C. Kuhl, R. Tautenhahn, C. Böttcher, T.R. Larson, S. Neumann, CAMERA. An integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets, Anal. Chem. 84 (2011) 283–289. [15] X. Wei, W. Sun, X. Shi, I. Koo, B. Wang, J. Zhang, X. Yin, Y. Tang, B. Bogdanov, S. Kim, Z. Zhou, C. McClain, X. Zhang, MetSign. A computational platform for highresolution mass spectrometry-based metabolomics, Anal. Chem. 83 (2011) 7668–7675. [16] A. Lommen, MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing, Anal. Chem. 81 (2009) 3079–3086. [17] H.P. Benton, D.M. Wong, S.A. Trauger, G. Siuzdak, XCMS2. Processing tandem mass spectrometry data for metabolite identification and structural characterization, Anal. Chem. 80 (2008) 6382–6389. [18] C.A. Smith, E.J. Want, G. O’Maille, R. Abagyan, G. Siuzdak, XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification, Anal. Chem. 78 (2006) 779–787. [19] R. Tautenhahn, G.J. Patti, E. Kalisiak, T. Miyamoto, M. Schmidt, F.Y. Lo, J. McBee, N.S. Baliga, G. Siuzdak, metaXCMS: second-order analysis of untargeted metabolomics data, Anal. Chem. 83 (2010) 696–700. [20] R. Tautenhahn, G.J. Patti, D. Rinehart, G. Siuzdak, X.C.M.S Online, A web-based platform to process untargeted metabolomic data, Anal. Chem. 84 (2012) 5035–5039.

Please cite this article in press as: Y.-J. Yu, et al., A new chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis, J. Chromatogr. A (2015), http://dx.doi.org/10.1016/j.chroma.2015.04.029

G Model CHROMA-356449; No. of Pages 9

ARTICLE IN PRESS Y.-J. Yu et al. / J. Chromatogr. A xxx (2015) xxx–xxx

[21] S. Chen, H. Kong, X. Lu, Y. Li, P. Yin, Z. Zeng, G. Xu, Pseudotargeted metabolomics method and its application in serum biomarker discovery for hepatocellular carcinoma based on ultra high-performance liquid chromatography/triple quadrupole mass spectrometry, Anal. Chem. 85 (2013) 8326–8333. [22] M. Daszykowski, Y. Vander Heyden, C. Boucon, B. Walczak, Automated alignment of one-dimensional chromatographic fingerprints, J. Chromatogr. A 1217 (2010) 6127–6133. [23] S.S. Fong, P. Rearden, C. Kanchagar, C. Sassetti, J. Trevejo, R.G. Brereton, Automated peak detection and matching algorithm for gas chromatographydifferential mobility spectrometry, Anal. Chem. 83 (2011) 1537–1546. [24] H. Gu, G.A.N. Gowda, F.C. Neto, M.R. Opp, D. Raftery, RAMSY. Ratio analysis of mass spectrometry to improve compound identification, Anal. Chem. 85 (2013) 10771–10779. [25] P. Jonsson, A.I. Johansson, J. Gullberg, J. Trygg, J.A.B. Grung, S. Marklund, M. Sjöström, H. Antti, T. Moritz, High-throughput data analysis for detecting and identifying differences between samples in GC/MS-based metabolomic analyses, Anal. Chem. 77 (2005) 5635–5642. [26] X. Li, X. Lu, J. Tian, P. Gao, H. Kong, G. Xu, Application of fuzzy c-means clustering in data analysis of metabolomics, Anal. Chem. 81 (2009) 4468–4475. [27] Y. Li, Q. Ruan, G. Ye, X. Lu, X. Lin, G. Xu, A novel approach to transforming a non-targeted metabolic profiling method to a pseudo-targeted method using the retention time locking gas chromatography/mass spectrometry-selected ions monitoring, J. Chromatogr. A 1255 (2012) 228–236. [28] X. Liu, Z. Ser, J.W. Locasale, Development and quantitative evaluation of a highresolution metabolomics technology, Anal. Chem. 86 (2014) 2175–2184. [29] T.D. Mak, E.C. Laiakis, M. Goudarzi, A.J. Fornace, MetaboLyzer. A novel statistical workflow for analyzing postprocessed LC–MS metabolomics data, Anal. Chem. 86 (2013) 506–513. [30] N.K.N. Neumann, S.M. Lehner, B. Kluger, C. Bueschl, K. Sedelmaier, M. Lemmens, R. Krska, R. Schuhmacher, Automated LC-HRMS(/MS) approach for the annotation of fragment ions derived from stable isotope labeling-assisted untargeted metabolomics, Anal. Chem. 86 (2014) 7320–7327. [31] A. Nordström, G. O’Maille, C. Qin, G. Siuzdak, Nonlinear data alignment for UPLC-MS and HPLC-MS based metabolomics: quantitative analysis of endogenous and exogenous metabolites in human serum, Anal. Chem. 78 (2006) 3289–3295. [32] M. Sturm, O. Kohlbacher, TOPPView: an open-source viewer for mass spectrometry data, J. Proteome Res. 8 (2009) 3760–3763. [33] A.A. Vaughan, W.B. Dunn, J.W. Allwood, D.C. Wedge, F.H. Blackhall, A.D. Whetton, C. Dive, R. Goodacre, Liquid chromatography–mass spectrometry calibration transfer and metabolomics data fusion, Anal. Chem. 84 (2012) 9848–9857. [34] X. Wei, X. Shi, S. Kim, L. Zhang, J.S. Patrick, J. Binkley, C. McClain, X. Zhang, Data preprocessing method for liquid chromatography–mass spectrometry based metabolomics, Anal. Chem. 84 (2012) 7963–7971. [35] A.-h. Zhang, H. Sun, Y. Han, G.-l. Yan, Y. Yuan, G.-c. Song, X.-x. Yuan, N. Xie, X.j. Wang, Ultraperformance liquid chromatography–mass spectrometry based comprehensive metabolomics combined with pattern recognition and network analysis methods for characterization of metabolites and metabolic pathways from biological data sets, Anal. Chem. 85 (2013) 7606–7612. [36] W. Zhang, J. Chang, Z. Lei, D. Huhman, L.W. Sumner, P.X. Zhao, MET-COFEA. A liquid chromatography/mass spectrometry data processing platform for metabolite compound feature extraction and annotation, Anal. Chem. 86 (2014) 6245–6253. [37] T.G. Bloemberg, J. Gerretzen, A. Lunshof, R. Wehrens, L.M.C. Buydens, Warping methods for spectroscopic and chromatographic signal alignment: a tutorial, Anal. Chim. Acta 781 (2013) 14–32. [38] S.A. Bortolato, J.A. Arancibia, G.M. Escandar, A.C. Olivieri, Time-alignment of bidimensional chromatograms in the presence of uncalibrated interferences

[39]

[40]

[41]

[42] [43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51] [52]

[53]

[54]

[55]

9

using parallel factor analysis: application to multi-component determinations using liquid-chromatography with spectrofluorimetric detection, Chemom. Intell. Lab. Syst. 101 (2010) 30–37. C. Christin, H.C.J. Hoefsloot, A.K. Smilde, F. Suits, R. Bischoff, P.L. Horvatovich, Time alignment algorithms based on selected mass traces for complex LC-MS data, J. Proteome Res. 9 (2010) 1483–1495. M. Daszykowski, Y. Vander Heyden, C. Boucon, B. Walczak, Automated alignment of one-dimensional chromatographic fingerprints, J. Chromatogr. A 1217 (2010) 6127–6133. B. Debrus, P. Lebrun, A. Ceccato, G. Caliaro, B. Govaerts, B.A. Olsen, E. Rozet, B. Boulanger, P. Hubert, A new statistical method for the automated detection of peaks in UV-DAD chromatograms of a sample mixture, Talanta 79 (2009) 77–85. K.H. Jarman, D.S. Daly, K.K. Anderson, K.L. Wahl, A new approach to automated peak detection, Chemom. Intell. Lab. Syst. 69 (2003) 61–76. I. Latha, S.E. Reichenbach, Q. Tao, Comparative analysis of peak-detection techniques for comprehensive two-dimensional chromatography, J. Chromatogr. A 1218 (2011) 6792–6798. J.S. Nadeau, B.W. Wright, R.E. Synovec, Chemometric analysis of gas chromatography–mass spectrometry data using fast retention time alignment via a total ion current shift function, Talanta 81 (2010) 120–128. H. Parastar, N. Akvan, Multivariate curve resolution based chromatographic peak alignment combined with parallel factor analysis to exploit second-order advantage in complex chromatographic measurements, Anal. Chim. Acta 816 (2014) 18–27. H. Parastar, M. Jalali-Heravi, R. Tauler, Comprehensive two-dimensional gas chromatography (GC × GC) retention time shift correction and modeling using bilinear peak alignment, correlation optimized shifting and multivariate curve resolution, Chemom. Intell. Lab. Syst. 117 (2012) 80–91. W. Struck, P. Wiczling, M. Waszczuk-Jankowska, R. Kaliszan, M.J. Markuszewski, New supervised alignment method as a preprocessing tool for chromatographic data in metabolomic studies, J. Chromatogr. A 1256 (2012) 150–159. W. Yao, X. Yin, Y. Hu, A new algorithm of piecewise automated beam search for peak alignment of chromatographic fingerprints, J. Chromatogr. A 1160 (2007) 254–262. W. Yu, B. Wu, N. Lin, K. Stone, K. Williams, H. Zhao, Detecting and aligning peaks in mass spectrometry data with applications to MALDI, Comput. Biol. Chem. 30 (2006) 27–38. Y.J. Yu, H.L. Wu, J.F. Niu, J. Zhao, Y.N. Li, C. Kang, R.Q. Yu, A novel chromatographic peak alignment method coupled with trilinear decomposition for three dimensional chromatographic data analysis to obtain the second-order advantage, Analyst 138 (2013) 627–634. Z.-M. Zhang, Y.-Z. Liang, H.-M. Lu, B.-B. Tan, X.-N. Xu, M. Ferro, Multiscale peak alignment for chromatographic datasets, J. Chromatogr. A 1223 (2012) 93–106. G. Tomasi, F. van den Berg, C. Andersson, Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data, J. Chemom. 18 (2004) 231–241. Y.-J. Yu, Q.-L. Xia, S. Wang, B. Wang, F.-W. Xie, X.-B. Zhang, Y.-M. Ma, H.-L. Wu, Chemometric strategy for automatic chromatographic peak detection and background drift correction in chromatographic data, J. Chromatogr. A 1359 (2014) 262–270. L. Zhang, X. Wang, J. Guo, Q. Xia, G. Zhao, H. Zhou, F. Xie, Metabolic profiling of Chinese tobacco leaf of different geographical origins by GC-MS, J. Agric. Food Chem. 61 (2013) 2597–2605. Z. Lei, H. Li, J. Chang, P.X. Zhao, L.W. Sumner, MET-IDEA version 2.06; improved efficiency and additional functions for mass spectrometry-based metabolomics data processing, Metabolomics 8 (2012) 105–110.

Please cite this article in press as: Y.-J. Yu, et al., A new chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis, J. Chromatogr. A (2015), http://dx.doi.org/10.1016/j.chroma.2015.04.029

A chemometric-assisted method based on gas chromatography-mass spectrometry for metabolic profiling analysis.

An automatic and efficient data analysis method for comprehensive metabolic profiling analysis is urgently required. In this study, a new chemometric-...
3MB Sizes 0 Downloads 7 Views