Journal of Chromatography A, 1364 (2014) 241–248

Contents lists available at ScienceDirect

Journal of Chromatography A journal homepage: www.elsevier.com/locate/chroma

Advanced scripting for the automated profiling of two-dimensional gas chromatography-time-of-flight mass spectrometry data from combustion aerosol夽 Benedikt A. Weggler a,b,c,1 , Thomas Gröger a,b,∗ , Ralf Zimmermann a,b,c,1 a

Joint Mass Spectrometry Centre, Cooperation Group “Comprehensive Molecular Analytics” Helmholtz Zentrum Muenchen, D85764 Neuherberg, Germany Joint Mass Spectrometry Centre, Institute of Chemistry, Chair of Analytical Chemistry UNiversity of Rostock, D18057 Rostock, Germany c Helmholtz Virtual Institute of Complex Molecular Systems in Environmental Health – Aerosol and Health (HICE) b

a r t i c l e

i n f o

Article history: Received 21 March 2014 Received in revised form 28 July 2014 Accepted 27 August 2014 Available online 1 September 2014 Keywords: Comprehensive two dimensional gas chromatography Data mining Ship fuel combustion Aerosol

a b s t r a c t Multidimensional gas chromatography is an appropriate tool for the non-targeted and comprehensive characterisation of complex samples generated from combustion processes. Particulate matter (PM) emission is composed of a large number of compounds, including condensed semi-volatile organic compounds (SVOCs). However, the complex amount of information gained from such comprehensive techniques is associated with difficult and time-consuming data analysis. Because of this obstacle, twodimensional gas chromatography still receives relatively little use in aerosol science [1–4]. To remedy this problem, advanced scripting algorithms based on knowledge-based rules (KBRs) were developed in-house and applied to GCxGC-TOFMS data. Previously reported KBRs and newer findings were considered for the development of these algorithms. The novelty of the presented advanced scripting tools is a notably selective search criterion for data screening, which is primarily based on fragmentation patterns and the presence of specific fragments. Combined with “classical” approaches based on retention times, a fast, accurate and automated data evaluation method was developed, which was evaluated qualitatively and quantitatively for type 1 and type 2 errors. The method’s applicability was further tested for PM filter samples obtained from ship fuel combustion. Major substance classes, including polycyclic aromatic hydrocarbons (PAH), alkanes, benzenes, esters and ethers, can be targeted. This approach allows the classification of approximately 75% of the peaks of interest within real PM samples. Various conditions of combustion, such as fuel composition and engine load, could be clearly characterised and differentiated. © 2014 Elsevier B.V. All rights reserved.

1. Introduction 1.1. Aerosol and health/HICE Epidemiological studies indicate that fine air particulate matter and traffic-related air pollution are correlated with severe health effects, including increased mortality and prevalence of cardiovascular and respiratory diseases and allergies [5]. These studies also

夽 Presented at the 13th International Symposium on Hyphenated Techniques in Chromatography and Separation Technology, Bruges, Belgium, 29–31 January 2014. ∗ Corresponding author at: Kooperationsgruppe “Comprehensive Molecular Analytics (CMA)”, Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Ingolstädter Landstraße 1, 85764 Oberschleißheim/Deutschland, Germany. Tel.: +49 89 3187 4554; fax: +49 89 3187 3371. E-mail address: [email protected] (T. Gröger). URL: http://www.helmholtz-muenchen.de/en/neu-cma/ (T. Gröger). 1 www.hice-vi.eu. http://dx.doi.org/10.1016/j.chroma.2014.08.091 0021-9673/© 2014 Elsevier B.V. All rights reserved.

support the hypothesis that both the physical (particle size, shape, surface) and chemical (dissolved and adsorbed chemicals, surface catalytic reactions) properties of the particles are involved in the toxic, genotoxic and carcinogenic mechanisms of inhaled particulates. In the Helmholtz Virtual Institute of Complex Molecular Systems in Environmental Health (HICE), local and global competencies in the fields of analytical chemistry, aerosol science and chemical toxicology are joined to investigate the long-term effects of environmentally related diseases. One of the main research points is the investigation of reactive organic compounds in environmental aerosols and the synergistic effects of the gaseous and particulate phases. Many of these effects are supposed to be induced by the organic PM compounds or corresponding ageing products. Unfortunately, the elucidation of the organic fraction is rather complex. Depending on the sampling sites, the contribution of organic material is between 20 and 90% of the total aerosol fine PM mass [6] and often consists of several thousand compounds. For a

242

B.A. Weggler et al. / J. Chromatogr. A 1364 (2014) 241–248

better understanding of aerosol-related health issues and to link the biological response to specific compounds, a non-targeted, comprehensive screening of the aerosol components, especially the organic fraction, is needed. Hyphenated techniques with great separation power, such as two-dimensional gas chromatography-time-of-flight mass spectrometry (GCxGC-TOFMS), are well suited for aerosol research. However, these techniques suffer from time-consuming analysis due to the vast amount of data obtained using non-targeted comprehensive screening. To address this issue, automated compound classifications (ACCs) were further developed and applied.

ratios. The use of this algorithm leads to good results for compounds with strong and abundant molecular ions and distinct isotopic patterns. Nevertheless, this approach always produces “molecular ions”, even for substances with weak or absent molecular ions, e.g., alcohols. Due to the complexity and the massive drawback of always producing molecular ions, this approach is limited to a small number of compounds and is not yet well known or commonly used. This paper presents a fast automated approach to classifying substances in large two-dimensional GC-TOFMS datasets in a nontargeted, comprehensive way.

1.2. State of the art

2. Material and methods

With the advent of one-dimensional GC–MS techniques, various chemometric data-mining techniques for mass spectral information have been developed. The commonly used compound classification in GCxGC-TOFMS is not applied directly. Usually, a NIST or in-house library search is performed, and substances are identified according to the spectral matching factors and retention times. Depending on the identification, the compound classes can be assigned. However, this indirect method is not suitable for untargeted screening in complex matrices with thousands of compounds and a large variety of compound classes. The first of these approaches are reviewed in [7]. Chemometric classifiers are of special interest for ACC. Based on the properties of the mass spectra, it is possible to determine substructures based on fragment prediction. Attempts to use such classifiers for twodimensional data are published in [8]. However, the development of chemometrics classifiers is rather complex. It is also mentioned in [8] that the substances classified in this way exhibit poor matching factors with the confirmation system used. Further possibilities, such as retention time prediction according to quantitative structure-retention relationship (QSRR) in GCxGC-TOFMS, have been reported by D’Archivio et al. [9] and Kiralj et al. [10]. The approaches described above are mainly based on information that is directly accessible from the mass spectra. There are two main principles in ACC. To classify a peak, Welthagen et al. and Vogt et al. [1,8,11] used so-called prominent features, e.g., m/z 74 as the base peak and m/z 87 with an abundance of at least 40% of the base peak’s abundance for the substance class of n-alkane acid methyl esters. This approach leads to easily implementable algorithms for some substance classes. For two-dimensional GCxGC-TOFMS data, the most common way to apply this approach is using the LECO ChromaTOF® scripting and classification tool, which are part of a special package not included in the main basic software distribution. However, even in the recent literature, this approach has only yielded rough algorithms for the characterisation of, e.g., tobacco smoke and fuel products [12,13]. Despite the simplicity of this approach, the accuracy of the delivered results is insufficient. To refine these algorithms, the retention regions have to be drawn directly and manually in the chromatogram. Depending on the extent of the differentiation, this demands sophisticated, artistic drawing skills and must be performed for every matrix. Nevertheless, warping and overlapping compounds with similar prominent features are still classified in this manner. It also must be mentioned that the retention shift between chromatographic runs also influences the accuracy and usage of these scripts. The second main approach was published in 2010 by Hilton et al. [14]. Hilton uses a complex algorithm, based on textbook criteria, to determine the molecular ion in the mass spectrum. Based on the molecular ion, they were able to identify chlorinated and brominated compounds out of 10,000 substances based on the isotopic

2.1. Sampling Anthropogenic aerosols were generated during the HICE field campaign in 2012. For this study, a single-cylinder diesel research engine was provided by the Institute of Piston Machines and Internal Combustion Engines in Rostock. It was operated with diesel fuel (LFO) according to DIN EN 5900 and heavy fuel oil 188 (HFO). Publications containing further information on the HICE Project and the related field campaigns are in submission. PM2.5 filter samples for off-line analyses were obtained with a modified speciation sampler (Rupprecht & Patashnik 2300, Thermo Scientific, Waltham, USA), which was able to sample four filters in parallel. The samples were collected after a first dilution step via a porous tube diluter directly followed by the first ejector diluter. This process results in an average dilution ratio of 40 for LFO feedstock and 100 for HFO feedstock. PM samples were collected on quartz fibre filters (QFF, T293, Munktell, Sweden) and PTFE membrane filters (PFF, Zefluor 1 ␮m, Pall, USA) for each of the experiments. The QFF were conditioned by baking at 500 ◦ C for at least 12 h and then stored in sealed glass containers until sampling. After collection of PM, the filter samples were immediately frozen at −25 ◦ C and stored at this temperature until analysis. 2.2. Extraction The filter samples were extracted in an ultrasonic bath using 25 mL of methanol/dichloromethane (50/50, v/v) for 15 min. The extracts from two extraction steps were combined, filtered (PTFE filter, 0.2 ␮m, 25 mm, VWR, Germany) and divided into three identical aliquots for targeted and non-targeted analyses. 2.3. Chromatography For the non-targeted analysis by GCxGC-TOFMS, 10 mL of the obtained extract was reduced to 500 ␮L and infused with 10 ␮L of isotope-labelled internal standard solution (acenaphthene d10, benz(a)anthracene d12, benzo(b)fluoranthene d12, chrysene d12, fluoranthene d12, fluorene d10, perylene d12, phenanthrene d10, pyrene d10; each 0.3 mg L−1 ; Cambridge Isotope Laboratories, Inc., USA). GCxGC measurements were carried out using an Agilent 6890 gas chromatograph equipped with a LECO Pegasus 4D detector, including a Pegasus III TOF-MS (LECO, USA) using helium as the carrier gas according to Welthagen et al. [1]. An Optic 3 inlet system (ATAS GL, Netherlands) was used. Data collection was performed using LECO ChromaTOF® software v. 2.00. Chromatographic separation was performed on a 60 m × 0.25-mm i.d. × 0.25-␮m df BPX5 (SGE, Australia) column and then a 1.9 m × 0.1-mm i.d. × 0.10-␮m df BPX50 (SGE, Australia) column. The modulation period was adjusted to 3 s with a modulator temperature offset of 75 ◦ C. The oven was heated from 40 ◦ C to 160 ◦ C at 15 K/min followed by a temperature ramp of 2 ◦ C min−1 to 320 ◦ C. In addition, 18-␮L splitless injection with temperature-controlled evaporation was used.

B.A. Weggler et al. / J. Chromatogr. A 1364 (2014) 241–248

243

Fig. 2. Fragmentation pathway for n-alkane acid methyl ester. According to [13], two ion series are possible, and two typical fragments, m/z 59 and m/z 31, are produced in EI mode.

Fig. 1. Chromatogram of an aerosol extract generated from ship fuel combustion with HFO as feedstock. This chromatogram consists of approximately 4500 peaks, shown here as a bubble plot. For better visualisation, the chromatogram is shifted by 1.5 s in the second dimension.

The initial temperature was 70 ◦ C, which was increased to 300 ◦ C at a rate of 10 ◦ C/min and held until the end of the run. Three repetitions (technical replicates) were performed for each analytical sample. 2.4. Data processing The data were processed and analysed using ChromaTOF® v. 4.44.0.0. Automated peak finding with subsequent mass deconvolution was applied to create lists of peaks with a signal-to-noise ratio (S/N) of greater than 50. The baseline offset was set to 0.8 without peak smoothing. The in-house-written algorithms for compound classification were applied using the implemented scripting feature. Fig. 1 shows a typical chromatogram with these settings. This procedure allowed the deconvolution of approximately 4500 peaks (here represented using a bubble plot). 2.4.1. Scripting Utilising recognisable features in the fragmentation patterns (e.g., the presence of specific masses, specific abundance ratios, specific fragment losses, etc.), the scripting feature embedded in the LECO ChromaTOF® software facilitates the classification of chromatographic peaks. The algorithms used in this feature, called scripts, are written in Microsoft® VBScript language. Scripting is not a matter of software. To obtain functioning algorithms, it is necessary to have properly separated or deconvoluted peaks with clean spectra. Therefore, scripting is not bound to specific software and can be performed, to our knowledge, using various computer programmes, e.g., Excel and Matlab, as previously reported [1]. Full integration in a software package has so far been achieved by the ChromaTOF® software and the GCImageTM software to date. We hereinafter only address the application of the algorithms using ChromaTOF® software. Brokl et al. [12] notes some accuracy issues with the scripts published to date. The accuracy of such scripts can be improved by adding more specific features, such as ion series (fragmentation patterns) or molecular ion information. To visualise the classified peaks, a bubble plot representation was used. In ChromaTOF® software, the peak bubbles are centred on peak apexes and superimposed on a GC×GC contour plot. Bubble radii correspond to the relative area of the peak represented. Each class of compounds can be assigned a different colour, allowing fast visual analysis of group-type classifications. Peaks classified in

more than one group are filled with stripes of the corresponding group colours. The developed scripts can be found in the supplemental material. 2.4.1.1. Scripting based on fragmentation patterns. The method described in Section 2.3 results in a chromatogram containing several thousand peaks with corresponding mass spectra. ACC was applied to manage these datasets in a non-targeted, comprehensive way. To this end, three basic steps for mass spectral interpretation were defined: the molecular ion is investigated, prominent features are identified and specific fragments and ion series are searched for. These steps are based on manual interpretation of the mass spectra. Both of the common approaches for ACC utilise at least one of them. However, ACC utilising fragmentation patterns and ion series or a combination thereof has not been reported so far. Fig. 2 shows the fragmentation pathway for n-alkane acid methyl esters according to McLafferty [15]. According to the literature, the prominent features are m/z 74 as a base peak and m/z 87 with at least 80% of the abundance of the base peak [1]. To include the fragmentation pathway as a classification criterion, a looping structure that searches for the fragmentation series (pathway a) was implemented. According to [15], the ion series should be obtained as products of the decarboxylation of acylium ions. Therefore, the algorithm adds 28 to the possible fragment of the series to check whether the investigated fragment is a plausible decarboxylation product. Fig. 3 shows the core part of the script for n-alkane acid methyl esters and the resulting chromatogram. For visualisation, all non-classified peaks were filtered out. For other classes, e.g., linear saturated hydrocarbons, specific regions were added to the chromatographic plane to enhance the accuracy of the scripts by reducing the false hit rates. 2.4.1.2. Scripting based on molecular ions. For some substance classes, e.g., PAH and cyclic hydrocarbons, the approach described in Section 2.4.1.1 is not suitable due to lower fragmentation and unclear fragmentation patterns or series. In this case, other criteria were considered. The phenanthrenes/anthracenes substance class is exemplarily used to describe the utilisation of the molecular ion algorithm of Donald Hilton in more detail. In general, this algorithm is split into two major parts. The first (base) part identifies possible phenanthrenes and anthracenes according to the m/z of the molecular ion and whether the molecular ion is the base peak or at least of significant abundance. The resulting chromatogram (filtered) and the base part of the algorithm are shown in Fig. 4. The second part of the algorithm uses retention information to discriminate between different methylation grades of the classified substances. The application of the second part is shown in Figure A5, which can be found in the supplemental material. This

244

B.A. Weggler et al. / J. Chromatogr. A 1364 (2014) 241–248

Fig. 3. Chromatogram of an aerosol sample generated by HFO combustion. The script for n-alkane acid methyl esters was applied after basic processing. No regional information was used during the classification process. The chromatogram shown is filtered so that only the results of the applied script are shown. For better visualisation, the chromatogram is shifted by 1.5 s in the second dimension. The here depicted code represents only a small portion of the script for n alkane acid methylester. This part illustrates how the search for the described ion series is performed.

should be assigned to the substance class after manual revision (null hypothesis is false). Further division into two categories is also necessary. Substances are either defined compounds, in which the exact number of the possible components is known, or undefined compounds, in which the exact number is not known. Defined substances, such as homologues, were checked for completeness. Missing homologues were counted as FNs and additional peaks as FPs. Assuming that chromatographically adjacent substances belong to the same or similar classes of substances, the probability for misclassification in the same chromatographic class is increased. To investigate the accuracy of the scripts, the retention times for all marked substances of a single class were averaged to generate a fictitious peak. Around this peak, a chromatographic window was defined with ±120 s in the RT1 direction and ±0.15 s in the RT2 direction. All peaks in this chromatographic window were manually reviewed. Figure A6, which can be found in the supplemental material, illustrates this procedure. 3. Results/discussion In the following, the results of the scripting evaluation are shown and discussed critically. A particular focus is the accuracy of the introduced algorithms compared with previously published ACC algorithms. In addition, the reproducibility of the method and the transferability of the developed scripts are assessed. 3.1. Comparison of different published scripting approaches

Fig. 4. Scripting based on the molecular ion shown here for phenanthrenes and anthracenes. The algorithm (a) identifies the phenanthrenes/anthracenes based on the molecular ion and (b) distinguishes between the degrees of the methylation grade using the retention information. The here depicted code represents only a small portion of the script for phenantrene and anthracene and methylated homologs. This part illustrates very clearly how the search for the molecular ion series is performed.

approach allows differentiation according to the methylation grade in an untargeted way. 2.4.2. Evaluation of automated compound classification To quantify the accuracy of the algorithms, the compounds identified were evaluated according to type 1 and type 2 errors. In this way, type 1 errors (null hypothesis is true but rejected) are false positive hits (FPs), and type 2 errors (null hypothesis is false but fails to be rejected) are false negative hits (FNs). The null hypothesis in this case would be “the substance does not belong to the class”. Here, FPs are meant to be substances that, despite being classified by the script (null hypothesis rejected), should not be assigned to the corresponding class of substances (null hypothesis is true) after manual revision based upon the chromatographic position or abnormalities in the mass spectrum. In contrast, FNs are substances not classified by the script (null hypothesis fails to be rejected) but

For the evaluation of the newly developed scripts, a direct comparison using the same raw data would be indicated. However, this is only meaningful for scripts with an identical objective, e.g., the same compound class or subclass. The only specific compound class found in all previous publications [1,9,11] that is also present in our approach is the algorithm for linear hydrocarbons. For all other substance classes, the scripts presented herein are more specific (including subclasses) and can therefore not be compared directly to approaches only addressing main classes. To compare the accuracy of the different approaches, the individual algorithms for linear hydrocarbons were applied to a LFO-generated aerosol sample. In Table 1, the outcome of the comparative study is shown. The differences for the scripts of linear hydrocarbons from C14 to C29 are listed. These carbon numbers were chosen because the GCxGCTOFMS method delivers good results in this range. Further below C14, the chromatogram is loaded with artefacts, and the method is not well suited for higher-boiling-point substances. For each script, the same chromatographic region was used. Fig. 5 shows the resulting chromatograms after application of the different scripts. Every approach was capable of recognising all linear hydrocarbons; hence, no approach produced false negatives, which are equivalent to type 1 errors (the null hypothesis (H0 ) is true but rejected, with the H0 substance belonging to the substance class of linear hydrocarbons). Due to the high concentration of those compounds, the deconvolution works well, and even determination by prominent features is sufficient. It can also be observed that even when using retention information, the script based exclusively on prominent features published by Welthagen et al. [1] yields the most FP hits. This indicates that the selectivity of prominent features alone is, in this case, not sufficient for accurate ACC. Using retention information in addition to the prominent features, the quality of the classification is increased significantly. The algorithm published by Lissitsyna et al. [13] yields slightly better results. This can be explained by the addition of specific molecular fragments as a criterion in the algorithm. The best classification performance is

B.A. Weggler et al. / J. Chromatogr. A 1364 (2014) 241–248

245

Table 1 The accuracy of the here developed algorithms compared to the accuracy of the already published algorithms applied to the chromatogramm obtained for LFO combustion. Parameter

Script by Welthagen et al.

Script by Welthagen et al. Regio

Script by Lissitsyna et al.

Script by Lissitsyna et al. Regio

Script developed for this study

Script developed for this study Regio

Peaks FP FN False classified [%]

134 118 0 88

66 50 0 75

114 98 0 85

34 18 0 52

57 41 0 71

20 4 0 20

obtained by the scripts introduced herein, which are mainly based on fragmentation pathways. In the case of linear hydrocarbons, this represents the search for the ion series Cn H2n+1 + . The difference from the prominent features approach lies in the search for complete ion series and specific mass fragments instead of only one or two mass/charge ratios. A manual revision of the four FPs leads to spectra that fit the required search criterion of the algorithm. A library search using the NIST library results in library hits that suggest linear hydrocarbons with a similarity of greater 850 each. It must also be mentioned that the peaks classified as false positives are located in the front part of the chromatogram. The chance of producing false hits in this region is higher because of the occurrence and overlaps of low-boiling-point substances; as the boiling point increases, the complexity of the sample decreases significantly. Furthermore, the fragmentation for lower-saturation hydrocarbons is less distinct than that for higher-saturation hydrocarbons. The fragmentation is strongly influenced by the chain length.

3.2. Accuracy for the developed algorithm types For the ACC approach, three types of algorithms were developed. Each uses a slightly different basis for the classification: ion series, molecular ion or both. To determine the accuracy of the algorithms, six representative substance classes were considered in particular: linear and branched saturated hydrocarbons, linear and branched methyl esters and unsubstituted and substituted phenanthrene/anthracenes. These compound classes were considered because each uses a different type of scripting. For other compound classes, one of those three algorithm types was applied. The separate basis for those algorithms is described in Table 2, as were the other compound classes for which they were used. Assuming the same behaviour for each of these algorithm types, the accuracy analysis here is limited to those six classes, and the found result is also assumed for the other developed compound classes.

Fig. 5. (A1) Script developed herein using a fragmentation pattern. (A2) Script developed herein using regional information. (B1) Script developed by Welthagen et al. (B2) Script developed by Welthagen et al. using retention information. The region used for each script can be found in the supplemental material.

246

B.A. Weggler et al. / J. Chromatogr. A 1364 (2014) 241–248

Table 2 Basis for the chosen scripts and the application for other compound classes. Main type

Compound class

Defined

Script based on

Applied for

Hydrocarbon

Linear, saturated Branched, saturated Linear, saturated methylester Branched, saturated methylester Phenantrene/anthracene unsubst. Phenantrene/anthracene subst.

Yes No Yes No Yes No

Ion series, retention information

Saturated and unsaturated hydrocarbons Heterocompound containing substances, except hetero PAH Subst. and unsubst. PAH

Ester PAH

Ion series, retention information specific fragments Molecular ion, abundance distribution

Table 3 FPs and FNs for the six substance classes for measurements of Aerosols generated from LFO|HFO combustion. + in case of defined compounds this number represents the classified peaks; for undefined compounds (*) it represents the number of investigated peaks. # basically here should only occur two peaks but due to the high concentration of phenanthrenes and anthracenes the phenanthrene peak is picked two times with the data processing settings described in Section 2.4.1. Parameter

Linear HC LFO|HFO

Branched HC* LFO|HFO

Linear ME LFO|HFO

Branched ME* LFO|HFO

PhenAnthra C0 LFO|HFO

PhenAnthra Cn * LFO|HFO

Peaks+ FP FN False classified peaks in %

20|21 4|5 0|0 20|22

72|81 3|2 1|4 6|7

17|16 3|4 1|1 23|31

123|173 8|6 1|1 7|4

3# |3# 0|0 0|0 0|0

25|122 0|8 1|0 4|7

In Table 3, the total number of peaks classified is listed together with the results of manual revision. The six compound classes were investigated for HFO- and LFO-generated samples. It can be clearly observed that the variance (peak count) for all substances is below 10%. The percentage of false classified peaks for the HFO and LFO samples are within the same range.

the quantitative information very much. This finding indicates that the FP and FN peaks do not have large peak areas, which could also be assumed to be the reason for the inaccurate classification. Small peak areas are normally also accompanied by small peak heights and a low signal-to-noise ratio, leading to imprecise deconvoluted spectra. Furthermore, it can be seen in Fig. 7 that, with respect to

3.3. Transferability for different matrices The presented algorithms primarily use fragmentation behaviour for classification. Retention information is only used to improve the accuracy. This allows the algorithms to be readily transferable from one sample matrix to another. For all matrices tested so far, this expectation was completely fulfilled. For example, N-alkane acid methyl esters, 2-ketones and linear aldehydes could be distinguished in a cinnamon oil sample that mainly contains carboxylic and nitrogen compounds, with each of the classified peaks showing an entry in the NIST with a similarity greater than 830. 3.4. Method reproducibility

Fig. 6. Relative standard deviations calculated for the integrated compound classes for HFO- and LFO-generated samples based on the absolute number of peaks.

30% percentual share of the summed peak area

To evaluate the whole method, some points have to be considered. Assuming that the sample preparation, measurement and data evaluation are reproducible, the relative standard deviation for the substance classes should be low between analytical replicates. Here, analytical replicates are meant to be samples generated from QFF sampled on different days with the same engine settings. Each analytical replicate consists of three technical replicates using independent randomised injection of the aerosol extracts. In Fig. 6, the relative standard deviations based on the number of classified peaks for HFO- and LFO-generated samples are shown. For such compound classes as hydrocarbons and benzenes, a high standard deviation can be found among the HFO samples. This is a result of some of the technical replicates showing more FPs and FNs than others. However, evaluation of the GCxGC-TOFMS data is usually not based on peak numbers. For comparative analysis, quantitative information obtained from peak areas is more important. In Fig. 7, a box-and-whisker chart displays the percentage share of the cumulated peak areas for aerosol samples generated via HFO combustion (the box plot for LFO can be found in the supplemental material). Despite the high standard deviation, it can be observed that the boxes are relatively narrow and that the whiskers are not very distant. Thus, it can be said that the contribution of FP and FN peaks in technical replications does not affect

25%

20% 15% 10% 5% 0% Carbonyl

Hydrocarbons

PAH

unsaturated

Benzene

Compound Class Fig. 7. Box-and-whisker chart for HFO-generated aerosol. Dots represent outliers.

B.A. Weggler et al. / J. Chromatogr. A 1364 (2014) 241–248

247

4. Conclusions/outlook

Fig. 8. Ratio for the peak count (pattern fill) and area ratio (solid fill) based on the emission data for the HFO- and LFO-generated samples.

the percentage share of the complete peak areas, the scattering between analytical repetitions is below 10% for all replicates and less than 5% for at least 2/3 of the replicates. 3.5. Application results based on peak count Based on the number of peaks classified, qualitative information on the composition of aerosols generated from HFO and LFO combustion can be obtained. Fig. 8 (pattern fill) shows the logarithmic ratio for the cumulative substance classes. It is easy to see from such ratio plots that there are approximately 5 times more peaks for, e.g., PAH in the HFO samples than in the LFO samples. The number of peaks for hydrocarbons seems to be the same for both sample matrices. For the HICE Project, dilution was needed for the combustion experiments. This is why normalised peak areas are usually considered for quantitative statements. Taking this dilution into account and assuming a linear relation between the dilution and mass spectral response in Fig. 8 (solid fill), the area ratios are shown. From this figure, it can be concluded that the most prevalent compounds in LFO exhaust are hydrocarbons and that those in HFO exhaust are benzenes.

Due to its high separation efficiency and peak capacity, among other useful properties, two-dimensional gas chromatography is well-suited for complex matrices. With regard to the identification of health-endangering substances and their assignment to a biological response, this technique should be considered more often in the field of aerosol research. The algorithms presented herein allow the analysis of significant amounts of data generated from combustion aerosol samples in a finite time with a high degree of automatisation. It has been shown that, compared to previously published approaches for ACC, the accuracy of the classification algorithms could be increased significantly by adding fragmentation patterns, molecular ion information and specific molecular fragments. In addition, the algorithms and the presented method provide high reproducibility and transferability. The most significant feature of this scripting approach is that it allows a non-targeted approach for complex samples that are usually investigated in a targeted way. The results of the introduced classification algorithm were also evaluated for false positive and false negative errors following an objective routine. A major outcome is the good accuracy of the proposed algorithm compared to earlier approaches. It was also found that classification errors have only a small effect on the quantitative information obtained for the substance classes. If implemented properly, this approach allows samples with over 4000 peaks to be screened in less than 20 min without a deep knowledge of the sample composition. Certainly only a small number of the possibilities of the scripting approach for ACC were touched upon in this work. Considering the isotopic ratios of certain molecular fragments would further enhance the selection rule for the classification of heteroatomic compounds containing compounds, as would the use of different ionisation techniques, such as photoionisation, to obtain access to higher-abundance molecular ions. Furthermore, the use of high-mass-resolution techniques for detection would provide even more powerful information to the ACC approach presented here by adding accurate mass information. This will be investigated in the following research work in more detail. Acknowledgements We would like to thank all our co-workers on the HICE Project, without whose help this work would not have been possible. We would also like to extend special thanks to Jürgen Orasche for his help in terms of particulate matter.

3.6. List of compounds and performance In Table A4 in the supplemental material, all of the compound classes for which algorithms were developed in this study are listed. With this algorithm, it is possible to classify over 50% of the peaks in the chromatograms generated from HFO and LFO combustion on average. Taking into account that the chromatographic area below pentadecane contains artefacts (such as blow-on and blow-off effects, which are normally neglected in SVOC analysis), the number of classified peaks increases to almost 75%. The entire data processing procedure, basic processing and classification, for a chromatogram consisting of 4500 peaks takes 32 min. Of those 32 min, only 18:30 was required for the classification process. Published scripts that depend on prominent features alone perform faster, as expected because those scripts search for only a few, defined characteristics of the spectra. To determine the affiliation of a peak, only few defined yes/no criteria have to be evaluated. In contrast, search for ion series requires every plausible peak in the mass spectra to be scanned until the abort criterion is fulfilled. In exchange, the accuracy of the results is significantly enhanced.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.chroma. 2014.08.091. References [1] W. Welthagen, J. Schnelle-Kreis, R. Zimmermann, Search criteria and rules for comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry analysis of airborne particulate matter, J. Chromatogr. A 1019 (2003) 233–249. [2] M. Kallio, M. Jussila, T. Rissanen, P. Anttila, K. Hartonen, A. Reissell, R. Vreuls, M. Adahchour, T. Hyotylainen, Comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry in the identification of organic compounds in atmospheric aerosols from coniferous forest, J. Chromatogr. A 1125 (2006) 234–243. [3] J.F. Hamilton, Using comprehensive two-dimensional gas chromatography to study the atmosphere, J. Chromatogr. Sci. 48 (2010) 274–282. [4] M.Z. Ozel, J.F. Hamilton, A.C. Lewis, New sensitive and quantitative analysis method for organic nitrogen compounds in urban aerosol samples, Environ. Sci. Technol. 45 (2011) 1497–1505.

248

B.A. Weggler et al. / J. Chromatogr. A 1364 (2014) 241–248

[5] U. Poschl, Atmospheric aerosols: composition, transformation, climate and health effects, Angew. Chem. Intl. Ed. 44 (2005) 7520–7540. [6] M. Kanakidou, J.H. Seinfeld, S.N. Pandis, I. Barnes, F.J. Dentener, M.C. Facchini, R. Van Dingenen, B. Ervens, A. Nenes, C.J. Nielsen, E. Swietlicki, J.P. Putaud, Y. Balkanski, S. Fuzzi, J. Horth, G.K. Moortgat, R. Winterhalter, C.E.L. Myhre, K. Tsigaridis, E. Vignati, E.G. Stephanou, J. Wilson, Organic aerosol and global climate modelling: a review, Atmos. Chem. Phys. 5 (2005) 1053–1123. [7] K. Varmuza, Chemometrics in mass spectrometry, Int. J. Mass Spectrom. Ion Processes 118–119 (1992) 811–823. [8] L. Vogt, T. Groger, R. Zimmermann, Automated compound classification for ambient aerosol sample separations using comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry, J. Chromatogr. A 1150 (2007) 2–12. [9] A.A. D‘Archivio, A. Incani, F. Ruggieri, Retention modelling of polychlorinated biphenyls in comprehensive two-dimensional gas chromatography, Anal. Bioanal. Chem. 399 (2011) 903–913.

[10] R. Kiralj, M.M.C. Ferreira, Is your QSAR/QSPR descriptor real or trash? J. Chemom. 24 (2010) 681–693. [11] T. Groger, W. Welthagen, S. Mitschke, M. Schaffer, R. Zimmermann, Application of comprehensive two-dimensional gas chromatography mass spectrometry and different types of data analysis for the investigation of cigarette particulate matter, J. Sep. Sci. 31 (2008) 3366–3374. [12] M. Brokl, L. Bishop, C.G. Wright, C. Liu, K. McAdam, J.F. Focant, Analysis of mainstream tobacco smoke particulate phase using comprehensive twodimensional gas chromatography time-of-flight mass spectrometry, J. Sep. Sci. 36 (2013) 1037–1044. [13] K. Lissitsyna, S. Huertas, L.C. Quintero, L.M. Polo, PIONA analysis of kerosene by comprehensive two-dimensional gas chromatography coupled to time of flight mass spectrometry, Fuel 116 (2014) 716–722. [14] D.C. Hilton, R.S. Jones, A. Sjodin, A method for rapid, non-targeted screening for environmental contaminants in household dust, J. Chromatogr. A 1217 (2010) 6851–6856. [15] F.W. McLafferty, Interpretation of Mass Spectra, University Science Books, 1993.

Advanced scripting for the automated profiling of two-dimensional gas chromatography-time-of-flight mass spectrometry data from combustion aerosol.

Multidimensional gas chromatography is an appropriate tool for the non-targeted and comprehensive characterisation of complex samples generated from c...
1MB Sizes 1 Downloads 6 Views