Journal of Magnetic Resonance 242 (2014) 67–78

Contents lists available at ScienceDirect

Journal of Magnetic Resonance journal homepage: www.elsevier.com/locate/jmr

Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids Mika Tiainen, Pasi Soininen, Reino Laatikainen ⇑ School of Pharmacy, University of Eastern Finland, P.O. Box 1627, FIN-70211 Kuopio, Finland

a r t i c l e

i n f o

Article history: Received 1 October 2013 Revised 24 December 2013 Available online 18 February 2014 Keywords: Quantum mechanical Quantitative NMR Spectral analysis QMSA Serum Mixtures Biofluids

a b s t r a c t The quantitative interpretation of 1H NMR spectra of mixtures like the biofluids is a demanding task due to spectral complexity and overlap. Complications may arise also from water suppression, T2-editing, protein interactions, relaxation differences of the species, experimental artifacts and, furthermore, the spectra may contain unknown components and macromolecular background which cannot be easily separated from baseline. In this work, tools and strategies for quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra from complex mixtures were developed and systematically assessed. In the present approach, the signals of well-defined, stoichiometric components are described by a QM model, while the background is described by a multiterm baseline function and the unknown signals using optimizable and adjustable lines, regular multiplets or any spectral structures which can be composed from spectral lines. Any prior knowledge available from the spectrum can also be added to the model. Fitting strategies for weak and strongly overlapping spectral systems were developed and assessed using two basic model systems, the metabolite mixtures without and with macromolecular (serum) background. The analyses show that if the spectra are measured in high-throughput manner, the consistent absolute quantification demands some calibration to compensate the different response factors of the protons and compounds. On the other hand, the results show that also the T2-edited spectra can be measured so that they obey well the QM rules. In general, qQMSA exploits and interprets the spectral information in maximal way taking full advantage from the QM properties of the spectra and, at the same time, offers chemical confidence which means that individual components can be identified with high confidence on the basis of their accurate spectral parameters. Ó 2014 Elsevier Inc. All rights reserved.

1. Introduction Quantitative 1H NMR (qHNMR) spectroscopy is widely used to numerous applications as recently reviewed by Pauli et al. [1], from the composition of natural products and drug impurity analysis [2] to metabolomics of complex biological samples such as urine, cerebrospinal fluid, serum, fecal extracts and tissue extracts [3–9]. The advantages of NMR are its quantitative and nondestructive nature, high reproducibility and that measurements can be done with minimal sample preparation and calibration. Thus, the cost of one such analysis is very low when the number of quantified metabolites is taken into account – a single NMR sample measured in high-throughput manner, can be transformed into more than 200 metabolic descriptors [10]. In a recent review, the authors ended to conclude also that NMR is the best method for identifying and quantifying urinary compounds and that it permits measure⇑ Corresponding author. E-mail address: Reino.Laatikainen@uef.fi (R. Laatikainen). http://dx.doi.org/10.1016/j.jmr.2014.02.008 1090-7807/Ó 2014 Elsevier Inc. All rights reserved.

ment of the largest number of metabolites (209) and also yields the greatest chemical diversity [9]. The information content of 1H NMR spectra of biological samples is high, but transformation of this complex information into accurate concentrations of individual compounds is not necessarily straightforward. Each compound may contribute up to tens of individual signals and the signals may overlap seriously with each other. In addition, the spectrum may contain signals from unknown components and macromolecular background. Further complications may arise from water suppression, T2-editing, protein interactions, and any instrumental or experimental artifacts. Furthermore, pH and ionic strength may cause variations to the chemical shifts and line widths which complicate automated quantification. There are two fundamentally different approaches for NMR metabolomic analysis. In the chemometric approaches (non-targeted metabolomics), the pattern-recognition methods are used to analyze whole spectra and only the signals that seem to be responsible for the selected patterns are identified [11,12], whereas in

68

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

quantitative metabolomics (targeted metabolomics) the concentrations of known metabolites are determined [13,14]. An advantage of the chemometric methods is that spectra can be analyzed without any prior knowledge of metabolites present in the sample and thus these methods can be used even for wholly novel compounds or sample materials. Nonetheless, underlying any statistical treatment of NMR spectra in metabolomics is the basic notion that metabolites are the actual variables of interest, since they represent the underlying physical model that generated the observed data, and that is why the quantitative data on metabolites is invaluable. In principle, a compound can be quantified from the 1H NMR spectrum of a complex mixture whenever even a single line of the compound can be identified. The separation of the signals can be further improved by expanding the spectrum to more than one dimension but, unfortunately, the 2D NMR spectra are usually lower in sensitivity. Also, the relationship between the signal intensity and concentration is not so straightforward in 2D spectra and needs calibration, whereas in 1D 1H spectra the signal area is usually nearly proportional to the concentration even in very low concentrations [2]. The use of the selective total correlation spectroscopy (TOCSY) experiments [15,16] is an alternative, but these experiments are typically meant for detection of only a few targeted metabolites. Nevertheless, the multidimensional methods can be invaluable in screening the composition of the sample as based on diagnostic correlations. The strategy combining 2D and 1D suits well for applications in which the variation of the chemical diversity is broad, like in plant metabolomics [17]. However, if the number of the samples is large and the concentrations are in the micromolar range, as common with, e.g. biological samples, the final quantification based on a 1D spectrum is desirable. A distinctive feature of high-resolution 1D NMR spectra is that even the most complex spectrum of a compound, composed of thousands of individual spectral lines, can be described by a few spectral parameters within experimental accuracy, employing a quantum mechanical (QM) theory [18]. Thus, even in the case of the 1H NMR spectrum of a complex mixture, there are strict QM rules between the lines of individual compounds. The utilization of this feature in structural analysis has a long tradition since the legendary LAOCOON3 program [19] was published in 1964. However, none of the presently available quantitative analysis methods like the spectral binning [20], peak alignment algorithms and curve-fitting methods [21–24], singular value decomposition [25] or targeted profiling [13,26] fully utilizes these rules. Our objective was to build up and assess a spectral analytical tool that uses and interprets the spectral information in maximal way taking also maximal advantage from any prior knowledge available from the sample and its components. The approach, called herein quantitative Quantum Mechanical Spectral Analysis (qQMSA), is based on the strict QM rules between the molecular structure of a compound and the positions and intensities of its observed NMR spectral lines and merely means the complete iterative analysis of the spectra, based on the QM spin Hamiltonian [27]. In our approach, the signals of well-defined, stoichiometric, components can be described by QM model while the background is described by a multiterm baseline function [28] and the unknown signals using optimizable and adjustable lines, regular multiplets or any spectral structures which can be composed from spectral lines, taking into account also the available prior knowledge about the sample. The program qQMTLS (quantitative Quantum Mechanical Total-Line-Shape) combines our previous constrained total-line-shape (CTLS) approach [2,29] and the iterative QM spectral analysis [30]. In this work, we developed and assessed strategies for qQMSA of biological samples using human serum as the model system. We also tried to find answers to such fundamental questions like ‘‘how strictly the NMR signal areas are proportional to the concentrations if the spectra are measured with

high-throughput manner’’ and ‘‘do the T2-edited and water suppressed spectra really obey exactly the QM rules’’. qQMSA is in fact the method to provide the answers to these questions which are usually left unanswered.

2. Materials and methods 2.1. Sample preparation To avoid the problems arising from unknown samples, the assessment was done by using a series of samples containing common metabolites listed in Table 1 (+ urea and acetate) and background of human serum from five individuals. The metabolites included to the samples are those that are abundant in human serum samples and visible in the high-throughput manner measured T2-edited spectra [4]. The backgrounds are actually human serum protein and lipoprotein samples with the small molecules dialyzed out. All the metabolites used in this work were their natural enantiomers. Dialyses cassettes (PIERCE, Slide-A-Lyzer Dialyses Cassette, 10 K MWCO) were washed in 300 ml dialyses buffer (10 mM Na2HPO4, 150 mM NaCl and 2.7 mM KCl, pH 7.4) to remove glycerol from the membrane (5 min magnetic stirring). Five 3 ml aliquots of human serum from five different individuals were dialyzed in three phases with magnetic stirring: (1) 1.5 l dialysis buffer, 1 h at room temperature; (2) 1.5 l dialysis buffer, 2 h at 4 °C; (3) 1.5 l dialysis buffer, overnight (20 h) at 4 °C. Protein concentrations of the sera were measured before and after dialyses using the biuret method [31,32]. After dialyses, the sera were concentrated by ultrafiltration in order to obtain the original protein concentrations. The centrifugal filters (VWR Centrifugal Filter, 10 K MWCO) were washed several times with water to remove glycerol from the filter membrane. The sera were filtered at 10,000g and 4 °C. Samples with physiological metabolite concentrations of human serum (Table S-1) were prepared by using sodium phosphate buffer (75 mM Na2HPO4 in 80%/20% H2O/D2O, pH 7.4; including also 0.08% sodium 3-(trimethylsilyl)propionate-2,2,3,3-d4 and 0.04% sodium azide) as a solvent. The NMR samples without and with human serum background (pairs of parallel samples with the same metabolite concentrations) were prepared by adding 300 ll the above mentioned metabolite mixtures to 300 ll dialysis buffer or dialyzed serum, respectively.

2.2. NMR Spectroscopy All the measurements were performed on Bruker AVANCE III 500 (Bruker, Karlsruhe, Germany) NMR spectrometer operating at 500.36 MHz equipped with a 5 mm selective inverse probe. Shimming and tuning of the samples were performed automatically. The baseopt digital filter, which yields the perfectly flat baseline, was used. The spectra were measured at the physiological temperature of 310 K. The 1H NMR spectra of the mixtures without serum background were recorded with 160k data points after 4 dummy scans using 8 transients acquired with an automatically calibrated 90° pulse and applying a Bruker noesypresat pulse sequence with mixing time of 10 ms and irradiation field of 25 Hz to suppress the water peak. The acquisition time was 5.5 s, the relaxation delay (d1) 3.0 s and the additional relaxation delay before suppression (d2) 51.5 s. The T2-edited spectra of the mixtures with serum background were measured in the high-throughput manner [4]. The spectra were collected with 64k data points using 24 transients acquired after 4 dummy scans with a Bruker 1D CPMG pulse sequence with water peak suppression and a 78 ms T2-filter with a fixed echo

69

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78 Table 1 Concentration ranges (lM) of metabolites in metabolite mixtures. Metabolite

Concentration range (lM)

Metabolite

Concentration range (lM)

3-Hydroxybutyrate Acetoacetate Alanine Citrate Creatine Creatinine Glucose Glutamine Glycerol

60–271 24–118 312–491 56–130 20–80 42–93 3880–6580 361–586 26–104

Glycine Histidine Isoleucine Lactate Leucine Phenylalanine Pyruvate Tyrosine Valine

169–286 39–83 36–86 555–1680 64–126 46–100 35–99 38–76 155–262

delay of 403 ls to minimize diffusion and J-modulation effects. The acquisition time was 3.3 s and the relaxation delay 3 s. T2 times of the metabolites were estimated by measuring T2edited spectra of a metabolite mixture with different T2-filters (60–4000 ms). The spectra were recorded with 64k data points using 128 transients acquired after 4 dummy scans. The acquisition time and the relaxation delay were 3.3 s and 60 s, respectively. 2.3. Program qQMTLS The program qQMTLS was written by employing Compaq Visual FORTRAN under MicrosoftÒ Developer Studio and the program runs in standard desktop PC. The graphical interface allows all the spectral preparations possibly useful for qQMSA, although normally the analysis can be started without baseline corrections or any other manipulations from the spectrum imported from a modern spectrometer where it has been prepared as described above. The experimental spectra can be imported into the program in the JDX or PERCH formats. The program and the templates for the serum analyses are freely available for non-commercial academic purposes from the corresponding author. However, the file and case management of the present version demands the support of PERCH NMR Software (http://www.perchsolutions.com). The CPU time for analysis does not form a limitation for the method: a simulation of the spectrum or one iteration cycle of the case corresponding to the model J (Table 6) takes ca. one second in a microcomputer equipped with IntelÒXeonÒ CPU E31245 V2 (3.40 GHz) processor. This means that the CPU time burden for a typical analysis is less than 1 min. In practice, in the manual analysis one needs to reserve a few minutes of human time. Tools for automation, model building and peak alignment are under development. Example of the spectral parameter file and the corresponding output file are included in the supporting material (Tables S-2 and S-3). 2.4. Line Shape function The goal of qQMSA is to minimize the residual root meansquare (rrms) between the observed and calculated spectral intensities I(mi)

rrms ¼

X

½Iobs ðmi Þ  Ical ðmi Þ2

ð1Þ

at spectral points i. In qQMTLS, the total line shape I(m) of an NMR frequency domain spectrum (intensity I vs. spectral frequency m) is expressed as a sum of the QM systems Qn(m), the signals Sm(m) which are not described using QM model, and the baseline B(m) which represents the macromolecular background and instrumental artifacts (Fig. 1):

IðmÞ ¼

X

X n Q n ðmÞ þ

X

xm Sm ðmÞ þ BðmÞ

ð2Þ

Table 2 Response factors of glucose determined from the spectra measured with different settings. qHa D2O

Hb D2 O

qpresatc D2O

presatd D2O

qpresatc H2O + D2O

presatd H2O + D2O

a-Glucose a-H1 a-H2 a-H3 a-H4 a-H5 a-H6A a-H6B

0.962 0.974 1.000 0.978 0.965 0.977 0.975

0.875 0.993 0.910 0.953 0.997 0.997 1.000

0.960 0.965 1.000 0.990 0.975 0.953 0.955

0.880 0.993 0.920 0.990 1.000 0.994 0.981

0.950 0.904 0.969 1.000 0.850 0.884 0.811

0.924 0.909 1.000 0.978 0.885 0.868 0.840

b-Glucose b-H1 b-H2 b-H3 b-H4 b-H5 b-H6A b-H6B

nd 0.988 0.996 0.986 0.989 1.000 0.982

nd 0.869 0.955 0.959 0.993 1.000 0.987

nd 0.949 0.978 0.951 1.000 0.913 0.904

nd 0.840 0.945 0.926 1.000 0.914 0.908

nd 1.000 0.986 0.952 0.974 0.870 0.845

nd 0.993 1.000 0.954 0.989 0.881 0.863

a Basic proton spectrum (zg): 128k data points (td), 4 dummy scans (ds), 8 transients (ns), acquisition time (aq) 7.7 s, relaxation delay (d1) 52.3 s and 90° pulse. b Basic proton spectrum (zg): td = 128k, ds = 4, ns = 32, aq = 7.7 s, d1 = 2.3 s and 90° pulse. c Noesypresat pulse sequence (noesygppr1d): 10 ms mixing time, td = 128k, ds = 4, ns = 8, aq = 7.7 s, d1 = 3.0 s, additional relaxation delay before suppression (d2) 49.3 s and 90° pulse. d As in c, but d2 = 0.

The concentrations Xn and xm are the parameters whose values are wanted. The QM spectra Qn(m), can be described by the equation

Q n ðmÞ ¼ F n ðm; w; J; R; D; line shapeÞ

ð3Þ

where the function Fn is a non-explicit function defined by the spin Hamiltonian and the vectors w, J, R and D contain the chemical shifts, coupling constants, response factors and line widths needed to describe the spin system, respectively. The chemical shifts, coupling constants and line widths depend on the chemical structure of the component and may also depend on the chemical composition of the sample. The response factors, line widths and the line shape parameters depend also on experimental conditions and instrumentation. The response factors account for the small differences caused mainly by differential relaxation of different species and they should ideally be close to 1.0. All these parameters reflect the physical and chemical conditions of the sample and can be therefore worth an inspection after spectral analysis. The line shape is expressed as an optimizable sum of Lorentzian, Gaussian and dispersion functions [29]. The non-quantum mechanical signals Sm(m) can be used to describe the signals which arise from unknown or chemically nonstoichiometric components. Examples of the latter are the broad

70

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

Table 3 The mean absolute error (MAE) and the linear regression calibrated MAE percentages of the concentrations (calculated for 20 metabolite mixtures) in quantification of the 18 metabolites using 5 different analysis protocols. Calibrated MAE %f

MAE % A 3-Hydroxybutyrate Acetoacetate Alanine Citrate Creatine Creatinine Glucose Glutamine Glycerol Glycine Histidine Isoleucine Lactate Leucine Phenylalanine Pyruvate Tyrosine Valine Average

a

B

4.2 17.4 5.3 6.3 7.3 11.4 9.8 16.5 23.9 1.7 21.5 4.4 1.0 20.1 12.6 4.0 3.1 5.7 9.8

b

C

7.4 4.2 3.7 6.1 7.3 7.1 1.6 8.2 37.2 2.2 21.2 6.7 2.4 9.4 7.3 1.7 3.5 1.4 7.7

c

3.7 11.5 3.8 11.3 7.9 10.5 2.1 6.0 16.8 2.8 14.1 5.1 2.1 10.5 3.1 4.8 2.3 7.1 7.0

D

d

5.5 6.8 1.9 4.5 7.0 5.5 1.8 4.8 15.5 2.4 3.4 3.5 1.4 5.5 5.5 2.3 6.1 1.4 4.7

E

e

5.3 8.2 2.8 1.6 5.1 2.2 2.8 2.5 9.0 1.7 2.6 2.7 2.2 6.6 4.3 1.2 5.7 1.9 3.8

Aa

Bb

Cc

Dd

Ee

3.8 4.2 1.4 3.9 6.3 4.9 1.3 3.7 24.3 1.4 11.4 4.4 0.9 3.1 6.6 1.1 3.3 1.1 4.8

4.3 3.7 1.8 6.1 7.1 7.1 1.2 3.6 29.4 2.0 14.1 5.6 1.1 2.5 6.5 1.7 3.5 1.4 5.7

3.2 4.8 1.5 2.8 7.0 3.7 1.3 2.9 10.3 1.4 2.4 3.2 1.0 1.6 2.7 1.1 2.4 1.4 3.0

2.7 4.6 1.5 4.5 7.1 5.6 1.0 2.3 9.3 2.4 3.5 2.7 1.1 1.4 4.0 2.0 3.6 1.2 3.4

2.6 3.0 1.3 1.6 5.0 2.0 0.5 2.5 8.9 1.6 2.3 2.7 0.7 1.9 2.1 1.2 2.2 0.9 2.4

a

Constant line width (1.5 Hz) with 100% Lorenzian line shape, linear baseline, no weighting. Line width optimized for all species independently, 100% Lorenzian line shape, linear baseline, no weighting. c Same line width for all species and Gaussian contribution to line shape optimized, intensity overweighting for weak signals, the regions overweighted: 0.90–1.23, 3.52– 3.69 and 6.80–7.90 ppm, 256 baseline terms. d Line width and Gaussian contribution to line shape optimized, intensity overweighting for weak signals, 100 baseline terms, overweighting as in the protocol C, constant line width (1.2 Hz) for glycerol. e Line width and Gaussian contribution to line shape optimized, 50 baseline terms, overweighting as in the protocol C, fit-and-lock protocol for pyruvate; acetoacetate; glycerol and aromatic metabolites, coupling constants and response factors optimized for glucose and lactate, template used for line widths. f See Eq. (7). The calibrated MAE takes into account the bias arising mainly from the differences between the response factors of different compounds. The linear regression coefficients (k) are given in Table 4. b

Table 4 The results of the best fitting protocol (E in Table 3). The slopes k and R2 values are from the linear regression analysis of the theoretical and raw concentrations (see Eq. (7)) for data of 20 samples. Metabolite 3-Hydroxybutyrate Acetoacetate Alanine Citrate Creatine Creatinine Glucose Glutamine Glycerol Glycine Histidine Isoleucine Lactate Leucine Phenylalanine Pyruvate Tyrosine Valine

k

a

0.959(0.006) 0.929(0.007) 0.977(0.004) 1.006(0.004) 1.024(0.097) 0.991(0.007) 0.972(0.002) 1.010(0.007) 1.013(0.019) 1.007(0.004) 1.018(0.006) 1.005(0.007) 0.981(0.002) 1.067(0.006) 0.960(0.007) 1.002(0.004) 0.945(0.006) 1.019(0.003)

R

2

0.994 0.992 0.982 0.994 0.983 0.984 0.999 0.955 0.916 0.985 0.984 0.982 0.999 0.988 0.985 0.997 0.984 0.995

MAE % 5.3 8.2 2.8 1.6 5.1 2.2 2.8 2.5 9.0 1.7 2.6 2.7 2.2 6.6 4.3 1.2 5.7 1.9

Metabolite

T2/s

3-Hydroxybutyrate Acetate Alanine Citrate Creatine Creatinine Glucose Glutamine Histidine Lactate Leucine Phenylalanine Pyruvate Threonine Tyrosine Valine

1.2 4.3 1.0 0.4 1.2 2.3 0.8 0.8 1.3 1.5 1.0 2.1 3.8 1.0 1.3 1.0

Calibrated MAE % 2.6 3.0 1.3 1.6 5.0 2.0 0.5 2.5 8.9 1.6 2.3 2.7 0.7 1.9 2.1 1.2 2.2 0.9

a The numbers in parentheses give the standard errors of the regression coefficients k.

signals in the T2-edited serum spectra arising from the CH3- and CH2-protons of lipids in lipoproteins (Fig. 1). The general formula of Sm(m) is

Sm ðmÞ ¼ Gm ðm; w; J; D; line shapeÞ

Table 5 The determined T2 times and the recovery percentages for two different T2-filters.

ð4Þ

where G is a multiplet function (singlet, doublet, triplet, etc.) which can be regular (1:1 doublet, 1:2:1 triplet, etc.) with a splitting described by J or irregular one for which the relative positions, intensities and line widths are freely optimizable or constrained to a template model. To rationalize the output, the lines can be arranged

Recovery-% with given T2-filter 80 ms

400 ms

95 99 95 88 96 98 93 93 96 96 94 97 99 95 96 95

79 94 76 53 80 89 70 71 80 83 75 87 93 76 81 77

into ‘‘complexes’’ so that their total areas are reported at the end of the analysis. For example, the CH3 signal of lipoproteins in T2-edited serum spectra can be described well using from 6 to 9 lines forming the ‘‘complex’’. Also the baseline B(m) can be optimized during iteration. Modern NMR spectrometers have a very linear baseline, thus B(m) describes mostly broad signals originating from macromolecules, not instrumental artifacts. In qQMTLS, an optimizable multiterm baseline function

BðmÞ ¼

X bn bðmÞ

ð5Þ

n

is employed [28]. The b(m) terms are cos2-functions

bðmÞ ¼ cos2 ðm  mn Þ=dm

ð6Þ

71

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

Table 6 The non-calibrated and calibrated mean absolute errors (MAEs) of the concentrations, calculated for 23 metabolite mixtures with serum background, using different analysis protocols. Calibrated MAE %f

MAE % a

3-Hydroxybutyrate Acetoacetate Alanine Citrate Creatine Creatinine Glucose Glutamine Glycine Histidine Isoleucine Lactate Leucine Phenylalanine Pyruvate Tyrosine Valine Average

F (%)

G (%)

40.0 28.7 6.6 5.2 27.9 12.3 2.6 7.8 10.1 48.2 55.7 9.4 21.8 7.6 28.2 21.9 4.1 19.9

21.3 27.9 7.2 4.8 24.0 15.2 1.3 10.9 9.1 15.0 16.9 9.7 18.7 13.2 26.9 13.4 12.8 14.6

b

c

d

H (%)

I (%)

J (%)

22.0 23.1 6.2 4.6 31.4 10.2 1.7 9.1 11.1 9.6 23.7 7.7 24.8 8.4 18.2 7.0 4.4 13.1

10.7 37.9 6.4 9.9 9.0 7.3 1.2 6.8 3.6 13.0 16.0 7.7 25.1 4.6 22.8 6.4 6.4 11.5

17.3 32.6 6.0 3.0 7.3 7.6 1.7 2.5 7.1 12.5 11.5 2.4 18.3 6.4 22.4 7.3 9.0 10.3

e

Fa

Gb

Hc

Id

Je

31.1 27.5 3.3 3.7 23.2 12.0 1.7 3.6 5.1 13.7 25.3 5.1 14.1 7.5 6.6 10.3 3.2 11.6

20.8 25.3 3.6 3.8 18.2 10.2 1.3 3.4 6.1 13.5 17.1 4.8 13.6 11.5 10.4 13.2 8.6 10.9

20.7 16.5 3.1 3.1 24.0 8.5 1.5 4.7 5.0 9.7 15.1 5.8 11.5 6.7 10.9 7.2 3.8 9.3

11.3 12.3 3.8 7.9 8.9 7.6 0.9 6.4 3.4 13.1 8.1 2.8 13.1 4.5 5.7 6.0 2.8 7.0

11.7 13.6 2.8 2.7 5.8 6.5 1.1 2.6 4.1 10.4 11.4 2.2 8.4 3.4 7.2 5.9 4.5 6.1

a

Same line width for all species optimized, regions overweighted: 0.90–1.23, and 6.80–7.90 ppm, intensity overweighting for weak signals. Otherwise like F but instead of overweighting, fit-and-lock protocol was used. Same line width for all species optimized, fit-and-lock protocol for pyruvate; acetoacetate; isoleucine; leucine; 3-hydroxybutyrate; valine and the aromatic metabolites. d Template used for line widths, fit-and-lock as in H, the response factors from the T2-edited spectra of individual metabolites applied. e Otherwise like I but the line widths were more strongly (doubly) forced to the template values and the values of 3-hydroxybutyrate were obtained as described later on in the text. f See footnote f in Table 3. The linear regression coefficients (k) are given in Table 7. b

c

2.5. Iterative solution of the non-linear equations

Fig. 1. A part of 500 MHz T2-edited (T2-filter time 80 ms) 1H NMR spectra of human serum (black = observed spectrum, red = calculated spectrum, blue = QM lines, violet = fitted lines, grey = baseline, and green = difference between the calculated and the observed spectrum). The calculated spectrum is the sum of the QM lines, fitted lines and baseline. Spectral insert shows the complex transition pattern of leucine methyls. The lipid signals can be automatically created in qQMTLS by defining the number of lines in the particular range and allowing the program to construct and optimize the multiplet. One should note that a good fit of these signals can be obtained with many line combinations.

where b(m) = 0 when m > mn + dm or m < mn  dm and they are set to evenly distributed positions (mn + 1 = mn + dm) so that their sum B(m) = 1.0 at every point when all bn = 1. When compared to the ‘‘global’’ Fourier expansion used in the PERCHit iterator [30], the advantage of the ‘‘localized’’ function is that any part of the spectrum can be fitted independently of the rest of the spectrum, preserving the exact form of the baseline function in memory (One should note that a linear baseline is obtained by setting all b equal and that, in order to prevent explosion of the coefficients, one has to add a constraint to keep them as small as possible).

Eq. (1) defines a set of linear equations for Xn, xm and coefficients bn. If the model spectra (Qn(m) in Eq. (1)) are available for the components and their chemical shifts match the observed spectrum, the concentrations Xn can be obtained by solving the equations. If the model spectra are experimental, they may contain instrumental artifacts and impurity signals. More serious is that the spectral parameters, especially the chemical shifts, and line shapes may differ from those of the sample to analyzed so that a significant bias is induced to the regression results. We have recently outlined an approach based on the adaptive spectral libraries (ASLs) in which the model spectra are obtained using QMSA and stored in the form of spectral parameters, which can be used to simulate the model spectra in any magnetic field and with any line width and shape [33]. In addition, the ASLs offer an efficient and convenient way to store the spectra. If also the chemical shifts, coupling constants, line widths and line shape parameters need to be adjusted, the Eqs. (1)–(6) define a set of nonlinear equations and, for their iterative solution, the trial values of the spectral descriptors need to be given prior to the fitting. These trial values can be obtained from ASL. The iterator algorithm has been outlined before [30] and not discussed here. Because the concentration of a component may be defined by some very weak but otherwise well-defined and separated signal, the principal component regression (PCR) is not useful in qQMSA, contrary to the single compound applications. However, PCR can be used if an integral transform broadening [27,30] is used in an early phase of the analysis. The number of the terms that are optimizable in qQMSA grows when the number of components grows. This problem can be handled by separating the non-correlated parameters like the baseline terms and chemical shifts. Moreover, most of the coupling constants can be kept fixed since they are insensitive to experimental conditions. Those coupling constants that may need to be adjusted can be defined case-specifically. The dimension of the largest Hessian matrix needed in the present application is only ca. 200 and,

72

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

thus, the number of metabolites could be easily multiplied without computational resource problems. Fortunately, for the biofluids like serum and cerebrospinal fluid the number of common lowmolecular-weight metabolites analyzable or to be analyzed from their 1H NMR spectrum is usually at most ca. 50 [9].

these target values. Instead of a template, when the trial values are known to be good, they can be used as the target values.

2.6. Dynamic range of the populations: weighting, locking and ignore

The basic protocol of qQMTLS demands that the corresponding calculated and observed signals overlap in the beginning of the iteration [27], which means that the trial chemical shifts should rather be within 1–2 times the line width from the correct ones. The problem is common with other protocols and many efforts have been used to develop strategies [26,34,35] for the peak alignment and its automation. The property that the exact structures of multiplets from each nucleus can be simulated in any field offers special tools also for peak alignment in qQMSA. In the case of serum spectra, aligning the alanine methyl doublets is usually enough to align the signals of the other compounds in the samples. If there are larger differences for certain signals, in qQMTLS one can use an interactive graphical tool to overlap individual multiplets. Alternatively, one can apply a broadening based on Bartlett window, which allows an artificial overlap of the observed and calculated signals [27] and works well if the signals are well defined but not so well with the very complex signals of, for example, the CH protons of some amino acids. However, because the alignment problem is not an issue in this work, it is not discussed widely here. The second-order effects are related to the alignment problem and can therefore be discussed here. For example, the chemical shifts of metabolites like glutamine and glutamic acid are sensitive to cations and may sometimes overlap with each others, too. This forms a challenge especially in analysis of urine but may disturb also the analyses of cerebrospinal fluid and cell extracts. In our case, the chemical shifts of the diastereotopic CH2 protons of glutamine differ only by 11–13 Hz (at 500 MHz) while the geminal coupling constants are ca. 15 Hz. This leads to strong second-order effects and make the outlook of the signals sensitive to small variations of their relative chemical shifts. In QMSA the signals can be simulated with any chemical shift difference on fly during the iterative process. The second-order effects depend also on field strengths and therefore the same experimental model spectra cannot be used for the spectra measured at different field strengths. In qQMSA the same model can be used; changing the field demands just the change of the FIELD parameter in the spectral parameter file (see Table S-2).

The large dynamic range of the populations forms a challenge for the qQMSA of complex mixtures. Our objective that even the smallest populations of the spectral metabolites varying from 1 to 100 M units should be quantified with relative standard error smaller than 10%, demands special solutions for problems arising from signal overlap. The typical rrms-error of fitting is ca. 0.2% of the maximum intensity, which means that weak signals at the root of the intensive signal can be poorly defined and the line shape artifacts may hinder the accurate estimation of their populations. In applications like the drug impurity analysis 13C decoupling can be useful in quantification of very weak signals [2]. A typical spectrum of a biological sample consists of an intensive high field region and a weak low field region. The latter region contains, for example, signals from aromatic amino acids which have signals also in the more crowded area at the high field. If both the regions are fitted simultaneously, the high field spectral features, such as intensive signals of glucose or protein background, may disturb the analysis so that the fit for the aromatic region suffers. qQMTLS offers two solutions to this problem: (i) the aromatic region is strongly overweighted in the fitting or (ii) the aromatic region is fitted first and the populations and spectral parameters defined by that region are then locked for the analysis of the high field region (called later fit-and-lock). In the case of T2-edited serum spectra, another example of information rich regions with weak signals is the region between 0.75 and 1.10 ppm. It can be also worthwhile to use a weighting protocol in which the strong intensities obtain lower weights in the least-square fitting. In the case of serum, this suits well for the range dominated by strong glucose signals. The weight can also be bound to the overlap of the signals or to the rrms-fit, so that the crowded areas or the areas with poor consensus between the observed and calculated spectra obtain lower weights in the fitting. In addition, the poorly defined ranges, for example around the water signal, can be completely ignored from the analysis. Also, when a proton gives a well-defined signal and the other signals of the compound are complex and composed of numerous weak lines, the latter can be ignored from the analysis. For example the leucine CH- and CH2signals, which are complex and overlap with protein background but not with any invaluable metabolite information, can be ignored from the analysis because the methyl signals (which have strong and well-defined lines, although quantum mechanically rather complex, see Fig. 1) contain the quantitative information in more easily digestible form. 2.7. Prior knowledge, constraints and templates If a spectral parameter, like the chemical shifts of some minor components in the glucose range of serum spectrum, is not well defined, one can also define the range within which a parameter is constrained to stay during the iteration. Also the linear constraints, similar to those used in CTLS analysis [2,29], are useful when a parameter is poorly defined by the spectrum or it correlates with others. In qQMTLS one can do all this also by a template: the spectral parameters, for example chemical shifts and line widths, typical for similar samples are defined in a template file, which is an ordinary spectral parameter file (see S-2), and then the least-square forces are used to constraint the parameters to

2.8. Alignment problem, second-order effects and different field strengths

3. Results and discussion The performance of qQMSA was assessed with metabolite mixtures without and with a human serum background (pairs of parallel samples with the same metabolite concentrations). The intent of the first assessment was to test the performance of program, different fitting protocols and parameters with known metabolite mixtures without complex background. These samples correspond for example to the situation in which the proteins are removed by filtration of serum or protein content of the samples is low, e.g. in the case of urine. The intent of the second assessment was to test the performance of qQMSA with serum spectra in the presence of protein background and as measured in the high-throughput manner [4]. Because all the spectra were measured with automatic shimming and identical settings with those the routine spectra measured in the metabolomics projects [4], our statistics should be close to the statistics obtainable with corresponding real samples. One might say that the spectra without the macromolecular back-

73

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

ground represent almost ideal ones for quantification while the spectrum with the background is opposite, suffering about almost all the possible complications mentioned above. The tools for model building and for automation are under development and therefore not discussed here. Anyhow, the combination of QMSA and ASL’s offers special features also for implementation of this phase of the analysis.

following (Tables 3, 4, 6 and 7), we report the statistics assuming, first, that the ratio between the signal area and the concentration is the same for all the compounds, and second, the statistics where the differences between compounds are taken into account through calibration based on linear regression analysis using the equation

3.1. Response factors and linear regression calibration

where the raw concentration is obtained by comparing the signal area to that of the reference compound, given by qQMTLS when the molarity of the reference compound is given. The calibrated MAE (Mean Absolute Error) is calculated by using these concentrations instead of the raw ones. The inverse of the slope (1/k) gives the relative response factor of the compound as compared to that of the reference compound. The calibration may remove also bias arising from the sample preparation.

The area of NMR signal per proton is ideally the same for all the protons of the same compound and qQMSA should give the same value (1.0) for their response factors. However, there are reasons why this is not fulfilled in practice: different relaxation times, water suppression, macromolecular interactions (with T2-editing), exchangeable protons, imperfect pulse calibration and proton-13C isotope couplings. In order to explore some of these effects, we analyzed the spectra of glucose measured with different settings (Table 2). If the spectrum was measured quantitatively with a long relaxation delay, the response factors were almost identical as could be presumed. However, in the other cases significant effects are visible so that the response factors vary from 0.81 to 1.00, which means that corresponding bias (up to 19%) could be induced to the integrals. These effects have a special importance in the accurate quantification of low concentration components overlapping with high concentration components, like glucose in many biological samples. In the case of T2-edited spectra measured in the high-throughput manner, some response factors need special care. For example, the response factor of lactate CH proton can be as low as 0.70 when compared to that of the lactate CH3 protons. Other examples of such lowered response factors are the aromatic protons. On the basis of these examples, we ended up to the practice that the largest response factor of a compound is set to 1.0. In cases, where the response factors cannot be obtained accurately or the concentration of the component is low (so that its response factor has not much effect on the total fit), they can be fixed to values determined elsewhere. The response factors (when defined as above, 1.0 for the largest value of each compound) take into account the differences between different protons of a compound but the response differences between compounds are not taken into account. This may lead to bias in the concentrations if no calibration is done. In the

concentration ¼ k  raw concentration

ð7Þ

3.2. Quantification of metabolite mixtures without macromolecular background The performance of qQMTLS was tested with 20 metabolite mixtures each containing 18 metabolites (+ acetate and urea) with physiological concentrations of human serum (Fig. 2 and Table 1). Numerous different protocols were tested (selected protocols in the Table 3), starting from fitting with constant line width and no weightings or constraints ending up to fitting with optimized line widths and shapes, weighted areas, locked integrals, optimization of some response factors, and use of a template. A majority of the metabolites could be quantified with an average error

Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of (1)H NMR spectra of complex mixtures and biofluids.

The quantitative interpretation of (1)H NMR spectra of mixtures like the biofluids is a demanding task due to spectral complexity and overlap. Complic...
1MB Sizes 2 Downloads 3 Views