Accepted Manuscript Using FT-NIR spectroscopy technique to determine arginine content in fermented Cordyceps sinensis mycelium Chuanqi Xie, Ning Xu, Yongni Shao, Yong He PII: DOI: Reference:
S1386-1425(15)00616-2 http://dx.doi.org/10.1016/j.saa.2015.05.028 SAA 13695
To appear in:
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy
Received Date: Revised Date: Accepted Date:
19 October 2014 8 May 2015 9 May 2015
Please cite this article as: C. Xie, N. Xu, Y. Shao, Y. He, Using FT-NIR spectroscopy technique to determine arginine content in fermented Cordyceps sinensis mycelium, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (2015), doi: http://dx.doi.org/10.1016/j.saa.2015.05.028
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1
Using FT-NIR spectroscopy technique to determine arginine content in
2
fermented Cordyceps sinensis mycelium
3
Chuanqi Xie a , Ning Xu b, Yongni Shao a, Yong He a,*
4
a
College of Biosystems Engineering and Food Science, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China
5 6
b
7
Corresponding author *: Yong He, Tel: +86-571-88982143,Fax: +86-571-88982143,Email:
8
[email protected] 9
ABSTRACT
College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
10
This research investigated the feasibility of using Fourier transform near-infrared (FT-NIR) spectral
11
technique for determining arginine content in fermented Cordyceps sinensis (C. sinensis) mycelium.
12
Three different models were carried out to predict the arginine content. Wavebumber selection methods
13
such as competitive adaptive reweighted sampling (CARS) and successive projections algorithm (SPA)
14
were used to identify the most important wavenumbers and reduce the high dimensionality of the raw
15
spectral data. Only a few wavenumbers were selected by CARS and CARS-SPA as the optimal
16
waveumbers, respectively. Among the prediction models, CARS-least squares-support vector machine
17
(CARS-LS-SVM) model performed best with the highest values of the coefficient of determination of
18
prediction ( Rp =0.8370) and residual predictive deviation (RPD=2.4741), the lowest value of root
19
mean square error of prediction (RMSEP=0.0841). Moreover, the number of the input variables was
20
forty-five, which only accounts for 2.04% of that of the full wavenumbers. The results showed that
21
FT-NIR spectral technique has the potential to be an objective and non-destructive method to detect
22
arginine content in fermented C.sinensis mycelium.
23
Keywords: Fourier transform near-infrared (FT-NIR) spectra; Arginine; Competitive adaptive
24
reweighted sampling (CARS); Successive projections algorithm (SPA); Prediction
25
Introduction
2
26
Cordyceps sinensis (Clavicipitaceae, Ascomycete; C.sinensis), which is also called “winter-worm
27
and summer-grass”, is composed of a parasitic fungus of Cordyceps sp. and its host [1]. It is one of the
28
most famous and traditional medicines, which can be found only in Qinghai-Tibet Plateau of China. It is
29
welcome by many people because of its pharmacological activities such as protecting lung and kidney
1
30
function, modulating the immune response, improving hyperlipidemia, hyperglycemia and sexual
31
function, inhibiting tumor growth and scavenging free radical [2-4]. Since the natural C.sinensis is
32
expensive and in short supply, the cultivated mycelia of C.sinensis which possesses the same functions
33
has become the major substitutes of the natural species [5]. The amino acid is considered to be one of the
34
most important components for the function of C.sinensis [6]. Arginine is a common amino acid in
35
C.sinensis. It is known to possess many modulatory functions on the endocrine and immune system,
36
which can improve nitrogen balance after trauma and heal wound through angiogenesis, cell
37
proliferation, collagen synthesis, epithelialization [7]. Though arginine is not considered to be an
38
essential amino acid for adults, it is regarded as a conditionally essential amino acid and equally
39
important to human being [8].
40
The most common method to detect amino acid content is high performance liquid chromatography
41
(HPLC) combined with amino acid analyzer technique [9]. Other detection methods are mass
42
spectrometry (MS) and capillary electrophoresis (CE) [10]. However, all of these methods have
43
limitations for that they are time-consuming, laborious, low efficiency, and required professional
44
operation. In addition, they cannot be applied in on-line detection. Therefore, a fast, objective and
45
non-destructive method is in urgent demand.
46
At present, Fourier transform near-infrared (FT-NIR) spectral technique has been widely used in
47
many fields [11-12] due to the advantages such as fast, non-destructive, low cost, simplicity, accuracy,
48
little sample preparation. The principle of FT-NIR spectral technique is to acquire the content or
49
instruction of different components of samples through the analysis of spectral data in the region of
50
12500 to 4000 cm-1 [13]. Three different models such as partial least squares regression (PLSR),
51
principal components regression (PCR) and least squares-support vector machine (LS-SVM) models
52
were established in this study. The process of establishing the prediction model based on full spectrum is
53
time consuming and against the high-speed characteristic of spectral technique [14]. On this account, this
54
study was also carried out to select effective wavenumbers from the full spectral data.
55
In this study, Fourier transform near-infrared (FT-NIR) spectral technique was used for determining
56
arginine content in fermented C.sinensis mycelium. The objectives of this work were: (1) to find the
57
quantitative relationships between the FT-NIR spectral data and arginine content; (2) to acquire optimal
58
wavenumbers using competitive adaptive reweighted sampling (CARS) and successive projection
59
algorithm (SPA), respectively; (3) to compare the performances of different models based on full
2
60
spectral and selected wavenumbers, respectively; (4) to identify the optimal model for the prediction of
61
arginine content; (5) to explain why the selected wavenumbers could be used to detect arginine content.
62
Materials and methods
63
Instrument and software
64
In this study, the Bruker MPA FT-NIR spectrometer (Brucker Optics, Ettlingen, Germany) with the
65
spectral range of 12500-4000 cm-1 and resolution of 8 cm-1 was used to obtain the spectral absorbance
66
information of samples. The spectrometer includes a Rock-Solid interferometer and an integrating
67
sphere diffuse reflector. The signal-to-noise ratio (SNR) for this system is not less than 800:1. The
68
detector was plugged into the sample powder for acquiring the spectral absorbance information, and each
69
spectrum was obtained by an average of 32 scans. The spectral data were obtained by OPUS 5.5 software
70
and saved as OPUS files. The Unscrambler 9.7 (CAMO Software AS, Oslo, Norway) and MATLAB
71
R2009a (The Math Works, Natick, USA) were used to preprocess the spectral data and establish models.
72
Samples preparation
73
A total of 195 dried fermented C.sinensis myceliums, which were provided by Hangzhou Zhongmei
74
Huadong Pharmaceutial Co. Ltd, were used for study. The arginine content was determined by
75
Automatic Amino Acids Analyzer system (Biochrom 30+Series, Biochrom Ltd, Cambridge, UK). A
76
total of 60 mg of each sample was placed in a headspace vial (volum-20 ml), and 10ml HCl (6 mol/L)
77
was added into the sample. The sample was set in an oven at 110±1℃ for 24h after removing the air by
78
nitrogen. The hydrolyzed sample was transferred into a volumetric flask (volum-50ml) after being
79
cooled, and diluted to 50ml with purified water. Then 5ml of the solution was filtering, and 0.5ml of the
80
filtrate was dried in a vacuum drying oven at 60℃. About 1ml sodium citrate buffer (pH=2.2) was added
81
into the residue. Finally, 50µl of the solution was injected into the Automatic Amino Acids Analyzer
82
system. The wavelengths used for detection were 440 and 570nm.
83
In order to avoid bias in subset partition, the 195 samples were arranged in an ascending order
84
according to the Y values (arginine content), then they were divided into the calibration set and the
85
prediction set at a ratio of 2:1 [15]. One sample was picked out from every three ones consecutively
86
which resulted in 130 samples for the calibration set and 65 ones for the prediction set. Full
87
cross-validation was performed for the calibration and validation sets. The statistical arginine content of
88
each set was shown in Table 1.
89
Test flow chart
3
90
The main steps of this study were illustrated in Fig. 1. The spectral information of arginine was
91
obtained by the FT-NIR spectrometer across the wavenumbers of 12500 to 4000 cm-1 in the first step.
92
Then arginine content was determined using HPLC method. All samples were divided into two sets
93
(calibration and prediction) with the ratio of 2:1. After several pre-processing methods and effective
94
variables selection, different models were established to predict arginine content based on the full
95
spectral wavenumbers and selected wavenumbers, respectively. Finally, the optimal model was
96
determined on the basis of the values of coefficient of determination (R2 ), residual predictive deviation
97
(RPD), root mean square error of calibration (RMSEC) and root mean square error of prediction
98
(RMSEP). All of the steps mentioned above were analyzed as follows.
99
Calibration algorithms
100
PLSR is performed by establishing a linear regression model between the variable matrix Y (arginine
101
content) and the variable matrix X (spectral information), which has been widely used in many studies
102
[16-18]. The predicted result is achieved by extracting a set of orthogonal factors which have powerful
103
predicted ability [19]. The PLSR algorithm can be described as follows:
104
Y = aX +b
105
Where Y is the response matrix of the samples, X is the predicted matrix of the samples, a is the matrix of
106
regression coefficients obtained from PLSR, and b is the matrix of residual information.
(1)
107
PCR can compress the high dimensions of the original variables effectively and accelerate the
108
calculation by ignoring the minor components. This method has been widely studied and produced
109
many successful applications [20-21]. In PCR algorithm, the multi-collinearity problem, which may
110
bring about the instability of the prediction model, can be effectively avoided.
111
LS-SVM can handle both linear and nonlinear multivariate problem in a fast way, and therefore has
112
been widely used in many fields [22-23]. It employs nonlinear map function and maps the input features
113
to a high dimensional space, thus changing the optimal problem into equality constraint condition [24].
114
The LS-SVM algorithm could be shown as follows: N
y ( x ) = ∑ α k K ( x , x k ) + b (2)
115
k =1
αi
are Lagrange multipliers, K (x, xi ) is the kernel function, b is the bias value. RBF
116
Where
117
kernel was used as the kernel function of LS-SVM in this study. The LS-SVM parameters were the
118
regularization parameter gam (γ) and the width parameter sig2 (σ2). The gam (γ) determined the tradeoff 4
119
between minimizing the training error and minimizing model complexity, and the sig2 (σ2) was the
120
bandwidth and implicitly defined the nonlinear mapping from input space to high-dimensional feature
121
space [25]. Grid search was used to calculate the optimal parameter values of ( γ , σ 2 ) in this study. This
122
method was calculated by the free LS-SVM toolbox (LS-SVM v1.5, Suykens, Leuven, Belgium) in
123
MATLAB R2009a.
124
Wavenumbers selection
125
The spectral data with the wavenumber of 12500 to 4000 cm-1 was characterized by high
126
dimensionality with redundancy among contiguous wavenumbers. In most cases, the whole
127
wavenumbers do not improve the model performance, since some wavenumbers include irrelevant
128
information while others have low SNR [26]. Selecting a few wavenumbers which were related to the
129
chemical information is a critical step in spectral analysis [27]. The selected wavenumbers can be equal
130
to or more efficient than full wavenumbers [28].
131
Competitive adaptive reweighted sampling (CARS) is an effective variable selection method. It
132
selects optimal wavenumbers from the full spectral wavenumbers according to the “survival of the
133
fittest” principle. The first step of CARS is to remove the wavenumbers which are of small regression
134
coefficients by exponentially decreasing function (EDF), and the ratios of wavenumbers are calculated
135
by an EDF equation in the second step [29]. In each sampling run, it contains four steps [30]: (a) model
136
sampling based on Monte Carlo (MC); (b) wavenumbers selection by EDF; (c) competitive wavenumber
137
selection by adaptive reweighted sampling (ARS); (d) evaluation of subset using cross validation
138
method. Therefore, the wavenumbers which are of little or no effective information are eliminated and
139
effective wavenumbers are retained.
140
SPA is a forward variable selection method, which designs to solve the collinear problems by
141
selecting optimal variables with minimal redundancy [31]. This method applies a projection operation in
142
a vector space for the selection of variables with small collinearity [32]. The CARS and SPA algorithms
143
were calculated in MATLAB R2009a.
144
Model evaluation index
145
The performance of the model was evaluated by the values of R2, RPD, RMSEC and RMSEP [33]. A
146
robust and accurate model should be of low values of RMSEC and RMSEP and high values of R2 and
147
RPD [34]. The RPD value less than 1.0 means very poor model; between 1.0 and 1.4 means poor model
148
in which only high and low values can be distinguished; between 1.4 and 1.8 means fair model which
5
149
may be used for assessment and correlation; between 1.8 and 2.0 means good model; between 2.0 and 2.5
150
means very good quantitative model; greater than 2.5 indicates excellent performance of the model [35].
151
The R2 , RPD and RMSE could be calculated by the following equations:
152
n ∑ ( xi − x )( yi − y ) i =1 R2 = n n ∑ ( xi − x )2 ∑ ( yi − y )2 i =1
SEP =
155
n
∑(y
i
− xi )2
(4)
i =1
STD SEP
(5)
1 n ∑ (yi -xi - Bias)2 n − 1 i=1
Bias =
156 157
1 n
RPD =
154
(3)
i =1
RMSE =
153
2
(6)
1 n ∑ (yi -xi ) n i=1
Where xi is the measurement value of sample i; x is the average value of xi ; yi is the
158
predicted value of sample i; y is the average value of yi ;
159
measurement value; n is the number of samples.
160
Results and Discussion
161
Spectral feature
STD is standard deviation of the
162
Fig. 2 shows the absorbance spectra of the arginine content covering the wavenumbers of 12500 to
163
4000 cm-1 (x-wavenumber/cm-1, y-absorbance value). It is obvious that the general trend of absorbance
164
decreased initially and then increased. The low molecular absorbance was seen in the region from
165
10000 to 7550 cm-1, with a higher absorbance value in the first overtone region from 7550 to 5250 cm-1
166
and the highest absorbance value in the combination region from 5250 to 4000 cm-1. The most
167
dominant absorption bands in NIR spectral region are due to hydrogen bonds such as C-H, O-H, N-H,
168
S-H and P-H as they can give strong overtone and combination [36]. For arginine, the strong
169
absorptions observed around the wavenumbers of 7000 to 4000 cm-1 are corresponded to C=O, N-H,
170
C-H and C-C bond [37]. The bands from 7500 to 5500 cm−1 are related to C-H first overtone stretch
171
vibration modes in CH3 and CH2 groups. The absorption bands from 5000 to 4000 cm−1 are due to amide
172
[38] and C-H combination bands which are characteristic bands for proteins and amino acids [39].
173
Pre-processing and PLSR models 6
174
In this study, PLSR models were applied to determine the best pre-processing method in terms of the
175
values of R2, RPD, RMSEC and RMSEP. Nine different pre-processing methods were used in this study
176
including moving average smoothing (MAS), savitzky-golay smoothing (SGS), median filter smoothing
177
(MFS), Gussian Filter Smoothing (GFS), multiplicative scatter correction (MSC), savitzky-golay
178
derivatives (SGD) and standard normal variate (SNV). The calculation results with raw and different
179
pre-processing methods in calibration and prediction sets were shown in Table 2. According to the model
180
evaluation standard, raw spectral data without any pre-processing performed best with the highest values
181
of Rc (0.8463),
182
(0.0962). Therefore, all of the subsequent analysis was carried out by raw spectral data.
183
Regression models based on full wavenumbers
184
2
Rp2 (0.7862) and RPD (2.1789), the lowest values of RMSEC (0.0807) and RMSEP
In this study, another two models (PCR and LS-SVM) were established to predict arginine content,
Rp2
185
respectively. From Table 3, it can be seen that PCR model performed better with higher values of
186
(0.7470) and RPD (1.9880), a lower value of RMSEP (0.1047). The LS-SVM model also obtained an
187
acceptable result. Among the three models, PLSR model performed best among the three models.
188
However, the number of input variables used in the three models was too many.
189
Effective wavenumbers
190
In order to simplify the model and improve the prediction ability, CARS and SPA were used to select
191
the optimal wavenumbers in this study. Most of the selected wavenumbers concentrate in the region of
192
8000 to 4000 cm-1. It is because that most of the sensitive wavenumbers which are correlated with the
193
chemical groups in arginine molecule are located in the region of 8000 to 4000 cm-1. The region from
194
10000 to 7500 cm-1, which is assigned to second and third overtones, is of low intensity and SNR [40].
195
That is to say there is not too much useful information in this area for arginine detection.
196
Models based on CARS
197
In order to improve the performance of the prediction model, CARS was firstly used to select the
198
effective wavenumbers in this study. In Fig. 3 (a), it could be found that the number of sampled variables
199
decreased fast in the first stage and then slowly in the second stage of EDF. In (b), the value of root mean
200
square error of cross validation (RMSECV) first descend which indicates some uninformative variables
201
were eliminated, later it changed slightly to show that the variables do not change significantly, and
202
finally increased due to elimination of some useful variables. Each line in (c) represents the coefficient
203
of each variable at different sampling runs. Some variables could be extracted by each sampling run, and 7
204
the optimal variables with the lowest value of RMSECV were marked by the vertical asterisk line. After
205
the asterisk line, the RMSECV began to increase which was ascribed to the removing of some effective
206
variables. It could be found in Table 4 that a total of 45 wavenumbers were selected by CARS. The
207
number of selected variables was only 2.04% of that of the whole wavebands. These wavenumbers were
208
then used to replace the full wavenumbers for building prediction models.
209
Three models were established based on the selected wavenumbers. The results were shown in Table
Rp2 (0.8370) and
210
5. The CARS-LS-SVM model obtained the best results with the highest values of
211
RPD (2.4741), the lowest value of RMSEP (0.0841). Compared with the other two models (CARS-PLSR
212
and CARS-PCR), the value of
213
increased by 14.30% and 34.21%, and RMSEP decreased by 12.49% and 25.51%, respectively. In
214
CARS-LS-SVM model, the values of R2 increased by 5.29% in the calibration set and 15.93% in the
215
prediction set, and RMSE decreased by 17.55% in the calibration set and 23.48% in the prediction set
216
compared with LS-SVM model. Though there were a little decrement for
217
CARS-PCR models, the results were acceptable. Thus, CARS was effective to search for the optimal
218
wavenumbers in this study.
219
Models based on CARS-SPA
Rp2 in CARS-LS-SVM model increased by 6.49% and 17.89%, RPD
Rp2 values in CARS-PLSR and
220
However, 45 wavenumbers was also a little more for spectral analysis. Therefore, CARS combined
221
SPA (CARS-SPA) was finally carried out to select the most useful wavenumbers. It could be seen in
222
Table 4 that fourteen wavenumbers (12459, 12420, 12378, 8278, 7541, 7047, 6172, 5929, 5145, 4980,
223
4868, 4355, 4154, and 4065 cm-1) were selected as the optimal input variables. The number of selected
224
variables was only 0.64% of that of the full wavebands. Then the fourteen selected wavenumbers were
225
treated as new input variables for establishing prediction models. The predicted results based on
226
CARS-SPA are also shown in Table 5. In the three models established based on CARS-SPA, the
227
CARS-SPA-LS-SVM model performed excellently with the highest values of R2 and RPD, the lowest
228
values of ( Rc =0.8560,
229
the other two models (CARS-SPA-PLSR and CARS-SPA-PCR), the values of Rc increased by 3.98%
230
and 5.74%,
231
decreased by 7.85% and 11.23%, and RMSEP decreased by 1.00% and 11.331% in
2
Rp2 =0.8160, RPD=2.3277, RMSEC=0.0798, RMSEP=0.0894). Compared with 2
Rp2 increased by 0.62% and 6.39%, RPD increased by 1.03% and 12.54%, RMSEC
8
232
CARS-SPA-LS-SVM model, respectively. Each model based on CARS-SPA obtained a better result
233
than the corresponding model built based on full spectral wavebands, indicating that useful wavebands
234
were selected while those contain redundant information were rejected by CARS-SPA method.
235
Compared with CARS-PLS and CARS-PCR models, the
236
CARS-SPA-PCR models. Though there was a little decrease of Rp in CARS-SPA-LS-SVM, the result
237
is also excellent ( Rp =0.816). The result demonstrated CARS combined SPA method is also good at
238
selecting effective wavebands.
239
3.5 Optimal models
240
Rp2 increased in CARS-SPA-PLS and 2
2
2
The Raw-PLSR performed best with the highest values of Rc , 2
Rp2 and RPD, the lowest values of
Rp2 =0.7862, RPD=2.1789,RMSEC=0.0807, RMSEP=0.0962)
241
RMSEC and RMSEP ( Rc =0.8463,
242
among those models established based on full wavebands. However, the number of input variables was
243
too many. Among the models based on selected wavebands, the CARS-LS-SVM model obtained an
244
excellent result with the highest values of Rc ,
245
( Rc =0.8950,
246
variables decreased largely which means simpler models can be acquired. Thus, the selected
247
wavenumbers are more efficient than full wavenumbers. The predicted results of Raw-PLSR and
248
CARS-LS-SVM models were shown in Fig. 4.
249
Discussion
250
2
2
Rp2 and RPD, the lowest values of RMSEC and RMSEP
Rp2 =0.8370, RPD=2.4741, RMSEC=0.0686, RMSEP=0.0841). Moreover, the input
In this study, nine different pre-processing methods were used for selecting the best one. The worst 2
Rp2 of 0.1172. Also,
251
result was acquired by the SGD pre-processing with the Rc of 0.6954 and
252
there is a big difference between the values Rc and
253
perform well. For the other eight pre-processing methods, the results are very similar. However, The
254
best result was obtained by the raw data with the Rc of 0.8463,
255
RMSEC of 0.0807 and RMSEP of 0.0962. Among all the results, the values of Rc ,
256
the highest, and the RMSEC and RMSEP are the lowest. For the RPD value (2.1789), which is between
257
2.0 and 2.5, indicating that the prediction model is very good. Thus, all the analysis were based on the
Rp2 , which means SGD-based model did not
2
2
Rp2 of 0.7862, RPD of 2.1789, 2
9
Rp2 and RPD are
258
original data. For PCR and LS-SVM models, the obtained results were a little worse than the PLSR
259
models except the SGD-based model. However, the number of input variables for these models are too
260
many. Thus, CARS and SPA methods were carried out to identify the useful wavenumbers. For CARS
261
and CARS-SPA, forty-five and fourteen wavenumbers were obtained, respectively. Based on these
262
selected wavenumbers, PLSR, PCR and LS-SVM models were re-established. In Table 5, it can be
263
found CARS-SPA method performed better the full spectrum-based models. This is because raw
264
spectral data have too much useless information at some wavenumbers, and effective wavenumbers can
265
be selected by CARS-SPA, which is helpful to build a accurate and robust model. After wavenumbers
266
selection, CARS-LS-SVM and CARS-SPA-LS-SVM performed the best. The RPD values in the two
267
models are 2.4741 and 2.3277, which means they are very good models and very close to the excellent
268
models. Based on the results acquired by different pre-processing methods and models, it proved that
269
FT-NIR spectral signature can be used for arginine content detection. CARS-LS-SVM model
270
performed better than full spectrum-based models, which is because it rejected redundant information
271
and retained useful information from the full wavenumbers. There are many different groups such as
272
C-H, N-H, C-C, C-N, C=O and O-H in arginine molecule (Fig. 5). Sensitive wavenumbers, which are
273
corresponded to different groups, are not the same. Therefore, the obtained results using different
274
wavenumbers selection methods are also various. The wavenumber of 7093, 7070 and 7047 cm-1 were
275
assigned to the first NH/OH stretching overtones (6200-7400 cm-1); 4991, 4987, 4983, 4980, 4976 and
276
4868 cm-1 were assigned to the vibrational overtone of combined C=O amide and amino acid N-H
277
(5000-4000 cm-1); 4359, 4355, 4154 and 4065 cm-1 were assigned to C-H stretching vibration and C-H
278
deformation [41-43]. Many of the selected wavenumbers have a close correlation with the arginine
279
content. This might be the reason why FT-NIR spectra could be used to detect arginine content in
280
fermented C.sinensis mycelium. There are thirteen selected wavenumbers (7093, 7070, 7047, 4991,
281
4987, 4983, 4980, 4976, 4868, 4359, 4355, 4154, and 4065 cm-1) suggested by CARS and six
282
wavenumbers (7047, 4980, 4868, 4355, 4154 and 4065 cm-1) suggested by CARS-SPA that could be
283
considered to have a correlation with the arginine content. This might be the reason why the predicted
284
results based on CARS method performed the best among all models.
285
Conclusions
286
This study was carried out to evaluate the feasibility of using FT-NIR spectrometer, which covers the
287
spectral range of 12500-4000 cm-1, to determine arginine content in fermented C.sinensis mycelium. The
10
288
results indicate that FT-NIR spectral technique had the potential to be adopted as a fast, objective and
289
non-destructive method to predict the arginine content. Out of the 2203 variables, only a few effective
290
variables were selected by CARS and CARS-SPA methods, respectively. On the basis of the selected
291
wavenumbers, CARS-LS-SVM model performed the best. Also, a simple system based on selected
292
wavebands could be developed to replace the current FT-NIR spectrometer for detecting arginine
293
content. The selected wavenumbers not only simplified and improved the prediction models but also
294
explained why FT-NIR spectra could be used to detect arginine content.
295
However, this research only represents a preliminary work. In future study, more samples should be
296
used for improving the robustness and accuracy of the prediction ability. Other algorithms for selecting
297
optimal wavenumbers with higher accuracy and fewer variables should be considered in further studies.
298 299
Acknowledgements
300
This work was supported by 863 National High-Tech Research and Development Plan
301
(2013AA102301, 2011AA100705), Zhejiang Provincial Natural Science Foundation of China
302
(Z3090295) and the Fundamental Research Funds for the Central Universities of China (2012FZA6005).
303 304
References
305
[1] Z.Y. Zhang, Z.F. Lei, Y. Lü, Z.Z. Lü, Y. Chen, J. Biosci. Bioeng. 106(2) (2008) 188-193.
306
[2] J.W. Bok, L. Lermer, J. Chilton, H.G. Klingeman, G.H.N, Phytochemistry 51(7) (1999) 891-898.
307
[3] B.J. Wang, S.J. Won, Z.R. Yu, C.L. Su, Food Chem. Toxicol. 43(4) (2005) 543-552.
308
[4] T.H. Hsu, L.H. Shiao, C. Hsieh, D.M. Chang, Food Chem. 78(4) (2002) 463-469.
309
[5] J.Y. Yang, W.Y. Zhang, P.H. Shi, J.P. Chen, X.D. Han, Y. Wang, Pathol. Res. Pract. 201(11) (2005)
310
745-750.
311
[6] X.Z. Zhou, Z.H. Gong, Y. Su, J. Lin, K.X. Tang, J. Pharm. Pharmacol. 61(3) (2009) 279-291.
312
[7] M.B. Witte, F.J. Thornton, U. Tantry, A. Barbul, Metabolism 51(10) (2001) 1269-1273.
313
[8] W.J.D. Jonge, B. Marescau, R.D. Hooge, P.P.D. Deyn, Nutr. Neurosci. 131(10) (2001) 2732-2740.
314
[9] T. Teerlink, R.J. Nijveldt, S.D. Jong, P.A.M.V. Leeuwen, Anal. Biochem. 303(2) (2002) 131-137.
315
[10] C.H. Petter, N. Heigl, S. Bachmann, V.A.C. Huck-Pezzei, M. Najam-ul-Haq, R. Bakry, A.
316 317
Bernkop-Schnürch, G. Bonn, C.W. Huck, Amino Acids, 34(4) (2008) 605-616. [11] H.Y. Cen, Y. He, Trends Food Sci. Tech. 18(2) (2007) 72-93.
11
318 319
[12] B.B. Wedding, C. Wright, S. Grauf, R.D. White, B. Tilse, P. Gadek, Postharvest Biol. Tec. 75 (2013) 9-16.
320
[13] D. Wu, J.Y. Chen, B.Y. Lu, L.N. Xiong, Y. He, Y. Zhang, Food Chem. 135(4) (2012) 2147-2156.
321
[14] D. Wu, X.J. Chen, P.Y. Shi, S.H. Wang, F.Q. Feng, Y. He, Anal. Chim. Acta 634 (2009) 166-171.
322
[15] C.Q. Xie, H.L. Wang, Y.N. Shao, Y. H, Intell. Autom. Soft Co. 21(3) (2015) 395-407.
323
[16] D. Wu, X. Chen, X. Zhu, X. Guan, G. Wu, Anal. Methods 3(8) (2011) 1790-1796.
324
[17] Y. He, M. Huang, A. Garcia, A. Hernandez, H. Song, Comput. Electron. Agr. 58 (2007) 144-153.
325
[18] L.L. Jiang, F. Liu, Y. He, Sensors 12 (2012) 3498-3511.
326
[19] M. Kamruzzaman, G. ElMasry, D.W. Sun, P. Allen. Anal. Chim. Acta 714 (2012) 57-67.
327
[20] X.G. Shao, W. Wang, Z.Y. Hou, W.S. Cai, Talanta 69(3) (2006) 676-680.
328
[21] W. Wang, Y.K. Peng, H. Huang, J.H. Wu, Sens. Lett. 9(3) (2011) 1024-1030.
329
[22] X.J. Chen, D. Wu, Y. He, S. Liu, Food Bioprocess Tech. 4(5) (2011) 753-761.
330
[23] X.L. Zhang, F. Liu, Y. He, X.L. Li, Sensors 12 (2012) 17234-17246.
331
[24] D. Wu, D.W. Sun, Talanta 111(15) (2013) 39-46.
332
[25] F. Liu, Y.H Jiang, Y. He, Anal. Chim. Acta 635(1) (2009) 45-52.
333
[26] D. Wu, D.W. Sun, Innov. Food Sci. Emerg 19 (2013) 1-14.
334
[27] D.F. Barbin, G. ElMasry, D.W. Sun, P. Allen, Anal. Chim. Acta 719 (2012) 30-42.
335
[28] M. Kamruzzaman, G. Elmasry, D.W. Sun, P. Allen, J. Food Eng. 104(3) (2011) 332-340.
336
[29] H.D. Li, Y.Z. Liang, Q.S. Xu, D.S. Cao, Anal. Chim. Acta 648 (2009) 77-84.
337
[30] X. Wei, N. Xu, D. Wu, Y. He, Food Bioprocess Tech. 7 (2014) 184-190.
338
[31] M.C.U. Araújo, T.C.B. Saldanha, R.K.H. Galvão, T. Yoneyama, H.C. Chame, V. Visani,
339 340 341
Chemometr. Intell. Lab. 57 (2) (2001) 65-73. [32] R.K.H. Galvão, M.C.U. Araújo, W.D. Fragoso, E.C. Silva, G.E. José, S.F.C. Soares, H.M. Paiva, Chemometr. Intell. Lab. 92(1) (2008) 83-91.
342
[33] A.H. Gómez, Y. He, A.G. Pereira, J. Food Eng. 77(2) (2006) 313-319.
343
[34] X.L. Li, He. Y, Food Bioprocess Tech. 3(5) (2010) 651-661.
344
[35] R.A. Viscarra Rossel, R.N. McGlynn, A.B. Mcbratbey. Geoderma 137(1-2) (2006) 70-82.
345
[36] P.K. Ghosh, D.S. jayas, Sens. & Instrumen. Food Qual. 3(1) (2009) 3-11.
346
[37] M. Mecozzi, M. Pietroletti, A. Tornambè, Spectrochim. ACTA A 78(5) (2011) 1572-1580.
347
[38] S.W. Bruun, J. Holm, S.I. Hansen, S. Jacobsen , Appl. Spectrosc. 60(7) (2006) 737-746.
12
348
[39] Y. Chen, M.Y. Xie, H. Zhang, Y.X. Wang, S.P. Nie, C. Li, Food Chem. 135(1) (2012) 268-275.
349
[40] N. Gierlinger, M. Schwanninger, R. Wimmer, J. Near Infrared Spec. 12(2) (2004) 113-119.
350
[41] J. Wang, M.G. Sowa, M.K. Ahmed, H.H. Mantsch, J. Phys. Chem. 98(17) (1994) 4748-4755.
351
[42] M. Miyazawa, M. Sonoyama, J. Near Infrared Spec. 6(1) (1998) 253-257.
352
[43] X.L. Chu, Y.P. Xu, G.Y. Tian, Chemical Industry Press, Beijing, 2009, pp. 82-84.
353
13
354
Figure captions
355
Fig.1. Main steps of the study
356
Fig.2. Raw spectral absorbance curves of arginine in fermented C.sinensis mycelium
357
Fig.3.The calculation of CARS: (a) The changing trend of the number of sampled variables, (b) 10-fold
358
root mean square error of cross validation (RMSECV) values and (c) regression coefficients of
359
each variable with the increasing of sampling runs. The line marked by asterisk means the optimal
360
point where 10-fold RMSECV values achieve the lowest
361 362 363
Fig.4. Scatter plots of measured vs. predicted values of arginine of Raw-PLSR and CARS-LS-SVM models: (a) calibration(b) prediction (c) calibration(d) prediction Fig.5. Chemical structure of arginine molecule
364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381
14
382 383
Fig.1. Main steps of the study.
384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399
15
400 401
Fig.2. Raw spectral absorbance curves of arginine in fermented C.sinensis mycelium.
402 403 404 405 406 407 408 409 410 411 412 413
414 415 416
Fig.3. The calculation of CARS: (a) The changing trend of the number of sampled variables, (b) 10-fold root mean square error of cross validation (RMSECV) values and (c) regression
16
417
coefficients of each variable with the increasing of sampling runs. The line marked by asterisk
418
means the optimal point where 10-fold RMSECV values achieve the lowest.
419 420 421 422 423 424 425 426 427 428 429 430 431 432 433
434
17
435 436
Fig.4. Scatter plots of measured vs. predicted values of arginine of Raw-PLSR and CARS-LS-SVM models: (a) calibration (b) prediction (c) calibration (d) prediction.
437 438 439 440 441 442 443 444 445
446 447
Fig.5. Chemical structure of arginine molecule.
448
18
449
Table 1
450
Statistical values of arginine content in calibration and prediction sets (g/100g)
451
Data sets
Number
Range
Mean
S.D
Calibration
130
2.5826-3.5633
3.0740
0.2067
Prediction
65
2.6595-3.6615
3.0825
0.2097
All
195
2.5826-3.6615
3.0768
0.2072
S.D: Standard Deviation
452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473
Table 2
474
Predicted results of PLSR models using different pre-processing methods
19
Calibration
Prediction
Latent
Preprocessing
RPD Rc2
RMSEC
R 2p
RMSEP
Variables
Raw
0.8463
0.0807
0.7862
0.0962
11
2.1789
MAS
0.8318
0.0845
0.7823
0.0970
11
2.1581
SGS
0.8301
0.0849
0.7802
0.0975
11
2.1469
MFS
0.8329
0.0842
0.7800
0.0975
11
2.1471
GFS
0.8412
0.0821
0.7827
0.0970
11
2.1601
Normalize
0.8451
0.0810
0.7832
0.0965
10
2.1640
MSC
0.8122
0.0892
0.7572
0.1025
8
2.0443
SGD
0.6954
0.1136
0.1172
0.2022
3
1.0333
Baseline
0.8045
0.0910
0.7494
0.1038
9
2.0129
SNV
0.8253
0.0860
0.7693
0.1000
9
2.0979
475
MAS: Moving Average Smoothing; SGS: S.Golay Smoothing; MFS: Median Filter Smoothing; GFS:
476
Gussian Filter Smoothing; MSC: Multiplicative Scatter Correction; SGD: SavitzkyGolay Derivatives;
477
SNV: Standard Normal Variate
478 479 480 481 482 483 484 485 486 487 488 489 490
Table 3
491
Predicted results of different models based on full wavenumbers Model
Input
Sets
No.
R2
RMSE
PCR
2203
Calibration
130
0.7505
0.1029
20
RPD
LS-SVM
Prediction
65
0.7470
0.1047
Calibration
130
0.8500
0.0832
Prediction
65
0.7220
0.1099
1.9880
2203 1.8951
492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515
Table 4
516
Effective wavenumbers selected by CARS Algorithm
Number
CARS
45
Selected wavenumber/cm-1 12478, 12459, 12420, 12378, 12362, 12247, 12200, 12115, 12112, 12073, 12065, 12019, 11734, 11599, 11595, 11564, 10777, 8278, 7541, 7537, 7533,
21
7093, 7070, 7047, 6172, 6168, 6164, 5948, 5944, 5940, 5936, 5932, 5929, 5211, 5145, 4991, 4987, 4983, 4980, 4976, 4868, 4359, 4355, 4154, 4065 12495, 12420, 12378, 8278, 7541, 7047, 6172, 5929, 5145, 4980, 4868, CARS-SPA
14 4355, 4154, 4065
517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537
Table 5
538
Predicted results of different models based on CARS Model
Input
CARS-PLSR
45
CARS-PCR
45
Sets
No.
R2
RMSE
Calibration
130
0.8918
0.0677
Prediction
65
0.7860
0.0961
Calibration
130
0.9049
0.0635
22
RPD
2.1645
CARS-LS-SVM
CARS-SPA-PLSR
CARS-SPA-PCR
CARS-SPA-LS-SVM
Prediction
65
0.7100
0.1129
Calibration
130
0.8950
0.0686
Prediction
65
0.8370
0.0841
Calibration
130
0.8232
0.0866
Prediction
65
0.8110
0.0903
Calibration
130
0.8095
0.0899
Prediction
65
0. 7670
0.1008
Calibration
130
0.8560
0.0798
Prediction
65
0.8160
0.0894
1.8435
45 2.4741
14 2.3040
14 2.0683
14
539 540
23
2.3277
541
Highlights
5421) Spectral feature of arginine was studied. 5432) FT-NIR technique was a non-destructive method to detect arginine content. 5443) CARS and SPA were effective methods to select useful wavenumbers.
24