Accepted Manuscript Discrimination of tomatoes bred by spaceflight mutagenesis using visible/near infrared spectroscopy and chemometrics Yongni Shao, Chuanqi Xie, Linjun Jiang, Jiahui Shi, Jiajin Zhu, Yong He PII: DOI: Reference:
S1386-1425(15)00028-1 http://dx.doi.org/10.1016/j.saa.2015.01.018 SAA 13186
To appear in:
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy
Received Date: Revised Date: Accepted Date:
13 October 2014 5 January 2015 11 January 2015
Please cite this article as: Y. Shao, C. Xie, L. Jiang, J. Shi, J. Zhu, Y. He, Discrimination of tomatoes bred by spaceflight mutagenesis using visible/near infrared spectroscopy and chemometrics, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (2015), doi: http://dx.doi.org/10.1016/j.saa.2015.01.018
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Discrimination of tomatoes bred by spaceflight mutagenesis using visible/near
1
infrared spectroscopy and chemometrics
2
Yongni Shaoa, Chuanqi Xiea, Linjun Jianga, Jiahui Shib, Jiajin Zhua, Yong He a*
3 4
a
College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China; b
5
Zhejiang Sports Science Research Institute, Hangzhou, P.R. China;
6
Corresponding author*: Yong He, Tel: +86-571-88982143, Fax: +86-571-88982143, Email:
7
[email protected] 8
ABSTRACT
9
Visible/near infrared spectroscopy (Vis/NIR) based on sensitive wavelengths (SWs) and
10
chemometrics was proposed to discriminate different tomatoes bred by spaceflight mutagenesis from
11
their leafs or fruits (green or mature). The tomato breeds were mutant M1, M2 and their parent. Partial
12
least squares (PLS) analysis and least squares-support vector machine (LS-SVM) were implemented
13
for calibration models. PLS analysis was implemented for calibration models with different
14
wavebands including the visible region (400-700 nm) and the near infrared region (700-1000 nm).
15
The best PLS models were achieved in the visible region for the leaf and green fruit samples and in
16
the near infrared region for the mature fruit samples. Furthermore, different latent variables (4-8 LVs
17
for leafs, 5-9 LVs for green fruits, and 4-9 LVs for mature fruits) were used as inputs of LS-SVM to
18
develop the LV-LS-SVM models with the grid search technique and radial basis function (RBF)
19
kernel. The optimal LV-LS-SVM models were achieved with six LVs for the leaf samples, seven LVs
20
for green fruits, and six LVs for mature fruits, respectively, and they outperformed the PLS models.
21
Moreover, independent component analysis (ICA) was executed to select several SWs based on
22
loading weights. The optimal LS-SVM model was achieved with SWs of 550-560 nm, 562-574 nm,
23
670-680 nm and 705-715 nm for the leaf samples; 548-556 nm, 559-564 nm, 678-685 nm and
24
962-974 nm for the green fruit samples; and 712-718 nm, 720-729 nm, 968-978 nm and 820-830 nm
25
for the mature fruit samples. All of them had better performance than PLS and LV-LS-SVM, with the
26
parameters of correlation coefficient (rp), root mean square error of prediction (RMSEP) and bias of
27
0.9792, 0.2632 and 0.0901 based on leaf discrimination, 0.9837, 0.2783 and 0.1758 based on green
28
fruit discrimination, 0.9804, 0.2215 and -0.0035 based on mature fruit discrimination, respectively.
29
The overall results indicated that ICA was an effective way for the selection of SWs, and the Vis/NIR
1
30
combined with LS-SVM models had the capability to predict the different breeds (mutant M1, mutant
31
M2 and their parent) of tomatoes from leafs and fruits.
32
Keywords: Vis/near infrared spectroscopy; tomato; independent component analysis; partial least
33
squares analysis; least squares-support vector machine
34
1. Introduction
35
Over the past 40 years, breeding by spaceflight mutagenesis has become a new trend of high
36
technology agriculture. China is one of the three countries (China, USA and Russia) which master the
37
spaceflight return technology. From the year 1987, China has successfully conducted spaceflight tests
38
carrying plant seeds. By the effort of researchers, excellent breeds were obtained, demonstrating the
39
obvious advantages of this technique. Spaceflight mutagenesis is a new breeding technology that
40
combines spaceflight, biology and agriculture breeding. Tomato is a kind of vegetable which is also
41
called “fruit”. It is rich in nutrient, such as Vitamin C, organic acids, mineral, especially, the lycopene
42
content. It is very popular with consumers because of its nutrient value. With the improvement of our
43
living standards, the breeding and quality of tomato are attracted to many scientists to do the research
44
on. Spaceflight mutagenesis technology has been applied to genetic improvement of process operations
45
of varieties of crops and to realize the genetic improvement of high efficiency. At present, researches
46
on the mechanism of spaceflight mutagenesis breeding are still in a preliminary stage.
47
Wang et al. analyzed tomato seeds by space flight mutagenesis using FT-IR. The results showed that
48
the space environment could be used to apply the stretching vibration of C-O on the structure of
49
carbohydrates in the space tomato seeds [1]. Guo et al. used X-ray fluorescence analysis to measure
50
and analyze the elements of the fourth generation radix isatidis with space flight mutagenesis breeding
51
[2]. In our early study, we used an analysis method of visible/near infrared spectroscopy (Vis/NIRS)
52
combined with chemometrics to discriminate breeds of tomato by spaceflight mutagenesis from the
53
green fruits [3]. In this study, we extend our samples to tomato leafs and fruits (green or mature), and
54
try to use Vis/NIR to find the variation characteristics of spectral, and compare the discrimination
55
capacity between them.
56
Near infrared spectroscopy (NIR), utilizing the spectral range from 780 to 2500 nm, has served for
57
the past 30 years as a method to predict the quality of different foods and agricultural products due to
58
its speedy analysis, little sample preparation requirement and low cost [4]. In our previous study, we
59
used Vis/NIRS to analyze the physiological properties of tomatoes, including firmness, soluble solids
2
60
content and the pH value [5]. Xu et al. used near infrared spectroscopy in detecting minor damages to
61
tomato leafs, and showed that the sensitive bands of 1450 and 1900 nm modeled with severity level
62
provided the highest correlation coefficient [6]. Pedro and Ferreira adopted near infrared spectroscopy
63
to estimate solids and carotenoids in tomatoes, and the best models presented prediction results with
64
RMSEP and r value of 0.4157, 0.9998 for total solids, 0.6333, 0.9996 for soluble solids, 21.5779,
65
0.9996 for lycopene, and 0.7296, 0.9981 for β-carotene [7]. Kamil et al. also evaluated tomato products
66
(lycopene, β-carotene, starch, allura red pigment and paprika) based on the Fourier transformer infrared
67
spectroscopy [8]. Sirisomboon et al. used near infrared spectroscopy to classify the maturity level and
68
to predict textural properties of tomato variety “Momotaro”. It showed that for the tomato maturity
69
classification, the distinguish ability of 100.00% was obtained for red and pink tomatoes, and 96.85%
70
for mature green tomatoes [9].
71
Various calibration methods have been used to relate near-infrared spectra with measured properties
72
of materials. Principal components regression (PCR), partial least squares (PLS), multiple linear
73
regression (MLR) and artificial neural networks (ANN) are the most used multivariate calibration
74
techniques for NIRS [10]. PLS was usually considered for a large number of applications in food
75
analysis and was widely used in multivariate calibration. Because it takes the advantage of the
76
correlation relationship that already exists between the spectral data and the constituent concentrations.
77
However PLS is based on linear models and unsatisfactory results may be obtained when non-linearity
78
is present.
79
Least-squares support vector machine (LS-SVM) could handle the linear and nonlinear
80
relationships between the spectra and response chemical constituents [11-12]. Therefore, a new
81
combination of ICA with LS-SVM was proposed as a nonlinear calibration model for quantitative
82
analysis using spectroscopic techniques. In our early study, we had already used this method to
83
analyze irradiated rice with different irradiation doses [13].
84
In this study, we try to use Vis/NIR to find the variation characteristics of spectral for the tomato
85
bred by spaceflight mutagenesis. The objectives of this paper were (1) to study the feasibility of
86
using Vis/NIR spectroscopy to predict the different breeds (mutant M1 , mutant M2 and their
87
parent) of spaceflight tomatoes from its leafs or fruits (green or mature); (2) to compare the
88
prediction precision of using different latent variables (4-9 LVs) for least squares-support vector
89
machine (LS-SVM); and (3) to select the optimal sensitive wavelengths (SWs) for the
3
90
development of portable instruments and online monitoring for commercial discrimination of
91
different breeds of tomatoes.
92
2. Experiment
93
2.1. Vis/NIR analysis
94
Spectra were collected using a Vis/NIR scanning spectroradiometer (ASD Handheld FieldSpec,
95
Boulder, USA) in reflectance mode. Measurements were made at ambient temperature (18-20˚C) over
96
the wavelength range of 325-1075 nm at intervals of 1.5 nm. Reflection measurements were taken for
97
each tomato plant with its leafs, green and mature fruits. A Lowell pro-lamp interior light source
98
(Assembly/128930) with the Lowell pro-lamp 14.5 V Bulb (128690 tungsten halogen made in China),
99
which could be used both in the visible and near infrared regions, was placed at a distance of 300 mm
100
from the sample surface. For the leaf measurement, the reflectance spectra were taken at the center
101
position of each leaf, and the scan number for each spectrum was set to 30. For the fruit measurement,
102
the reflectance spectra were taken at the position around the equator for each fruit, and the scan number
103
for each spectrum was also set to 30. The signals were preprocessed using ViewSpec Pro V2.14
104
(Analytical Spectral Device, Inc., Boulder, CO 80301). Due to the scattering noises of the collection
105
system, only spectral data from the wavelength of 400-1000 nm were used. In the following regression,
106
the wavelengths were divided into the visible region (400-700 nm) and the near infrared region
107
(700-1000 nm). In the trial for tomato leaf samples, a total of 150 samples were prepared, which were
108
breeds of mutant M1, mutant M2 and their parent, with 50 samples for each breed. When building the
109
model, they were randomly divided into calibration sets of 105 samples (35 samples for each breed)
110
and prediction sets of 45 samples (15 samples for each breed). The same sample numbers and testing
111
scheme were applied for the green and mature fruits of tomatoes.
112
2.2 Spectral preprocessing
113
The spectra data was preprocessed before the calibration stage. The Savitzky-Golay smoothing was
114
used to reduce the noise [14-15], with a window width of 7 (3-1-3) points. The multiplicative scatter
115
correction (MSC) was used to correct additive and multiplicative effects in the spectra [16].
116
2.3. Partial least squares analysis
117
In the development of the PLS model, calibration models were built between the spectra and the
118
tomato leafs or fruits, full cross-validation was used to evaluate the quality and to prevent overfitting of
119
calibration models. The optimal number of LVs was determined by the lowest value of predicted
4
120
residual error sum of squares (PRESS). The prediction performance was evaluated by the correlation
121
coefficient (r) and root mean square error of calibration (RMSEC) or prediction (RMSEP). The ideal
122
model should have a high r value, and low RMSEC and RMSEP values. The prediction set was applied
123
to evaluate the accuracy of the models to classify tomato breeds from its leafs or fruits.
124
2.4. Independent component analysis
125
ICA was originally developed to deal with problems similar to the cocktail-party problem [17]. As
126
an effective approach to the separation of blind signal, ICA has recently attracted broad attention and
127
has been successfully used in many fields, e.g. medical signal analysis, image processing, dimension
128
reduction, fault detection and near-infrared spectral data analysis [18-23].
129
ICA is a well-established statistical signal processing technique that aims to decompose a set of
130
multivariate signals into a base of statistically independent components with the minimal loss of
131
information content. The independent components are latent variables, meaning that they cannot be
132
directly observed, and the independent component must have non-Gaussian distributions. A chief
133
explanation of noise-free ICA model could be written as the following expression:
134
x=As
(1)
135
where x denotes the recorded data matrix, s and A represent the independent components and the
136
coefficient matrix, respectively. The goal of ICA is to find a proper linear representation of
137
non-Gaussian vectors so that the estimated vectors are as independent as possible, and the mixed
138
signals can be denoted by the linear combinations of these independent components. The ICs were
139
obtained by a high-order statistic which is a much stronger condition than orthogonality. This goal is
140
equivalent to find a separating matrix W that satisfies
sˆ = Wx
141
(2)
142
where sˆ is the estimation of s.
143
The separating matrix W can be trained as the weight matrix of a two-layer feed-forward neural
144
network in which x is input and sˆ is output.
145
There are lots of algorithms for performing ICA [24-25]. Among these algorithms, the fast
146
fixed-point algorithm (FastICA) is highly efficient for performing the estimation of ICA, which was
147
developed by Hyvärinen and Oja in 2000 [26].
148 149
FastICA was chosen for ICA and carried out in Matlab 7.0 (The Math Works, Natick, USA) according to the following steps [21]: 5
150 151
152 153
(1) Choose an initial random weight vector w (0) and let k= 1, where w is an l-dimensional (weight) vector in the weight matrix W, k is an irrelevant constant.
(2) Let
w(k ) = E{xg ( w(k − 1)T x)} − E{g '( w(k − 1)T x)}w(k − 1) , where g is the T
2
first-derivative of the function G (any nonquadratic function), and E{( w x) } = 1 .
154
(3) Let w( k ) = w( k ) / || w( k ) || .
155
(4) If | w( k ) w( k − 1) | is not close enough to 1, let k = k +1 and go back to step 2. Otherwise,
T
156
output the vector w (k).
157
2.5. Least Squares-support Vector Machine.
158
LS-SVM can work with linear or non-linear regression or multivariate function estimation in a
159
relatively fast way [27-29]. It uses a linear set of equations instead of a quadratic programming (QP)
160
problem to obtain the support vectors (SVs). The details of LS-SVM algorithm could be found in the
161
literature [29-30]. The LS-SVM model can be expressed as:
162
N y( x ) = ∑ α K ( x, x ) + b k k =1 k
163
where K(x, xi) is the kernel function, xi is the input vector, αi is Lagrange multipliers called support
164
(3)
value, b is the bias term.
165
In the model development using LS-SVM and radial basis function (RBF) kernel, the optimal
166
combination of gam(γ) and sig2(σ2) parameters was selected when resulting in smaller root mean
167
square error of cross validation (RMSECV). In this study, gam(γ) were optimized in the range of 2-1-210
168
and 2-215 for sig2(σ2) with adequate increments. These ranges were chosen from previous studies where
169
the magnitude of parameters was optimized. The grid search had two steps, the first step was for a
170
crude search with a large step size, and the second step was for the specified search with a small step
171
size. The free LS-SVM toolbox (LS-SVM v 1.5, Suykens, Leuven, Belgium) was applied with
172
MATLAB 7.0 to develop the calibration models.
173 174
LVs obtained from PLS were applied as inputs of the LS-SVM models to improve the training speed and reduce the training error, which was call LV-LS-SVM model.
175
ICA was applied for the selection of SWs, which could reflect the main features of the raw
176
absorbance spectra. Wavelengths with the highest weights of each IC were selected as the SWs, and the
177
selected SWs could be used as the direct input of the LS-SVM model which was also called 6
178
SW-LS-SVM model in this case
179
3. Results and discussion
180
3.1. Reflectance spectral investigation
181
Fig. 1 shows average reflectance spectral curves from 400-1000 nm for three tomato leafs of mutant
182
M1, mutant M2, and their parent. There exists distinct diversity among different tomato breeds, mostly
183
in the visible region (400-700 nm). The breed of the parent has a distinct characteristic diversity
184
compared with mutant M1 and mutant M2 near wavebands 500-700 nm. At wavelengths about 560 nm,
185
all of them had reflectance apexes, and those of mutant M1 were higher than mutant M2 and the parent.
186
And these wavelengths (550-600 nm) are related with the chlorophyll content in tomato leaves. In the
187
near infrared region (700-1000 nm), the reflectivity of mutant M1 and mutant M2 were higher than that
188
of the parent, and this is correlation with the increasing of the nitrogen content.
189
Fig. 2 shows the average spectral curves from 400-1000nm of the three breeds’ green fruits. There
190
exists distinct diversity among different tomato breeds, mostly in the visible region (400-700 nm). The
191
parent has a lower reflectivity compared with mutants M1 and M2 throughout all wavebands. All of
192
them had similar variational trend between wavebands 500 and 700 nm, with an apex around 550-560
193
nm, and a vale around 680-690 nm. And the variation trend of the spectral curves in Fig. 1 and Fig.2
194
are similar, the reflectance of mutant M1 were higher than mutant M2 and the parent, which may
195
corresponding to higher chlorophyll content or nitrogen content.
196
Fig. 3 shows the Vis/NIR spectral curves of mature fruits with three breeds. There exists distinct
197
diversity among different tomato breeds, mostly in the near infrared region (700-1000 nm). The parent
198
has a higher reflectivity compared with mutants M1 and M2 throughout all wavebands. All of them had
199
similar variational trend between wavebands 650 and 850 nm, and there were apexes around 710-725
200
nm, 810-820 nm. It showed that the reflectance of the parent were higher than mutant M1 and mutant
201
M2 for the mature fruits of tomato. From Fig. 2 and Fig. 3, it indicated that the maturity of the tomato
202
fruit were relatively stable after the spaceflight mutagenesis.
203
3.2. PLS models
204
The PLS model was developed using the preprocessed spectra data by Savitzky-Golay smoothing
205
and MSC, and calibration models were built between the spectra and tomato leaf samples. Considering
206
the different wavelength regions, the visible region (400-700 nm) and near infrared region (700-1000
207
nm) were separated to establish two models. Different LVs were applied to build the calibration
7
208
models, and no outliers were detected in the calibration set during the development of the PLS models.
209
The results of the calibration and validation sets in two regions are shown in Table 1. The models built
210
with the visible region turned out to be the best for the prediction of tomato breeds by leaf samples. In
211
the prediction set of 45 unknown samples, the rp, RMSEP and bias for the visible model were 0.9412,
212
0.3998 and 0.1037, respectively.
213
Similar PLS models were built between the spectra and tomato samples of green and mature fruits.
214
Table 2 shows the prediction sets in the two regions for both of them. It indicated that the models built
215
with the visible region were the best for the prediction of tomato breed by green fruits, and those with
216
the near infrared region was the best for mature fruits.
217
3.3. LV-LS-SVM models
218
From the aforementioned analysis of the performance of the PLS models, LVs from the visible
219
region were used as new eigenvectors to enhance the features of spectra and reduce the dimensionality
220
of the spectra data matrix for leaf and green fruit analysis, and LVs from the near infrared region was
221
used for mature fruit analysis. Several LVs were extracted from the spectra of 150 samples (leaf and
222
fruit). Table 3 shows the explained variance of Y (tomato breeds) of the first four to nine LVs for both
223
leafs and fruits. For leaf samples, the variance of the first four LVs could explain more than 90 % of
224
the total variance, and the eighth LV only interpreted an additional 0.892 %, which contributed not so
225
much as the other aforementioned LVs. For green fruits, 5-9 LVs were necessary, and so were 4-9 LVs
226
for mature fruits.
227
Before the LS-SVM calibration model was built, three steps are crucial for the optimal input feature
228
subset, proper kernel function and the optimal kernel parameters. Firstly, the LVs (4-8 for leafs, 5-9 for
229
green fruits, and 4-9 for mature fruits) obtained from PLS analysis were used as the input data set.
230
Secondly, RBF could handle the nonlinear relationships between the spectra and target attributes.
231
Finally, two important parameters gam (γ) and sig2 (σ2) should be optimal for RBF kernel function as
232
aforementioned in multivariate analysis.
233
The performance of these models was evaluated by 45 samples in the prediction set, including leafs,
234
green and mature fruits. With a comparison of the results for the calibration and prediction sets, the
235
best performance was achieved with six LVs for leaf samples, seven LVs for green fruits, and six LVs
236
for mature fruits, respectively. The rp, RMSEP and bias for prediction sets were 0.9654, 0.3270 and
237
0.1139, 0.9742, 0.2603 and 0.00419, 0.9718, 0.2711 and 0.0247 for leafs, green and mature fruits,
8
238
respectively. The results for the calibration and prediction sets showed that the LS-SVM models
239
outperformed the PLS models.
240
3.4. SW-LS-SVM models
241
Wavelengths with the highest weights of each IC were selected as the SWs for the first four ICs, which
242
were 550-560nm, 562-574nm, 670-680nm, 705-715nm for leaf samples, 548-556nm, 559-564nm,
243
678-685nm, 962-974nm for green fruits, and 712-718nm, 720-729nm, 968-978nm, 820-830nm for
244
mature fruits. In order to evaluate the performance of SWs, they were applied as the input data matrix
245
to develop the SW-LS-SVM models. The prediction results for rp, RMSEP and bias were 0.9792,
246
0.2632 and 0.0901 for leaf samples, 0.9837, 0.2783 and 0.1758 for green fruits, and 0.9804, 0.2215 and
247
-0.0035 for mature fruits, respectively. Fig. 4 show the predicted versus reference charts, and Fig. 4(a),
248
(b) and (c) corresponding to the results for leaf, green fruits and mature fruits respectively. The solid
249
line is the regression line corresponding to the correlation between the prediction and reference values.
250
The SW-LS-SVM models achieved a better performance compared to the best LV-LS-SVM models
251
both in the calibration and prediction sets. The wavelengths around 550-580 nm was the green peak,
252
and 670-720 nm matched to the red edge. The wavelengths between 700-730 nm, 820-830nm, 960-980
253
nm are possible that it results from a 3rd overtone stretch of CH and 2nd and 3rd overtone of OH in
254
tomatoes which was referred by Rodriguez-Saona et al. (2001) in their article about rapid analysis of
255
sugars in fruit juices by FT-NIR spectroscopy [31]. In our study, from the information of section
256
reflectance spectral investigation and combined with the SWs for tomato leafs and green fruits of
257
tomato, the mutant M1 and mutant M2 have higher chlorophyll content than the parent. Compared the
258
spectral curves for fruits of tomato, we can find that the fruits have higher chlorophyll content after
259
spaceflight mutagenesis when during the green stage, while the sugar content decreased when it
260
become mature. Therefore, the selection of SWs was suitable for such situation in the present study and
261
the effectiveness of SWs was also validated. SWs represented most of the features of the original
262
spectra, and could replace the whole wavelength region to predict different breeds of tomato by leafs or
263
fruits. Furthermore, SWs might be important for the development of portable instruments and online
264
monitoring for commercial applications of different breeds of tomato by spaceflight breeding.
265
3.5. Analysis of the results
266
Compared with the above PLS models, the models with visible region (400-700 nm) turned out to be
267
the best for leaf and green fruit samples, and models with near infrared region (700-1000 nm) turned
9
268
out to be the best for mature fruit analysis, respectively. From the reflectance spectral investigation, the
269
spaceflight mutagenesis mainly changed the chlorophyll content for leaf and green fruit of tomato,
270
while changed the sugar content for mature fruit of tomato. The results showed that for tomato leafs
271
and green fruits of tomato, the mutant M1 and mutant M2 have higher chlorophyll content than the
272
parent. When the fruits become mature, the mutant M1 and mutant M2 have less sugar content than
273
parent.
274
The SW-LS-SVM models had a better performance than the PLS models, and the reason might be
275
that the LS-SVM models took the nonlinear information of the spectral data into consideration and the
276
nonlinear information had improved the prediction precision. ICs from ICA were obtained by a
277
high-order statistic which is a much stronger condition than orthogonality, so SWs selected from ICs
278
were more effective, and it could be very helpful for the development of portable instrument or
279
real-time monitoring of the tomato breeds discrimination.
280
4. Conclusions
281
The determination of tomato breeds by spaceflight breeding could be successfully performed
282
through Vis/NIR spectroscopy combined with chemometric methods of the PLS and SW-LS-SVM
283
models. In the PLS models, those with the visible region (400-700 nm) turned out to be the best for
284
leaf and green fruit samples, with rp, RMSEP and bias of 0.9412, 0.3998 and 0.1037 for leafs, 0.9687,
285
0.2841 and 0.0503 for green fruits, respectively. The models with the near infrared region (700-1000
286
nm) turned out to be the best for mature fruits, with rp, RMSEP and bias of 0.9520, 0.3710 and
287
-0.0206 for the prediction set. SWs selected from ICs were applied as the input data matrix of the
288
SW-LS-SVM models, and a two-step grid search technique was used for the optimal RBF kernel
289
parameters of (γ, σ2). The SW-LS-SVM models were developed and the best prediction performance
290
was achieved. The rp , RMSEP and bias for the prediction set were 0.9792, 0.2632 and 0.0901 for leaf
291
samples, 0.9837, 0.2783 and 0.1758 for green fruits, 0.9804, 0.2215 and -0.0035 for mature fruits
292
respectively, which were better than those of the PLS models. The overall results indicated that
293
Vis/NIR spectroscopy had the capability to determine the breeds of tomatoes by spaceflight breeding.
294
ICA was a powerful way for the selection of sensitive wavelengths, and Vis/NIR spectroscopy
295
combined with the LS-SVM models had powerful capability to discrimination tomato breeds. Further
296
interpretation of the input data selection, parameter optimization and results explanation would be
297
needed in order to improve the calibration generalization and stability.
10
298
Acknowledgement
299
The research presented in this paper was partially supported by the Natural Science Foundation of
300
Zhejiang province, China (Q14C130002), the Scientific Research Foundation for the Returned
301
Overseas Chinese Scholars, State Education Ministry, and Fundamental Research Funds for the
302
Central Universities (2013QNA6011).
303 304
References
305 306
[1] Y.L. Wang, Q. Yang, D. Yang, A.M. Yang, Chinese Journal of Light Scattering (in Chinese) 17(2006) 412-415.
307
[2] X.H. Guo, Y.Y. Zhu, Y. Guan, Chinese J Spectrosc. Lab. 27(2010) 2311-2313.
308
[3] J.H. Shi, Z.L. Chen, Y.N. Shao, Y. He, P. Feng, J.J. Zhu, Spectrosc. Spect. Anal. 31(2011) 387-389.
309
[4] B.G. Osborne, T. Fearn, P.H. Hindle, Practical NIR Spectroscopy with Applications in Food and
310
Beverage Analysis, 2nd Ed., Longman Scientific and Technical, Harlow, Essex, U.K., 1993.
311
[5] Y.N. Shao, Y. He, A.H. Gomez, A.G. Pereir, Z.J. Qiu, Y. Zhang, J. Food Eng. 81 (2007) 672-678.
312
[6] H.R. Xu, Y.B. Ying, X.P. Fu, S.P. Zhu, Biosyst. Eng. 96 (2007) 447-454.
313
[7] A.M.K. Pedro, M.M.C. Ferreira, Anal. Chem. 77(2005) 2505-2511.
314
[8] M.M. Kamil, G.F. Mohamed, M.S. Shaheen, J Am. Sci. 7(2011) 559-572.
315
[9] P. Sirisomboon, M. Tanaka, T. Kojima, P. Williams, J. Food Eng. 112(2012) 218-226.
316
[10] J.J. Workman JR, P.R. Mobley, B.R. Kowalski, R. Bro, Appl. Spectrosc. Rev. 31 (1996) 73-124.
317
[11] J.A.K. Suykens, J. Vanderwalle, Neural Process. Lett. 9 (1999) 293-300.
318
[12] J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, Least Squares Support
319
Vector Machines, World Scientific, Singapore, 2002.
320
[13] Y.N. Shao, Y. He, C. Q. Wu, J. Agric. Food Chem. 56 (2008) 3960-3965.
321
[14] A. Savitzky, M.J.E. Golay, Anal. Chem. 36 (1964) 1627-1639.
322
[15] P.A. Gorry, Anal. Chem. 62 (1990) 570-573.
323
[16] I.S. Helland, T. Naes, T. Isaksson, Chemom. Intell. Lab. Syst. 29 (1995) 233-241.
324
[17] S. Amari, A. Cichocki, H.H. Yang, Adv. Neural Inf. Process. Syst. 8 (1996) 757-763.
325
[18] A. Hyvarinen, Neural Comput. 11 (1999) 1739-1768.
326
[19] P.O. Hoyer, A. Hyvarinen, Netw.-Comput. Neural Syst. 11 (2000) 191-210.
327
[20] A. Hyvarinen, P.O. Hoyer, Neural Comput. 12 (2000) 1705-1720.
328
[21] J. Chen, X.Z. Wang, J. Chem. Inf. Comput. Sci. 41 (2001) 992-1001. 11
329
[22] X. Bi, T.H. Li, L. Wu, Chem. J. Chinese Universities 25 (2004) 1023-1027.
330
[23] X.G. Shao, G.Q. Wang, S.F. Wang, Q.D. Su, Anal. Chem. 76 (2004) 5143-5148.
331
[24] A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis, New York, Wiley, 2001.
332
[25] T.W. Lee, Independent Component Analysis: Theory and Application. Boston, MA, Kluwer, 1998.
333
[26] A. Hyvarinen, E. Oja, Neural Netw. 13 (2000) 411-430.
334
[27] J.A.K. Suykens, J. Vanderwalle, Neural Process. Lett. 9 (1999) 293-300.
335
[28] A. Borin, M.F. Ferrao, C. Mello, D.A. Maretto, R.J. Poppi, Anal. Chim. Acta 579 (2006) 25-32.
336
[29] Q.S. Chen, J.W. Zhao, C.H. Fang, D.M. Wang, Spectroc. Acta Pt. A-Molec. Biomolec. Spectr. 66
337
(2007) 568-574.
338
[30] H. Guo, H.P. Liu, L. Wang, Journal of System Simulation (in Chinese) 18 (2006) 2033-2036, 2051.
339
[31] L.E. Rodriguez-Saona, F.S. Fry, M.A. McLaughlin, E.M. Calvey, Carbohydr. Res. 336 (2001)
340
63-74.
12
341
Figure captions
342
Fig. 1. The average reflectance spectral curves from 400-1000 nm for three tomato leafs of mutant
343
M1, mutant M2, and the parent.
344
Fig. 2. The average reflectance spectral curves from 400-1000 nm for green fruits of mutant M1,
345
mutant M2, and the parent.
346
Fig. 3. The average reflectance spectral curves from 400-1000 nm for mature fruits of mutant M1,
347
mutant M2, and the parent.
348
Fig. 4. (a, b, c) The predicted versus reference charts by SW-LS-SVM.
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 13
382 383
384 385 386
Fig. 1. The average reflectance spectral curves from 400-1000 nm for three tomato leafs of mutant
387
M1, mutant M2, and the parent.
388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 14
406 407
Fig. 2. The average reflectance spectral curves from 400-1000 nm for green fruits of mutant M1,
408
mutant M2, and the parent.
409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427
15
428 429 430
Fig. 3. The average reflectance spectral curves from 400-1000 nm for mature fruits of mutant M1,
431
mutant M2, and the parent.
432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449
16
450 451
(a)
452 453
(b)
454 455
(c)
456
Fig. 4. (a, b, c) The predicted versus reference charts by SW-LS-SVM.
457
17
458
Table 1. The results of calibration and validation sets in two regions for tomato leaf Data set Calibration
Validation
459 460 461 462 463 464 465 466
Correlation coefficients
RMSEC (RMSECV)
Bias
400-700
0.9941
0.1211
4.015e-06
700-1000
0.9818
0.2123
-4.095e-06
400-700
0.9785
0.2352
0.0080
700-1000
0.9488
0.3532
-0.0050
Table 2. The prediction sets in two regions for both green and mature fruit Correlation coefficient(rp)
RMSEP
Bias
Green fruit
400-700
0.9687
0.2841
0.0503
Mature fruit
700-1000 400-700 700-1000
0.9283 0.8433 0.9520
0.4227 0.6030 0.3710
0.0521 0.0492 -0.0206
467 468 469 470 471 472 473 474 475
Table 3. The explained variance of Y of the first four to nine LVs for both leaf and fruit Parameters LVsa Leaf sample b
EV (%) a
5
6
7
8
90.412
93.568
95.339
96.984
97.876
90.159
93.514
95.631
97.162
97.804
90.528
93.591
95.841
97.024
98.034
98.519
Green fruit Mature fruit
476
4
LV: latent variable; b EV: explained variance.
477
18
9
478
Highlights
479
Vis/NIR spectral was used to discriminate tomato breeds by spaceflight breeding from its leaf or fruit.
480
Tomato breeds were divided into mutations M1, mutations M2 and its parent
481
The SW-LS-SVM were better than PLS and LV-LS-SVM models to predict the tomato breeds
482
19