S1386-1425(15)00028-1 http://dx.doi.org/10.1016/j.saa.2015.01.018 SAA 13186

To appear in:

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy

Received Date: Revised Date: Accepted Date:

13 October 2014 5 January 2015 11 January 2015

Please cite this article as: Y. Shao, C. Xie, L. Jiang, J. Shi, J. Zhu, Y. He, Discrimination of tomatoes bred by spaceflight mutagenesis using visible/near infrared spectroscopy and chemometrics, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (2015), doi: http://dx.doi.org/10.1016/j.saa.2015.01.018

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Discrimination of tomatoes bred by spaceflight mutagenesis using visible/near

1

infrared spectroscopy and chemometrics

2

Yongni Shaoa, Chuanqi Xiea, Linjun Jianga, Jiahui Shib, Jiajin Zhua, Yong He a*

3 4

a

College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China; b

5

Zhejiang Sports Science Research Institute, Hangzhou, P.R. China;

6

Corresponding author*: Yong He, Tel: +86-571-88982143, Fax: +86-571-88982143, Email:

7

[email protected]

8

ABSTRACT

9

Visible/near infrared spectroscopy (Vis/NIR) based on sensitive wavelengths (SWs) and

10

chemometrics was proposed to discriminate different tomatoes bred by spaceflight mutagenesis from

11

their leafs or fruits (green or mature). The tomato breeds were mutant M1, M2 and their parent. Partial

12

least squares (PLS) analysis and least squares-support vector machine (LS-SVM) were implemented

13

for calibration models. PLS analysis was implemented for calibration models with different

14

wavebands including the visible region (400-700 nm) and the near infrared region (700-1000 nm).

15

The best PLS models were achieved in the visible region for the leaf and green fruit samples and in

16

the near infrared region for the mature fruit samples. Furthermore, different latent variables (4-8 LVs

17

for leafs, 5-9 LVs for green fruits, and 4-9 LVs for mature fruits) were used as inputs of LS-SVM to

18

develop the LV-LS-SVM models with the grid search technique and radial basis function (RBF)

19

kernel. The optimal LV-LS-SVM models were achieved with six LVs for the leaf samples, seven LVs

20

for green fruits, and six LVs for mature fruits, respectively, and they outperformed the PLS models.

21

Moreover, independent component analysis (ICA) was executed to select several SWs based on

22

loading weights. The optimal LS-SVM model was achieved with SWs of 550-560 nm, 562-574 nm,

23

670-680 nm and 705-715 nm for the leaf samples; 548-556 nm, 559-564 nm, 678-685 nm and

24

962-974 nm for the green fruit samples; and 712-718 nm, 720-729 nm, 968-978 nm and 820-830 nm

25

for the mature fruit samples. All of them had better performance than PLS and LV-LS-SVM, with the

26

parameters of correlation coefficient (rp), root mean square error of prediction (RMSEP) and bias of

27

0.9792, 0.2632 and 0.0901 based on leaf discrimination, 0.9837, 0.2783 and 0.1758 based on green

28

fruit discrimination, 0.9804, 0.2215 and -0.0035 based on mature fruit discrimination, respectively.

29

The overall results indicated that ICA was an effective way for the selection of SWs, and the Vis/NIR

1

30

combined with LS-SVM models had the capability to predict the different breeds (mutant M1, mutant

31

M2 and their parent) of tomatoes from leafs and fruits.

32

Keywords: Vis/near infrared spectroscopy; tomato; independent component analysis; partial least

33

squares analysis; least squares-support vector machine

34

1. Introduction

35

Over the past 40 years, breeding by spaceflight mutagenesis has become a new trend of high

36

technology agriculture. China is one of the three countries (China, USA and Russia) which master the

37

spaceflight return technology. From the year 1987, China has successfully conducted spaceflight tests

38

carrying plant seeds. By the effort of researchers, excellent breeds were obtained, demonstrating the

39

obvious advantages of this technique. Spaceflight mutagenesis is a new breeding technology that

40

combines spaceflight, biology and agriculture breeding. Tomato is a kind of vegetable which is also

41

called “fruit”. It is rich in nutrient, such as Vitamin C, organic acids, mineral, especially, the lycopene

42

content. It is very popular with consumers because of its nutrient value. With the improvement of our

43

living standards, the breeding and quality of tomato are attracted to many scientists to do the research

44

on. Spaceflight mutagenesis technology has been applied to genetic improvement of process operations

45

of varieties of crops and to realize the genetic improvement of high efficiency. At present, researches

46

on the mechanism of spaceflight mutagenesis breeding are still in a preliminary stage.

47

Wang et al. analyzed tomato seeds by space flight mutagenesis using FT-IR. The results showed that

48

the space environment could be used to apply the stretching vibration of C-O on the structure of

49

carbohydrates in the space tomato seeds [1]. Guo et al. used X-ray fluorescence analysis to measure

50

and analyze the elements of the fourth generation radix isatidis with space flight mutagenesis breeding

51

[2]. In our early study, we used an analysis method of visible/near infrared spectroscopy (Vis/NIRS)

52

combined with chemometrics to discriminate breeds of tomato by spaceflight mutagenesis from the

53

green fruits [3]. In this study, we extend our samples to tomato leafs and fruits (green or mature), and

54

try to use Vis/NIR to find the variation characteristics of spectral, and compare the discrimination

55

capacity between them.

56

Near infrared spectroscopy (NIR), utilizing the spectral range from 780 to 2500 nm, has served for

57

the past 30 years as a method to predict the quality of different foods and agricultural products due to

58

its speedy analysis, little sample preparation requirement and low cost [4]. In our previous study, we

59

used Vis/NIRS to analyze the physiological properties of tomatoes, including firmness, soluble solids

2

60

content and the pH value [5]. Xu et al. used near infrared spectroscopy in detecting minor damages to

61

tomato leafs, and showed that the sensitive bands of 1450 and 1900 nm modeled with severity level

62

provided the highest correlation coefficient [6]. Pedro and Ferreira adopted near infrared spectroscopy

63

to estimate solids and carotenoids in tomatoes, and the best models presented prediction results with

64

RMSEP and r value of 0.4157, 0.9998 for total solids, 0.6333, 0.9996 for soluble solids, 21.5779,

65

0.9996 for lycopene, and 0.7296, 0.9981 for β-carotene [7]. Kamil et al. also evaluated tomato products

66

(lycopene, β-carotene, starch, allura red pigment and paprika) based on the Fourier transformer infrared

67

spectroscopy [8]. Sirisomboon et al. used near infrared spectroscopy to classify the maturity level and

68

to predict textural properties of tomato variety “Momotaro”. It showed that for the tomato maturity

69

classification, the distinguish ability of 100.00% was obtained for red and pink tomatoes, and 96.85%

70

for mature green tomatoes [9].

71

Various calibration methods have been used to relate near-infrared spectra with measured properties

72

of materials. Principal components regression (PCR), partial least squares (PLS), multiple linear

73

regression (MLR) and artificial neural networks (ANN) are the most used multivariate calibration

74

techniques for NIRS [10]. PLS was usually considered for a large number of applications in food

75

analysis and was widely used in multivariate calibration. Because it takes the advantage of the

76

correlation relationship that already exists between the spectral data and the constituent concentrations.

77

However PLS is based on linear models and unsatisfactory results may be obtained when non-linearity

78

is present.

79

Least-squares support vector machine (LS-SVM) could handle the linear and nonlinear

80

relationships between the spectra and response chemical constituents [11-12]. Therefore, a new

81

combination of ICA with LS-SVM was proposed as a nonlinear calibration model for quantitative

82

analysis using spectroscopic techniques. In our early study, we had already used this method to

83

analyze irradiated rice with different irradiation doses [13].

84

In this study, we try to use Vis/NIR to find the variation characteristics of spectral for the tomato

85

bred by spaceflight mutagenesis. The objectives of this paper were (1) to study the feasibility of

86

using Vis/NIR spectroscopy to predict the different breeds (mutant M1 , mutant M2 and their

87

parent) of spaceflight tomatoes from its leafs or fruits (green or mature); (2) to compare the

88

prediction precision of using different latent variables (4-9 LVs) for least squares-support vector

89

machine (LS-SVM); and (3) to select the optimal sensitive wavelengths (SWs) for the

3

90

development of portable instruments and online monitoring for commercial discrimination of

91

different breeds of tomatoes.

92

2. Experiment

93

2.1. Vis/NIR analysis

94

Spectra were collected using a Vis/NIR scanning spectroradiometer (ASD Handheld FieldSpec,

95

Boulder, USA) in reflectance mode. Measurements were made at ambient temperature (18-20˚C) over

96

the wavelength range of 325-1075 nm at intervals of 1.5 nm. Reflection measurements were taken for

97

each tomato plant with its leafs, green and mature fruits. A Lowell pro-lamp interior light source

98

(Assembly/128930) with the Lowell pro-lamp 14.5 V Bulb (128690 tungsten halogen made in China),

99

which could be used both in the visible and near infrared regions, was placed at a distance of 300 mm

100

from the sample surface. For the leaf measurement, the reflectance spectra were taken at the center

101

position of each leaf, and the scan number for each spectrum was set to 30. For the fruit measurement,

102

the reflectance spectra were taken at the position around the equator for each fruit, and the scan number

103

for each spectrum was also set to 30. The signals were preprocessed using ViewSpec Pro V2.14

104

(Analytical Spectral Device, Inc., Boulder, CO 80301). Due to the scattering noises of the collection

105

system, only spectral data from the wavelength of 400-1000 nm were used. In the following regression,

106

the wavelengths were divided into the visible region (400-700 nm) and the near infrared region

107

(700-1000 nm). In the trial for tomato leaf samples, a total of 150 samples were prepared, which were

108

breeds of mutant M1, mutant M2 and their parent, with 50 samples for each breed. When building the

109

model, they were randomly divided into calibration sets of 105 samples (35 samples for each breed)

110

and prediction sets of 45 samples (15 samples for each breed). The same sample numbers and testing

111

scheme were applied for the green and mature fruits of tomatoes.

112

2.2 Spectral preprocessing

113

The spectra data was preprocessed before the calibration stage. The Savitzky-Golay smoothing was

114

used to reduce the noise [14-15], with a window width of 7 (3-1-3) points. The multiplicative scatter

115

correction (MSC) was used to correct additive and multiplicative effects in the spectra [16].

116

2.3. Partial least squares analysis

117

In the development of the PLS model, calibration models were built between the spectra and the

118

tomato leafs or fruits, full cross-validation was used to evaluate the quality and to prevent overfitting of

119

calibration models. The optimal number of LVs was determined by the lowest value of predicted

4

120

residual error sum of squares (PRESS). The prediction performance was evaluated by the correlation

121

coefficient (r) and root mean square error of calibration (RMSEC) or prediction (RMSEP). The ideal

122

model should have a high r value, and low RMSEC and RMSEP values. The prediction set was applied

123

to evaluate the accuracy of the models to classify tomato breeds from its leafs or fruits.

124

2.4. Independent component analysis

125

ICA was originally developed to deal with problems similar to the cocktail-party problem [17]. As

126

an effective approach to the separation of blind signal, ICA has recently attracted broad attention and

127

has been successfully used in many fields, e.g. medical signal analysis, image processing, dimension

128

reduction, fault detection and near-infrared spectral data analysis [18-23].

129

ICA is a well-established statistical signal processing technique that aims to decompose a set of

130

multivariate signals into a base of statistically independent components with the minimal loss of

131

information content. The independent components are latent variables, meaning that they cannot be

132

directly observed, and the independent component must have non-Gaussian distributions. A chief

133

explanation of noise-free ICA model could be written as the following expression:

134

x=As

(1)

135

where x denotes the recorded data matrix, s and A represent the independent components and the

136

coefficient matrix, respectively. The goal of ICA is to find a proper linear representation of

137

non-Gaussian vectors so that the estimated vectors are as independent as possible, and the mixed

138

signals can be denoted by the linear combinations of these independent components. The ICs were

139

obtained by a high-order statistic which is a much stronger condition than orthogonality. This goal is

140

equivalent to find a separating matrix W that satisfies

sˆ = Wx

141

(2)

142

where sˆ is the estimation of s.

143

The separating matrix W can be trained as the weight matrix of a two-layer feed-forward neural

144

network in which x is input and sˆ is output.

145

There are lots of algorithms for performing ICA [24-25]. Among these algorithms, the fast

146

fixed-point algorithm (FastICA) is highly efficient for performing the estimation of ICA, which was

147

developed by Hyvärinen and Oja in 2000 [26].

148 149

FastICA was chosen for ICA and carried out in Matlab 7.0 (The Math Works, Natick, USA) according to the following steps [21]: 5

150 151

152 153

(1) Choose an initial random weight vector w (0) and let k= 1, where w is an l-dimensional (weight) vector in the weight matrix W, k is an irrelevant constant.

(2) Let

w(k ) = E{xg ( w(k − 1)T x)} − E{g '( w(k − 1)T x)}w(k − 1) , where g is the T

2

first-derivative of the function G (any nonquadratic function), and E{( w x) } = 1 .

154

(3) Let w( k ) = w( k ) / || w( k ) || .

155

(4) If | w( k ) w( k − 1) | is not close enough to 1, let k = k +1 and go back to step 2. Otherwise,

T

156

output the vector w (k).

157

2.5. Least Squares-support Vector Machine.

158

LS-SVM can work with linear or non-linear regression or multivariate function estimation in a

159

relatively fast way [27-29]. It uses a linear set of equations instead of a quadratic programming (QP)

160

problem to obtain the support vectors (SVs). The details of LS-SVM algorithm could be found in the

161

literature [29-30]. The LS-SVM model can be expressed as:

162

N y( x ) = ∑ α K ( x, x ) + b k k =1 k

163

where K(x, xi) is the kernel function, xi is the input vector, αi is Lagrange multipliers called support

164

(3)

value, b is the bias term.

165

In the model development using LS-SVM and radial basis function (RBF) kernel, the optimal

166

combination of gam(γ) and sig2(σ2) parameters was selected when resulting in smaller root mean

167

square error of cross validation (RMSECV). In this study, gam(γ) were optimized in the range of 2-1-210

168

and 2-215 for sig2(σ2) with adequate increments. These ranges were chosen from previous studies where

169

the magnitude of parameters was optimized. The grid search had two steps, the first step was for a

170

crude search with a large step size, and the second step was for the specified search with a small step

171

size. The free LS-SVM toolbox (LS-SVM v 1.5, Suykens, Leuven, Belgium) was applied with

172

MATLAB 7.0 to develop the calibration models.

173 174

LVs obtained from PLS were applied as inputs of the LS-SVM models to improve the training speed and reduce the training error, which was call LV-LS-SVM model.

175

ICA was applied for the selection of SWs, which could reflect the main features of the raw

176

absorbance spectra. Wavelengths with the highest weights of each IC were selected as the SWs, and the

177

selected SWs could be used as the direct input of the LS-SVM model which was also called 6

178

SW-LS-SVM model in this case

179

3. Results and discussion

180

3.1. Reflectance spectral investigation

181

Fig. 1 shows average reflectance spectral curves from 400-1000 nm for three tomato leafs of mutant

182

M1, mutant M2, and their parent. There exists distinct diversity among different tomato breeds, mostly

183

in the visible region (400-700 nm). The breed of the parent has a distinct characteristic diversity

184

compared with mutant M1 and mutant M2 near wavebands 500-700 nm. At wavelengths about 560 nm,

185

all of them had reflectance apexes, and those of mutant M1 were higher than mutant M2 and the parent.

186

And these wavelengths (550-600 nm) are related with the chlorophyll content in tomato leaves. In the

187

near infrared region (700-1000 nm), the reflectivity of mutant M1 and mutant M2 were higher than that

188

of the parent, and this is correlation with the increasing of the nitrogen content.

189

Fig. 2 shows the average spectral curves from 400-1000nm of the three breeds’ green fruits. There

190

exists distinct diversity among different tomato breeds, mostly in the visible region (400-700 nm). The

191

parent has a lower reflectivity compared with mutants M1 and M2 throughout all wavebands. All of

192

them had similar variational trend between wavebands 500 and 700 nm, with an apex around 550-560

193

nm, and a vale around 680-690 nm. And the variation trend of the spectral curves in Fig. 1 and Fig.2

194

are similar, the reflectance of mutant M1 were higher than mutant M2 and the parent, which may

195

corresponding to higher chlorophyll content or nitrogen content.

196

Fig. 3 shows the Vis/NIR spectral curves of mature fruits with three breeds. There exists distinct

197

diversity among different tomato breeds, mostly in the near infrared region (700-1000 nm). The parent

198

has a higher reflectivity compared with mutants M1 and M2 throughout all wavebands. All of them had

199

similar variational trend between wavebands 650 and 850 nm, and there were apexes around 710-725

200

nm, 810-820 nm. It showed that the reflectance of the parent were higher than mutant M1 and mutant

201

M2 for the mature fruits of tomato. From Fig. 2 and Fig. 3, it indicated that the maturity of the tomato

202

fruit were relatively stable after the spaceflight mutagenesis.

203

3.2. PLS models

204

The PLS model was developed using the preprocessed spectra data by Savitzky-Golay smoothing

205

and MSC, and calibration models were built between the spectra and tomato leaf samples. Considering

206

the different wavelength regions, the visible region (400-700 nm) and near infrared region (700-1000

207

nm) were separated to establish two models. Different LVs were applied to build the calibration

7

208

models, and no outliers were detected in the calibration set during the development of the PLS models.

209

The results of the calibration and validation sets in two regions are shown in Table 1. The models built

210

with the visible region turned out to be the best for the prediction of tomato breeds by leaf samples. In

211

the prediction set of 45 unknown samples, the rp, RMSEP and bias for the visible model were 0.9412,

212

0.3998 and 0.1037, respectively.

213

Similar PLS models were built between the spectra and tomato samples of green and mature fruits.

214

Table 2 shows the prediction sets in the two regions for both of them. It indicated that the models built

215

with the visible region were the best for the prediction of tomato breed by green fruits, and those with

216

the near infrared region was the best for mature fruits.

217

3.3. LV-LS-SVM models

218

From the aforementioned analysis of the performance of the PLS models, LVs from the visible

219

region were used as new eigenvectors to enhance the features of spectra and reduce the dimensionality

220

of the spectra data matrix for leaf and green fruit analysis, and LVs from the near infrared region was

221

used for mature fruit analysis. Several LVs were extracted from the spectra of 150 samples (leaf and

222

fruit). Table 3 shows the explained variance of Y (tomato breeds) of the first four to nine LVs for both

223

leafs and fruits. For leaf samples, the variance of the first four LVs could explain more than 90 % of

224

the total variance, and the eighth LV only interpreted an additional 0.892 %, which contributed not so

225

much as the other aforementioned LVs. For green fruits, 5-9 LVs were necessary, and so were 4-9 LVs

226

for mature fruits.

227

Before the LS-SVM calibration model was built, three steps are crucial for the optimal input feature

228

subset, proper kernel function and the optimal kernel parameters. Firstly, the LVs (4-8 for leafs, 5-9 for

229

green fruits, and 4-9 for mature fruits) obtained from PLS analysis were used as the input data set.

230

Secondly, RBF could handle the nonlinear relationships between the spectra and target attributes.

231

Finally, two important parameters gam (γ) and sig2 (σ2) should be optimal for RBF kernel function as

232

aforementioned in multivariate analysis.

233

The performance of these models was evaluated by 45 samples in the prediction set, including leafs,

234

green and mature fruits. With a comparison of the results for the calibration and prediction sets, the

235

best performance was achieved with six LVs for leaf samples, seven LVs for green fruits, and six LVs

236

for mature fruits, respectively. The rp, RMSEP and bias for prediction sets were 0.9654, 0.3270 and

237

0.1139, 0.9742, 0.2603 and 0.00419, 0.9718, 0.2711 and 0.0247 for leafs, green and mature fruits,

8

238

respectively. The results for the calibration and prediction sets showed that the LS-SVM models

239

outperformed the PLS models.

240

3.4. SW-LS-SVM models

241

Wavelengths with the highest weights of each IC were selected as the SWs for the first four ICs, which

242

were 550-560nm, 562-574nm, 670-680nm, 705-715nm for leaf samples, 548-556nm, 559-564nm,

243

678-685nm, 962-974nm for green fruits, and 712-718nm, 720-729nm, 968-978nm, 820-830nm for

244

mature fruits. In order to evaluate the performance of SWs, they were applied as the input data matrix

245

to develop the SW-LS-SVM models. The prediction results for rp, RMSEP and bias were 0.9792,

246

0.2632 and 0.0901 for leaf samples, 0.9837, 0.2783 and 0.1758 for green fruits, and 0.9804, 0.2215 and

247

-0.0035 for mature fruits, respectively. Fig. 4 show the predicted versus reference charts, and Fig. 4(a),

248

(b) and (c) corresponding to the results for leaf, green fruits and mature fruits respectively. The solid

249

line is the regression line corresponding to the correlation between the prediction and reference values.

250

The SW-LS-SVM models achieved a better performance compared to the best LV-LS-SVM models

251

both in the calibration and prediction sets. The wavelengths around 550-580 nm was the green peak,

252

and 670-720 nm matched to the red edge. The wavelengths between 700-730 nm, 820-830nm, 960-980

253

nm are possible that it results from a 3rd overtone stretch of CH and 2nd and 3rd overtone of OH in

254

tomatoes which was referred by Rodriguez-Saona et al. (2001) in their article about rapid analysis of

255

sugars in fruit juices by FT-NIR spectroscopy [31]. In our study, from the information of section

256

reflectance spectral investigation and combined with the SWs for tomato leafs and green fruits of

257

tomato, the mutant M1 and mutant M2 have higher chlorophyll content than the parent. Compared the

258

spectral curves for fruits of tomato, we can find that the fruits have higher chlorophyll content after

259

spaceflight mutagenesis when during the green stage, while the sugar content decreased when it

260

become mature. Therefore, the selection of SWs was suitable for such situation in the present study and

261

the effectiveness of SWs was also validated. SWs represented most of the features of the original

262

spectra, and could replace the whole wavelength region to predict different breeds of tomato by leafs or

263

fruits. Furthermore, SWs might be important for the development of portable instruments and online

264

monitoring for commercial applications of different breeds of tomato by spaceflight breeding.

265

3.5. Analysis of the results

266

Compared with the above PLS models, the models with visible region (400-700 nm) turned out to be

267

the best for leaf and green fruit samples, and models with near infrared region (700-1000 nm) turned

9

268

out to be the best for mature fruit analysis, respectively. From the reflectance spectral investigation, the

269

spaceflight mutagenesis mainly changed the chlorophyll content for leaf and green fruit of tomato,

270

while changed the sugar content for mature fruit of tomato. The results showed that for tomato leafs

271

and green fruits of tomato, the mutant M1 and mutant M2 have higher chlorophyll content than the

272

parent. When the fruits become mature, the mutant M1 and mutant M2 have less sugar content than

273

parent.

274

The SW-LS-SVM models had a better performance than the PLS models, and the reason might be

275

that the LS-SVM models took the nonlinear information of the spectral data into consideration and the

276

nonlinear information had improved the prediction precision. ICs from ICA were obtained by a

277

high-order statistic which is a much stronger condition than orthogonality, so SWs selected from ICs

278

were more effective, and it could be very helpful for the development of portable instrument or

279

real-time monitoring of the tomato breeds discrimination.

280

4. Conclusions

281

The determination of tomato breeds by spaceflight breeding could be successfully performed

282

through Vis/NIR spectroscopy combined with chemometric methods of the PLS and SW-LS-SVM

283

models. In the PLS models, those with the visible region (400-700 nm) turned out to be the best for

284

leaf and green fruit samples, with rp, RMSEP and bias of 0.9412, 0.3998 and 0.1037 for leafs, 0.9687,

285

0.2841 and 0.0503 for green fruits, respectively. The models with the near infrared region (700-1000

286

nm) turned out to be the best for mature fruits, with rp, RMSEP and bias of 0.9520, 0.3710 and

287

-0.0206 for the prediction set. SWs selected from ICs were applied as the input data matrix of the

288

SW-LS-SVM models, and a two-step grid search technique was used for the optimal RBF kernel

289

parameters of (γ, σ2). The SW-LS-SVM models were developed and the best prediction performance

290

was achieved. The rp , RMSEP and bias for the prediction set were 0.9792, 0.2632 and 0.0901 for leaf

291

samples, 0.9837, 0.2783 and 0.1758 for green fruits, 0.9804, 0.2215 and -0.0035 for mature fruits

292

respectively, which were better than those of the PLS models. The overall results indicated that

293

Vis/NIR spectroscopy had the capability to determine the breeds of tomatoes by spaceflight breeding.

294

ICA was a powerful way for the selection of sensitive wavelengths, and Vis/NIR spectroscopy

295

combined with the LS-SVM models had powerful capability to discrimination tomato breeds. Further

296

interpretation of the input data selection, parameter optimization and results explanation would be

297

needed in order to improve the calibration generalization and stability.

10

298

Acknowledgement

299

The research presented in this paper was partially supported by the Natural Science Foundation of

300

Zhejiang province, China (Q14C130002), the Scientific Research Foundation for the Returned

301

Overseas Chinese Scholars, State Education Ministry, and Fundamental Research Funds for the

302

Central Universities (2013QNA6011).

303 304

References

305 306

[1] Y.L. Wang, Q. Yang, D. Yang, A.M. Yang, Chinese Journal of Light Scattering (in Chinese) 17(2006) 412-415.

307

[2] X.H. Guo, Y.Y. Zhu, Y. Guan, Chinese J Spectrosc. Lab. 27(2010) 2311-2313.

308

[3] J.H. Shi, Z.L. Chen, Y.N. Shao, Y. He, P. Feng, J.J. Zhu, Spectrosc. Spect. Anal. 31(2011) 387-389.

309

[4] B.G. Osborne, T. Fearn, P.H. Hindle, Practical NIR Spectroscopy with Applications in Food and

310

Beverage Analysis, 2nd Ed., Longman Scientific and Technical, Harlow, Essex, U.K., 1993.

311

[5] Y.N. Shao, Y. He, A.H. Gomez, A.G. Pereir, Z.J. Qiu, Y. Zhang, J. Food Eng. 81 (2007) 672-678.

312

[6] H.R. Xu, Y.B. Ying, X.P. Fu, S.P. Zhu, Biosyst. Eng. 96 (2007) 447-454.

313

[7] A.M.K. Pedro, M.M.C. Ferreira, Anal. Chem. 77(2005) 2505-2511.

314

[8] M.M. Kamil, G.F. Mohamed, M.S. Shaheen, J Am. Sci. 7(2011) 559-572.

315

[9] P. Sirisomboon, M. Tanaka, T. Kojima, P. Williams, J. Food Eng. 112(2012) 218-226.

316

[10] J.J. Workman JR, P.R. Mobley, B.R. Kowalski, R. Bro, Appl. Spectrosc. Rev. 31 (1996) 73-124.

317

[11] J.A.K. Suykens, J. Vanderwalle, Neural Process. Lett. 9 (1999) 293-300.

318

[12] J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, Least Squares Support

319

Vector Machines, World Scientific, Singapore, 2002.

320

[13] Y.N. Shao, Y. He, C. Q. Wu, J. Agric. Food Chem. 56 (2008) 3960-3965.

321

[14] A. Savitzky, M.J.E. Golay, Anal. Chem. 36 (1964) 1627-1639.

322

[15] P.A. Gorry, Anal. Chem. 62 (1990) 570-573.

323

[16] I.S. Helland, T. Naes, T. Isaksson, Chemom. Intell. Lab. Syst. 29 (1995) 233-241.

324

[17] S. Amari, A. Cichocki, H.H. Yang, Adv. Neural Inf. Process. Syst. 8 (1996) 757-763.

325

[18] A. Hyvarinen, Neural Comput. 11 (1999) 1739-1768.

326

[19] P.O. Hoyer, A. Hyvarinen, Netw.-Comput. Neural Syst. 11 (2000) 191-210.

327

[20] A. Hyvarinen, P.O. Hoyer, Neural Comput. 12 (2000) 1705-1720.

328

[21] J. Chen, X.Z. Wang, J. Chem. Inf. Comput. Sci. 41 (2001) 992-1001. 11

329

[22] X. Bi, T.H. Li, L. Wu, Chem. J. Chinese Universities 25 (2004) 1023-1027.

330

[23] X.G. Shao, G.Q. Wang, S.F. Wang, Q.D. Su, Anal. Chem. 76 (2004) 5143-5148.

331

[24] A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis, New York, Wiley, 2001.

332

[25] T.W. Lee, Independent Component Analysis: Theory and Application. Boston, MA, Kluwer, 1998.

333

[26] A. Hyvarinen, E. Oja, Neural Netw. 13 (2000) 411-430.

334

[27] J.A.K. Suykens, J. Vanderwalle, Neural Process. Lett. 9 (1999) 293-300.

335

[28] A. Borin, M.F. Ferrao, C. Mello, D.A. Maretto, R.J. Poppi, Anal. Chim. Acta 579 (2006) 25-32.

336

[29] Q.S. Chen, J.W. Zhao, C.H. Fang, D.M. Wang, Spectroc. Acta Pt. A-Molec. Biomolec. Spectr. 66

337

(2007) 568-574.

338

[30] H. Guo, H.P. Liu, L. Wang, Journal of System Simulation (in Chinese) 18 (2006) 2033-2036, 2051.

339

[31] L.E. Rodriguez-Saona, F.S. Fry, M.A. McLaughlin, E.M. Calvey, Carbohydr. Res. 336 (2001)

340

63-74.

12

341

Figure captions

342

Fig. 1. The average reflectance spectral curves from 400-1000 nm for three tomato leafs of mutant

343

M1, mutant M2, and the parent.

344

Fig. 2. The average reflectance spectral curves from 400-1000 nm for green fruits of mutant M1,

345

mutant M2, and the parent.

346

Fig. 3. The average reflectance spectral curves from 400-1000 nm for mature fruits of mutant M1,

347

mutant M2, and the parent.

348

Fig. 4. (a, b, c) The predicted versus reference charts by SW-LS-SVM.

349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 13

382 383

384 385 386

Fig. 1. The average reflectance spectral curves from 400-1000 nm for three tomato leafs of mutant

387

M1, mutant M2, and the parent.

388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 14

406 407

Fig. 2. The average reflectance spectral curves from 400-1000 nm for green fruits of mutant M1,

408

mutant M2, and the parent.

409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427

15

428 429 430

Fig. 3. The average reflectance spectral curves from 400-1000 nm for mature fruits of mutant M1,

431

mutant M2, and the parent.

432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449

16

450 451

(a)

452 453

(b)

454 455

(c)

456

Fig. 4. (a, b, c) The predicted versus reference charts by SW-LS-SVM.

457

17

458

Table 1. The results of calibration and validation sets in two regions for tomato leaf Data set Calibration

Validation

459 460 461 462 463 464 465 466

Correlation coefficients

RMSEC (RMSECV)

Bias

400-700

0.9941

0.1211

4.015e-06

700-1000

0.9818

0.2123

-4.095e-06

400-700

0.9785

0.2352

0.0080

700-1000

0.9488

0.3532

-0.0050

Table 2. The prediction sets in two regions for both green and mature fruit Correlation coefficient(rp)

RMSEP

Bias

Green fruit

400-700

0.9687

0.2841

0.0503

Mature fruit

700-1000 400-700 700-1000

0.9283 0.8433 0.9520

0.4227 0.6030 0.3710

0.0521 0.0492 -0.0206

467 468 469 470 471 472 473 474 475

Table 3. The explained variance of Y of the first four to nine LVs for both leaf and fruit Parameters LVsa Leaf sample b

EV (%) a

5

6

7

8

90.412

93.568

95.339

96.984

97.876

90.159

93.514

95.631

97.162

97.804

90.528

93.591

95.841

97.024

98.034

98.519

Green fruit Mature fruit

476

4

LV: latent variable; b EV: explained variance.

477

18

9

478

Highlights

479

Vis/NIR spectral was used to discriminate tomato breeds by spaceflight breeding from its leaf or fruit.

480

Tomato breeds were divided into mutations M1, mutations M2 and its parent

481

The SW-LS-SVM were better than PLS and LV-LS-SVM models to predict the tomato breeds

482

19