Food Chemistry 145 (2014) 342–348

Contents lists available at ScienceDirect

Food Chemistry journal homepage: www.elsevier.com/locate/foodchem

Rapid identification of adulterated cow milk by non-linear pattern recognition methods based on near infrared spectroscopy Li-Guo Zhang a, Xin Zhang b, Li-Jun Ni a,⇑, Zhi-Bin Xue a, Xin Gu b, Shi-Xin Huang b a b

School of Chemistry and Molecular Engineering, East China University of Science & Technology, Shanghai 200237, China Shanghai Municipal Control Institute of Veterinary Drug & Feedstuff, Shanghai 201103, China

a r t i c l e

i n f o

Article history: Received 12 May 2013 Received in revised form 4 August 2013 Accepted 14 August 2013 Available online 27 August 2013 Keywords: Rapid identification of adulterated cow milks Near infrared spectroscopy Uniform design Improved support vector machine Improved and simplified K nearest neighbours

a b s t r a c t More than 800 representative milk samples, which consisted of 287 raw cow milk samples from different pastures surrounding Shanghai of China and 526 adulteration milk samples containing different pseudo proteins and thickeners, were collected and designed to demonstrate a method for rapidly discriminating adulterated milks based on near infrared (NIR) spectra. The NIR classification models were built by two non-linear supervised pattern recognition methods of improved support vector machine (I-SVM) and improved and simplified K nearest neighbours (IS-KNN). Uniform design theory was applied to optimize the parameters of SVM and thus the computation amount was reduced 90%. Both two methods exhibit good adaptability in discriminating adulterated milks from raw cow milks. Further investigation showed that the correction ratio for discriminating milk samples increased with the increasing of adulteration solutions’ level in the adulterated milk. The concentration of adulterants is an important factor of influencing milk discrimination results of the NIR pattern recognition models. The results demonstrated the usefulness of NIR spectra combined with non-linear pattern recognition methods as an objective and rapid method for the authentication of complicated raw cow milks. Ó 2013 Elsevier Ltd. All rights reserved.

1. Introduction Liquid milk has been an important human nutrient source for thousand years. However, the food safety of milk has been challenged in recent years by illegal adulterants such as water, neutralizers, and melamine and so on (Balabin & Smirnov, 2011; Santos, Wentzell, & Pereia-Filho, 2012). Excessive addition of water into milk could result in the decrease of nutrition substances such as protein and solid content (Santos & Pereia-Filho, 2013). To avoid supervision, starch, whey or dextrin is often added to milk to increase the solid content of adulterated milk (Borin, Ferrao, Mello, Maretto, & Poppi, 2006), and other adulterants such as pseudo proteins are also added to increase the contents of ‘nutrient compositions’ in adulterated milk. Melamine as a nitrogen-rich substance with a 1,3,5-triazine skeleton and 66% nitrogen concentration by mass, is a preferred adulterant to increase pseudo protein content of adulterated milk. Unfortunately, hundreds of Chinese babies who had been long-term drinking milk powder containing melamine suffered from kidney stones in 2008 (http://news.sohu. com/s2008/babyshenjieshi/). Moreover, the harmful chemical substances such as urea and ammonium nitrate with multi nitrogen atoms are water soluble and can also dissolve in milk to improve the content of ‘protein’ in adulterated milk. However, the widely ⇑ Corresponding author. Address: Box 425, East China University of Science & Technology, 130 Meilong Road, Xuhui District, Shanghai 200237, China. Tel./fax: +86 21 64253045. E-mail address: [email protected] (L.-J. Ni). 0308-8146/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.foodchem.2013.08.064

used Kjeldahl method (Chan, Griffiths, & Chan, 2008) is unavailable to identify these chemicals due to its limitation of detecting the content of nitrogen to estimate protein content (Mauer, Chernyshova, Hiatt, Deeding, & Davis, 2009). Current analytical methods for detecting melamine or other harmful substances officially certified include HPLC, LC–MS/MS, GC–MS and GC–MS/MS, which are time consuming, expensive, and labour-intensive and require complex procedures of sample pretreatment (Lin, 2009). Therefore it is extremely important for public health concern to develop an effective, convenient and quick method to detect and authenticate of milk products. As an alternative tool for monitoring authenticity of milk, digital colour image analysis combined with chemometrics methods has been successfully applied to detect adulterations in liquid milks (Santos et al., 2012) and discriminate adulterated milks from authenticated milks (Santos & Pereia-Filho, 2013). The method does not require complicated procedure of preprocessing milk samples and detects samples quickly. Near infrared (NIR) spectroscopy, as another fast and non-destructive analytical method offering quicker analysis and fewer sample preparation, has been used in the authenticity of adulterated food (Downey, Fouratier, & Kelly, 2003; Downey, McIntyre, & Davies, 2002), detection of the contents of adulterants in powdered milk (Borin et al., 2006; Mauer et al., 2009) and melamine in dairy products (Balabin & Smirnov, 2011; Yuan, He, Ma, Wu, & Nie, 2009). Adulterated liquid cow milks containing botanical filling materials (Li & Ding, 2010), melamine (Dong et al., 2009), whey, synthetic milk, synthetic

L.-G. Zhang et al. / Food Chemistry 145 (2014) 342–348

urine, urea, and hydrogen peroxide (Santos, Pereira-Filho, & Rodrigues-Saona, 2013a, 2013b) were well distinguished by NIR spectroscopy combining some chemometrics methods such as Partial Least Square Discrimination Analysis (PLS-DA) and Soft Independent Modelling of Class Analogy (SIMCA). The authenticated liquid milk samples used in above references were just collected from three dairy cattles (Li et al., 2010) or local supermarket. And only 9–22 commercial liquid milk samples with different production lots of one brand (or several brands) were chosen in true milk category (Dong et al., 2009; Santos et al., 2013a, 2013b). These commercial milk samples were much fewer than adulterated milks because an authenticated milk sample was used to prepare many adulterated milk samples containing adulterants at different level. Adding multiple replicates of authenticated milk samples to true cow milk category cannot increase actual number of authenticated milk samples. It would not be difficult to discriminate such kinds of adulterated milks prepared from fewer true milk samples. Berrueta et al. pointed that the classification model would be overoptimistic if the number of samples in training set was too few (Berrueta, Alonso-Salces, & Héberger, 2007). Puchwein also indicated that the proper choice of samples is decisive for a good calibration (Puchwein, 1988). Our previous practice (Ni, Zhang, Xie, & Luo, 2009) demonstrated that representativeness and complexity of calibration samples greatly influence classification results of NIR pattern recognition models. Selection of representative training set were emphasized and investigated in the application of NIR technique (deGroot, Postma, Melssen, & Buydens, 1999). Therefore, the effectiveness of a pattern recognition method for classifying adulterated milk based on NIR spectra should be validated by enough reasonable samples to reflect actual status of authenticated and adulterated liquid milk. To guarantee the representativeness of milk samples used in present work, about three hundreds of raw cow milk sample were collected from different pastures surrounding Shanghai and more than five hundreds of adulterated milk samples were prepared by adding different amount of five kinds of common adulterants: starch, dextrin, melamine, urea and ammonium nitrate to raw cow milks. Starch (or dextrin) was used to improve solid content of adulterated milks and melamine (urea or ammonium nitrate) to improve ‘protein’ content of adulterated milks. By this way, we made the adulterated milk samples approach to their real state as far as possible. Supervised pattern recognition methods based on multivariate statistics are widely applied in qualitative analysis for near infrared spectroscopic technique (Berrueta et al., 2007). PLS-DA (Thissena, Pepersb, Üstüna, Melssena, & Buydens, 2004) is the most common method and often gives good classification results. However this linear algorithm would suffer from the drawbacks if a nonlinear behaviour exists in near infrared spectra (Thissena et al., 2004; Wentzell & Montoto, 2003). The variation of ambient measurement conditions such as humidity and temperature (Thissena et al., 2004; Bertran et al., 2000), instrument variation and sample characteristics may induce nonlinearities in NIR spectra. In our previous study (Zhong, Zhang, Zhang, Gu, & Ni, 2010), PLS-DA coupling to NIR spectra could well identify adulterated milk samples containing melamine (or urea) and dextrin, but gave very poor identification for most raw cow milk. Conversely, a raw-milk-suited identification method of KNN could not give a good result for adulterated milk. But the NIR models built by improved and simplified KNN (IS-KNN) method developed by our team (Ni, Zhang, Xie, & Luo, 2009) exhibited good performance both on raw cow milk and adulterated milk, and were robustness for all sample sets. We also applied PLS-DA, LDA, KNN and IS-KNN method to discriminate adulterated milk containing dextrin and trace melamine of 1–10 lm/L from 110 different raw cow milks (Ni et al., 2009), the highest validation correct ratios of PLS-DA, LDA, KNN and IS-KNN were 66%, 77%, 78%, and 94%, respectively. The detail

343

results were shown in Table 1 of the Supplemental materials. In our previous studies, every authenticated milk sample was only used to prepare an adulterated milk sample. The samples were more complicated and representative than those used in published references, in which many adulterated samples were prepared from same authenticated milk sample. Our practices indicated that the linear discrimination method was not suitable for distinguishing adulterated milk from complicated raw cow milk. As a nonlinear technique, artificial neural networks (ANN) were used to deal with NIR data. However, ANN method often results in over-fitting problem because too many model parameters are needed and only limited samples are available in practical application (Zhou et al., 2007). Support vector machine (SVM) (Cortes & Vapnik, 1995) is a method based on statistical learning theory. It fixes the classification decision function based on structural risk minimization principle to avoid over-fitting. SVMs are useful to find nonlinear, global solutions and suitable for working with high dimensional input vectors (Thissena et al., 2004). As a state-of-theart classification technique, SVM combining NIR spectroscopy has been applied to identify genuineness of medicine (Yu & Cheng, 2006), sort of teas (Chen, Zhao, Fang, & Wang, 2007), herb classification (Lai, Ni, & Kokot, 2011) and so on. It should be addressed that the classification results of NIR models built by SVM greatly concern with the parameters in SVM and the parameters’ optimization is time consuming. The present work advanced an effective method to optimize the parameters of SVM by applying uniform design theory (Fang, Lin, Winker, & Zhang, 2000; Liang, Fang, & Xu, 2001), and built good NIR classification models by the improved SVM to discriminate adulterated milk from raw cow milk. Furthermore, the influence of concentration of adulteration solutions on the discrimination results was investigated. As comparison, IS-KNN was also applied to discriminate adulterated milk based on NIR spectroscopy.

2. Material and methods 2.1. Experimental 2.1.1. Samples preparation For the initial pure milk samples, three batches of raw milk samples, collected from different pastures surrounding Shanghai, were recorded as RM set 1, RM set 2 and RM set 3, respectively. Two batches of aqueous solutions were prepared by adding 3 g of dextrin or starch into deionized water, respectively. The national standard of fresh milk in China requires that the content of proteins in milk is not lower than 2.95%. Per 100 mL aqueous solutions of dextrin or starch were respectively added to melamine of 0.731 g, urea of 1.048 g and ammonium nitrate of 1.402 g, to ensure the contents of ‘proteins’ in the adulterated solutions detected by the Kjeldahl method not lower than 3%. The detail information of the adulterated solutions (AS1AS6 for short) was listed in Table 1. To ensure the representiveness and reasonableness of the milk sample sets, different kinds of adulterated milk samples were prepared by adding aqueous solutions of dextrin or starch containing pseudo proteins (melamine, urea, or ammonium nitrate) to raw cow milk. Adulterated milk (AM for short) samples were prepared by adding 5%, 10% and 15% adulterated solutions AS1–AS6 to raw milk samples according to the schemes listed in the second and third columns of Table 2. For example, the adulterated milk samples in AM set 1 were prepared by adding 5% (or 10%, 15%) AS1 to the raw milk samples in RM set 3. Approximately one third of the samples in AM set 1 contained 5%, 10% and 15% of AS1, respectively. And thus, the ‘protein’ concentration detected by kjeldah method in every AM samples would not be lower than 3%. According

344

L.-G. Zhang et al. / Food Chemistry 145 (2014) 342–348

Table 1 Information of adulteration solutions. Adulteration solution

Thickener

Name of pseudo proteins

Content of pseudo protein (%)

AS1 AS2 AS3 AS4 AS5 AS6

Dextrin Dextrin Dextrin Starch Starch Starch

Melamine Urea Ammonium nitrate Melamine Urea Ammonium nitrate

3 3 3 3 3 3

AS is the abbreviation of adulteration solution.

Table 2 Information of milk samples. AM samples

Adulteration solutions

AS content in AM samples

Original RM samples

AM AM AM AM AM AM

AS1 AS2 AS3 AS4 AS5 AS6

5%, 5%, 5%, 5%, 5%, 5%,

RM RM RM RM RM RM

Set Set Set Set Set Set

1(80) 2(120) 3(80) 4(86) 5(83) 6(77)

10%, 10%, 10%, 10%, 10%, 10%,

15% 15% 15% 15% 15% 15%

set set set set set set

poor spectral quality. To eliminate the harmful effect of the nonuniform milk samples and improve the NIR spectral quality, every raw milk sample was put in an ultrasonic vibrator for 20 min at 40 °C, and then tested immediately. A quartz-surface flat base tube was used to load about 10 mL milk for NIR testing. The sample tube was rinsed by distilled water and dried in an oven at 80 °C under normal pressure for 5–6 min before it was used. Every sample was measured one time and each spectrum was the average of 64 scanned interferograms at 8 cm1 resolutions. After acquisition of the NIR spectrum of a raw cow milk sample, it was added to one kind of adulteration solutions of 5% (or 10%, or 15%) to prepare an adulterated milk (AM) sample. Then the NIR spectrum of this AM sample was measured immediately. 2.2. Data analysis 2.2.1. Improved SVM methods The detailed theoretical background of SVM could be found (Vapnik & Wiley, 1998). Among many modified SVM methods, the parameter m of m-SVM (Schölkopf, Smola, Williamson, & Bartlett, 2000), has clear significance and is easy to be determined. Usually m satisfies following relationship:

3(82) 1(120) 3(82) 2(85) 2(85) 2(85)

Note: The figure in the brackets of the first and last columns is the number of adulteration milk samples and raw cow milk samples of the corresponding sets, respectively.

a b 6m6 l l

ð2Þ

to Table 2, the AM samples of the six AM sets listed at the first column of Table 2 contained 515% adulteration solutions of dextrin and melamine, dextrin and urea, dextrin and ammonium nitrate, starch and melamine, starch and urea, starch and ammonium nitrate, respectively. These AM samples’ original raw cow milk sets were listed at the fourth column of Table 2. The actual contents of the three kinds of pseudo proteins melamine, urea and ammonium nitrate in adulterated milks were in the region of 0.04– 0.11%, 0.05–0.16% and 0.07–0.21%, respectively.

where l, a and b denote the number of training samples, number of mistakenly recognized samples and support vectors, respectively. So the value of m is in the interval [0, 1]. For a set of vectors: ðxi ; yi Þ; i ¼ 1; 2; . . . ; l; yi 2 f1; 1g;xi 2 X ¼ Rn , where xi is the ith vector belonging to class s (or t). The hyper plane that best separates two classes with the widest width can be determined by solving following dual problem:

2.1.2. Instrument and acquisition of near infrared spectra Since liquid cow milk is a kind of emulsion, it is not a transparency liquid. Transmission mode is not a good choice for measuring NIR spectra of liquid milk. In our pre-experiments, the quality of liquid milk’s spectra tested in transflective and diffuse mode was evaluated by standard variance spectrum of precision tests (SVSP for short). When the NIR spectra of a sample are continuously and repeatedly tested n times, the n spectra would not be completely coincide due to the influence of samples uniformity, instrument noises and measurement errors on the spectra. For evaluating the quality of the spectra, we calculate the standard variance spectrum of these n spectra in terms of Eq. (1).

8 0 6 asti 6 1=2lstþ ; yi ¼ 1 > > > > > 0 6 asti 6 1=2lst ; yi ¼ 1 > > > > > lst > > i¼1 > > lst > X > > > asti ¼ m > :

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  Pn i¼1 ðX ij  X j Þ SVSPðjÞ ¼ n1

ð1Þ

where Xij is the absorbance at the j-th wavelength of the i-th spec j is the average absorbance of the spectra at the j-th wavetrum, X length. The smaller SVSP is, the higher the spectrum quality is. As shown in Fig. 1 of the Supplemental materials, the value of SVSP of transflective mode in the region of 4000–8500 cm1 is considerately higher than that of diffuse mode. So, all NIR spectra were acquired in the range of 4000 cm1– 10,000 cm1 in diffuse reflectance mode using Antaris (Fourier transform near infrared) spectrometer (Thermo Fisher, USA) fitted with an InGas detector and diffuse reflectance accessory. The NIR diffuse reflectance spectra of milk samples were measured in the lab at the room temperature of 25 °C (±1 °C). Since the raw milk samples were kept in cold-chain transportation, they separated out solid fat particles which would lead to

(

) lst X lst 1X max LD ¼  ast ast y y Kðxi ; xj Þ 2 i¼1 j¼1 i j i j

ð3Þ

ð4Þ

i¼1

where Kðxi ; xj Þ ¼ /ðxi Þ  /ðxj Þ is called as kernel function. The Gaussian radial basis function (RBF) was selected as kernel function:

Kðxi ; xj Þ ¼ expðjjxi  xj jj2 =2r2 Þ

ð5Þ

Through solving quadratic programming problem of (3) with the constraint condition (4), the discrimination function of classes of s and t was obtained as follows:

8 X > Dst ðxÞ ¼ asti yi Kðxi ; xÞ þ bst > > < xi 2SV

ð6Þ

Dst ðxÞ ¼ Dts ðxÞ > > > : s–t; i ¼ 1; . . . ; lst where

8 > > 1 > > < bst ¼  2

1 Nþ

lst XX

lst XX

i2Sþ j¼1

K2S j¼1

aj yj Kðxj ; xi Þ þ N1

> > Sþ ¼ fijai 2 ð0; mþ Þ; yi ¼ 1g > > : S ¼ fijai 2 ð0; m Þ; yi ¼ 1g

!

aj yj Kðxj ; xk Þ ð7Þ

N+ and N denote the number of elements in S+ and S, respectively.

L.-G. Zhang et al. / Food Chemistry 145 (2014) 342–348

In present work, m-SVM was applied to build NIR pattern recognition models for discriminating adulterated milk samples. The parameters of m and r were optimized by means of uniform design theory based on NIR spectra and category of samples. The detail steps of optimizing m and r by uniform design theory were illustrated in Section 3.1.

2.2.2. Improved and simplified KNN method As introduced in reference (Ni, Zhang, Xie & Luo, 2009), IS-KNN method classifies an unknown sample to a group, if the Mahalanobis distance of the sample in the space of principal components (PCs) to the centre of the group is the shortest. To avoid over-fitting noises of instruments and experimental error, the allowed maximum number of PCs (mmax) is determined by comparing standard deviation of residual of all objects on the selected PCs with instrument noise level. The number of PCs (m) is optimized by leaving one out from samples of training set to do cross-validation at all available m (1 < m < mmax) and finally the m corresponding to the highest correct classification ratio is selected. Compared with LDA, KNN and PLS-DA, IS-KNN always exhibits good robustness and performance both on calibration and prediction, whether in solid powder samples (tobaccos) (Ni, Zhang, Xie, & Luo, 2009) or emulsion samples (milks) (Zhong, Zhang, Zhang, & Ni, 2010; Ni et al., 2009).

2.2.3. Evaluation on performance of NIR classification models In the present work, one-third samples were uniformly selected three times in turn as validation sets in every NIR classification model, and the rest samples were used as calibration sets. The average validation correct ratio (VCR) for the three validation sets and the average calibration correct ratio (CCR) for the three training sets were applied to evaluate the prediction and calibration performance of the models. CCR and VCR were calculated according to following formula:

P3 Nck CCR ¼ Pk¼1 3 k¼1 N k

ð8Þ

where Nk is the number of training samples selected for the kth time, and Nck is the number of correctly recognized training samples for the kth time.

P3 M ck VCR ¼ Pk¼1 3 k¼1 M k

ð9Þ

where Mk is the number of validation samples selected for the kth time, and Mck is the number of validation samples correctly recognized for the kth time.

2.2.4. Data processing In our previous study (Zhong et al., 2010), four kinds of NIR spectra, original, multiplicative-scatter calibration (MSC), standard normal variation (SNV) and the first-derivative spectra were used to build NIR pattern recognition models of milk. The classification results based on the first three kinds of spectra were better than that on the first-derivative spectra. The noises amplified by derivation may result in poorer classification results. Since there were not significant difference between the models’ performances based on original, MSC and SNV NIR spectra. For simplicity, SNV was selected to pre-process the milk samples’ NIR spectra in present work. Principal component analysis (PCA) was applied to compress information of NIR spectra before modelling. All algorithms were compiled on MATLAB ver. 7.0 (The Math-Works, USA).

345

3. Results and discussions The derivative transformation on NIR spectra amplifies spectral difference, emphasizes the steep edges of a peak and pronounces small features over a broad background. So we defined outlier milk samples based on their first-derivative spectra. Because some peaks (or valleys) of five samples’ first-derivative NIR spectra were significantly higher or lower than other spectra, the five spectra were considered as spectral outliers and excluded from further calculations. As shown in Table 2, finally 287 raw cow milk samples and 526 adulterated milk samples were reserved. As an example, we illustrated the original and SNVNIR spectra of RM sets 1 of Table 2 in Fig. 2 of the Supplemental materials. For simply comparing the spectra change after SNV transformation, we demonstrated original and SNV average spectra of six sample sets of Table 2 in Fig. 3 of the Supplemental materials. It is well known that preprocessing can eliminate baseline shift, some noises and reduce scattering effect existing in NIR spectra. However, as shown in Fig. 3 of the Supplemental materials, the spectra were more concentrated and the difference between true and adulterated milk is narrowed after SNVpreprocessing.

3.1. Optimization of the parameters of the classification models built by improved SVM methods The number of principal components (PCs) is an important parameter that greatly affects discrimination results of NIR models. The number of PCs in most published literatures using NIR technique to do quantitative and qualitative analyses is usually no more than 10. If more than 10 principal components are selected, one is likely to think whether the models be over-fitting. However, our previous practice indicates that more complicated samples are, more principal components are needed. The optimized principal component number for the NIR data sets, which were consisted by 555-1129 tobacco samples from 10 cultivation areas of China, was 10–24 and changed with the complexity of data sets (Ni, Zhang, Xie, & Luo, 2009). We carried out PCA based on SNV spectra of RM set 3 and AM set 1 of Table 2 and listed the results in Table 2 and Fig. 4 of the Supplementary materials. The variances of the 12th17th and 18th–22th principal components were at level of 1.0e4 and 1.0e5, respectively. The accumulative contribution ratio of the first twenty principal components was 99.9996%. Fig. 4 of the Supplemental materials illustrated that the loading spectra of the 21th–22th PCs were very similar and looked like noises but there was larger difference between loading spectra of the 19th–20th PCs. Since the original liquid cow milk samples in present work (about 300) were collected from more than 100 pastures surrounding Shanghai of China during one year. And every raw liquid milk sample was applied to prepare one or two adulterated samples only. These milk samples were much more complicated than those used in published references. The principal components with small variances also play important role in the pattern recognition of complicated liquid milk samples. Furthermore, we inspected the change of VCR with PC number on the basis of RM set 3 and AM set 1 by m-SVM. As shown in Fig. 5 of the Supplemental materials, VCR generally increased with the number of PCs from 5 to 19 and arrived at a high value at 19 and changed a little after 19. Basing on Figs. 4 and 5 of the Supplemental materials, we selected PCs number as 20 in m-SVM method. To observe the effect of parameters of r and m on prediction performance of NIR model built by m-SVM, a sample set consisted of SNV spectra of AM set 2 and RM set 1 was randomly selected to build NIR classification models with different combinations of r in the range of 50–1000 and m in the range of 0–1 with adequate

346

L.-G. Zhang et al. / Food Chemistry 145 (2014) 342–348 Table 4 Discrimination results of adulterated milk and raw milk without considering the types of adulterants. 0.96

VCR

0.95

Method

VCR-T (%)

VCR-F (%)

CCR (%)

VCR (%)

I-SVM IS-KNN

86.75 86.78

92.86 89.96

99.63 90.96

96.62 88.89

0.94 0.93 X: 0.1 Y: 100 Z: 0.9288

0.92 1000

500

σ 0

0.1

0.12

0.14

0.16

0.18

0.2

have very good calibration performance, but also good prediction performance no matter what kinds of adulterants are added to milks. There weren’t significant difference between VCR (CCR) of the two methods. It indicates that the two non-linear pattern recognition methods can well discriminate adulterated milk containing specific adulterants from its original raw milk.

v

Fig. 1. Mesh plot of VCR for validation samples at combination of r and m by I-SVM.

increments. According to Fig. 1, VCR fluctuates in the region of 92– 95% with r of 0500 and m of 0.10.2. Thus we selected 10 levels of the two parameters as follow:

r ¼ f50; 100; 150; 200; 250; 300; 350; 400; 450; 500g m ¼ f0:1; 0:11; 0:12; 0:13; 0:14; 0:15; 0:16; 0:17; 0:18; 0:19g

ð10Þ

The complete gridding combinations of r and m with the ten levels were 10  10 = 100. Uniform experiment design table of U 10 ð108 Þ and its application table (Liang et al., 2001), which can arrange 10 experiments with 8 factors at 10 levels, was applied to reduce calculation amount for getting best combination of r and m. By means of the application table of U 10 ð108 Þ, ten combinations of r and m were selected as calculation scheme. Limited by the paper’s length, we illustrated the application table of U 10 ð108 Þ and the ten combinations of r and m based on it in the Table 3 of the Supplemental materials. The best combination of r and m was determined by the highest VCR finally. The calculation times of optimizing m and r was reduced from 100 to 10 by this way. 3.2. Discriminating adulterated milks from raw milks based on six sample sets We defined m-SVM combing uniform design theory to optimize its parameters as improved SVM (I-SVM for short) in present work, and applied I-SVM and IS-KNN to discriminate adulterated milks from raw cow milks based on the six sets in Table 2. The classification results of the two methods were listed in Table 3, which showed that most average calibration correct ratios (CCR) were greater than 95%. The average VCR-F and the average VCR-T were very close, and most of them fluctuated in the region of 92–95%. The six NIR classification models built by I-SVM or IS-KNN not only

3.3. Discriminating adulteration milks from raw cow milks regardless of the types of adulterants Whether milk is adulterated or not is more concerned than which kind of adulterants is added. When we did not pay attention to the type of pseudo proteins or thickeners illegally added to raw milk, we divided the milk samples listed in Table 2 into two groups. One was false milk group consisting of 526 adulterated milk samples in AM set 16. Another one was authenticated milk group consisting of 287 raw milk samples in RM set 13. Classification results given by I-SVM and IS-KNN were listed in Table 4 for comparison. Table 4 indicates that without considering the types of proteins and thickeners, I-SVM model has better performance than IS-KNN. The VCR-F of I-SVM model is greater than 92%, and higher than VCR-T about 6%. It means that I-SVM model tends to wrongly discriminate authenticated milk as adulterated milk. However, the VCR-F and VCR-T of any NIR model in Table 3 are very similar because samples’ numbers in the groups of authenticated and adulterated milk groups are very close. Berrueta et al. (2007) indicated that many multivariate techniques, such as LDA, KNN and class-modelling technique, were very sensitive to samples’ number in each class. If the samples’ number in each group of the training set is not approximately equal, the classification model will bias towards the group with the most representatives. Therefore, if possible, equal number of samples in each class should be ensured by designed experiments whatever pattern recognition methods are applied. Zhong et al. (2010) applied IS-KNN, KNN, PLS-DA to authenticate milk based on NIR spectra of 162 adulterated milks containing dextrin and melamine (or urea) and 162 different raw authenticated milks collected from more than 100 pastures. The VCR-T and VCR-F of IS-KNN, KNN, PLS-DA were 86% and 91%, 41% and 52%, 28% and 100%, respectively. The results of IS-KNN in this reference were similar to that of present work. It shows that the classification results of NIR models are not correlated to the

Table 3 Classification results of NIR models built by m-SVM and IS-KNN based on six sample sets of Table 2. Thickener type

Pseduo-protein type Urea

Ammonium nitrate

Melamine

VCR-T (%)

VCR-F (%)

CCR (%)

VCR-T (%)

VCR-F (%)

CCR (%)

VCR-T (%)

VCR-F (%)

CCR (%)

I-SVM

Dextrin Starch

92.71 94.46

90.83 94.09

94.12 95.16

95.49 93.37

96.34 91.03

96.48 95.73

95.13 94.06

93.86 97.62

96.07 95.16

IS-KNN

Dextrin Starch

86.46 95.49

95.00 95.28

90.44 97.72

96.48 96.01

95.39 94.77

95.83 94.76

95.48 94.07

91.48 97.65

96.48 95.70

VCR-T and VCR-F are the average validation correct ratio for true and false validation samples, respectively. CCR are the average calibration correct ratio for all training samples without considering their category.

347

L.-G. Zhang et al. / Food Chemistry 145 (2014) 342–348 Table 5 The effect of AS concentrations on milk discrimination results. Method

I-SVM IS-KNN

5% AS

10% AS

15% AS

VCR-T (%)

VCR-F (%)

CCR (%)

VCR-T (%)

VCR-F (%)

CCR (%)

VCR-T (%)

VCR-F (%)

CCR (%)

87.83 86.76

85.94 87.08

94.41 90.75

89.34 90.33

86.80 86.75

95.89 91.33

90.61 92.35

91.53 92.68

96.34 95.70

adulterants type but greatly influenced by pattern recognition methods if milk samples are very complicated. Compared with I-SVM, the VCR-F and VCR-T given by IS-KNN method are close. The method is not as sensitive to samples’ number as I-SVM. 3.4. Effect of concentration of adulteration solution on discriminating adulteration milks from raw cow milks As reported in the literature (Dong et al., 2009), designing adulterated milk samples containing 50% of adulterated solution (AS) will build a very good NIR pattern recognition model with 100% of CCR and VCR. However, as a technique for solving practical problems, sample selection and design of a NIR classification model should represent practical status of samples and be reasonable. Neimeng province is the main milk production area in China. Basing on cost estimation (Hu, 2012), adding 10% adulteration solutions to raw milk will make a dairy farmer in Neimeng province get extra ¥60,000 yuan annually, which is more than two times of a dairy farmer’s ordinary annual income. If too many adulteration solutions are added to raw milks, the adulterated milks would be of high risk of being discriminated. Moreover, Brazilian Police investigations also indicate that adulterated liquid cow’s milk presents 10–15% (v/v) added water (Santos et al., 2012). So a reasonable and profitable AS concentration is 10–15%. Therefore, the concentration of AS in present study was designed as three levels of 5%, 10% and 15%, respectively. To investigate the influence of the contents of AS on the NIR classification models, the 526 adulterated milk samples in AM set 1AM set 6 were divided into three groups according to AS content gradients. AM group 1, 2, and 3 consisted of AM samples with AS concentrations of 5%, 10% and 15%, respectively. Basing on AM group 1 + RM set 13, AM group 2 + RM set 13 and AM group 3 + RM set 13, three NIR classification models were established by I-SVM and IS-KNN, respectively. Table 5 shows that both CCR and VCR increase with the increasing of the concentration of adulteration solutions. Santos and Pereia-Filho (2013) successfully applied KNN and SIMCA basing on digital image to discriminate 130–184 adulterated milk respectively containing water, whey, synthetic urine, hydrogen peroxide and synthetic milk from 135 fresh cow milks. They found that the samples were grouped according to their initial level of adulteration but not the adulterant present in the adulteration samples. Their results and present works’ findings illustrated AS level greatly influenced the results of classification models no matter which kinds of signals were selected to authenticate milk. Increasing AS concentrations in adulterated milk can improve the performance of NIR models. However, it is of no practical significance to use adulterated milk samples containing more than 20% of adulterated solutions to get much better classification results by NIR technique. 4. Conclusions When the concentration of adulteration solutions equals or exceeds 5%, the two non-linear pattern recognition methods of I-SVM and IS-KNN combining NIR spectroscopy can well distinguish authenticated raw milk from adulteration milk added with various pseudo proteins and thickeners. More than 800 representative milk

samples verified the methods proposed in this article. By means of uniform design table and its application table, calculation amount of optimizing the parameters of SVM can be greatly reduced. I-SVM gives better results when the types of adulterants were not distinguished. Compared with I-SVM, the influence of samples number in different groups on the classification results of IS-KNN is not significant. To build NIR classification models with good performance, it is better to make samples’ numbers of different groups be close. Since pattern recognition method is not matrix independent, the classification models should be built and validated by representative samples to ensure the model’s applicability. The authenticated and adulterated liquid milk samples in present work were carefully designed to reflect practical milk’s status as far as possible. The method proposed in present study would be of good adaptability in practice of monitoring milk’s authenticity by NIR spectroscopy. The present study and Santos and Pereia-Filho’s work (2013) suggest that the performance of pattern recognition models for discriminating adulterated milks based on NIR spectral signals or colour images is positively correlated to the content of adulteration solution. Since water is more than 90% in the adulteration solutions, the results indicate that more water amount is added to milk, the better the classification are. It could be inferred that water signal in NIR spectra of milks is a key factor influencing discrimination of adulterated milk. Further study and discussion will be reported in our following work. The present study and the work of Zhong et al. (2010), Ni et al. (2009) indicate that non-linear pattern recognition methods are more suitable for discriminate complicated adulterated milk samples, which are prepared from different authenticated liquid milks. Acknowledgements We appreciate the Shanghai Dairy Research Institute, which provided and authenticated all raw cow milk samples used in present work. All authors do not have a direct financial relation with the commercial identity mentioned in this article. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.foodchem.2013. 08.064. References Balabin, R. M., & Smirnov, S. V. (2011). Melamine detection by mid- and nearinfrared (MIR/NIR) spectroscopy: A quick and sensitive method for dairy products analysis including liquid milk, infant formula, and milk powder. Talanta, 85, 562–568. Berrueta, L. A., Alonso-Salces, R. M., & Héberger, K. J. (2007). Supervised pattern recognition in food analysis. Journal of Chromatography A, 1158, 196–214. Bertran, E., Blanco, M., Maspoch, S., Ortiz, M. C., Sánchez, M. S., & sarabia, L. A. (2000). Handling intrinsic non-linearity in near-infrared reflectance spectroscopy. Chemometrics and Intelligent Laboratory Systems, 49, 215–224. Borin, A., Ferrao, M., Mello, C., Maretto, D., & Poppi, R. (2006). Least-squares support vector machines and near infrared spectroscopy for quantification of common adulterants in powdered milk. Analytica Chimica Acta, 579, 25–32. Chan, E. Y. Y., Griffiths, S. M., & Chan, C. W. (2008). Public-health risks of melamine in milk products. The Lancet, 372, 1444–1445.

348

L.-G. Zhang et al. / Food Chemistry 145 (2014) 342–348

Chen, Q. S., Zhao, J. W., Fang, C. H., & Wang, D. M. (2007). Feasibility study on identification of green, black and Oolong teas using near-infrared reflectance spectroscopy based on support vector machine (SVM). Spectrochimica Acta Part A, 66, 568–574. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297. deGroot, P. J., Postma, G. J., Melssen, W. J., & Buydens, L. M. C. (1999). Selecting a representative training set for the classification of demolition waste using remote NIR sensing. Analytica Chimica Acta, 392, 67–75. Dong, Y. W., Tu, Z. H., Zhu, D. Z., Liu, Y. W., Wang, Y. N., Huang, J. L., et al. (2009). Feasibility of using NIR spectroscopy to detect melamine in milk. Spectroscopy and Spectral Analysis (China), 29, 2934–2938. Downey, G., Fouratier, V., & Kelly, J. D. (2003). Detection of honey adulteration by addition of fructose and glucose using near infrared transflectance spectroscopy. Journal of Near Infrared Spectroscopy, 11, 447–456. Downey, G., McIntyre, P., & Davies, A. N. (2002). Detecting and quantifying sunflower oil adulteration in extra virgin olive oils from the eastern mediterranean by visible and near-infrared spectroscopy. Journal of Agricultural and Food Chemistry, 50, 5520–5552. Fang, K. T., Lin, D. K. J., Winker, P., & Zhang, Y. (2000). Uniform design: Theory and application. Technometrics, 42, 237–248. http://news.sohu.com/s2008/babyshenjieshi/, accessed on 04. 27.13. Hu, M. H. (2012). Neimenggu statistical yearbook 2012. Beijing: Chinese Statistics Press. Lai, Y. H., Ni, Y. N., & Kokot, S. (2011). Discrimination of Rhizoma Corydalis from two sources by near-infrared spectroscopy supported by the wavelet transform and least-squares support vector machine methods. Vibrational Spectroscopy, 56, 154–160. Li, L., & Ding, W. (2010). Discriminant analysis of raw milk adulterated with botanical filling material using near infrared spectroscopy. Spectroscopy and Spectral Analysis (China), 30, 1238–1242. Liang, Y. Z., Fang, K. T., & Xu, Q. S. (2001). Uniform design and its applications in chemistry and chemical engineering. Chemometrics and Intelligent Laboratory Systems, 58, 43–57. Lin, M. (2009). A review of traditional and novel detection techniques for melamine and its analogues in foods and animal feed. Frontiers of Chemical Engineering in China, 3, 427–435. Mauer, L. J., Chernyshova, A. A., Hiatt, A., Deeding, A., & Davis, R. (2009). Melamine detection in infant formula powder using near- and mid-infrared spectroscopy. Journal of Agricultural and Food Chemistry, 57, 3974–3980. Ni, L. J., Zhang, L. G., Xie, J., & Luo, J. Q. (2009). Pattern recognition of Chinese fluecured tobaccos by an improved and simplified K-nearest neighbors classification algorithm on near infrared spectra. Analytica Chimica Acta, 633, 43–50.

Ni, L. J., Huang, S. H., Guo, M. L., Zhang, X., Zhang, C., Zhang, L. G, & Wang, X. (2009). Rapid identification of adulteration milk by near infrared spectroscopy. In Proceedings of 14th international conference of near infrared spectroscopy, P182, November 8–13, 2009. Bangkok, Thailand. Puchwein, G. (1988). Selection of calibration samples for near-infrared spectrometry by factor analysis of spectra. Analytical Chemistry, 60, 569–573. Santos, P. M. D., Wentzell, P. D., & Pereia-Filho, E. R. (2012). Scanner digital images combined with color parameters: A case study to detect adulterations in liquid cow’s milk. Food Analytical Methods, 5, 89–95. Santos, P. M. D., & Pereia-Filho, E. R. (2013). Digital image analysis-an alternative tool for monitoring milk authenticity. Analytical Methods (Print). http:// dx.doi.org/10.1039/ C3AY40561C. Santos, P. M., Pereira-Filho, E. R., & Rodrigues-Saona, L. E. (2013a). Rapid detection and quantification of milk adulteration using infrared microspectroscopy and chemometrics analysis. Food Chemistry, 138, 19–24. Santos, P. M., Pereira-Filho, E. R., & Rodrigues-Saona, L. E. (2013b). Application of handheld and portable infrared spectrometers in bovine milk analysis. Journal of Agricultural and Food Chemistry, 61, 1205–1211. Schölkopf, B., Smola, A. J., Williamson, R. C., & Bartlett, P. L. (2000). New support vector algorithms. Neural Computation, 12, 1207–1245. Thissena, U., Pepersb, M., Üstüna, B., Melssena, W. J., & Buydens, L. M. C. (2004). Comparing support vector machines to PLS for spectral regression applications. Chemometrics and Intelligent Laboratory Systems, 73, 169–179. Vapnik, V., & Wiley, J. (1998). Statistical learning theory. New York: John Wiley & Sons Inc. Wentzell, P. D., & Montoto, L. V. (2003). Comparison of principal components regression and partial least squares regression through generic simulations of complex mixtures. Chemometrics and Intelligent Laboratory Systems, 65, 257–279. Yu, K., & Cheng, Y. (2006). Discriminating the genuineness of Chinese medicines using least squares support vector machines. Chinese Journal of Analytical Chemistry, 34, 561–564. Yuan, S. L., He, Y., Ma, T. Y., Wu, D., & Nie, P. C. (2009). Fast determination of melamine content in milk base on Vis/NIR spectroscopy method. Spectroscopy and Spectral Analysis (China), 29, 2939–2942. Zhong, Z. Z., Zhang, L. G., Zhang, X., Gu, X., & Ni, L. J. (2010). The feasibility study of discriminating adulteration raw milk by near infrared technology. Computers and Applied Chemistry (China), 27, 1691–1693. Zhou, D. Z., Ji, B. P., Meng, C. Y., Shi, B. L., Tu, Z. H., & Qing, Z. S. (2007). The performance of m-support vector regression on determination of soluble solids content of apple by acousto-optic tunable filter near infrared spectroscopy. Analytica Chimica Acta, 598, 227–234.

Rapid identification of adulterated cow milk by non-linear pattern recognition methods based on near infrared spectroscopy.

More than 800 representative milk samples, which consisted of 287 raw cow milk samples from different pastures surrounding Shanghai of China and 526 a...
360KB Sizes 0 Downloads 0 Views