Environmental Pollution 206 (2015) 217e226

Contents lists available at ScienceDirect

Environmental Pollution journal homepage: www.elsevier.com/locate/envpol

Rapid identification of soil cadmium pollution risk at regional scale based on visible and near-infrared spectroscopy Tao Chen a, *, Qingrui Chang a, **, J.G.P.W. Clevers b, L. Kooistra b a b

College of Resources and Environment, Northwest A&F University, Shaanxi, Yangling 712100, China Laboratory of Geo-Information Science and Remote Sensing, Wageningen University, Droevendaalsesteeg 3, 6708 PB Wageningen, The Netherlands

a r t i c l e i n f o

a b s t r a c t

Article history: Received 17 March 2015 Received in revised form 4 July 2015 Accepted 7 July 2015 Available online xxx

Soil heavy metal pollution due to long-term sewage irrigation is a serious environmental problem in many irrigation areas in northern China. Quickly identifying its pollution status is an important basis for remediation. Visible-near-infrared reflectance spectroscopy (VNIRS) provides a useful tool. In a case study, 76 soil samples were collected and their reflectance spectra were used to estimate cadmium (Cd) concentration by partial least squares regression (PLSR) and back propagation neural network (BPNN). To reduce noise, six pre-treatments were compared, in which orthogonal signal correction (OSC) was first used in soil Cd estimation. Spectral analysis and geostatistics were combined to identify Cd pollution hotspots. Results showed that Cd was accumulated in topsoil at the study area. OSC can effectively remove irrelevant information to improve prediction accuracy. More accurate estimation was achieved by applying a BPNN. Soil Cd pollution hotspots could be identified by interpolating the predicted values obtained from spectral estimates. © 2015 Elsevier Ltd. All rights reserved.

Keywords: Soil cadmium Visible and near-infrared spectroscopy Orthogonal signal correction Back propagation neural network Indicator kriging

1. Introduction Soil heavy metal pollution has become an important environmental issue (Tchounwou et al., 2012). Amongst various heavy metal elements, cadmium (Cd) is identified as an extremely significant pollutant because of its high transfer rate from soil to plants and strong bio-toxicity (Das et al., 1997; Mclaughlin and Singh, 1999; Satarug et al., 2010). Excessive Cd uptake can result in plant growth inhibition and even death (Gussarsson et al., 1996; Haghiri, 1974). Worse is that it also can enter the human body through the food chain, threatening health, especially for children and pregnant women (Peralta-Videa et al., 2009). Many studies have shown that soil Cd is an important source of human Cd intake (Jarup, 2003). Therefore, quickly obtaining information on its pollution level and spatial distribution is very important for remediation. However, the conventional method of determining Cd concentration in soil requires complex, multi-step laboratory analyses that involve strong acid digestion and concentration process (Gholizadeh et al., 2015). This method is time-consuming,

* Corresponding author. ** Corresponding author. E-mail addresses: [email protected] (T. Chen), [email protected] (Q. Chang). http://dx.doi.org/10.1016/j.envpol.2015.07.009 0269-7491/© 2015 Elsevier Ltd. All rights reserved.

expensive and requires excellent operation skills. Soil reflectance spectroscopy provides a promising tool for the efficient detection and monitoring of soil contaminants (Ben-Dor et al., 1997; Gholizadeh et al., 2015). Using reflectance spectroscopy measurements in the visible-near-infrared (VNIR 400e1100 nm) and shortwave-infrared (SWIR 1100e2500 nm) range, some direct and indirect soil properties can be extracted (Gholizadeh et al., 2015). These chemometric processes are termed NIRA (near-infrared reflectance analysis) (Stenberg and Rossel, 2010). Dalal and Henry (1986) used the NIRA method to predict soil moisture, organic carbon, and total nitrogen contents in Australia. Ben-Dor and Banin (1995) simultaneously evaluated soil clay content, surface area, cation exchange capacity, hygroscopic moisture, OM and CaCO3 contents in Israel. As regards heavy metals, the first report on quantitative prediction by NIRA was published by Malley and Williams (1997). Hereafter, some similar studies were performed in different mining areas and floodplains (Choe et al., 2008; Moros et al., 2009; Siebielec et al., 2004). In recent years, more and more focus was put on estimating low concentrations of heavy metals in agricultural soils (Ren et al., 2009; Wang et al., 2014; Wu et al., 2005), which had a close relationship with food safety. Among these existing studies, researchers found that certain specific information was not easy to be extracted and stripped from weak soil spectral signals. Due to the comparable size of the

218

T. Chen et al. / Environmental Pollution 206 (2015) 217e226

wavelengths in VNIR-SWIR electromagnetic radiation and particle sizes in soil samples, the reflectance spectroscopy is a battleground for undesired scatter effects (both baseline shift and nonlinearities) that influence the sample spectra (Rinnan et al., 2009). By applying suitable pre-treatments, these effects can largely be eliminated. For example, Ben-Dor et al. (1997) used first and second derivative absorption spectra to enhance the spectral information. Stenberg and Rossel (2010) compared the effects of log(1/R) transformation, first derivative, and SNV-Detrending on soil reflectance spectra. Hively et al. (2011) compared 30 combinations of spectral math pre-treatments and found that first derivative worked well for nearly all analytes (including soil carbon, particle size distribution, and some agronomical elements). Generally speaking, these pre-treatments are divided into two categories: scatter-correction and spectral derivative (Buddenbaum and Steffens, 2012). No matter which pre-treatment technique is used, its spectral math manipulations are performed only based on the spectral matrix X without considering soil physical-chemical components Y (response variable). However, orthogonal signal correction (OSC), as a commonly used approach in food science, can purposefully remove irrelevant information from X that is mathematically orthogonal to target matrix Y, or as close to orthogonal as possible (Wold et al., 1998). The result is that more valuable spectral information is retained that is closely related to the target variable (Y). Unfortunately, so far no study was reported about the application of OSC in soil science, especially in heavy metal estimation. In addition, choosing the appropriate calibration method can also help to achieve a more reliable prediction model in NIRA. At present, many studies used Multiple Linear regression (MLR) (Kemper and Sommer, 2002), Principal Component Regression (PCR) (Wu et al., 2005), or Partial Least Squares Regression (PLSR) (Kooistra et al., 2001) to estimate the concentrations of soil heavy metals, in which PLSR became one of the most popular calibration technologies. As a classic linear calibration method, it inevitably had some deficiencies when it was used to model some non-linear relationships. Artificial neural networks (ANN) overcomes this problems, which has the ability to model linear or non-linear relationships between a set of input and output variables (Ben-Dor et al., 1997). For example, Daniel et al. (2003) confirmed the capability of ANN to estimate soil macronutrients by solving difficulties incurred from high cross-channel correlations. Meanwhile, it has also been used to estimate soil salt (Farifteh et al., 2007), bulk density (Al-Asadi and Mouazen, 2014) and water infiltration rate (Goldshleger et al., 2012) etc. However, to our knowledge research on the application of ANN in estimating soil heavy metals is still rarely reported. Sewage irrigation is an important pollution source of soil heavy metals. In the past few decades, sewage irrigation in northern China has become a common phenomenon, especially in the arid and semi-arid regions. In recent years, to improve the eco-environment, remediation of the pollution is undertaken by the government by prohibiting agricultural practices, covering with new soil or phytoextraction. In this process, rapid and repeated assessment of the pollution status is necessary in order to decide the moment that the pollution has reduced to acceptable levels. However, it is difficult for the conventional method to meet the practical needs due to its high costs and time-consuming characteristics. Therefore, in this study, based on different spectral pretreatments and calibration methods, soil Cd pollution in a typical sewage irrigation area was estimated using the NIRA technique in the VNIR region. On this basis, the spatial distribution and pollution hotspots of Cd were obtained. This case study aims to: i) identify most effective pre-processing and calibration methods by comparing the estimation accuracy of soil Cd; ii) reveal the estimation mechanism in the typical sewage irrigation area; and iii)

explore the feasibility of rapidly identifying soil Cd pollution hotspots at regional scale based on the NIRA technique and geostatistics. 2. Materials and methods 2.1. Study area and sampling This study was conducted in the Fenghui irrigation region, which is situated in the northwest of Xi'an City, Shaanxi Province, China (Fig. 1). Since the early 1950s sewage irrigation has started in this region and gradually has expanded. With the enhancement of environmental consciousness, sewage irrigation fully stopped in 2010. This region has distinctive seasons with an annual mean temperature of 13.4  C and a mean annual precipitation of 580 mm (Chen et al., 2014). Local terrain is relatively flat, with an average elevation of 382 m. The dominant soil type is Loessial soil (HapliUstic Cambosols according to the Chinese soil taxonomy), with an average pH of 8.5 and a silty soil texture. Due to long-term sewage irrigation, soil nutrient contents are generally higher than the mean standard of Shaanxi soils (Chen et al., 2014). The crop rotation is mostly wheat (Triticum aestivum L.) followed by maize (Zea mays L.). Two soil sampling campaigns were carried out in May 2010 and May 2012. The first sampling consisted of 52 sites spread over the study area in a random design and in total 52 soil samples were collected (depth 0e20 cm). In the second sampling, 8 sampling sites were chosen according to the distance from the irrigation canal. At each site, three soil samples were collected, totaling 24 samples: 8 surface (depth 0e20 cm), 8 subsurface (20e40 cm), and 8 mixed (0e40 cm). The location of each sampling site was determined by GPS. As a result, a total of 76 samples were collected. Soil samples were air-dried at room temperature, removed of stones or other debris, and passed through a 0.149 mm mesh. The samples were analyzed for soil organic matter (SOM), total Fe2O3, and Cd. In the chemical analysis, SOM was determined by the K2Cr2O7eH2SO4 oxidation method and total Fe2O3 concentrations were determined by power X-ray fluorescence spectrometry. Concentrations of Cd were measured by an inductively coupled plasma mass spectrometer after digestion with nitric acid (HNO3) and perchloric acid (HClO4) (Agricultural Chemistry Committee of China, 1983). Duplicate tests were carried out, and quality control procedures were conducted by using certified reference materials (GSS-17 and GSS-19). 2.2. Spectral measurement and preprocessing Soil spectra were measured by a portable FieldSpec® Spectrometer (ASD Inc., USA), which covers the VNIR region (325e1075 nm) and offers a spectral resolution of 3 nm at 700 nm. Spectral sampling interval is 1.4 nm and data were resampled to 1 nm output values. The spectral measurements were performed in a dark room. A 50 W halogen lamp pointing at an angle of 30 from nadir was mounted on a tripod (40 cm above samples) as the stable light source. A 7.5 field-of-view at nadir was chosen and the spectrometer was fixed at 40 cm above the center of the samples. Each soil sample was uniformly tiled in a petri dish and was scanned 10 times. The average spectral curve was calculated and further used for pre-treatment and subsequent modeling. The spectrometer was recalibrated after every 10 samples using a white BaSO4 panel. In order to reduce noise and calculation time, the reflectance curves firstly were smoothed using a pseudo-Gaussian method, and the spectral bands in the intervals 325e399 nm and 1001e1075 nm were removed. Then, the reflectance values were resampled across a 9-nm wide window, which resulted in 67

T. Chen et al. / Environmental Pollution 206 (2015) 217e226

219

Fig. 1. Study area and sampling points.

reflectance values for each sample (400e1000 nm). PLSR results of the spectra without further processing are the reference for the other pre-processing methods. In the collected soil spectra, there are always some undesired systematic variations which are primarily caused by light scattering and differences in spectroscopic path length (Wold et al., 1998). These unwanted variations often constitute the major part of the total variation in the sample set, and can be observed as shifts in

baseline (multiplicative effects) and non-linearities (Rinnan et al., 2009). For removal of these systematic variations in soil spectra, two types of pre-treatments are commonly applied before calibration modeling, differentiation and signal correction (Wold et al., 1998). In this study, a total of 6 pre-processing were compared. The first category includes Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV), OSC, and Absorbance

220

T. Chen et al. / Environmental Pollution 206 (2015) 217e226

transformation (ABS, log(1/R)). The second category are mainly spectral derivatives which include first derivative of reflectance (1st D) and second derivative of reflectance (2nd D). MSC and SNV are often used to remove multiplicative interferences of scatter and particle size (Buddenbaum and Steffens, 2012). The concept behind MSC is that artifacts or imperfections (e.g., undesirable scatter effect) will be removed from the spectral matrix X prior to data modeling. However, because MSC cannot in a simple way be orthogonalized against target variables Y, it may remove from X information related to Y (Wold et al., 1998). The basic format for SNV is the same as that for the traditional MSC. As already discussed by Dhanoa et al. (1994), there is an obvious similarity between SNV and MSC. This means that MSC and SNV are the same for most practical applications. In addition, OSC could also accomplish a filtering similar to MSC and SNV, but without removing relevant information from X. In OSC pre-processing, the largest variation in matrix X having zero correlation with the target matrix Y can be selectively removed from X (Kim et al., 2005; Wold et al., 1984). In addition, the reflectance measurements are frequently converted to log(1/R) values, which are then used in a manner similar to optical density readings (Nicolaï et al., 2007). Spectral derivative transformation is one of the best methods for removing baseline effects (Duckworth, 2004). The first derivative removes only the baseline; the second derivative removes both baseline and linear trend (Rinnan et al., 2009). In this study, the first and second derivatives were calculated according to the Savitzky-Golay algorithm (Gorry, 1990). To find the derivative at center point, a polynomial is fitted in a symmetric window on the raw spectra. In the Savitzky-Golay 1st and 2nd derivation procedure, a first-order and second-order polynomial was fitted to a spectral window size of 7 data points, respectively. All pre-treatments were implemented in Matlab 7.14 (Works, Inc., USA).

Therefore, a reasonable network setting is of great importance. In this study, the network input layer related to soil reflectance, while the output layer related to soil Cd concentration. A tan-sigmoid transfer function was used in the hidden layer, which allows to approximate non-linear relations between input and output layers (Haykin, 1994). The number of neurons was determined using a trial and error approach suggested by Chiang et al. (2004). Detailed theory can be found in related literature (Haykin, 1994; Hornik et al., 1989). PLSR and BPNN modeling was performed in Matlab 7.14 (Works, Inc., USA), in which the libPLS package (Li et al., 2014) and the Neural Network Toolbox were used, respectively.

2.3.3. Assessment of the prediction quality In this study, all samples were divided into calibration (56 samples) and validation (20 samples) sets based on the KennardStone algorithm (Kennard and Stone, 1969). Calibration samples were used to establish an estimation model, whereas validation samples were used to assess the model's predictive ability (Pandit et al., 2010). Three parameters were calculated to assess prediction accuracies: the coefficient of determination (R2), root mean square error of prediction (RMSE), and residual prediction deviation (RPD). Generally speaking, a robust model should have high R2, RPD and low RMSE. Because RMSE depends on the range of measured values, R2 and RPD are often used to compare and evaluate prediction accuracies. In this study, goodness of prediction is evaluated following the qualitative classifications of Chang et al. (2001). They suggested: i) useless models with R2  0.5 or RPD  1.4; ii) medium quality models with 0.5 < R2  0.80 or 1.4 < RPD  2.0; and iii) excellent models with R2>0.80 or RPD > 2.0.

2.3. Calibration methods 2.3.1. Partial least squares regression PLSR is often adopted to construct predictive models when there are many predictor variables that are highly collinear (Viscarra Rossel et al., 2006). This method is based on latent variable decomposition of two blocks of variables, predictor matrices X (spectra) and response matrices Y (soil attributes) (Kooistra et al., 2001). By fitting a PLSR model, a few new factors, called latent variables (LVs), can be extracted. Generally, the LV matrices have a much lower dimension than matrices X. The selection of the number of LVs is critical to preventing over or under fitting (Volkan Bilgili et al., 2010). Leave-one-out cross-validation is used to determine the number of LVs. To select the optimal model, the root mean square error (RMSE) of predictions is computed. The model with the lowest RMSE is selected. Details on the PLSR algorithm can be found in Haenlein and Kaplan (2004). 2.3.2. Artificial neural network ANNs can be seen as universal model-free approximators representing any non-linear function (Shi et al., 2014). The most popular type of ANN is the back propagation neural network (BPNN). In BPNN, the learning process is based on a series of weight adjustments between so-called nodes in the network in order to minimize a global error between predicted outputs and measured values (Liu and Liu, 2013). The performance of BPNN is very sensitive to the proper setting of the neural network parameters, such as the learning rate, the number of neurons in a hidden layer, and the transfer functions. Too many neurons in the hidden layer not only require more calculation time, but also may cause over-fitting (Huang and Foo, 2002). If the learning rate is set too high, the algorithm can oscillate and become unstable (Liu and Liu, 2013).

2.4. Spatial distribution and risk assessment Many studies have proven that geostatistics is an effective tool in characterizing spatial dependence of soil variables and revealing their regional distribution (Chen et al., 2009). It can use the variogram to provide input parameters for spatial interpolation. These interpolation methods are termed kriging (Webster and Oliver, 2007), in which ordinary kriging is the most commonly used to obtain the spatial distribution of soil properties. Indicator kriging is another interpolation method, which can estimate the probability of exceeding a specific threshold, Zk, at a given location (Goovaerts, 1997). It does not require the data to follow a normal distribution and can inhibit the influence of specific values on the robustness of the variogram. In this method, the stochastic variable, Z(u), is transformed into an indicator variable with a binary distribution (Mamat et al., 2014):

 i ¼ ðu; Zk Þ ¼

1; 0;

ZðuÞ  Zk otherwise

Then, at an un-sampled location, U0, the indicator kriging estimator is obtained based on the following equation:

i* ðU0 ; Zk Þ ¼

n X

  lj i Uj ; Zk

j¼1

where i(Uj;Zk) is the value of the indicator at sampled location j, i*(U0;Zk) is the estimated value at the un-sampled location, n is the number of samples, and lj is the weight coefficient of i*(U0;Zk) and can be used to solve the following equations:

T. Chen et al. / Environmental Pollution 206 (2015) 217e226

221

Table 1 Statistical descriptions for soil Cd concentration (mg kg1) in the study area. Soil Cd mg kg1

Min

Max

Mean ± SD

Skewness

Kurtosis

C.V. %

Pollution ratio %

Calibration set (n ¼ 56) Validation set (n ¼ 20) Whole dataset (n ¼ 76)

0.37 0.392 0.37

5.60 3.596 5.60

1.55 ± 1.26 1.40 ± 0.97 1.51 ± 1.19

1.68 1.01 1.63

2.33 0.08 2.36

81.14 69.38 78.48

85.71 80.00 84.21

Pollution ratio: the percentage of polluted samples (threshold, 0.6 mg kg1).

Table 3 The correlation coefficient between Cd and soil properties (OM and Fe2O3) in the study area. Pearson correlation

OM

Fe2O3

Cd

OM Fe2O3 Cd

1 0.450a 0.335a

1 0.275b

1

a b

Correlation at the 0.01 significant level. Correlation at the 0.05 significant level.

8 n   P > > lj gi Ui  Uj ; Zk þ h ¼ gi ðUi  U0 ; Zk Þ >
0.80 or RPD > 2.0) suggested by Chang et al. (2001), the calibration model could not provide excellent prediction for soil Cd concentration. This

Table 2 Summary statistics for the estimation models of Cd obtained by PLSR based on different pre-treatments. Pre-treatments

Selected wavelengths

LVs

Original SNV MSC ABS 1st D 2nd D OSC

Original: raw spectra. 1st D: Savitzky-Golay 1st derivative. 2nd D: Savitzky-Golay 2nd derivative.

5 6 6 5 6 11 5

Calibration

Validation

RMSEc

R2 c

RMSEp

R2 p

RPD

0.7842 0.9572 0.9600 0.7705 0.8354 0.8086 0.6597

0.6064 0.4136 0.4101 0.6200 0.5533 0.5815 0.7214

0.6388 0.7834 0.7832 0.6398 0.6601 0.6752 0.5518

0.5684 0.3675 0.3671 0.5563 0.6198 0.5220 0.7186

1.5141 1.2346 1.2350 1.5118 1.4653 1.4326 1.7529

222

T. Chen et al. / Environmental Pollution 206 (2015) 217e226

phenomenon also existed in other researchers' works. Gholizadeh et al. (2015) obtained similar prediction results by PLSR in a brown coal mining dumpsite (R2 p ¼ 0.57 and RPD ¼ 1.68). Wu et al. (2007) also got poor prediction for Cd in Nanjing (R2 p ¼ 0.20 and RPD ¼ 1.23). It can thus be seen that soil Cd pollution was not easy to be identified by raw spectra. To obtain better estimations, it is necessary to use various pre-processing methods to purify spectral information. As indicated by R2 p and RMSEp, PLSR using OSC transformed spectra and selected wavelengths for predicting Cd concentration provided the best prediction and accuracy (R2 p ¼ 0.7186 and RMSEp ¼ 0.5518) of all pre-processing models (Table 2). The PLSR model based on the ABS transformation (log(1/R)) obtained the same precision for prediction (R2 p ¼ 0.5563 and RMSEp ¼ 0.6398). In addition, the rest of the pre-treatments (MSC, SNV, 1st D and 2nd D) could not acquire better prediction results relative to raw spectra, in which the worst pre-treatment was SNV, followed by MSC. By comparing the RPD values, the prediction accuracy followed the order: OSC > ABS or raw spectra >1st D or 2nd D > MSC or SNV. Only the OSC pre-treatment yielded better estimation results, whereas the others could not improve estimates. The main reason may be that OSC can effectively eliminate spectral noise that has no or little correlation with soil Cd. In OSC, the correction filter is expressed in such a way that its orthogonality to the target matrix, Y, can be quantified with a bilinear structure, i.e., a product of a ‘score’ matrix times a loading matrix. Through removing the bilinear structure, information irrelevant to Y can be eliminated from spectral matrix, X (Wold et al., 1998). So, application of OSC resulted in a tight correlation between soil spectra and Cd concentrations. However, MSC and SNV cannot be expressed as a bilinear expansion, which means that there are difficulties to orthogonalize against Y. Therefore, some valuable information related to Y may be removed from X, which resulted in lower estimation accuracy. In addition, MSC and SNV methods are mainly designed for baseline corrections, so their benefit is small if the baseline does not vary much. The derivatives emphasize noise in the data more distinctly than the other methods, so they should only be used when a very low noise level is ensured (Buddenbaum and Steffens, 2012). In this study, first and second spectral derivatives failed to achieve good modeling results. This may mainly be due to severe noise existing in the raw spectra which disturbed useful information to be extracted. 3.3. Estimation mechanism Many studies have reported that the mechanisms for estimating soil heavy metal by VNIRS were mainly based on the internal relations between heavy metals and spectrally active properties, such as soil Fe-oxides, organic matter, and clays (Kooistra et al., 2001; Wu et al., 2007). In order to clarify the major mechanisms in sewage irrigation areas, a correlation analysis was carried out between soil Cd, OM and Fe2O3, and results are shown in Table 3. In this study, soil Cd showed a significantly positive correlation

Table 4 The predicted results of different modeling methods. Calibration

BPNN-Original BPNN-OSC PLS-Original PLS-OSC

Validation

RMSEc

R2c

RMSEp

R2p

RPD

0.6173 0.6296 0.7842 0.6597

0.7644 0.7517 0.6064 0.7214

0.4700 0.4068 0.6388 0.5518

0.7563 0.8240 0.5684 0.7186

2.0579 2.3775 1.5141 1.7529

BPNN-Original and PLS-Original were conducted based on no pre-treatment spectra. BPNN-OSC and PLS-OSC were carried out based on OSC transformed spectra.

with OM (r ¼ 0.335**, p ¼ 0.01) and Fe2O3 (r ¼ 0.275*, p ¼ 0.05), which indicated that these two soil properties cannot be neglected during the modeling process. Meanwhile, the regression coefficients in the PLSR model over the wavelength range (400e1000 nm) are shown in Fig. 2. As can be seen in this figure, some important wavelengths, 410, 527, 554, 581e626, 670e690, and 932 nm, were extracted, suggesting their importance for Cd estimation. Viscarra Rossel et al. (2006) even reported that the key wavebands for predicting soil organic carbon were at 410, 570 and 660 nm, which corresponded to the important wavelengths 410, 581e626 and 670e690 nm in this study. In addition, our early work also proved that the source of Cd in this study was mainly from sewage irrigation (Chen et al., 2014). Long-term sewage irrigation not only solved the shortage of agricultural water, but also provided lots of organic nutrition and pollutants for the soil. After about 50 years' sewage irrigation, in this study area soil Cd, along with OM, clearly increased. Due to the strong adsorption ability, it is reasonable to infer that closer correlation with OM is a major predictive mechanism in the studied area. In addition, the important wavelengths 527 and 554 nm also exactly corresponded to the spectral response wavebands of hematite, whereas 932 nm was the spectral response location of goethite (Grove et al., 1992; Viscarra Rossel and Behrens, 2010). Therefore, the internal relation between Fe-oxides and Cd may be another important prediction mechanism. However, by comparing the regression coefficients in Fig. 2, it is not hard to find that the values at 410, 581e626 and 670e690 nm are greater than those at 527, 554 and 932 nm, indicating the importance of OM for estimation higher than Feoxides. In this study, the internal relation with OM should be the major estimation mechanism for Cd. This is different from the findings of Wu et al. (2005) that the correlation with total Fe (including active and residual Fe) is the major estimation mechanism for farmland soil Cd in Nanjing. In a sewage irrigation area, associations with soil OM may be the main form of Cd binding in soils. Based on this internal relation, chemometric models can be indirectly developed for soil samples in order to screen the Cd concentration. 3.4. Estimation accuracy based on different modeling methods Soil spectral values (at selected wavelengths) and Cd concentrations in the calibration set were used as input and output layer data, respectively, to build a BPNN model. The prediction performance of the BPNN was validated using 20 independent samples in the validation set. The predicted results are shown in Table 4. The prediction accuracies obtained by BPNN were all higher than those acquired by PLSR, whether they were based on the raw spectra or the optimal OSC transformed spectra. The highest R2 and RPD was obtained by BPNN using OSC transformed spectra (R2 p ¼ 0.8240 and RPD ¼ 2.3775). The lowest R2 and RPD was obtained by PLSR based on raw spectra (R2 p ¼ 0.5684 and RPD ¼ 1.5141). By comparing the RPD values, the prediction accuracy followed the order: BPNN-OSC > BPNN-Original > PLSR-OSC > PLSR-Original, of which the RMSEp gradually increased (0.4068, 0.4700, 0.5518 and 0.6388, respectively). Compared to PLSR modeling, BPNN can achieve a more accurate estimation. This may be because the internal relations between Cd and soil spectrum-active components are not always linear but non-linear relationships may be more common. Although PLSR can efficiently solve the collinearity problem, there are still many deficiencies in dealing with non-linear relationships, which happens to be the advantage of BPNN. The higher estimated accuracy means the predicted values were closer to the true values, i.e. the scatter points of predicted Cd vs. measured Cd were closer to the 1:1 line. Scatter plots for the samples in validation dataset are shown in Fig. 3. From the figure it

T. Chen et al. / Environmental Pollution 206 (2015) 217e226

223

Fig. 3. Scatter plots of predicted values vs. measured values based on different pre-treatments and modeling methods.

can be seen that most scatter points are close to the 1:1 line. However, in Fig. 3A and B, low Cd concentrations were overestimated, with some scatter points far above the 1:1 line. Only the estimation points obtained by BPNN-Original and BPNN-OSC were more evenly distributed along the 1:1 line. Therefore, from the scatter plots it can also be seen that more accurate Cd concentrations were estimated by the BPNN calibration method. In practical applications, measurement of soil heavy metal concentration is used to judge whether soil is polluted, to identify its pollution level, and further to provide proposals for future remediation. Based on the Soil Environment Quality Standard of China (0.6 mg kg1, pH > 7.5), soil Cd pollution for farmland is often divided into 4 classes: clean (Cd  0.6 mg kg1), low pollution (0.6 < Cd  1.2 mg kg1), moderate pollution (1.2
2.4 mg kg1). In this study, soil Cd pollution levels included all 4 classes ranging from clean to serious pollution. Fig. 3 shows the boundaries of different pollution levels by dotted lines and 4 classes of pollution are outlined (named I, II, III, and IV, respectively). When the scatter points fall within the boundaries of a certain class, they are correctly classified. From Fig. 3 it can be seen that more scatter points obtained by PLSR-Original and PLSR-OSC fell outside the classification boundaries, indicating that PLSR modeling had lower classification accuracies than BPNN. According to the numbers of correctly classified samples, the successful classification rates were 75% for BPNN-OSC (15 samples in validation dataset were correctly classified), 70% for BPNN-Original (14 samples), 65% for PLSR-OSC (13 samples), and 50% for PLSR-Original (10 samples). In misclassified

224

T. Chen et al. / Environmental Pollution 206 (2015) 217e226

Fig. 4. Filled contours maps of soil Cd: A was produced by ordinary kriging based on measured data; and B was produced by ordinary kriging based on the estimated Cd values by BPNN-OSC.

samples, the Cd pollution level of most soil samples was overestimated, which may lead to stricter remediation measures being adopted in the future. 3.5. Spatial distribution and risk assessment Soil is a spatially heterogeneous system. For developing various control measures and adjusting crop cultivation, it is necessary to quickly obtain information on the spatial distribution of soil heavy metals. In this study, both the measured Cd concentration and predicted Cd concentration as an example were used to verify the feasibility of rapidly identifying soil Cd pollution hotspots at regional scale. Fig. 4 presents the spatial patterns of soil Cd, which were obtained by ordinary kriging based on the measured Cd concentrations derived from chemical analysis and predicted Cd values derived from the BPNN-OSC estimation. The obtained semivariogram model parameters are shown in Table 5. As shown in this table, exponential models were fitted for the measured Cd and predicted Cd. In this study, the anisotropy ratios of both variables were 1.50, indicating weak directional variations. The nugget/sill ratios were 43.90% and 14.51%, respectively, meaning stronger spatial variations in the measured Cd dataset. The spatial patterns

Fig. 5. The estimated probability maps of soil Cd: A was produced by indicator kriging based on measured data; B was produced by indicator kriging based on estimated Cd values by BPNN-OSC (threshold is 1.2 mg kg1).

of the measured Cd and predicted Cd are presented in Fig. 4. Their corresponding root mean square errors (RMSE) were 0.9330 and 0.9153, respectively, being close to 1. As can be seen from Fig. 4, the spatial distribution of soil Cd showed very similar geographical trends between Fig. 4A and B, with low concentrations in the west area and high concentrations in the east area. Considering the distribution of wastewater irrigation channels, it is not difficult to deduce that an increased Cd concentration had a close relationship with long-term sewage irrigation. In addition, because the spatial distribution of soil Cd in Fig. 4B was derived from estimates based on BPNN-OSC models and then followed by ordinary kriging interpolation, whereas Cd distribution in Fig. 4A was obtained by straightforward interpolation from measured Cd concentrations, the former one had a higher estimation error. Even so, the soil Cd pollution pattern was still revealed clearly, which could provide significant information for further remediation. To understand soil Cd pollution risk for farmlands, the estimated probability of exceeding a certain threshold (1.2 mg kg1 was chosen) was assessed by indicator kriging and results are given in Fig. 5. From the figure, it can be seen that the probability distribution of Cd exceeding 1.2 mg kg1 demonstrated a very similar spatial pattern as presented in Fig. 4, with high pollution risk in the

Table 5 Semi-variogram model parameters for Cd concentration. Variables

Model

Nugget (C0)

Sill (C0þC)

Nugget/Sill (%)

Major range (m)

Minor range (m)

Anisotropy ratio

RMSE

Measured Cd Predicted Cd

E E

0.6765 0.1479

1.5410 1.0196

43.90 14.51

1276.30 902.66

850.84 601.77

1.50 1.50

0.9330 0.9153

E is the Exponential model.

T. Chen et al. / Environmental Pollution 206 (2015) 217e226

east and low risk in the west. Meanwhile, the probability maps in Fig. 5A and B, which were derived from estimations based on measured Cd values and BPNN-OSC predicted Cd values, respectively, resembled each other in terms of identifying hotspots (high risk areas). For example, in this study the probability (soil Cd exceeding 1.2 mg kg1) greater than 80% (i.e. U > 80%) is considered as high risk hotspot areas. Based on this premise, there was 2.04 km2 (14.31% of the study area) and 1.93 km2 (13.53%) farmland identified as high risk hotspots from Fig. 5A and B, respectively, both areas being almost the same. It can be seen that the hotspots of Cd pollution risk could be roughly determined by interpolating the predicted Cd values obtained form BPNN-OSC estimates. Overall, in this study the spatial pattern of soil Cd and its pollution risk hotspots could be identified by interpolating the predicted Cd values obtained from BPNN-OSC estimates. Although there were still many uncertainties compared with the conventional method, it could rapidly reveal the distribution of high risk areas for soil Cd. Considering its simplicity, high efficiency and low cost, the estimation results were considered acceptable and reasonable. In this study, results have shown that Cd was obviously accumulated in the topsoil after long-term sewage irrigation. Similar phenomena occurred in many city suburbs of China, in which the famous sewage irrigation areas included the Tonghuihe River and Liangshuihe River irrigation area of Beijing (Liu et al., 2005), the Zhangshi irrigation area of Shenyang (Sun et al., 2006), the Dagu and Beitang irrigation area of Tianjing (Zhu et al., 2012), the Muye irrigation area of Xinxiang (Wang et al., 2009), and so on. In these irrigation areas, long-term sewage irrigation not only led to a significant increase of total Cd, but also greatly promoted the activation of Cd that raised its pollution risk (Chen et al., 2014). Therefore, once a prediction model based on NIRA is successfully developed, it can provide a convenient, time-saving and effort-saving choice for further predictions in these irrigation areas. Especially for applications that do not overemphasize the measuring accuracy, such as land use adjustment or ecological assessments, the saving of time, labor and money is very attractive. Meanwhile, this research approach also can be used for reference in studies on other pollutants. However, it is also worth noting that the estimation of soil heavy metal concentration based on VNIRS is only a beneficial supplement, and not an alternative to traditional methods.

4. Conclusion Cd was significantly accumulated in the topsoil of our study area after long-term sewage irrigation. VNIRS exhibited good potential for detecting soil Cd pollution. Compared with other pretreatments, OSC can effectively remove irrelevant information to improve prediction accuracy. Meanwhile, BPNN could more effectively estimate soil Cd concentrations compared to PLSR modeling. In this sewage irrigation area, the predictive mechanism for estimating soil Cd by VNRIS was mainly based on the close relationship between Cd and OM. In addition, the spatial pattern of soil Cd and its pollution risk hotspots could be identified by interpolating the predicted Cd values obtained from spectral estimates, which provided a rapid and convenient solution for periodically monitoring soil Cd changes in the same study area or in similar areas.

Acknowledgement The research was sponsored by the National High Technology Research and Development Program of China (2013AA102401) and the National Study Abroad Funding.

225

Appendix A. Supplementary data Supplementary data related to this article can be found at http:// dx.doi.org/10.1016/j.envpol.2015.07.009. References Agricultural Chemistry Committee of China, 1983. Conventional Methods of Soil and Agricultural Chemistry Analysis. Science Press, Beijing (in Chinese). Al-Asadi, R.A., Mouazen, A.M., 2014. Combining frequency domain reflectometry and visible and near infrared spectroscopy for assessment of soil bulk density. Soil Tillage Res. 135, 60e70. Ben-Dor, E., Banin, A., 1995. Near-infrared analysis as a rapid method to simultaneously evaluate several soil properties. Soil Sci. Soc. Am. J. 59, 364e372. Ben-Dor, E., Inbar, Y., Chen, Y., 1997. The reflectance spectra of organic matter in the visible near-infrared and short wave infrared region (400e2500 nm) during a controlled decomposition process. Remote Sens. Environ. 61, 1e15. Buddenbaum, H., Steffens, M., 2012. The effects of spectral pretreatments on chemometric analyses of soil profiles using laboratory imaging spectroscopy. Appl. Environ. Soil Sci. 2012, 12. Chang, C.W., Laird, D.A., Mausbach, M.J., Hurburgh, C.R., 2001. Near-infrared reflectance spectroscopy-principal components regression analyses of soil properties. Soil Sci. Soc. Am. J. 65, 480e490. Chen, T., Chang, Q., Liu, J., 2014. Fractions and bioavailability spatial distribution of soil Cd under long-term sewage irrigation. J. Agro-Environ. Sci. 33, 1322e1327 (in Chinese). Chen, T., Liu, X., Li, X., Zhao, K., Zhang, J., Xu, J., Shi, J., Dahlgren, R.A., 2009. Heavy metal sources identification and sampling uncertainty analysis in a field-scale vegetable soil of Hangzhou, China. Environ. Pollut. 157, 1003e1010. Chiang, Y.-M., Chang, L.-C., Chang, F.-J., 2004. Comparison of static-feedforward and dynamic-feedback neural networks for rainfallerunoff modeling. J. Hydrol. 290, 297e311. Choe, E., van der Meer, F., van Ruitenbeek, F., van der Werff, H., de Smeth, B., Kim, K.W., 2008. Mapping of heavy metal pollution in stream sediments using combined geochemistry, field spectroscopy, and hyperspectral remote sensing: a case study of the Rodalquilar mining area, SE Spain. Remote Sens. Environ. 112, 3222e3233. Dalal, R.C., Henry, R.J., 1986. Simultaneous determination of moisture, organic carbon, and total nitrogen by near infrared reflectance spectrophotometry. Soil Sci. Soc. Am. J. 50, 120e123. Daniel, K.W., Tripathi, N.K., Honda, K., 2003. Artificial neural network analysis of laboratory and in situ spectra for the estimation of macronutrients in soils of Lop Buri (Thailand). Soil Res. 41, 47e59. Das, P., Samantaray, S., Rout, G.R., 1997. Studies on cadmium toxicity in plants: a review. Environ. Pollut. 98, 29e36. Dhanoa, M.S., Lister, S.J., Sanderson, R., Barnes, R.J., 1994. The link between Multiplicative Scatter Correction (MSC) and Standard Normal Variate (SNV) transformations of NIR spectra. J. Near Infrared Spectrosc. 2, 43e47. Duckworth, J., 2004. Mathematical data preprocessing. In: Roberts, C.A., Workman Jr., J., Reeves Iii, J.B. (Eds.), Near-Infrared Spectroscopy in Agriculture. American Society of Agronomy, Crop Science Society of America, Soil Science Society of America, Madison, WI, pp. 115e132. Farifteh, J., Van der Meer, F., Atzberger, C., Carranza, E.J.M., 2007. Quantitative analysis of salt-affected soil reflectance spectra: a comparison of two adaptive methods (PLSR and ANN). Remote Sens. Environ. 110, 59e78. Gholizadeh, A., Bor uvka, L., Vas at, R., Saberioon, M., Klement, A., Kratina, J., Tejnecký, V., Dr abek, O., 2015. Estimation of potentially toxic elements contamination in anthropogenic soils on a brown coal mining dumpsite by reflectance spectroscopy: a case study. PLoS One 10, e0117457. Goldshleger, N., Chudnovsky, A., Ben-Binyamin, R., 2013. Predicting salinity in tomato using soil reflectance spectra. Int. J. Remote Sens. 34, 6079e6093. Goldshleger, N., Chudnovsky, A., Ben-Dor, E., 2012. Using reflectance spectroscopy and artificial neural network to assess water infiltration rate into the soil profile. Appl. Environ. Soil Sci. 2012, 9. Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation. Oxford University Press. Gorry, P.A., 1990. General least-squares smoothing and differentiation by the convolution (Savitzky-Golay) method. Anal. Chem. 62, 570e573. Grove, C.I., Hook, S.J., Paylor, E.D., Laboratory, J.P., Aeronautics, U.S.N., Administration, S., 1992. Laboratory Reflectance Spectra of 160 Minerals, 0.4 to 2.5 Micrometers. NASA [and] Jet Propulsion Laboratory, California Institute of Technology. Gussarsson, M., Asp, H., Adalsteinsson, S., Jensen, P., 1996. Enhancement of cadmium effects on growth and nutrient composition of birch (Betula pendula) by buthionine sulphoximine (BSO). J. Exp. Bot. 47, 211e215. Haenlein, M., Kaplan, A.M., 2004. A beginner's guide to partial least squares analysis. Underst. Stat. 3, 283e297. Haghiri, F., 1974. Plant uptake of cadmium as influenced by cation exchange capacity, organic matter, zinc, and soil temperature. J. Environ. Qual. 3, 180e183. Haykin, S.S., 1994. Neural Networks: a Comprehensive Foundation. Macmillan. Hively, W.D., McCarty, G.W., Reeves, J.B., Lang, M.W., Oesterling, R.A., Delwiche, S.R., 2011. Use of airborne hyperspectral imagery to map soil properties in tilled agricultural Fields. Appl. Environ. Soil Sci. 2011, 13.

226

T. Chen et al. / Environmental Pollution 206 (2015) 217e226

Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359e366. Huang, W., Foo, S., 2002. Neural network modeling of salinity variation in Apalachicola River. Water Res. 36, 356e362. Jarup, L., 2003. Hazards of heavy metal contamination. Br. Med. Bull. 68, 167e182. Kemper, T., Sommer, S., 2002. Estimate of heavy metal contamination in soils after a mining accident using reflectance spectroscopy. Environ. Sci. Technol. 36, 2742e2747. Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics 11, 137e148. Kim, K., Lee, J.-M., Lee, I.-B., 2005. A novel multivariate regression approach based on kernel partial least squares with orthogonal signal correction. Chemom. Intell. Lab. Syst. 79, 22e30. Kooistra, L., Wehrens, R., Leuven, R.S.E.W., Buydens, L.M.C., 2001. Possibilities of visibleenear-infrared spectroscopy for the assessment of soil contamination in river floodplains. Anal. Chim. Acta 446, 97e105. Li, H.D., Xu, Q.S., Liang, Y.Z., 2014. LibPLS: an integrated library for partial least squares regression and discriminant analysis. PeerJ Prepr. 2, e190v191. http:// dx.doi.org/110.7287/peerj.preprints.7190v7281. € derlund, L., Liu, G., 2005. Impacts of sewage irrigation Liu, W., Zhao, J., Ouyang, Z., So on heavy metal distribution and contamination in Beijing, China. Environ. Int. 31, 805e812. Liu, X., Liu, J., 2013. Using short wave visible-near infrared reflectance spectroscopy to predict soil properties and content. Spectrosc. Lett. 47, 729e739. Malley, D.F., Williams, P.C., 1997. Use of near-infrared reflectance spectroscopy in prediction of heavy metals in freshwater sediment by their association with organic matter. Environ. Sci. Technol. 31, 3461e3467. Mamat, Z., Yimit, H., Ji, R.Z.A., Eziz, M., 2014. Source identification and hazardous risk delineation of heavy metal contamination in Yanqi basin, northwest China. Sci. Total Environ. 493, 1098e1111. Mclaughlin, M.J., Singh, B.R., 1999. Cadmium in Soils and Plants. Kluwer Academic Publishers. Moros, J., Vallejuelo, S.F.-O.d, Gredilla, A., Diego, A.d, Madariaga, J.M., Garrigues, S., Guardia, M.d.l, 2009. Use of reflectance infrared spectroscopy for monitoring the metal content of the estuarine sediments of the Nerbioi-Ibaizabal River (Metropolitan Bilbao, Bay of Biscay, Basque Country). Environ. Sci. Technol. 43, 9314e9320. Nicolaï, B.M., Beullens, K., Bobelyn, E., Peirs, A., Saeys, W., Theron, K.I., Lammertyn, J., 2007. Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: a review. Postharvest Biol. Technol. 46, 99e118. Pandit, C.M., Filippelli, G.M., Li, L., 2010. Estimation of heavy-metal contamination in soil using reflectance spectroscopy and partial least-squares regression. Int. J. Remote Sens. 31, 4111e4123. Peralta-Videa, J.R., Lopez, M.L., Narayan, M., Saupe, G., Gardea-Torresdey, J., 2009. The biochemistry of environmental heavy metal uptake by plants: implications for the food chain. Int. J. Biochem. Cell. Biol. 41, 1665e1677. Ren, H.-Y., Zhuang, D.-F., Singh, A.N., Pan, J.-J., Qiu, D.-S., Shi, R.-H., 2009. Estimation of As and Cu contamination in agricultural soils around a mining area by reflectance spectroscopy: a case study. Pedosphere 19, 719e726. Rinnan, Å., Berg, F.v.d, Engelsen, S.B., 2009. Review of the most common pre-

processing techniques for near-infrared spectra. TrAC Trends Anal. Chem. 28, 1201e1222. Satarug, S., Garrett, S.H., Sens, M.A., Sens, D.A., 2010. Cadmium, environmental exposure, and health outcomes. Environ. Health Perspect. 118, 182e190. Shi, T., Chen, Y., Liu, Y., Wu, G., 2014. Visible and near-infrared reflectance spectroscopydAn alternative for monitoring soil contamination by heavy metals. J. Hazard. Mater. 265, 166e176. Siebielec, G., McCarty, G.W., Stuczynski, T.I., Reeves 3rd, J.B., 2004. Near- and midinfrared diffuse reflectance spectroscopy for measuring soil metal content. J. Environ. Qual. 33, 2056e2069. Stenberg, B., Rossel, R.A.V., 2010. Diffuse reflectance spectroscopy for highresolution soil sensing. In: Viscarra Rossel, R.A., McBratney, A.B., Minasny, B. (Eds.), Proximal Soil Sensing. Springer, Netherlands, pp. 29e47. Sun, L., Zhang, Y., Sun, T., Gong, Z., Lin, X., Li, H., 2006. Temporal-spatial distribution and variability of cadmium contamination in soils in Shenyang Zhangshi irrigation area, China. J. Environ. Sci. 18, 1241e1246. Tchounwou, P., Yedjou, C., Patlolla, A., Sutton, D., 2012. Heavy metal toxicity and the environment. In: Luch, A. (Ed.), Molecular, Clinical and Environmental Toxicology. Springer Basel, pp. 133e164. Viscarra Rossel, R.A., Walvoort, D.J.J., McBratney, A.B., Janik, L.J., Skjemstad, J.O., 2006. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 131, 59e75. Viscarra Rossel, R.A.V., Behrens, T., 2010. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 158, 46e54. Volkan Bilgili, A., van Es, H.M., Akbas, F., Durak, A., Hively, W.D., 2010. Visible-near infrared reflectance spectroscopy for assessment of soil properties in a semiarid area of Turkey. J. Arid Environ. 74, 229e238. Wang, J., Cui, L., Gao, W., Shi, T., Chen, Y., Gao, Y., 2014. Prediction of low heavy metal concentrations in agricultural soils using visible and near-infrared reflectance spectroscopy. Geoderma 216, 1e9. Wang, J.L., Zhang, C.Y., Wang, X.F., Zhu, G.F., Chen, D.J., 2009. Effect of long-term batteries wastewater irrigation on the heavy metal speciation in cultivated soils. Chin. J. Soil Sci. 40, 1181e1184 (in Chinese). Webster, R., Oliver, M.A., 2007. Geostatistics for Environmental Scientists. Wiley. € Wold, S., Antti, H., Lindgren, F., Ohman, J., 1998. Orthogonal signal correction of near-infrared spectra. Chemom. Intell. Lab. Syst. 44, 175e185. Wold, S., Ruhe, A., Wold, H., Dunn, I.,W., 1984. The collinearity problem in linear regression: the partial least squares (PLS) approach to generalized inverses. SIAM J. Sci. Stat. Comput. 5, 735e743. Wu, Y., Chen, J., Ji, J., Gong, P., Liao, Q., Tian, Q., Ma, H., 2007. A mechanism study of reflectance spectroscopy for investigating heavy metals in soils. Soil Sci. Soc. Am. J. 71, 918e926. Wu, Y.Z., Chen, J., Ji, J.F., Tian, Q.J., Wu, X.M., 2005. Feasibility of reflectance spectroscopy for the assessment of soil mercury contamination. Environ. Sci. Technol. 39, 873e878. Zhu, Z., Wang, Z., Li, J., Li, Y., Zhang, Z., Zhang, P., 2012. Distribution of rare earth elements in sewage-irrigated soil profiles in Tianjin, China. J. Rare Earths 30, 609e613.

Rapid identification of soil cadmium pollution risk at regional scale based on visible and near-infrared spectroscopy.

Soil heavy metal pollution due to long-term sewage irrigation is a serious environmental problem in many irrigation areas in northern China. Quickly i...
2MB Sizes 0 Downloads 10 Views