Journal of Hazardous Materials 265 (2014) 166–176
Contents lists available at ScienceDirect
Journal of Hazardous Materials journal homepage: www.elsevier.com/locate/jhazmat
Review
Visible and near-infrared reflectance spectroscopy—An alternative for monitoring soil contamination by heavy metals Tiezhu Shi a , Yiyun Chen a , Yaolin Liu a , Guofeng Wu b,a,∗ a School of Resource and Environmental Science & Key Laboratory of Geographic Information System of the Ministry of Education, Wuhan University, 430079 Wuhan, China b Key Laboratory for Geo-Environment Monitoring of Coastal Zone, National Administration of Surveying, Mapping and GeoInformation & Shenzhen Key Laboratory of Spatial Smart Sensing and Services & College of Life Sciences, Shenzhen University, 518060 Shenzhen, China
h i g h l i g h t s • • • • •
Visible and near-infrared reflectance spectroscopy can monitor soil heavy metals. Summary on mechanisms for estimating heavy metal concentrations. Discussions on three types of spectra and their roles in monitoring heavy metals. Comprehensive review on methods for monitoring heavy metals. Discussions on challenges in mapping soil contaminations with heavy metals.
a r t i c l e
i n f o
Article history: Received 25 September 2013 Received in revised form 28 November 2013 Accepted 29 November 2013 Available online 7 December 2013 Keywords: Pre-processing method Spectral index Variable selection method Modeling strategy Calibration method Hyperspectral image
a b s t r a c t Soil contamination by heavy metals is an increasingly important problem worldwide. Quick and reliable access to heavy metal concentration data is crucial for soil monitoring and remediation. Visible and near-infrared reflectance spectroscopy, which is known as a noninvasive, cost-effective, and environmentally friendly technique, has potential for the simultaneous estimation of the various heavy metal concentrations in soil. Moreover, it provides a valid alternative method for the estimation of heavy metal concentrations over large areas and long periods of time. This paper reviews the state of the art and presents the mechanisms, data, and methods for the estimation of heavy metal concentrations by the use of visible and near-infrared reflectance spectroscopy. The challenges facing the application of hyperspectral images in mapping soil contamination over large areas are also discussed. © 2014 Elsevier B.V. All rights reserved.
Contents 1. 2. 3. 4.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mechanisms for the estimation of heavy metal concentrations in soils by the use of visible and near-infrared reflectance spectroscopy . . . . . . . Spectral data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Pre-processing methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Spectral indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Variable selection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4. Modeling strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5. Calibration methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
167 167 169 169 169 170 170 171 171
∗ Corresponding author at: Key Laboratory for Geo-Environment Monitoring of Coastal Zone, National Administration of Surveying, Mapping and GeoInformation & Shenzhen Key Laboratory of Spatial Smart Sensing and Services & College of Life Sciences, Shenzhen University, 518060 Shenzhen, China. Tel.: +86 13517128444. E-mail address:
[email protected] (G. Wu). 0304-3894/$ – see front matter © 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jhazmat.2013.11.059
T. Shi et al. / Journal of Hazardous Materials 265 (2014) 166–176
5.
167
Future perspectives and summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
1. Introduction Soil is a dynamic, open, and complex system occurring in the upper few meters of the earth’s surface, at the interface of atmosphere, biosphere, hydrosphere, and geosphere [1]. A major function of soil is to provide fundamental natural resources for the survival of plants, animals, and the human race [2]. The functions of soil depend on the balances of its structure and composition, as well as the chemical, biological, and physical properties [3]. These balances are, however, being disrupted by highly accumulated heavy metals in soils [4], due to anthropogenic activities, such as industrial pollutants, pesticides, livestock wastewater, mine drainage, and petroleum contamination [5,6]. The chemical, biological, and physical unbalance caused by soil contamination by heavy metals may be detrimental to plant, animal, and human health. For example, the symptoms of reduced root growth, reduced seed sprouting, and seedling stunting, necrosis, and chlorosis appear in susceptible plants growing in soils contaminated with heavy metals [7,8]. Agricultural crops (fruits, grains, and vegetables) for livestock or human consumption, growing on contaminated soil, can potentially uptake and accumulate heavy metals in their edible plant parts, and may be hazardous to animal and human health through the food chains [5,9,10]. A reliable and environmentally friendly method is therefore needed to rapidly detect and survey the spatial distribution of soil heavy metals, to diagnose suspected contaminated areas as well as control the rehabilitation processes [11]. The conventional method of obtaining the spatial distribution of heavy metals is based on regular field samplings and subsequent chemical analyses in the laboratory (e.g. wet chemistry), followed by geo-statistical interpolation [12,13]. However, this method may be costly and time-consuming as a result of the intensive soil samplings in the field and the analyses in the laboratory. Moreover, such investigations can only provide limited information at specific locations and moments in time, and they cannot describe the spatial and temporal dynamics of heavy metal concentrations over large areas [14]. Visible and near-infrared reflectance spectroscopy (VNIRS, 350–2500 nm) provides a valid alternative to the conventional method for the estimation of heavy metal concentrations in soils, and it applies the spectral information of soils to estimate the soil properties (including heavy metal concentrations). Recent studies have suggested that VNIRS can provide estimations of the soil physical, chemical, and biological properties [15,16]. Compared with the conventional analytical methods, the practical advantages of the VNIRS technique include: (i) the technique is non-destructive and cost-effective; (ii) no or less hazardous chemical reagents are required; (iii) the measurement is fast and repeatable; (iv) several soil properties can be estimated from a single scan; (v) the technique can be used both in the laboratory and in situ; and (vi) this technique has better spatial and temporal continuities [15,17]. The first study of the accurate estimation of soil heavy metal concentrations by the use of VNIRS was published in 1997 [18]; however, the application of this technique only really began to take off much more recently [19–35]. These recent studies have explored the mechanism and have perfected the technical approaches (see Fig. 1) in the field of using VNIRS to estimate soil heavy metal concentrations. In this paper we: (i) review the mechanisms, data, and methods for the estimation of soil heavy metal concentrations with VNIRS;
(ii) discuss the usefulness and challenges of hyperspectral images for mapping heavy metal contamination over large areas; and (iii) describe the pre-processing methods, spectral indices, variable selection methods, modeling strategies, and calibration methods. The purpose of this review is to promote the research and application of VNIRS in the monitoring of soil contamination by heavy metals.
2. Mechanisms for the estimation of heavy metal concentrations in soils by the use of visible and near-infrared reflectance spectroscopy The absorptions of soils over the visible/near-infrared spectral regions (350–2500 nm) are primarily associated with Fe-oxides, clay minerals, water, and organic matter, as a consequence of the vibrational energy transitions of these dominant molecular bonds [36]. Most Fe-oxides in soils, e.g. goethite (␣-FeOOH), have absorptions over the visible (350–780 nm) and short-wave near-infrared (780–1100 nm) spectral regions [37]. Clay minerals hold spectral features in the long-wave near-infrared (1100–2500 nm) regions, due to OH, H2 O, and CO3 overtones and combination vibrations [38]. Water has strong absorption features over visible/nearinfrared regions, most visibly near 1400 and 1900 nm, while there are weaker overtone bands elsewhere [36]. Soil organic matter has distinct absorption features over the visible/near-infrared regions, due to the various chemical bonds such as C H, C C, C C, C N, and O H [39]. Some transition elements (such as Ni, Cu, Co, and Cr) can also exhibit absorption features in the visible/near-infrared regions under two particular conditions: (i) the elements are present at very high concentrations (>4000 mg kg−1 ); and (ii) they have an unfilled d shell [30]. The reason for this is that when the atom of a transition element is located in a crystal field, the energy levels of the d orbits will split, that is, an electron moves from a lower level to a higher one. Thus, the electron transition results in the electromagnetic energy being absorbed [38]. Wu et al. [30] noted that Cr and Cu show spectral features (610 and 830 nm) at concentrations >4000 mg kg−1 . Thus, when the heavy metals hold absorption features, they can be estimated based on their intimate and direct relationships with the spectral features [19]. However, such high soil contamination may only occur in mining or industrial areas. In most areas, the heavy metal concentrations in soils are usually found at trace or ultra-trace level. When heavy metals are present in only small amounts in the soil, they do not have spectral features in the visible/near-infrared regions, which means that it is difficult to directly estimate the heavy metal concentrations in the soil by the use of the soil spectral features [40]. Although soil heavy metals with low or moderate concentrations are spectrally featureless, they can be easily bound to Fe-oxides, clays, and organic matter [30]. Metal cations (M2+ ) adsorbed onto such hydroxylated surface sites (ROH, in which R can be Al, Fe, Mn, Si, etc. upon mineral surfaces) are generally described as follows: ROH + M2+ = RO − M+ + H+ ; consequently, an increase in the cations of heavy metals results in a decrease in ROH and an increase in RO (e.g. Fe-oxides) on the surfaces of the clay and oxide minerals [19,40]. Moreover, heavy metals are bound to soil organic matter due to metal complexation (Fig. 2, [41]), and the depletion of soil organic matter (e.g. the decomposition of soil organic
168
T. Shi et al. / Journal of Hazardous Materials 265 (2014) 166–176
Fig. 1. Technical approaches to estimating heavy metal concentrations in soils using visible and near-infrared reflectance spectroscopy.
matter) can strongly influence the behavior of soil heavy metals [42]). Therefore, the concentrations of soil heavy metals are related to those of clays, Fe-oxides, and soil organic matter (the relationship between heavy metals and soil organic matter is well described in [43]).
Fig. 2. Different interaction forms of the complexation between metal cations (M2+ ) and soil organic matter (humic substances).
As a result of the relationships between the heavy metals and other soil properties with spectral features, the featureless heavy metals can be estimated using the reflectance spectra of soils [25,31,44]. This indirect mechanism for heavy metal concentration estimation was well illustrated by Liu et al. [14]. In their study, spectral data were first used to estimate the soil organic matter as Model 1. Model 2 between the heavy metal concentrations and soil organic matter was then established. Finally, the combination of Model 1 and Model 2 could then be employed to estimate heavy metal concentrations from spectral data. As an alternative to using the spectral features of soil heavy metals to directly estimate the concentrations, more and more studies have employed the relationships between soil heavy metals and organic matter [14,18,25,34], Fe-oxides [22,23,30,32], or clays [22,25,32,34] to estimate heavy metal concentrations in soils. However, among the organic matter, Fe-oxides, and clays, which one plays the role of a bridge linking the heavy metals and reflectance spectra is location-based and varies from one area to another. This means that the mechanisms for heavy metal concentration estimation may be different for different areas. The reflectance of plants is influenced by three main factors: chlorophyll content, cellular structure, and water content [45]. Susceptible plants growing in heavy metal polluted areas show signs of
T. Shi et al. / Journal of Hazardous Materials 265 (2014) 166–176
stress, including an increase in chlorophyll hydrolysis, a decrease in water content, and destruction of cellular structure [46,47]. These modifications may change the visible and near-infrared reflectance spectra of plants, which, therefore, provide a foundation for estimating soil heavy metal concentrations indirectly through plant spectra. For example, Kooistra et al. [26] found that the visible and near-infrared reflectance spectra of ryegrass (Lolium perenne) could be used to estimate elevated Zn contamination levels in the soil of the floodplain of the rivers Rhine and Meuse (the Netherlands); Kooistra et al. [27] also found satisfactory relationships between the Ni, Cd, Cu, Zn, and Pb concentrations in river floodplain soil and the reflectance spectra of grass species (Poa annua and Lolium perenne) in the spectral region of 400–1350 nm. However, some metal-tolerant plant species (e.g. dayflower, bristlegrass, simon poplar) are able to accumulate fairly large amounts of heavy metals without showing signs of stress [48]. This indicates that not all the reflectance spectra of plants can be used to estimate heavy metal concentrations in soils. Nonetheless, plant VNIRS provides an alternative for the monitoring of soil contamination by heavy metals for areas covered by plants. 3. Spectral data acquisition VNIRS techniques involve three types of spectral data acquired at different spatial scales and in different environments: (i) laboratory spectra; (ii) field spectra; and (iii) remotely sensed spectra. Laboratory and field spectra rely on ground-based sensors (usually point spectrometry), while remotely sensed spectra are based on air- or space-borne sensors (usually image spectrometry) [49]. For laboratory spectra, all the stones and litter debris are removed from the soil, and the soil is generally crushed to pass a 2 mm sieve before spectral measurement. The laboratory conditions ensure a stable indoor environment in terms of stable temperature, illumination, and air humidity. The geometric conditions of the illumination and instrumentation are illustrated in Fig. 3. ASD FieldSpec® 3 portable spectroradiometers (Analytical Spectral Devices Inc., USA) and Foss NIRSystems 5000 spectrophotometer (Silver Spring, MD, USA) are the most commonly used spectroradiometers. Laboratory spectra provide controlled data to explore the feasibility, extract the spectral features, and investigate the mechanism for estimating heavy metal concentrations in soils [14,18,20,28,31]. Many studies have proved that laboratory VNIRS is a reliable, fast, and economical method for the estimation of heavy metal concentrations in soils, when compared with the conventional methods [14,18,23,28,50]. Laboratory VNIRS can therefore serve as an alternative for the estimation of heavy metal concentrations in soils. Field spectra are generally acquired in the field by the use of a portable spectroradiometer. Field spectra can be of great benefit as
Fig. 3. Geometric factors of equipment set-up and an illustration of diffuse reflectance [51]. The dashed lines represent the viewing field of the probe.
169
they can be acquired quickly and almost continuously [52]. The use of field spectra, however, requires some constraints to be taken into account, since the natural soil surface (vegetation, moisture, roughness, stoniness), atmospheric, and illumination conditions can have major influences on the field spectra [52,53]. For example, the reflectance decreases steadily as the chunk size and moisture of soil increases [54,55], which indicates that the soil roughness and moisture can alter the soil reflectance. This change in the reflectance will further influence the estimation accuracies of soil properties by the use of empirical regression models. The application of the remotely sensed spectra of soils faces the same constraints. Therefore, field spectra provide inexpensive and available data for exploring the solutions to these constraints. In addition, the models derived from field spectra can be extended to air- or space-borne hyperspectral imaging spectrometers [19,56]. To date, the studies of field VNIRS have mainly focused on establishing the foundations for the application of remotely sensed spectra in the mapping of soil contamination by heavy metals. Remotely sensed spectra (hyperspectral images) are obtained with air- or space-borne sensors, which can quickly acquire the spectral information of large-scale target objects. Therefore, remotely sensed spectra play a vital role in mapping soil heavy metal concentrations over large areas. For example, Kemper and Sommer [24] used air-borne hyperspectral data obtained by the HyMap sensor (http://www.intspec.com/) to map Pb and As contamination in the Guadiamar floodplain, Andalusia; Choe et al. [19] employed HyMap data to map regions affected by heavy metals; and Wu et al. [33] indicated that the performances of simulated HyMap, Landsat TM, and QuickBird bands gave satisfactory results in the estimation of heavy metal concentrations. However, the practical application of remotely sensed spectra is still only just beginning. The main reason limiting the continued growth is the current high cost and low availability of remotely sensed spectra [57]. However, with the launch of satellites with high spectral resolution sensors, remotely sensed spectra will soon become available and cost-effective. Furthermore, the monitoring of soil contamination by heavy metals will be possible over large areas.
4. Methods 4.1. Pre-processing methods The variations in the structural properties of the soil samples and the working status and condition of the spectroradiometers may cause non-linearities between the spectra and the component concentrations, resulting in random noise, baseline drift, and a multiple scattering effect in the spectra, which may affect the robustness of the calibration models. Thus, some spectral pre-processing techniques are commonly employed to reduce these effects. To eliminate random noise and increase the spectral data quality, three strategies are commonly adopted: (i) Savitzky–Golay smoothing [58]; (ii) removal of the noisy regions [35,52]; and (iii) spectral resampling [23,51]. Savitzky–Golay smoothing is an effective spectral pre-processing method with a wide scope of applications in VNIRS. Selecting a reasonable number of smoothing points is a key step. If the number is set too low, it will cause new errors in the model. If the number is set too high, the sample information in the spectra will be polished and lost [59]. Spectral resampling is applied to eliminate the data redundancy in hyperspectral data by the use of a Gaussian model, which takes the band center and full width half-maximum into account [23,32,35]. Moreover, spectral resampling increases the calculation speed, and reduces over-fitting of the calibration models [60]. Derivative transformation can remove the interferences of background, resolve overlapping spectra, and minimize the baseline
170
T. Shi et al. / Journal of Hazardous Materials 265 (2014) 166–176
drift of raw soil spectra that is caused by differences in grinding and optical setups [61]. First and second derivatives are frequently calculated according to the Savitzky–Golay algorithm [62]. Compared with first derivatives, the second derivative of reflectance can better eliminate the baseline effect and enhance minor absorption features. However, second derivatives are more sensitive to noise, and reduce the spectral data quality [23]. Derivative transformations are often applied in conjunction with a smoothing algorithm, because they tend to amplify noise. Multiplicative scatter correction applies a linear transformation to each spectrum in order to match the average spectrum of the whole spectrum set [63], and it removes the multiplicative interferences of scattering and particle size. Standard normal variate pretreatment serves the same purpose and has a similar effect to the multiplicative scatter correction. However, the difference between the two methods is that the standard normal variate is applied to an individual spectrum, while multiplicative scatter correction uses a reference spectrum, such as the mean spectrum of the calibration set [25]. Multiplicative scatter correction and standard normal variate pretreatment both require a good linear relationship between the spectra and the component concentrations. Thus, the measured reflectance (R) spectrum is generally transformed to log(1/R) or Kubelka Munk units ((1 − R)2 /2R), in order to realize the linearization between spectrum and concentration [62]. Continuum removal analysis, as proposed by Clark and Roush [64], is a standard transformation of a spectrum in VNIRS, in which the continuum, a convex hull of straight-line segments, is fitted over a reflectance spectrum and subsequently removed by division or ratioing relative to the complete reflectance spectrum [19]. The continuum removal approach aims to quantify the absorption of materials at a specific wavelength, assuming that no other material has strong absorption features around this specific wavelength [64]. Moreover, various data enhancement algorithms (e.g. mean centering, normalization) are employed to simplify the calibration models, eliminate redundant information, and highlight the diversities of the spectra [62]. Mean centering (i.e. the average calibration spectrum is subtracted from each spectrum and the average calibration concentration is subtracted from each concentration) can decrease the complexity of the calibration models by reducing the number of partial least square factors [65]. Normalization of spectral data is accomplished by dividing each absorbance by a constant, which is used to remove systematic variation [66] and enhance absorption features and curve shape [23]. At present, there is no single processing or combination of pre-processing methods that works well with all data sets from different study areas. In other words, the type and amount of the required pre-processing methods are location-dependent and data-specific [16]. The reason for this could be that the spectra face different problems for different study areas. Furthermore, for different soil properties, the spectral features for their estimation may be different. 4.2. Spectral indices A spectral index is a numerical value calculated from two or more wavebands by the use of a mathematical method. Due to the overlapping absorptions of the soil constituents, soil reflectance spectra are largely non-specific for the estimation of a given soil property [16]. The spectral indices derived from soil reflectance spectra can enhance the relationship between spectral features and component concentrations, and can eliminate the effects of irrelevant wavelengths. Spectral indices can be applied to air- or space-borne images with a lower spectral resolution (e.g. ASTER, Landsat TM) [19,67]. Moreover, the models using spectral indices are less complicated than those employing full spectra [19,67].
Table 1 Spectral vegetation indices and their formulas [68]. Vegetation index
Formulaa
References
Difference vegetation index (DVI) Ratio vegetation index (RVI) Normalized difference vegetation index (NDVI) Ratio difference vegetation index (RDVI) Soil-adjusted vegetation index (SAVI)
NIR − Red
[71] [72] [73]
Second modified soil-adjusted vegetation index (MSAVI2) Infrared percentage vegetation index (IPVI) Modified simple ratio (MSR)
NIR Red NIR−Red NIR+Red
√NIR−Red
[74]
NIR+Red
(NIR−Red)(1+L) , (NIR+Red+L)
2NIR+1−
where L = 0.5
(2NIR+1)2 −8(NIR−Red) 2
NIR NIR+Red (NIR/Red)−1
√
(NIR/Red)+1
[75]
[76] [77] [78]
a Red and NIR are the reflectances in the red and near-infrared wavelengths, respectively.
To date, several spectral indices of soil spectra, including absorption depth at 500 and 2200 nm [20,30], the ratio of 610–500 nm, 1344–778 nm, and 624–564 nm [14,19,20], and absorption areas at 500 and 2200 nm [19,20], have been employed to accurately estimate heavy metal concentrations in different study areas. As mentioned in Section 2, visible and near-infrared reflectance spectra of some plants (Poa annua and Lolium perenne) have potential for the estimation of heavy metal concentrations in soils [26,27]. One of the greatest challenges in applying field and remotely sensed spectra of plants to the quantitative analysis of heavy metal concentrations in soils is the “mixed pixel” problem. The “mixed pixel” problem appears where the background factors are other than the presence of green vegetation (e.g. vegetation, soil, and shadow). This problem often makes the discrimination of vegetation difficult. Therefore, in the application of plant spectra, the “mixed pixel” problem needs a solution. Vegetation indices (see Table 1) are intended to enhance the vegetative signal while minimizing the background and atmospheric effects [68,69]. Many studies have shown that good correlations exist between vegetation indices and heavy metal concentrations (e.g. Ni, Cd, Zn, Pb), with a Pearson correlation coefficient of greater than 0.8 [26,27]. Red edge position is defined as the maximum slope (maximum first derivative) of the reflectance spectra in the red edge region (690–740 nm) [70]. This may shift due to stress conditions caused by the decreases in leaf chlorophyll concentration and leaf area index [21]. Heavy metals in soils can decrease leaf chlorophyll concentrations and the leaf area index of plants. Therefore, red edge position is believed to have the potential to estimate heavy metal concentrations in soils, and this hypothesis was proved by Clevers et al. [21], who found that the red edge position of grassland spectra had a significant negative correlation (correlation coefficient r > 0.8; p < 0.05) with the heavy metal concentrations in the soils of floodplains. 4.3. Variable selection methods One problem in modeling hyperspectral data is redundant variables. In general, several hundreds or even thousands of variables (wavelengths) are measured in a spectrum. Some of the variables may be irrelevant to the studied heavy metal concentrations. Zou et al. [79] suggested that the predictive ability could be increased and the model complexity might be reduced by a judicious pre-selection of wavelengths. Therefore, in order to construct a high-quality calibration model, methods of extracting samplespecific or component-specific information have been developed
T. Shi et al. / Journal of Hazardous Materials 265 (2014) 166–176
[80], such as the genetic algorithm [81], uninformative variable elimination [82], and the successive projection algorithm [83]. The genetic algorithm is a popular heuristic optimization technique, and it employs a probabilistic, non-local search process inspired by Darwin’s theory of natural selection [81]. The flexible search strategy of genetic algorithms is to randomly select an initial set of spectral variables and to optimize this set by considering many combinations of features and their interactions [83]. Genetic algorithms have been proven to be an effective method for variable selection, and further for soil property estimation [84,85]; however, they are complex and time-consuming, and the selected bands may not be reproducible [83]. Uninformative variable elimination is a method of detecting uninformative variables based on a stability analysis of the regression coefficients (b-coefficient) [86]. Based on the definition of noise, uninformative variable elimination can eliminate uninformative variables. Employing the variables selected by uninformative variable elimination for modeling can avoid model over-fitting and can usually improve the predictive ability [79]. However, latent variables still need to be employed for modeling because the number of variables selected by the uninformative variable elimination is still high. Therefore, the selected variables could be further selected by the successive projection algorithm [87]. The successive projection algorithm is a forward feature selection technique designed by Araujo et al. [83] to minimize the collinearity problem in spectral data. It employs a simple projection operation in a vector space to obtain a subset of variables with minimal collinearity [79]. The principle of feature selection by the successive projection algorithm is that the newly selected variables are those having the maximum projection value on the orthogonal subspace of the previously selected variables. The successive projection algorithm is simple and can avoid over-fitting problems, as a result of eliminating the collinearity effects of the independent variables. However, the number of selected wavelengths, which cannot be larger than the number of calibration samples, is a limitation for this method [83]. If more spectral variables are needed to estimate the soil components, a large number of soil samples should be collected to perform model calibration, which may hamper the application of the successive projection algorithm in studies with very limited samples.
4.4. Modeling strategies In general, there are two modeling strategies for estimating the soil properties of new target sites with VNIRS: (i) modeling based on local soil libraries of new target sites (at field scale) [15,88]; and (ii) models built from national soil libraries (at a country scale) [89,90]. To date, in the field of estimation of soil heavy metal concentrations, the majority of research has been based on the first modeling strategy [18,19,23,63]. In these studies, local soil libraries (usually