1544

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 9, SEPTEMBER 2014

Hyperspectral Image Classification Using Functional Data Analysis Hong Li, Member, IEEE, Guangrun Xiao, Tian Xia, Y. Y. Tang, Fellow, IEEE, and Luoqing Li

Abstract—The large number of spectral bands acquired by hyperspectral imaging sensors allows us to better distinguish many subtle objects and materials. Unlike other classical hyperspectral image classification methods in the multivariate analysis framework, in this paper, a novel method using functional data analysis (FDA) for accurate classification of hyperspectral images has been proposed. The central idea of FDA is to treat multivariate data as continuous functions. From this perspective, the spectral curve of each pixel in the hyperspectral images is naturally viewed as a function. This can be beneficial for making full use of the abundant spectral information. The relevance between adjacent pixel elements in the hyperspectral images can also be utilized reasonably. Functional principal component analysis is applied to solve the classification problem of these functions. Experimental results on three hyperspectral images show that the proposed method can achieve higher classification accuracies in comparison to some state-of-the-art hyperspectral image classification methods. Index Terms—Functional data analysis (FDA), functional data representation, functional principal component analysis (FPCA), hyperspectral image classification, support vector machines (SVM).

I. Introduction YPERSPECTRAL imaging has become a fast growing technique in the field of remote sensing due to recent advances of hyperspectral imaging technology. It makes use of hundreds of contiguous spectral bands to expand the capability of multispectral sensors that use tens of discrete spectral bands [1]. A very important application of hyperspectral imaging is image classification. Many classification analyses have been

H

Manuscript received May 11, 2013; revised September 24, 2013 and October 12, 2013; accepted October 24, 2013. Date of publication November 22, 2013; date of current version August 14, 2014. This work was supported in part by the National Natural Science Foundation of China under Grant 61075116, Grant 11371007, Grant 91330118, and Grant 61273244, in part by the Natural Science Foundation of Hubei Province under Grant 2010CDA008, in part by the Multi-Year Research of University of Macau under Grant MYRG205(Y1L4)-FST11-TYY and Grant MYRG187(Y1-L3))-FST11-TYY, in part by the Start-Up Research of the University of Macau under Grant SRG010-FST11TYY, and in part by the Science and Technology Development Fund (FDCT) of Macau FDCT-100-2012-A3. This paper was recommended by Associate Editor S. Zafeiriou. H. Li is with the School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China. (e-mail: [email protected]). G. Xiao is with the School of Automation, Huazhong University of Science and Technology, Wuhan 430074, China. (e-mail: [email protected]). T. Xia and Y. Y. Tang are with the Faculty of Science and Technology, University of Macau, Macau, China (e-mail: [email protected]; [email protected]). L. Li is with the Faculty of Mathematics and Computer Science, Hubei University, Wuhan 430062, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2013.2289331

performed for hyperspectral data [2]. Maximum-likelihood or Bayesian estimation methods [3], decision trees [4], neural networks [5]–[7], sparse representation [8], [9], and kernelbased techniques [10] have been investigated. Among these approaches, machine learning methods, such as support vector machine (SVM) [11] and relevance vector machine [12], are usually superior to the others and have been successfully applied to hyperspectral image classification [13]–[18]. However, as the inputs are high-dimensional vectors whose coordinates are highly correlated, the direct use of classical models for hyperspectral images faces several difficulties. It may give rise to Hughes phenomenon, which means that with a fixed number of training samples, hyperspectral image classification accuracy can first increase as the dimensionality of spectral features increases, but decay with the dimensionality higher than some optimum value [19]. If number of training samples is enough, the Hughes phenomenon may not happen. To deal with this problem, various feature extraction methods [20]–[28] have been proposed. The forementioned feature extraction and classification methods belong to the general framework of multivariate analysis, i.e., hyperspectral data are treated as vectors of discrete samples. In this case, every band has a unique ID (wavelength-related band number) which is important in mixing or comparing the pixels, at least before a band-reduction. Since the identical one-to-one relationship between the band number and wavelength among the pixels of a data set is missed after band reduction, the wavelength information of the spectral curves is ignored in these analyses. A more reasonable way to treat such data is to incorporate the information that is inherent in wavelength order and smoothness of processes over wavelength. The tools for this consideration have been provided by the recently developed methodology of functional data analysis (FDA). FDA was first used by Ramsay and Dalzell [29], and later appeared in the titles of Ramsay and Silverman [30], [31], as well as Ferraty and Vieu [32]. The basic philosophy of FDA is to consider observed data functions as single entities, rather than merely as a sequence of individual observations. The term functional in reference to observed data refers to the intrinsic structure of the data rather than to their explicit form. FDA explores samples of data where each observation arises from a curve or function varying over a continuum. The continuum is often time, but may also be spatial location, wavelength (hyperspectral data), probability, etc. We require the data to provide enough information to estimate the curves and their

c 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 2168-2267  See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

LI et al.: HYPERSPECTRAL IMAGE CLASSIFICATION USING FDA

properties that we wish to use, and this can be satisfied because of the abundant spectral information in hyperspectral data. Moreover, FDA can easily make use of the information in the slopes and curvatures of curves, as reflected in their first and second derivatives. The derivatives may naturally reveal important aspects of the processes generating the data. The possibility of using derivative information greatly extends the power of these methods, and also leads to functional models defined by differential equations or dynamic systems, or other types of functional equations. From a certain extent, models for functional data and methods for FDA are similar to those for conventional multivariate data, including linear and nonlinear regression models, principal components analysis (PCA), cluster analysis, and others. By now, several functional data classification algorithms [33]–[37] have been presented. In [36], a projection approach was developed to process the functional SVM and the consistency has been proved. In [37], a differential approach was proposed to process the functional SVM when the functional data is supposed to belong to the Sobolev space. In particular, in the context of spectral classification, many approaches based on functional data models have been proposed, e.g., functional relevance learning approaches and functional vector quantization techniques [38]–[40], feature selection by alternative functional projection approaches [41], etc. In [40], a functional approach to relevance learning for high-dimensional functional data was proposed to solve the instabilities in learning, and an inherent regularization took place. In this paper, we propose a classification algorithm for hyperspectral images utilizing FDA. In the hyperspectral data case, each spectral curve is a function that maps the wavelengths of the illuminating light to the corresponding reflectance or radiance of the studied sample. We view the observed hyperspectral data as independent realizations of a smooth stochastic process. By this way, the high dimensionality and strong relevance, which usually lead to Hughes phenomenon in classical multivariate analysis, can be fully used to construct functional data. Then, functional principal component analysis (FPCA) in conjunction with SVM will be applicable to classify these functional data. We demonstrate the usefulness of this approach for the analysis of Indian Pines image, Kennedy Space Center data, and University of Pavia image. The remainder of this paper is organized as follows. In the next section, we first briefly describe the fundamental theory of FDA, FPCA, and SVM. The proposed classification scheme for hyperspectral images is presented later. In Section III, we provide a description of the experimental hyperspectral image data sets and the experimental setup used to validate the proposed approach. We validate the proposed approach with three popular hyperspectral data sets, studying the classification performance by comparing to four state-of-the-art classification methods. Finally, conclusions are drawn and future prospects are considered in Section IV. II. Models and Methods A. Functional Data Representation for Hyperspectral Image Functional data refer to data that consist of observed functions or curves evaluated at a finite subset of some interval.

1545

Due to this, modeling functional data requires considering function spaces such as Hilbert spaces, and each functional observation is viewed as a realization generated by a random mechanism in the spaces, which is the essential advantage of FDA. What distinguishes FDA from other conventional statistics is the atom of data. The atom of the data is number or vector in conventional statistics, whereas that is curve or image in FDA. Many multivariate statistics analysis techniques can be directly applied to functional data. More details can be found in [30] and [31]. Suppose that the hyperspectral image consists of N pixels, and that the ith (i = 1, . . . , N) pixel has reflectance (or radiance) values yij , j = 1, . . . , n, where n is the number of spectral bands. We assume that a smooth function xi underlies spectral curve of the ith pixel, and that the reflectance value yij is associated with the value of function xi (t) at point tj , where t is wavelength. The first step of FDA is turning raw discrete data into smooth functions. Generally, FDA assumes that the curves being estimated are smooth. In practice, smoothness means that the first or higher derivatives of the curves can be estimated. Although several smoothing techniques are available, basis function approach is usually used to represent the functional data. Let {φk } be a set of basis functions, then each spectral curve xi (t) can be approximated by a linear expansion K  ci,k φk (t) (1) xi (t) = k=1

where {ci,k } is the corresponding coefficients, and {φk } can be arbitrary basis functions. Two basis function systems have been most widely used: the Fourier basis and the B-spline basis. The former is often used for periodic data, while the latter for nonperiodic data. Serval other types of basis systems, such as wavelets basis, power basis, can also be considered to use in certain situations. For the hyperspectral images, let T be a set of n nondecreasing numbers, t1 ≤ t2 ≤ . . . ≤ tn , called knot vector, where {tj , j = 1, 2, . . . , n} corresponding to the wavelengths are called knots. The kth B-spline basis function of degree p, written as Nk,p (t), is defined recursively as follows, using the Cox–de Boor recursion formula  1 if tk ≤ t < tk+1 k = 1, . . . , n − 1 Nk,0 (t) = 0 otherwise tk+p+1 − t (2) Nk,p (t) = Nk+1,p−1 (t) tk+p+1 − tk+1 t − tk + Nk,p−1 (t), k = 1, . . . , n − p − 1. tk+p − tk B-spline basis functions are smooth and well behaved when p > 1. An important property of B-spline basis functions is called the compact support property, which means that each B-spline basis function is nonzero only on a few adjacent subintervals. Based on this, the computation of a B-spline function can be organized so as to increase only linearly with the number of B-spline basis functions. Thus, the B-spline basis system shares the computational advantages of potentially orthogonal basis systems such as Fourier and wavelet

1546

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 9, SEPTEMBER 2014

Algorithm 1 Functional data representation of the hyperspectral image

Require: {tj , j = 1, . . . , n}, {yij , i = 1, . . . , N, j = 1, . . . , n} and {Nk,3 (t), k = 1, . . . , K}. Ensure: The optimal functional data representation xˆ i (t) of the original K hyperspectral image. 1: xi (t) = k=1 ci,k Nk,3 (t). 2: for log(λ) = a1 : a2 do 3: for i = 1 : N do n   4: sseλ (i) = [yij − xi (tj )]2 + λ [Dm xi (s)]2 ds. j=1

5: 6: 7: 8: 9:

end for sseλ = sum(sseλ (i)). end for λˆ = arg min{sseλ }. n   xˆ i (t) = arg min { [yij − xi (tj ))]2 + λˆ [Dm xi (s)]2 ds}. j=1

bases. In the following experiments, as we only made use of the smoothness of the B-spline basis function itself, the cubic B-spline basis system was sufficient to construct efficient functional data representation of the hyperspectral images. The following step is to estimate the coefficients in (1). The roughness penalty approach [31] is an efficient smoothing method, minimizing the penalized residual sum of squares  n  sseλ (i) = [yij − xi (tj )]2 + λ [Dm xi (s)]2 ds (3) j=1



where [Dm xi (s)]2 ds represents roughness, Dm means derivate of order m, m usually takes 2, λ is a smoothing parameter that measures the rate of exchange between fit to the data, as measured by the residual sum of squares in the first term, and variability of the function xi , as quantified by [Dm xi (s)]2 ds in the second term. Here, validation approach will be used for selecting the smoothing parameter λ. Algorithm 1 is the pseudocode of the functional data representation method of the hyperspectral image, where the lower bound a1 and upper bound a2 of log(λ) are determined in accordance with specific experiment situations.

curves as a zero-mean stochastic process {X(t), t ∈ T }, with the covariance function v(s, t) = EX(s)EX(t). Hence, each spectral curve xi (t), 1 ≤ i ≤ N is a realization of the stochastic process observed on the same wavelength interval. Then, the basic idea of FPCA is as follows. First, find principal component weight function εi,1 (t), for which FPC scores  fi,1 = εi,1 (t)xi (t)dt (4)   2 subject to the continuous analogue ε2i,1 (t) = maximize i fi,1 εi,1  = 1 of the unit sum of squares constraint. Next, implement the second and subsequent steps. On the mth step, compute weight function εi,m (t) and FPC scores  (5) fi,m = εi,m (t)xi (t)dt  2 , subject to the constraint εi,m  = 1 and maximizing i fi,m the m − 1 additional constraint(s)  (6) εi,k (t)εi,m (t)dt = 0 ∀k < m. Now, the main objective is calculating the principal component weight functions {εi,m (t)}. Similarly to PCA, this problem can be expressed as the problem of the eigenanalysis of the covariance function v(s, t), which means that a sequence of different eigenvalue-eigenfunction pairs {ρi,m , εi,m } satisfy the following equation:  v(s, t)εi,m (t)dt = ρi,m εi,m (s)  (7) εi,k εi,m = δk,m ∀k, m where δ is the Kronecker delta. The covariance function v(s, t) is typically estimated from the spectral curve xi (t) and the ρi,m and εi,m are computed from the estimated covariance function. According to Algorithm 1, each spectral curve xi (t) can be represented as K  cˆ i,k Nk,3 (t) (8) xˆ i (t) = k=1

and the corresponding sample covariance function is given by N  vˆ (s, t) = N −1 xˆ i (s)ˆxi (t) i=1

= N −1

B. Functional Data Feature Extraction As the hyperspectral image functional data representation model has been obtained, we will face the functional data classification problem in a Hilbert space. To solve this intrinsic infinite-dimensional problem, some feature extraction methods for functional data should be adopted before the final classification process. PCA is one of the most popular linear dimensionality reduction methods in classical multivariate analysis, which is usually used when we want to find the dominant modes of variation in the data. Similarly, FPCA described in this section aims to obtain the first few orthogonal functions referred to as functional principal components (FPCs), which most effectively explain the variations in the data. FPCA proceeds in a manner analogous to the conventional PCA case. For the simplicity, we model the sample spectral

N  K K  

cˆ i,k Nk,3 (s)ˆci,p Np,3 (t).

(9)

i=1 k=1 p=1

Suppose that the eigenfunctions can be written by K  bq Nq,3 (t). ε(t) =

(10)

q=1

Then, we have  vˆ (s, t)ε(t)dt  =

N −1

K N  K  

cˆ i,k Nk,3 (s)ˆci,p Np,3 (t)

i=1 k=1 p=1

= N −1

K  N  K  K 

 cˆ i,k cˆ i,p bq Nk,3 (s)

K 

bq Nq,3 (t)dt

q=1

Np,3 (t)Nq,3 (t)dt.

i=1 k=1 p=1 q=1

(11)

LI et al.: HYPERSPECTRAL IMAGE CLASSIFICATION USING FDA

1547

Algorithm 2 Extract the functional data features {fi }.

Require: {ˆxi (t), i = 1, . . . , N}, {Nk,3 (t), k = 1, . . . , K}. Ensure: The functional data features {fi,m }.. 1: for i = 1 : N do 2: Calculate {ˆci,k } in (8) according to Algorithm 1. 3: for m = 1 : M do 4: Calculate the mth eigenvector bm of N −1 Cψ. 5: εm = φb  m. 6: fi,m = εm xˆ i dt. 7: end for 8: end for

Fig. 1.

Hence, the eigenequation (7) can be expressed as  N  K  K K   N −1 cˆ i,k cˆ i,p bq Nk,3 (s) Np,3 (t)Nq,3 (t)dt i=1 k=1 p=1 q=1



K 

bq Nq,3 (s).

(12)

q=1

 Define φ = [N1,3 (t), . . . , Nk,3 (t)] , C = [ N ˆ i,k cˆ i,p ]K×K , b = i=1 c [b1 , b2 , . . . , bK ] , and ψ = [ Np,3 (t)Nq,3 (t)dt]K×K . Then, the matrix form of (12) is N −1 φ Cψb = ρφ b.

(13)

Finally, the problem of the eigenanalysis of the covariance function is transformed to the solution of matrix equation N −1 Cψb = ρb.

(14)

As a result, εm = φbm is the mth eigenfunction of v(s, t), where bm is the mth eigenvector of N −1 Cψ. So, the mth FPC score of xˆ i (t) can be determined by fi,m = εm xˆ i dt. Ultimately, the FPC scores (i.e., functional data features) {fi = [fi,1 , fi,2 , . . . , fi,M ] , i = 1, . . . , N} will be used to classify the original hyperspectral image, where M is the number of required FPC scores in practice. Algorithm 2 is the brief calculation process of functional data features. C. Classification of Hyperspectral Image via FDA In order to make full use of the characteristic of high spectral resolution of the hyperspectral image, we will focus on the global property of each pixel in the hyperspectral image from the point of view of function rather than highdimensional discrete vector in this paper. The pixels of the image to be processed are treated as spectral curves; hence, they are functional data. As the central idea of FDA is to treat multivariate data as continuous functions, the FDA method can study important sources of pattern and variation among the data. Since the processed objects are functions now, FDA can also reasonably explain variation in an outcome or dependent variable (i.e., reflectance or radiance) by using input or independent variable (i.e., wavelength) information. At the same time, with the dimensionality and the relevance between the adjacent spectral bands increasing, the accuracy of transforming into functional data from original discrete data will be higher, which is more useful for the subsequent

Flow chart of our algorithm.

processing. As a result, the original high dimensionality and strong relevance are no longer the major difficulty for hyperspectral image classification in our method. To classify the functional data features extracted by FPCA, SVM has been used in our method. The SVM algorithm is based on the statistical learning theory and the Vapnik– Chervonenkis dimension introduced by Vapnik and Chervonenkis [11]. SVM constructs a hyperplane in a highdimensional or infinite-dimensional space, which has been successfully applied to various classification problems, especially to the binary classification problems; whereas it often happens that the data sets to be discriminated are not linearly separable in the original finite-dimensional space. For this reason, we can map the original finite-dimensional space into a much higher-dimensional feature space. In the feature space, dot products may be computed easily in terms of the variables in the original space, by defining them in terms of a kernel function. Thus, though the classifier may be nonlinear in the original input space, it is a hyperplane in the high-dimensional feature space. Various kernels, such as homogeneous kernel, inhomogeneous kernel, Gaussian kernel and sigmoid kernel, have been successfully used in practice. Generally, Gaussian kernel  fi − fj 2 ) (15) 2σ 2 is most widely used, in which the Gaussian kernel parameter σ is usually determined by the cross validation method. Although hyperspectral image classification is often a multiclass classification problem, we can easily transform it into a collection of binary classification problems. Generally, oneversus-one (OVO) or one-versus-all binary classifiers are constructed to separate one class from another or to separate one class from all other classes, respectively. These binary classifiers are then combined to conduct multiclass classification prediction. OVO is selected in this paper. As mentioned above, the proposed method in this paper is decomposed into the several following steps, and the flow chart of our algorithm is shown in Fig. 1. 1) According to Algorithm 1, we represent each pixel in the hyperspectral image as a function xˆ i (t). 2) FPCA is utilized to obtain the functional data features {fi } according to Algorithm 2. Here, the number M of FPCs is determined by the cross validation method. 3) Multiclass OVO SVM is performed on the functional data features {fi } extracted by FPCA. Gaussian kernel is used in the following experiments. K(fi , fj ) = exp(−

III. Experimental Results and Discussion We demonstrate the proposed method (FDA+SVM) against four state-of-the-art methods on three popular hyperspectral

1548

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 9, SEPTEMBER 2014

TABLE I Information Classes and Training-Test Samples for Indian Pines Image

Fig. 2. Indian Pines image and related ground truth data. (a) Indian Pines image. (b) Ground truth data.

data sets. These four methods are: 1) standard SVM (SVM); 2) SVM following standard PCA (PCA+SVM); 3) functional SVM with projection preprocessing (PFSVM) [36]; and 4) generalized functional relevance learning vector quantization (GFRLVQ) [40]. The experiments were implemented using MATLAB R2012b in a personal computer with Intel Core2 Duo E7500 CPU and 4G DDR2 RAM. A. Classification of the Indian Pines Image The Indian Pines image was gathered by AVIRIS sensor over the Indian Pines test site in North-western Indiana and consists of 145 × 145 pixels and 220 spectral bands in the wavelength range 0.4–2.5 μm. The Indian Pines scene contains two-thirds agriculture, and one-third forest or other natural perennial vegetation. There are two major dual lane highways, a rail line, as well as some low density housing, other built structures, and smaller roads. For this data set, 20 bands could be identified as dominated by noise (bands 104–108, 150–163, and 220) due to atmospheric water absorption and they were removed first. Sixteen classes of interest, which represent mostly different types of crops, are considered. A sample band of this image and related ground truth data are shown in Fig. 2. There are 10 249 samples except background in total. We randomly chose 10% of the samples for each class from the reference data as training samples, and the remaining 90% as testing samples. Available training and testing sets are given in Table I. In this experiment, the radiance values were normalized to [0, 1], and the wavelength values were scaled to be beneficial to calculate the functional data representations. All the training and testing samples are transformed into functional data together. Cubic B-splines basis function system was used here. For the roughness penalty approach, we used a natural measure of a function’s roughness, i.e., the integrated squared second derivative. Thus, m took 2 in (3) to satisfy our need. Fig. 3 shows the functional data representations of one sample from each class. The number M of FPCs was chosen by the five-fold cross validation method. Fig. 4 shows the overall classification error rate as a function of M. This figure indicates that when the first 34 FPCs are used, the classification error rate attains its lowest value 14.67%, and the computation time is 481 s. However, if we pay more attention to time complexity, we can use fewer FPCs. Fig. 4 indicates that when only the first nine FPCs are used, the computation time is only 300 s,

Fig. 3. Functional data representations of one sample from each class of the Indian Pines image.

Fig. 4. Choosing M, the number of FPCs for the Indian Pines image data. Five-fold cross validation was used to estimate the classification error rates, in dependency on M.

which is much less than SVM (the computation time of which is 1861 s) with comparable accuracy. Consequently, we can choose the number of FPCs based on our needs to balance the classification accuracy with the computation time. The first three FPCs for the Indian Pines image data are shown in Fig. 5. In order to display the differences better, we randomly select ten samples from each of eight classes,

LI et al.: HYPERSPECTRAL IMAGE CLASSIFICATION USING FDA

1549

Fig. 6. Scatterplot of pairwise FPC scores for eight classes of the Indian Pines image data. (a) Second versus first FPC score. (b) Third versus second FPC score.

Fig. 5. First three FPCs for the Indian Pines image data. These three FPCs account for 98.1% of the total variation, with (a) first accounting for 91.8%, (b) second for 3.3%, and (c) third for 3%.

and then plot the FPC scores for the second versus the first FPC score [Fig. 6(a)] and the third versus the second FPC score [Fig. 6(b)]. From Fig. 6(a), we can see that most of the classes are distinctly discriminated in the plot. Although the differences for the first FPC between some similar classes, such as corn-notill, corn-mintill, and corn, are not so clear, the samples from the same class gather together obviously in Fig. 6(a). Similar plots can be produced for each pair of the first M FPC scores. Five methods including the proposed method were performed on the original Indian Pines image data. Gaussian kernel was used in all SVM algorithms. For SVM, the Gaussian kernel parameter σ was determined by ten-fold cross validation, which gave σ = 0.35. For PCA+SVM, the number of principal components was identified as 37 by the five-fold cross validation method, and the Gaussian kernel parameter σ determined by ten-fold cross validation was 0.71. For PFSVM,

21 Fourier basis functions were used, and the Gaussian kernel parameter was identified as 128 by ten-fold cross validation. For GFRLVQ, ten Gaussian basis functions were used, and the number of prototypes per class was 10. For the proposed method, the value of the Gaussian kernel parameter was identified as 256 by ten-fold cross validation, and M = 34 was chosen to obtain the highest classification accuracy. Table II summaries the overall classification accuracies for the Indian Pines image data using the five classifiers. As the number of training samples in some classes is much smaller than the dimensionality of this hyperspectral image, we can see from Table II that the SVM and PCA+SVM methods are effected most obviously. Meanwhile, the other three functional methods obtain better results, and our method achieves the highest classification accuracy. The advantage of processing hyperspectral image data from the point of view of function analysis is embodied obviously in this experiment. Moreover, the computation time of SVM and our method are shown in Table III. From this table, we can see our method also improves the computational efficiency compared with SVM. B. Classification of Kennedy Space Center Data The Kennedy Space Center (KSC) data were acquired by the NASA AVIRIS instrument over the Kennedy Space Center (KSC), FL, USA, on March 23, 1996. The KSC data have a spatial resolution of 18 m. One hundred seventy six bands were left after removing water absorption and low signal-to-noise

1550

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 9, SEPTEMBER 2014

TABLE II

TABLE IV

Overall Classification Accuracies in Percentage for Three Hyperspectral Images

Information Classes and Training-Test Samples for the KSC Data

TABLE III Computation Time of SVM and Our Method for the Three Hyperspectral Images. For Our Method, It Is the Time When the Highest Classification Accuracy Achieved

Fig. 7. KSC data set. (a) RGB composite image of three bands. (b) Ground truth data.

bands. For classification purposes, 13 classes representing the various land cover types that occur in this environment are considered. Classes 4 and 6 represent mixed classes. The false color composition of bands 31, 21, and 11 and the ground truth data are shown in Fig. 7. There are 5211 samples except background in this data set. We also randomly chose 10% of the samples for each class as training samples, and the remaining samples composed the test set. Available training and testing sets are given in Table IV. Similar to the first experiment, after some proceedings, the cubic B-splines basis function system was used to obtain the functional data representation of the KSC data, and m took two in (3). The functional representations of one sample from each class are shown in Fig. 8. The number M of FPCs for this image was also chosen by the five-fold cross validation method. Fig. 9 shows the overall classification error rate as a function of M. From the plot, we can see that when the first 26 FPCs are used, the classification error rate attains its lowest value 6.32% and the computation time is 90 s. In fact, we can see from Fig. 9 that the increase of the classification accuracy is not very obvious when M > 5, so we can only use the first five FPCs to obtain a high accuracy and the computation time is only 55 s. Fig. 10 shows the first three FPCs for the KSC data. These three FPCs account for 96.9% of the total variation, whereas the first FPC does not account for so much variation as it

Fig. 8. data.

Functional representations of one sample from each class of the KSC

Fig. 9. Choosing M, the number of FPCs for the KSC data. Five-fold cross validation was used to estimate the classification error rates.

does in the first experiment. So, we had to combine two or more of the FPCs to observe the discrimination ability. Fig. 11 shows the FPC scores for the second versus the first FPC score [Fig. 11(a)] and the third versus the second FPC score [Fig. 11(b)]. Ten samples from each of eight classes have been randomly selected. As classes 4 and 6 represent mixed classes, so their FPC scores are often mixed together in both panels.

LI et al.: HYPERSPECTRAL IMAGE CLASSIFICATION USING FDA

1551

Fig. 11. Scatterplot of pairwise FPC scores for eight classes of the KSC data. (a) Second versus first FPC score. (b) Third versus second FPC score.

the value of the Gaussian kernel parameter was 5.66. Here, we chose M = 26 to obtain the highest classification accuracy. Table II summaries the overall classification accuracies for the KSC data. From this table, we can see that compared with the other two functional analysis methods, FPCA can reveal the intrinsic property better for this hyperspectral image data. Thus, our method provides the highest classification accuracy. Moreover, the computation time of SVM and our method are shown in Table III. It is obvious that our method has a great advantage in computational efficiency. C. Classification of the University of Pavia Image Fig. 10. First three FPCs for the KSC data. These three FPCs account for 96.9% of the total variation, with (a) first accounting for 55.6%, (b) second for 31.2%, and (c) third for 10.1%.

All five methods mentioned above were demonstrated on the original KSC data. Gaussian kernel was also used in all SVM algorithms, and the kernel parameters were all determined by ten-fold cross validation. For SVM, the kernel parameter σ was 1.41. For PCA+SVM, the first 29 principal components were preserved by five-fold cross validation, and the Gaussian kernel parameter σ was 2. For PFSVM, 17 Fourier basis functions were used, and the Gaussian kernel parameter was identified as 4. For GFRLVQ, ten Gaussian basis functions were used, and the number of prototypes per class was 10. For our method,

The University of Pavia scene was acquired by the ROSIS sensor during a flight campaign over Pavia, northern Italy. The geometric resolution is 1.3 m. The University of Pavia image consists of 610 × 340 pixels and 103 spectral bands, but some of the samples in the image contain no information and have to be discarded before the analysis. Nine classes of interest are considered. A sample band of this image and related ground truth data are shown in Fig. 12. There are 42 776 samples except background in this data set. We randomly chose only 5% of the samples for each class as training samples, and the remaining samples composed the test set. Available training and testing sets are given in Table V. Here, we also used the cubic B-splines basis function system to obtain the functional data representation of the University of Pavia image, and let m be 2 in (3). The functional

1552

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 9, SEPTEMBER 2014

Fig. 14. Choosing M, the number of FPCs for the University of Pavia image. Five-fold cross validation was used to estimate the classification error rates.

Fig. 12. University of Pavia image and related ground truth data. (a) Center of Pavia. (b) Ground truth data. TABLE V Information Classes and Training-Test Samples for University of Pavia Image

Fig. 13. Functional representations of one sample from each class of the University of Pavia image.

representations of one sample from each class are shown in Fig. 13. The number M of FPCs for this image was also chosen by the five-fold cross validation method. Fig. 14 shows the overall classification error rate as a function of M. From the plot, we can see that when the first 37 FPCs are used, the classification error rate attains its lowest value 5.83% and the computation

time is 720 s. However, we can see from the figure that when M > 10 the increase of the classification accuracy is such slow that we can only use the first ten FPCs to obtain a high accuracy and the computation time is only 408 s. Fig. 15 shows the first three FPCs for the University of Pavia image. These three FPCs account for 95.8% of the total variation. To observe the discrimination ability better, we combined two or more of the FPCs. Fig. 16 shows the FPC scores for the second versus the first FPC score [Fig. 16(a)] and the third versus the second FPC score [Fig. 16(b)]. Ten samples from each class have been randomly selected. All five methods mentioned above were demonstrated on this image. Gaussian kernel was used in all SVM algorithms, and the kernel parameters were all determined by ten-fold cross validation. For SVM, the kernel parameter σ was 1.41. For PCA+SVM, the first 31 principal components were preserved by five-fold cross validation, and the Gaussian kernel parameter σ was 8. For PFSVM, 29 Fourier basis functions were used, and the Gaussian kernel parameter was identified as 64. For GFRLVQ, ten Gaussian basis functions were used, and the number of prototypes per class was 10. For our method, the value of the Gaussian kernel parameter was identified as 5.66 by ten-fold cross validation. Here, we chose M = 37 to obtain the highest classification accuracy. Table II summaries the overall classification accuracies for this image of our method compared with the other four methods. Although the advantage of our method is not as obvious as the first two experiments due to the property of this image itself, we can see from this table that our method still provides the highest classification accuracy. Finally, the computation times of SVM and our method are shown in Table III. It shows that our method have a great advantage in computational efficiency compared with SVM.

IV. Conclusion We presented a novel method for accurate classification of hyperspectral images based on FDA. Considering the high dimensionality and strong relevance property of the hyperspectral image data, we intuitively view the observed data as independent realizations of a smooth stochastic process. Compared with other classical hyperspectral image classification

LI et al.: HYPERSPECTRAL IMAGE CLASSIFICATION USING FDA

1553

Fig. 16. Scatterplot of pairwise FPC scores for all classes of the University of Pavia image. (a) Second versus first FPC score. (b) Third versus second FPC score.

PCA in some complex practices. So, we will consider introducing one or more of these methods into the FDA framework to solve the hyperspectral image classification problem better. Moreover, our future research will focus on applying the FDA method into other hyperspectral image processing problems (e.g., target detection).

References

Fig. 15. First three FPCs for the University of Pavia image. These three FPCs account for 95.8% of the total variation, with (a) first accounting for 65.7%, (b) second for 27.0%, and (c) third for 3.1%.

methods in the multivariate analysis framework, our method takes full advantage of the abundant spectral information, and thus, efficiently utilizes the high dimensionality and strong relevance. We tested our method and the other four state-of-theart methods on three popular hyperspectral data sets, and the experimental results show that our method consistently performs better than the others. In the multivariate analysis framework, some other feature extraction methods have been proposed and are superior to

[1] C. I. Chang, Hyperspectral Data Exploitation: Theory and Applications. Hoboken, NJ, USA: Wiley-Interscience, 2007. [2] A. Plaza, J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, et al., “Recent advances in techniques for hyperspectral image processing,” Remote Sens. Environ., vol. 113, no. 1, pp. 110–122, Sep. 2009. [3] D. A. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing (Wiley Series in Remote Sensing). New York, NY, USA: Wiley, 2003. [4] P. K. Goel, S. O. Prasher, R. M. Patel, J. A. Landry, R. B. Bonnell, and A. A. Viau, “Classification of hyperspectral data by decision trees and artificial neural networks to identify weed stress and nitrogen status of corn,” Comput. Electron. Agric., vol. 39, no. 2, pp. 67–93, Aug. 2003. [5] S. Subramanian, N. Gat, M. Sheffield, J. Barhen, and N. Toomarian, “Methodology for hyperspectral image classification using novel neural network,” in Proc. SPIE, vol. 3071. Apr. 1997, pp. 128–137. [6] H. Yang, F. V. D. Meer, W. Bakker, and Z. J. Tan, “A back-propagation neural network for mineralogical mapping from aviris data,” Int. J. Remote Sens., vol. 20, no. 1, pp. 97–110, Jan. 1999. [7] C. Hern´andez-Espinosa, M. Fern´andez-Redondo, and J. Torres-Sospedra, “Some experiments with ensembles of neural networks for classification of hyperspectral images,” in Proc. ISNN, vol. 1. Aug. 2004, pp. 912–917.

1554

[8] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classification using dictionary-based sparse representation,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 10, pp. 3973–3985, Oct. 2011. [9] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classification via kernel sparse representation,” in Proc. ICIP, Sep. 2011, pp. 1233–1236. [10] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1351–1362, Jun. 2005. [11] V. N. Vapnik, The Nature of Statistical Learning Theory, 2nd ed. New York, NY, USA: Springer-Verlag, 2000. [12] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learning Res., vol. 1, pp. 211–244, Sep. 2001. [13] J. A. Gualtieri and R. F. Cromp, “Support vector machines for hyperspectral remote sensing classification,” in Proc. SPIE, vol. 3584. Oct. 1998, pp. 221–232. [14] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004. [15] L. Bruzzone, M. Chi, and M. Marconcini, “A novel transductive SVM for the semisupervised classification of remote-sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 11, pp. 3363–3373, Nov. 2006. [16] Y. Tarabalka, J. A. Benediktsson, and J. Chanussot, “Spectral-spatial classification of hyperspectral imagery based on partitional clustering techniques,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 8, pp. 2973–2987, Aug. 2009. [17] G. Camps-Valls, L. Gomez-Chova, J. Muñoz-Marí, J. Vila-Franc´es, and J. Calpe-Maravilla, “Composite kernels for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1, pp. 93–97, Jan. 2006. [18] F. A. Mianji and Y. Zhang, “Robust hyperspectral classification using relevance vector machine,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 6, pp. 2100–2112, Jun. 2011. [19] G. F. Hughes, “On the mean accuracy of statistical pattern recognizers,” IEEE Trans. Inf. Theory, vol. 14, no. 1, pp. 55–63, Jan. 1968. [20] B.-C. Kuo, C.-H. Li, and J.-M. Yang, “Kernel nonparametric weighted feature extraction for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 4, pp. 1139–1155, Apr. 2009. [21] T. V. Bandos, L. Bruzzone, and G. Camps-Valls, “Classification of hyperspectral images with regularized linear discriminant analysis,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 3, pp. 862–873, Mar. 2009. [22] C. L. P. Chen, H. Li, Y. Wei, T. Xia, and Y. Y. Tang, “A local contrast method for small infrared target detection,” IEEE Trans. Geosci. Remote Sens., in press, 2013. [23] H. Li, Y. Wei, L. Li, and C. L. P. Chen, “Hierarchical feature extraction with local neural response for image recognition,” IEEE. Trans. Cybern., vol. 43, no. 2, pp. 412–424, Apr. 2013. [24] M. Pal and G. M. Foody, “Feature selection for classification of hyperspectral data by SVM,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5, pp. 2297–2307, May 2010. [25] L. Ma, M. M. Crawford, and J. Tian, “Local manifold learning-based k-nearest-neighbor for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 11, pp. 4099–4109, Nov. 2010. [26] W. Li, S. Prasad, J. E. Fowler, and L. M. Bruce, “Locality-preserving discriminant analysis in kernel-induced feature spaces for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 5, pp. 894–898, Sep. 2011. [27] W. Liao, A. Piˇzurica, P. Scheunders, W. Philips, and Y. Pi, “Semisupervised local discriminant analysis for feature extraction in hyperspectral images,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 1, pp. 184–198, Jan. 2013. [28] W. Li, S. Prasad, J. E. Fowler, and L. M. Bruce, “Localitypreserving dimensionality reduction and classification for hyperspectral image analysis,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 4, pp. 1185–1198, Apr. 2012. [29] J. O. Ramsay and C. J. Dalzell, “Some tools for functional data analysis,” J. Royal. Statist. Soc B., vol. 53, no. 3, pp. 539–572, 1991. [30] J. O. Ramsay and B. W. Silverman, Applied Functional Data Analysis: Methods and Case Studies. New York, NY, USA: Springer, 2002. [31] J. O. Ramsay and B. W. Silverman, Functional Data Analysis, 2nd ed. New York, NY, USA: Springer, 2005. [32] F. Ferraty and P. Vieu, Nonparametric Functional Data Analysis: Theory and Practice. New York, NY, USA: Springer, 2006. [33] G. Biau, F. Bunea, and M. H. Wegkamp, “Functional classification in Hilbert spaces,” IEEE Trans. Inf. Theory, vol. 51, no. 6, pp. 2163–2172, Jun. 2005.

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 9, SEPTEMBER 2014

[34] F. Rossi, N. Delannay, B. Conan-Guez, and M. Verleysen, “Representation of functional data in neural networks,” Neurocomput., vol. 64, pp. 183–210, Mar. 2005. [35] F. Rossi and N. Villa, “Classification in Hilbert spaces with support vector machines,” in Proc. ASMDA, vol. 3071. May 2005, pp. 635–642. [36] F. Rossi and N. Villa, “Support vector machine for functional data classification,” Neurocomput., vol. 69, nos. 7–9, pp. 730–742, Mar. 2006. [37] N. Villa and F. Rossi, “Recent advances in the use of SVM for functional data classification,” in Proc. IWFOS, Jun. 2008, pp. 273–280. [38] T. Villmann and F.-M. Schleif, “Functional vector quantization by neural maps,” in Proc. WHISPERS, Aug. 2009, pp. 1–4. [39] M. K¨astner and T. Villmann, “Functional relevance learning in learning vector quantization for hyperspectral data,” in Proc. WHISPERS, Jun. 2011, pp. 1–4. [40] M. K¨astner, B. Hammer, M. Biehl, and T. Villmann, “Functional relevance learning in generalized learning vector quantization,” Neurocomput., vol. 90, pp. 85–95, Aug. 2012. [41] C. Krier, F. Rossi, D. Franc¸ois, and M. Verleysen, “A data-driven functional projection approach for the selection of feature ranges in spectra with ICA or cluster analysis,” Chemometr. Intell. Lab. Syst., vol. 91, no. 1, pp. 43–53, Mar. 2008.

Hong Li (M’07) received the M.Sc. degree in mathematics and the Ph.D. degree in pattern recognition and intelligence control from the Huazhong University of Science and Technology, Wuhan, China, in 1986 and 1999, respectively. She is currently a Professor with the School of Mathematics and Statistics, Huazhong University of Science and Technology. Her current research interests include approximation theory, wavelet analysis, learning theory, functional data analysis, hyperspectral image processing, and pattern recognition.

Guangrun Xiao received the B.Sc. degree in mathematics from the Huazhong University of Science and Technology, Wuhan, China, in 2008. He is currently pursuing the Ph.D. degree in control science and engineering at the School of Automation, Institute for Pattern Recognition and Artificial Intelligence, Huazhong University of Science and Technology. His current research interests include hyperspectral image processing, functional data analysis, and pattern recognition.

Tian Xia received the Honors degree in computer science and technology from the University of Adelaide, Adelaide, Australia, in 2009. He is currently pursuing the Ph.D. degree at the Faculty of Science and Technology, University of Macau, Macau, China. His current research interests include computer vision and pattern recognition.

LI et al.: HYPERSPECTRAL IMAGE CLASSIFICATION USING FDA

Y. Y. Tang (F’04) is currently a Chair Professor with the Faculty of Science and Technology, University of Macau, Macau, China, and a Professor/Adjunct Professor/Honorary Professor with several institutes, including Chongqing University, Chongqing, China, Concordia University, Montreal, QC, Canada, and Hong Kong Baptist University, Kowloon, Hong Kong. He has published more than 400 academic papers and authored and co-authored of over 25 monographs/books/book chapters. His current research interests include wavelets, pattern recognition, image processing, and artificial intelligence. Dr. Tang is the Founder and Editor-in-Chief of the International Journal on Wavelets, Multiresolution, and Information Processing and an Associate Editor of several international journals. He is the Founder and Chair of the Pattern Recognition Committee in IEEE Systems, Man, and Cybernetics. He has serviced as the General Chair, Program Chair, and Committee Member for many international conferences. He is the Founder and General Chair of the series International Conferences on Wavelets Analysis and Pattern Recognition. He is the Founder and Chair of the Macau Branch of International Associate of Pattern Recognition (IAPR). He is a fellow of the IAPR.

1555

Luoqing Li received the B.Sc. degree from Hubei University, Hubei, China, the M.Sc. degree from Wuhan University, Hubei, and the Ph.D. degree from Beijing Normal University, Beijing, China. He is currently with the Key Laboratory of Applied Mathematics, Hubei, and with the Faculty of Mathematics and Computer Science, Hubei University, where he became a Full Professor in 1994. His current research interests include approximation theory, wavelet analysis, learning theory, neural networks, signal processing, and pattern recognition. Dr. Li is the Managing Editor of the International Journal of Wavelets, Multiresolution, and Information Processing.

Hyperspectral image classification using functional data analysis.

The large number of spectral bands acquired by hyperspectral imaging sensors allows us to better distinguish many subtle objects and materials. Unlike...
17MB Sizes 1 Downloads 7 Views