Feature weighting algorithms for classification of hyperspectral images using a support vector machine Bin Qi,1 Chunhui Zhao,2,* and Guisheng Yin1 1
College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China 2
College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China *Corresponding author:
[email protected] Received 18 December 2013; revised 3 March 2014; accepted 31 March 2014; posted 1 April 2014 (Doc. ID 203271); published 25 April 2014
The support vector machine (SVM) is a widely used approach for high-dimensional data classification. Traditionally, SVMs use features from the spectral bands of hyperspectral images with each feature contributing equally to the classification. In practical applications, although affected by noise, slight contributions can also be obtained from deteriorated bands. Thus, compared with feature reduction or equal assignment of weights to all the features, feature weighting is a trade-off choice. In this study, we examined two approaches to assigning weights to SVM features to increase the overall classification accuracy: (1) “CSC-SVM” refers to a support vector machine with compactness and a separation coefficient feature weighting algorithm, and (2) “SE-SVM” refers to a support vector machine with a similarity entropy feature weighting algorithm. Analyses were conducted on a public data set with nine selected land-cover classes. In comparison with traditional SVMs and other classical feature weighting algorithms, the proposed weighting algorithms increase the overall classification accuracy, and even better results could be obtained with few training samples. © 2014 Optical Society of America OCIS codes: (100.0100) Image processing; (280.0280) Remote sensing and sensors; (110.2960) Image analysis. http://dx.doi.org/10.1364/AO.53.002839
1. Introduction
With wide use of hyperspectral remote-sensing technology in environment survey, precision agriculture, geological investigation, and military applications, there is continuous demand for improving the accuracy of classification algorithms [1–4]. Images captured by hyperspectral sensors provide hundreds of narrow and approximate contiguous spectral bands, so that detailed information about the objects can be derived [5]. A large number of spectral bands imply high-dimensional data, presenting significant challenges to the classification algorithms [6]. 1559-128X/14/132839-08$15.00/0 © 2014 Optical Society of America
The support vector machine (SVM) was proposed by Vapnik and his colleagues as a classification approach in the fields of pattern recognition and machine learning based on the structural risk minimization principle [7,8]. It is one of the most famous high-dimensional data classification algorithms, which aims at providing a trade-off between hypothesis space complexity and quality of fitting the training data. In recent years, SVMs have attracted increasing attention in remote-sensing hyperspectral communities [1,9–11]. Previous publications have shown competitive performance when applying SVMs to hyperspectral image classification [12,13]. However, due to the curse of dimensionality, the sample size required for training a specific highdimensional classifier grows exponentially with the 1 May 2014 / Vol. 53, No. 13 / APPLIED OPTICS
2839
dimensions [14]. That is, the number of spectral bands is too large relative to the size of training samples. As a consequence of the curse of dimensionality, the overall classification accuracy progressively increases with the addition of features but reaches the maximum and subsequently declines. In other words, the overall classification accuracy will not increase further with additional spectral information. An effective way to overcome this problem is to reduce the dimensions [14–16]. A realistic problem we have to face is the determination of the optimal reduced dimensions. In addition, although affected by noise, some deteriorated bands also provide slight contributions. Thus, compared with directly reducing the dimensions, it might be advantageous to assign higher weights to the features with larger contributions to the classification and lower weights to those associated with heavy noise. In traditional SVMs, each feature is treated with equal weight, even though they are unlikely to have equal contributions to the classification [17]. In this study, two feature weighting algorithms, the compactness and separation coefficient (CSC) feature weighting algorithm and the similarity entropy (SE) feature weighting algorithm, were proposed. The design idea of CSC feature weighting is based on the assumption that important bands should have smaller within-class difference and bigger betweenclass difference, and noisy bands have the opposite situation. Then, the spectral bands with higher discrimination will be given comparatively higher weights. As for SE feature weighting, an ideal feature vector for each class will be given initially. Then the similarity matrix composed of similarity values between each of the samples and the ideal feature vector will be calculated. In order to measure the uncertainty of each spectral band, we incorporate fuzzy entropy into the determination of feature weights with similarity matrix. Thus, the weighting algorithm may increase the separation of SVMs by giving comparatively higher weights to the bands with lower uncertainty. For convenient citation, we use the following abbreviations: “SVM” refers to the original support vector machine, “CSC-SVM” refers to the support vector machine with the compactness and separation coefficient feature weighting algorithm, and “SE-SVM” refers to the support vector machine with the similarity entropy feature weighting algorithm. The remainder of this paper is organized as follows. In Section 2, we propose two feature weighting algorithms and the way to incorporate weights in the classification with SVMs. In Section 3, we introduce the data set and experimental design. Then we discuss the parameter settings for each classifier. Experiments were carried out to assess the performance of each proposed weighting method in comparison with traditional SVMs and other classical feature weighting algorithms in Section 4. Finally, this paper ends with the conclusions in Section 5. 2840
APPLIED OPTICS / Vol. 53, No. 13 / 1 May 2014
2. Methods and Concepts A. Compactness and Separation Coefficient Feature Weighting Algorithm
In a theoretical reflectance data set, X fx1 ; x2 ; …; xN g, xi ∈ Rd is the training data set of M classes and N pixels, and each pixel has d features. w is a 1 × d vector that represents the weight of the feature vector. All the training samples are divided into M classes X fˆx1 ; xˆ 2 ; …; xˆ M g. DW is an N × d matrix, noted as a within-class diversity matrix, which represents the difference of each pixel with other pixels in the same class:
DW i; k
αm 1 X 2 x k − xˆ m j k αm − 1 j1 i αm 1 X 2 ˆxm k − xˆ m j k αm − 1 j1 t
i 1; …; N;
k 1; …; d;
(1)
i refers to the ith sample in the data set X, k refers to the kth feature, αm is the total number of pixels in the class m, and xi is supposed to be the tth pixel in m the mth class (i.e., xˆ m t , 1 ≤ t ≤ α ). The within-class diversity can be measured as
divW k
N 1X D i; k N i1 W
αm αm M 1 X 1 X 1 X 2 ˆxm k − xˆ m j k M m1 αm t1 αm − 1 j1 t
αm X αm M X 1 X 1 2 ˆxm k − xˆ m j k ; M m1 αm αm − 1 t1 j1 t
(2) DB is an N × d matrix, noted as a between-class diversity matrix, which represents the difference of each pixel with the pixels in other classes: M n 1 X 1 X x k − xˆ nj k2 M − 1 n1 αn j1 i α
DB i; k
n≠m
αn M 1 X 1 X ˆxm k − xˆ nj k2 ; M − 1 n1 αn j1 t
(3)
n≠m
xi is supposed to be the tth pixel in the mth class, and xˆ nj refers to a pixel from the nth class (n ≠ m). The between-class diversity can be measured as
divB k
H H1; H2; …; Hd:
N 1X D i;k N i1 B
αm αn M M 1X 1 X 1 X 1X ˆxm k − xˆ nj k2 M m1 αm t1 M − 1 n1 αn j1 t n≠m
M X M X
1 MM − 1 m1
n1 n≠m
αm X αn 1 X ˆxm k − xˆ nj k2 . αm αn t1 j1 t
(4)
In information theory, entropy is a measure of the uncertainty of a random variable. It reports how much information there is in an event. A feature with higher inconsistency (referred to uncertainty) in each class can result in higher fuzzy entropy. So, such kinds of features have little contributions to the classification and will be assigned smaller weights. The weight for the kth feature is given by wk
The weight for the kth feature is given by wk divB k∕divW k; B.
k 1; 2; …; d:
(5)
Assume all the samples are divided into M classes X fˆx1 ; xˆ 2 ; …; xˆ M g; vector vm is regarded as the centroid (in this study, the centroid refers to the average vector from the same class) for each class: m 1 X xˆ m ; αm j1 j
α
vm
m 1; 2; …; M:
(6)
The similarity between sample xi and ideal vector vm is given by 1∕2 1 X M 2 2 2 2 jxi k − vj kj − jxi k − vm kj : ϕm i;k M − 1 j1 j≠m
(7) With N training samples, a N × M × d similarity matrix can be obtained, which is given by Φ Φ1 ; Φ2 ; …; ΦN T :
(8)
1∕Hk ; max1∕H
k 1; 2; …; d:
(12)
1∕H 1∕H1; 1∕H2; …; 1∕Hd; max1∕H is the maximum element in 1∕H. C.
Similarity Entropy Feature Weighting Algorithm
(11)
Spectrally Weighted SVM Kernels
The basic classification method used in this study is SVM, which has already shown competitive performance in many machine learning applications and exhibits excellent performance in dealing with highdimensional features. A full introduction to SVMs can be found in [10,11,18]. Assume the training set consists of n vectors from the ddimensional feature space xi ∈ Rd i 1; 2; …; N. A label yi ∈ f−1; 1g is associated to each vector xi. The SVM classifier can be represented as f x sgn
X N
yi βi Kxi ; x b :
(13)
i1
β1 ; β2 ; …; βN are Lagrange multipliers, and b ∈ R is a bias. Kx; x0 ΦxΦx0 is the kernel function, which is the inner product of nonlinear transformation Φ·. Commonly used kernel functions are polynomials and Gaussian radial basis functions, as follows: Kx; y xT y 1e ;
(14)
Kx; y exp−γ‖x − y‖2 :
(15)
Φi is defined by the following expression: 2
ϕ1 i; 1
ϕ1 i; 2
6 6 ϕ2 i; 1 ϕ2 i; 2 6 Φi 6 6 4 ϕM i; 1 ϕM i; 2
.. .
ϕ1 i; d
3
7 ϕ2 i; d 7 7 : 7 7 5 ϕM i; d M×d
(9)
Then the fuzzy entropy of each feature could be obtained with the similarity matrix Φ, which is given by Hk −
N×M X
Φl; k logΦl; k
l1
1 − Φl; k log1 − Φl; k;
(10)
e is the order of the polynomials, and γ controls the width of the Gaussian radial basis function. For a sample xi xi 1; xi 2; …; xi d, the element xi j corresponds to the reflectance value of xi from the jth spectral band. In general kernel functions, each element is treated with equal weight whenever it is projected to the feature space. In this study, we prefer to assign higher weights to the important features and decrease the weights when they have little contributions to the classification. To reflect the above consideration, a weight vector w w1; w2; …; wd is used to scale each element xi j before mapping it into feature space. To simplify notation, we introduce a diagonal weighting matrix as W diagw. With this weighting 1 May 2014 / Vol. 53, No. 13 / APPLIED OPTICS
2841
matrix, the weighted polynomials kernel function and Gaussian radial basis kernel function can be written in the following forms: K W x; y xT W T Wy 1e ;
(16)
K W x; y exp−γ‖Wx − y‖2 :
(17)
Table 1.
Number of Training and Testing Pixels in Each Class
Class C1-Corn-notill C2-Corn-mintill C3-Grass-pasture C4-Grass-trees C5-Hay-windrowed C6-Soybean-notill C7-Soybean-mintill C8-Soybean-clean C9-Woods Total
W T W diagw2 1; w2 2; …; w2 d. In this procedure, the simple diagonal matrix can be used assign weights to each element of the feature vector.
Training
Testing
717 417 248 373 244 484 1234 307 647 4671
1434 834 497 747 489 968 2468 614 1294 9345
3. Experimental Design A.
AVIRIS Data Set
Public vegetation reflectance data from northwest Indiana’s Indian Pines (AVIRIS sensor, 12 June 1992: ftp://ftp.rcn.purdue.edu/biehl/MultiSpec/ 92AV3C) was used in this study (Fig. 1), which has been used in multiple publications [7,17,19]. The hyperspectral image consists of a scene of size 145 × 145 pixels, with a spatial resolution of 20 m∕pixel and 200 spectral bands. From the 16 different land-cover classes available in the original ground truth data, nine classes were selected to judge the effectiveness of different classifiers. Among the nine classes, “cornnotill” and “corn-mintill” are of the same species of corn with different types. Additionally, “soybeannotill,” “soybean-mintill,” and “soybean-cleantill” are of the same species of soybean with different types. B.
Training and Testing Data Sets
Experimental analysis was organized into three parts. The first aims at analyzing the effectiveness of proposed classifiers (CSC-SVM and SE-SVM). A comparison with traditional SVMs, support vector machine with maximum noise fraction transform (MNF-SVM) [19], support vector machine with discriminant analysis feature extraction (DAFE-SVM) [20], and support vector machine with decision boundary feature extraction (DBFE-SVM) [20] was provided as an assessment. For convenient citation, we noted MNF-SVM, DAFE-SVM, and DBFE-SVM as comparison weighting algorithms. The number of training and testing pixels in each class was shown in Table 1. In the second experiment, a comparison
between traditional SVMs and weighting SVMs was conducted with different sizes of training samples. To evaluate the robust performance of feature weighting algorithms, training data with different sample sizes were acquired. All the samples were randomly partitioned into 10 subgroups with approximately equal size. Initially, we chose the ratio of training data size as 0.1. That is, of the 10 subgroups, one subgroup was retained as training data and the remaining nine subgroups were used as testing data. Then, we chose the ratio as 0.2 that two subgroups were retained as training data and the remaining eight subgroups were used as testing data. In a similar manner, we chose different training data size ratios until it reached to 0.6. For each selection of training data size ratio, the process was repeated 10 times, with different combinations of subgroups used as training data and the remaining subgroups used as testing data. With this manner, we can evaluate the performance of different classifiers with respect to the selection of training samples. For simply demonstrating the significant improvement of the proposed weighting algorithms with other weighting algorithms, the Neyman–Pearson test on the overall classification accuracy with different training data size ratios was provided in the third part. C.
SVM and Parameter Settings
Similar to [21], we used the “one-against-one” SVM classification strategy without weighting of features as the initial SVM method. The kernel function used here is Gaussian RBF, as follows: Kx; y exp−γ‖x − y‖2 :
Average relative reflectance
0.8
Corn-notill Corn-min till Grass-pasture Grass-trees Hay-windrowed Soybean-notill Soybean-mintill Soybean-clean till Woods
0.7 0.6 0.5 0.4 0.3 0.2 0.1
500
1000
1500
2000
2500
Wavelength(nm)
(a)
(b)
Fig. 1. AVIRIS data set: (a) sample band of AVIRIS data set (band 120) and (b) corresponding average reflectance profiles. 2842
APPLIED OPTICS / Vol. 53, No. 13 / 1 May 2014
(18)
γ determines the width and tunes the smoothing of the discriminant function. The penalty C is another important factor in the SVM classifier, which controls the trade-off between the margin and the size of the slack variables. Consequently, to reliably optimize γ and C, a cross-validation framework was applied with γ ranging from 2−1 to 27 with steps of 20.5 and C ranging from 2−4 to 24 with steps of 20.5 (Fig. 2). Based on evaluation of parameters on the training data set, γ 20.5 and C 22 were selected as suitable parameters for traditional SVM
90.4 90.4
4
0
0.5
Selected combination and of
-3
72
85 . .7
6
1 log2 C
79
52
2
2.5
-4 -1
3
-0.5
0
0.5
60 67.98 75.96 83.94
4 60
1 log2 C
1.5
2
2.5
3 -1
3
73
.94
2 log2Cc log 2
2.5
3
3.5
4
-1 -1
60 66.63 73.26 79.89
86.52
86.52
Selected combination of and
86
5
79
60
1.5 log2γ
60 66.63 73.26 79.89
1 log2 C
4
1.5
2
2.5
.6
.2
5 73 66 . 1 .5 5
3
Selected combination of and 86
79
1
60
0.5
.2
. 65
7 6 6 3. 1 . 55
86 79
0
-0.5
.2
. 52
1.5
-0.5 60
3
83
0.5
75 .
86
.6 66
1
.63 60 6673.26 9 79 . 8 86.52
73 .26
log2γ
0
75 .96
0
66 .5
2
0 1 6
0.5
.9 60 8
0.5
2.5 60
1.5
94
-0.5
-0.5
83 .1
7
(c)
2
.89 79
83 .
67 .
3
2.5
.92 91
96 67
90.8 4
75 92.
60 67.98 75.96 83.94
1
0.5 75 .
75 .
.1
1.5
7
(b)
Selected combination of and
2
67 .
73
60 67.98 75.96 83.94
Selected combination and of
3.5
3
2.5
1 83 .
5
4.5
7 66 2 .1 78 .0 6 .24 84 8 .3 2 60
-3.5
. 14
1.5
60
1
-0.5
-2.5
67.760 75.4 83.1
83.1
. 83
66 .3 60 8
2
(a)
log2gγ log 2
90.4 90.4
6
60
6
.5
60
79 .1
3
-1 0
60 66.08 72.16 78.24 84.32
67.760 75.4
60 67.7 75.4
5.5
.32 84
.7
-0.5
0
60 66.08 72.16 78.24 84.32
60
log2γ
log2γ
8 60 6 .0.16 672 78.24 84 .32
78 .24 72 .16 66 .08
85
0
-1 -1
-1
-2
72
0.5
7 6.5
-1.5
91.9
6 1 6. 3 8 60
-0.5
75 .4
Selected combination of and
.14 79
72.76
1.5
85
. 52
60 66.38 72.76 79.14 85.52
67 .7
60 2
60 66.38 72.76 79.14 85.52
60 .38 6672.76 79.14
2.5
log2γ
3
93.15
6 0
0.5
(d)
1 log2C
(e)
1.5
2
60
-0.5
2.5
3
-1 -1
-0.5
0
0.5
1 log2 C
1.5
2
.2
.6
5 7 66 3. 1 .5 5
2.5
3
(f)
Fig. 2. Parameter selection for different classifiers. (a) SVM, (b) MNF-SVM, (c) DAFE-SVM, (d) DBFE-SVM, (e) CSC-SVM, and (f) SE-SVM.
[Fig. 2(a)], γ 2−2 and C 21 were selected for MNFSVM [Fig. 2(b)], γ 24.5 and C 22.5 were selected for DAFE-SVM [Fig. 2(c)], γ 21 and C 22 were selected for DBFE-SVM [Fig. 2(d)], γ 2−0.5 and C 22 were selected for CSC-SVM [Fig. 2(e)], and γ 22.5 and C 22 were selected for SE-SVM [Fig. 2(f)]. 4. Results and Discussion A.
Classification Accuracy
Following the algorithms discussed in Section 2, feature weighting based SVMs are implemented to evaluate the proposed feature weighting scheme. The performance of the proposed algorithms is compared with traditional SVMs with no spectral weighting as adopted in [21–24] and comparison weighting algorithms on the AVIRIS data set. Overall classification accuracies (i.e., the percentage of correctly classified pixels among all the test pixels considered) showed that the proposed weighting algorithms outperformed no feature weighting and comparison feature weighting algorithms. In comparison with CSC-SVM, SE-SVM exhibited better overall classification accuracy (Table 2). Compared with traditional SVMs, CSC-SVM and SE-SVM showed an increase in overall classification accuracy of 1.41% and 1.47%, respectively. One could also conclude from the classification accuracies that traditional SVMs exhibit fine performance on the distinctive classes (such as C3, C4, C5, and C9). However, when SVMs are used to classify pixels from the same species of different types (such as C1, C2, C6,
C7, and C8), the classification accuracy decreases a lot. But for the proposed weighting SVMs, they both show comparative higher classification accuracies on these classes. Figure 3 shows the ground reference and classification results with different classifiers. B. Classification Accuracy with Different Training Data Ratio
As part of assessing the robustness of feature weighting on overall classification accuracy, each classifier was tested with different sets of training sample size ratios and different selection of training data. With each selection of training sample size ratio, the process was repeated 10 times. All the overall classification results are shown in Fig. 4. It was obvious to observe that the proposed feature weighting algorithms performed better than traditional SVMs and comparison weighting algorithms and show strong robustness to the selection of training samples. For simply representing the classification results, the average overall classification accuracies were summarized with different selection of training sample size ratio in Fig. 5. For different selection of training sample size ratio, the proposed weighting algorithms outperform traditional SVMs and comparison weighting algorithms. CSC-SVM exhibited comparatively better performance. Careful evaluation reveals that the proposed weighting algorithms show even better performance whenever the training sample size ratio is small. That is, compared with traditional SVMs, CSC-SVM and SE-SVM showed an increase in average overall accuracy of 1.11% 1 May 2014 / Vol. 53, No. 13 / APPLIED OPTICS
2843
Fig. 3. Ground reference map and classification results. (a) Ground reference map, (b) classification result of SVM, (c) classification result of MNF-SVM, (d) classification result of DAFE-SVM, (e) classification result of DBFE-SVM, (f) classification result of CSC-SVM, and (g) classification result of SE-SVM.
and 0.81%, when 60% of the whole data was selected as training data. However, CSC-SVM and SE-SVM showed an increase in average overall accuracy of 1.60% and 1.14%, when 10% of the original data was selected as training data. The results highlight the adverse effects of the Hughes phenomenon when a small training data set is used, but it can be concluded that CSC-SVM and SE-SVM reduced the negative effects of the Hughes phenomenon. C.
Table 2.
H 1 :μ > μ0 ;
(19)
where μ is the average accuracy of the proposed algorithms and μ0 is the average accuracy of the comparison weighting algorithms. Assume the variance σ 2 of the proposed algorithms and the comparison weighting algorithms is the same. The parameter is set as θ0 μ0 ; σ 2 and θ μ; σ 2 . The likelihood of the variable Z with parameter θ is given by
Neyman–Pearson Test
In this study, we use the Neyman–Pearson test to evaluate the performance of the proposed weighting algorithms. Suppose the overall classification accuracy Z Z1 ; …; Zn is a random variable from a normal distribution. We wish to test the hypotheses at a significance level α 0.05 with a sample size of n 10:
1 Lθ p 2π σ
n
X n zi − μ2 exp − : 2σ 2 i1
(20)
Correspondingly, Lθ0 is given by n X n 1 zi − μ0 2 : Lθ0 p exp − 2σ 2 2π σ i1
(21)
Comparison of Cohen Kappa Coefficients, Overall Classification Accuracies (%), and Classification Accuracies (%) Conducted by the SVM, MNF-SVM, DAFE-SVM, DBFE-SVM, CSC-SVM, and SE-SVM Algorithms Yielded on AVIRIS Data Set
Cohen Kappa coefficient Overall classification accuracy C1-Corn-notill C2-Corn-mintill C3-Grass-pasture C4-Grass-trees Classification accuracy C5-Hay-windrowed C6-Soybean-notill C7-Soybean-mintill C8-Soybean-clean C9-Woods
2844
H 0 :μ μ0
SVM
MNF-SVM
DAFE-SVM
DBFE-SVM
CSC-SVM
SE-SVM
0.9472 95.51 92.40 91.49 97.99 99.73 99.80 92.36 95.06 95.77 99.61
0.9536 96.05 94.42 93.17 98.59 99.73 100.00 91.43 95.75 94.95 99.69
0.9493 95.67 92.82 94.48 99.20 99.20 99.59 93.60 93.68 96.91 99.54
0.9573 96.36 93.38 94.12 98.79 99.73 99.80 94.32 95.91 95.60 99.69
0.9638 96.92 94.42 95.92 98.59 99.60 99.80 95.45 96.07 97.07 99.69
0.9646 96.98 94.28 96.04 98.79 99.73 99.80 95.04 96.31 97.56 99.69
APPLIED OPTICS / Vol. 53, No. 13 / 1 May 2014
94
92 90 88 86 84 82 80
1
2
3
4
5 6 Experimental No.
7
8
SVM CSC-SVM SE-SVM MNF-SVM DAFE-SVM DBFE-SVM 9 10
96 95
92 90 88 86 84 82 80
1
2
3
4
5 6 Experimental No.
8
94 93 92 91 90 89
86
95
95
92 91 90 89 88 87 2
3
4
5 6 Experimental No.
7
8
SVM CSC-SVM SE-SVM MNF-SVM DAFE-SVM DBFE-SVM 9 10
94 93 92 91 90 89 88 87 86
1
2
3
4
5 6 Experimental No.
(d)
(e)
7
8
SVM CSC-SVM SE-SVM MNF-SVM DAFE-SVM DBFE-SVM 9 10
Overall classification accuracy(%)
95
1
2
3
4
5 6 Experimental No.
7
8
(c) 96
86
1
(b)
93
SVM CSC-SVM SE-SVM MNF-SVM DAFE-SVM DBFE-SVM 9 10
87
96
94
8
SVM CSC-SVM SE-SVM MNF-SVM DAFE-SVM DBFE-SVM 9 10
88
96
Overall classification accuracy(%)
Overall classification accuracy(%)
(a)
7
SVM CSC-SVM SE-SVM MNF-SVM DAFE-SVM DBFE-SVM 9 10
Overall classification accuracy(%)
96
94 Overall classification accuracy(%)
Overall classification accuracy(%)
96
94 93 92 91 90 89 88 87 86
1
2
3
4
5 6 Experimental No.
7
(f)
Fig. 4. Overall classification accuracy with different training sample size ratio. (a) Training sample size ratio 0.1, (b) training sample size ratio 0.2, (c) training sample size ratio 0.3, (d) training sample size ratio 0.4, (e) training sample size ratio 0.5, and (f) training sample size ratio 0.6.
The likelihood ratio is Lθ0 Lθ n n X 1 X 2 2 zi − μ0 − zi − μ exp − 2 2σ i1 i1 n 2 (22) exp − 2 μ − μ0 ; 2σ
Λ
In our problem, μ and σ 2 are unknown. We used the unbiased estimation of mean value Z¯ and unbiased
Average overall classification accuracy(%)
96 94
92
¯ Lθ0 1 Z − μ0 2 exp − p : Λ Lθ 2 S∕ n
88
86
84
0.2 0.3 0.4 0.5 Training sample size ratio
(24)
k is a value determined by the significant level. Since p Z¯ − μ0 ∕S∕ n ∼ tn − 1, the probability to reject H 0 with true H 0 is ¯ Z − μ0 P p ≥ k α: S∕ n
0.1
(23)
p The likelihood Λ will be small if jZ¯ − μ0 ∕S∕ nj is very big. In such a situation, we will reject H 0 and accept H 1 . That is, the proposed algorithms can provide a significant improvement compared with other comparison weighting algorithms. The region to reject H 0 is ¯ Z − μ0 p ≥ k: S∕ n
SVM MNF-SVM DAFE-SVM DBFE-SVM CSC-SVM SE-SVM
90
82
estimation of variance S2 from the sample to replace μ and σ 2 . Then, the likelihood ratio is given as
0.6
Fig. 5. Average overall classification accuracy with different training sample size ratio.
(25)
We could obtain k tα∕2 n − 1 t0.025p9 2.2622. The observation values of jZ¯ − μ0 ∕S∕ nj with proposed algorithms and comparison weighting algorithms are shown in Table 3. We can see that all the observation values fall in the reject region. So we will accept H 1 that the proposed algorithms have a significant improvement compared with the selected comparison weighting algorithms. 1 May 2014 / Vol. 53, No. 13 / APPLIED OPTICS
2845
Table 3.
p Observation Values of jZ − μ0 ∕S∕ nj with Proposed Algorithms and Comparison Weighting Algorithms
Training Sample Size Ratio
CSC-SVM
SE-SVM
MNF-SVM DAFE-SVM DBFE-SVM MNF-SVM DAFE-SVM DBFE-SVM
0.1
0.2
0.3
0.4
0.5
0.6
13.1711 11.7444 14.9509 15.1957 13.1050 17.8039
10.1631 15.4651 9.3497 11.5423 18.4116 10.4884
24.8382 47.2398 22.7294 11.6205 24.5648 10.4020
13.4054 31.9379 12.5746 7.8081 20.5654 7.2362
14.9378 46.8083 16.5433 6.9935 26.9968 8.0013
10.3989 32.5928 11.7498 4.6612 23.8121 5.8268
5. Conclusion
In this study, two feature weighting algorithms were proposed. Compared with traditional SVMs and comparison weighting algorithms, both of the feature weighting algorithms increased the overall classification accuracies of the hyperspectral image. As can be seen from the experimental results, the accuracy increased further, mainly due to pixels from the same species of different types. Moreover, experiments for robust performance evaluation were given. With different selection of training samples and training sample size ratio, CSC-SVM and SE-SVM always exhibited consistently fine performance. Finally, the Neyman–Pearson test proved the significant improvement of the proposed algorithms. References 1. F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Trans. Geosci. Remote Sens. 42, 1778–1790 (2004). 2. W. Li, S. Prasad, and J. E. Fowler, “Noise-adjusted subspace discriminant analysis for hyperspectral imagery classification,” IEEE Trans. Geosci. Remote Sens. 10, 1374–1378 (2013). 3. S. Prasad, M. Cui, W. Li, and J. E. Fowler, “Segmented mixture-of-Gaussian classfication for hyperspectral image analysis,” IEEE Trans. Geosci. Remote Sens. 11, 138–142 (2014). 4. P. Gurram and H. Kwon, “Sparse kernel-based ensemble learning with fully optimized kernel parameters for hyperspectral classfication problems,” IEEE Trans. Geosci. Remote Sens. 51, 787–802 (2013). 5. L. Guo, Y. Wu, L. Zhao, T. Cao, W. Yan, and X. Shen, “Classification of mental task from EEG signals using immune feature weighted support vector machines,” IEEE Trans. Magn. 47, 866–869 (2011). 6. P. Hsu, “Feature extraction of hyperspectral images using wavelet and matching pursuit,” ISPRS J. Photogramm. Remote Sens. 62, 78–92 (2007). 7. A. M. Filippi, R. Archibald, B. L. Bhaduri, and E. A. Bright, “Hyperspectral agricultural mapping using support vector machine-based endmember extraction (SVM-BEE),” Opt. Express 17, 23823–23842 (2009). 8. C. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens. 43, 1351–1362 (2005). 9. M. Marconcini, G. Camps-Valls, and L. Bruzzone, “A composite semisupervised svm for classification of hyperspectral images,” IEEE Trans. Geosci. Remote Sens. 6, 234–238 (2009).
2846
APPLIED OPTICS / Vol. 53, No. 13 / 1 May 2014
10. Y. Bazi and F. Melgani, “Toward an optimal SVM classification system for hyperspectral remote sensing images,” IEEE Trans. Geosci. Remote Sens. 44, 3374–3385 (2006). 11. M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, “Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles,” IEEE Trans. Geosci. Remote Sens. 46, 3804–3814 (2008). 12. F. A. Mianji and Y. Zhang, “SVM-based unmixing-toclassification conversion for hyperspectral abundance quantification,” IEEE Trans. Geosci. Remote Sens. 49, 4318–4327 (2011). 13. Y. Gu and K. Feng, “Optimized Laplacian SVM with distance metric learning for hyperspectral image classification,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 6, 1109–1117 (2013). 14. M. Pal and G. M. Foody, “Feature selection for classification of hyperspectral data by SVM,” IEEE Trans. Geosci. Remote Sens. 48, 2297–2307 (2010). 15. W. Li, S. Prasad, J. E. Fowler, and L. M. Bruce, “Localitypreserving dimensionality reduction and classification for hyperspectral image analysis,” IEEE Trans. Geosci. Remote Sens. 50, 1185–1198 (2012). 16. R. Huang and M. He, “Band selection based on feature weighting for classification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens. 2, 156–159 (2005). 17. B. Qi, C. Zhao, E. Youn, and C. Nansen, “Use of weighting algorithms to improve traditional support vector machine based classifications of relflectance data,” Opt. Express 19, 26816–26826 (2011). 18. Y. Tarabalka, M. Fauvel, J. Chanussot, and J. A. Benediktsson, “SVM- and MRF-based method for accurate classification of hyperspectral images,” IEEE Trans. Geosci. Remote Sens. 7, 736–740 (2010). 19. A. A. Green, M. Berman, P. Switzer, and M. D. Craig, “A transformation for ordering multispectral data in terms of image quality with implications for noise removal,” IEEE Trans. Geosci. Remote Sens. 26, 65–74 (1988). 20. D. A. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing (Wiley, 2003). 21. B. Guo, S. R. Gunn, R. I. Damper, and J. D. B. Nelson, “Customizing kernel functions for SVM-based hyperspectral image classification,” IEEE Trans. Image Process. 17, 622– 629 (2008). 22. C. Huang, L. S. Davis, and J. R. Townshend, “An assessment of support vector machines for land cover classification,” Int. J. Remote Sens. 23, 725–749 (2002). 23. C. A. Shah, P. Watanachaturaporn, M. K. Arora, and P. K. Varshney, “Some recent results on hyperspectral image classification,” in Proceedings of IEEE Workshop on Advances in Techniques for Analysis of Remotely Sensed Data (IEEE, 2003), pp. 346–353. 24. J. Gualtieri and R. Cromp, “Support vector machines for hyperspectral remote sensing classification,” in Proceedings of 27th AIPR Workshop Advances in Computer Assisted Recognition (SPIE, 1998), pp. 121–132.