Robust block sparse discriminative classification framework.

2806

J. Opt. Soc. Am. A / Vol. 31, No. 12 / December 2014

Liu et al.

Robust block sparse discriminative classification framework Yang Liu,1,* Chenyu Liu,1 Yufang Tang,1 Haixu Liu,1 Shuxin Ouyang,1 and Xueming Li1,2 1

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China 2 Beijing Key Laboratory of Network System and Network Culture, Xitucheng Road No. 10, Haidian District, Beijing 100876, China *Corresponding author: [email protected] Received May 15, 2014; revised August 6, 2014; accepted October 13, 2014; posted October 24, 2014 (Doc. ID 211571); published November 25, 2014

In this paper, a block sparse discriminative classification framework (BSDC) is proposed under the assumption that a block or group structure exists in sparse coefficients on classification. First, we propose a block discriminative dictionary-learning (BDDL) algorithm, which learns class-specific subdictionaries and forces the sparse coefficients to be block sparse. An efficient gradient-based optimization strategy of BDDL also is developed, and the block sparse constraint of the sparse coefficient leads to a least-squares solution of nonzero entries in the sparse coding stage of dictionary learning. Second, to take advantage of the structures when a new test sample is given, conventional sparse coding algorithms are discarded, and structured sparse coding methods are adopted. Experiments validate the effectiveness of the proposed framework in face recognition and texture classification. We also show that BSDC is robust to noise. © 2014 Optical Society of America OCIS codes: (100.0100) Image processing; (100.5010) Pattern recognition; (150.0150) Machine vision; (150.1135) Algorithms. http://dx.doi.org/10.1364/JOSAA.31.002806

1. INTRODUCTION Recent studies show that sparsity is a ubiquitous property exhibited by many real-world signals as well as man-made signals, such as audio and images, and sparsity also acts as a strong prior for solving ill-posed inverse problems [1,2]. Thus, sparse representation, also known as sparse coding, has drawn considerable interest in recent years. The problem solved by sparse representation is to approximate the given signal by sparse linear combinations of elements, also named atoms, on a basis or an over-complete dictionary. In practice, additional structure generally resides in sparse coefficients. A widely studied structure is block structure [3–5]; thus, a number of structured sparse coding algorithms are proposed to calculate the sparse coefficient with the block structure, and the block structure of sparse coefficients is successfully applied to image classification [6]. Sparse representation-based classification (SRC), proposed in [7], is a face recognition breakthrough, which has achieved striking recognition performance despite severe occlusion or corruption. The main idea of SRC is to represent a given test sample as a sparse linear combination of all training samples, then classify the test sample by evaluating which class leads to the minimum residual. Unfortunately, SRC picks atoms from training samples directly; therefore, to obtain excellent performance, a dictionary of huge size is essential, which is detrimental to solve the problem. Another method for obtaining the atoms is through dictionary-learning algorithms. This approach suggests using machine-learning techniques to infer the dictionary from training samples. In this case, the dictionary is typically represented as an explicit matrix, and a training algorithm is 1084-7529/14/122806-08$15.00/0

employed to adapt the matrix coefficient to the samples. Several effective dictionary-learning algorithms have been proposed in recent years, including an iterative least-squares dictionary-learning algorithm [8], K-SVD [9], online dictionary learning (ODL) [10], and a recursive least-squares dictionarylearning algorithm [11]. Sparse representation and dictionary learning has been successfully applied to a variety of applications, such as audio source separation [12], image denoising [13], image demosaicing image restoration [14], image classification [7,15,16], iris recognition [17], super resolution [18–20], action recognition [21], etc. Conventional dictionary-learning algorithms focus more on reconstruction error. However, for applications such as image classification, it is more important that the representation is discriminative for the given classes than a small reconstruction error [22]. To overcome this, Zhang and Li proposed a discriminative K-SVD dictionary-learning algorithm (D-KSVD) [23] by introducing a discriminative term into the objective function. Jiang et al. proposed a label consistent K-SVD algorithm (LC-KSVD) [24] by augmenting the K-SVD with a label consistent term. But both D-KSVD and LC-KSVD are based on K-SVD and learn a linear classifier, which therefore makes it difficult to utilize the structure of sparse coefficients. Mairal et al. proposed a discriminative dictionary-learning method in [25] by learning class-specific dictionaries and discriminative terms. Wang et al. proposed a class-specific dictionarylearning method for action recognition in [21] with dictionary coherence terms. These two class-specific dictionary-learning methods need to solve multiple sparse recovery problems and increase the computational complexity of classification. © 2014 Optical Society of America

Liu et al.

Vol. 31, No. 12 / December 2014 / J. Opt. Soc. Am. A

All discriminative dictionary-learning algorithms mentioned above ignore the structure of sparse coefficients. In this paper, a novel block sparse discriminative classification framework (BSDC) is proposed. BSDC learns classspecific dictionaries and forces the sparse coefficients of training samples to be block sparse in the training stage. We concatenate all class-specific dictionaries to form an over-complete dictionary and adopt a block sparse coding algorithm to obtain the sparse representation of given sample in the test stage. A. Organization of This Paper This paper is organized as follows. Related work in the literature is first reviewed in Section 2. Section 3 introduces the proposed framework in detail. Extensive experiments are conducted in Section 4. Finally, Section 5 concludes the paper.

A. Sparse Representation The basic model of sparse representation is given by x

s:t: y Dx;

(1)

where y denotes the vector of the given signal, D is the dictionary, and x is the sparse coefficient to be found. ‖x‖0 represents the l0 quasi-norm, which counts the number of nonzero entries in x. However, to solve Eq. (1) is NP-hard [26]. It is proven in [27] that if the solution x sought is sparse enough, the solution of the l0 minimization problem of Eq. (1) is equal to the solution of the following l1 minimization problem: argmin‖x‖1 ; x

B. Dictionary Learning The dictionary-learning problem can be formulated as follows: argmin D;X

‖xi ‖p α‖Y − DX‖2F

s:t: ‖d‖2 1; p ∈ f0; 1g;

i

(4) where D is the dictionary to learn, d is the column of D, Y is the training sample, X is the coefficient matrix, xi denotes the ith column of X, also the corresponding sparse coefficient of the ith training sample in Y, and α is the trade-off between sparsity term and reconstruction error term. This optimization is nonconvex jointly in D and X, but is convex to one variable when the other is fixed (p 1). So the dictionary-learning algorithm can be divided into two steps iterative solved:

s:t: y Dx:

(2)

The first step is also called the sparse coding stage, and X can be found by the sparse coding algorithms mentioned above. The second step also can be referred to as the dictionary update stage, which is the key part of the dictionarylearning algorithm. For simplicity, the details of the common update strategies are not presented and can be found in [8–11]. C. Methods for Comparison Zhang and Li proposed a discriminative K-SVD dictionarylearning algorithm (D-KSVD) [23]. D-KSVD introduced discriminative term H into the objective function of Eq. (4) as follows: argmin D;W;X

Several algorithms are proposed to solve the l0 minimization problem or l1 minimization problem, such as the orthogonal matching pursuit (OMP) [28], iterative hard threshold [29], basis pursuit [30], FOCal underdetermined system solver [31], fast iterative shrinkage-thresholding algorithm [32], and sparse Bayesian learning (SBL) [33]. Further, sparse recovery algorithms that solve lp minimization where 0 < p < 1 are proposed [34]. In practice, x generally has additional structure. A widely studied structure is block structure, in which x can be divided into N blocks, i.e., x x1 ; …; xb1 ; …; xbN−1 1 ; …; xbN T ; |{z} |{z} xT1

X

1. Keeping D fixed find X. 2. Keeping X fixed update D.

2. RELATED WORK

argmin‖x‖0 ;

2807

(3)

xTN

where xTi denotes the ith block of x. Among N blocks, only kk ≪ N are nonzero, but their location is unknown. A number of structured sparse coding algorithms are proposed to solve this sparse representation problem with the block structure. Typical algorithms include group-lasso [3], block-OMP [4], model-CoSaMP [5], group basis pursuit [35], mixed l2 ∕l1 program [36], block sparse Bayesian learning (BSBL) [37], etc. BSBL not only captures the block structure of x, it also considers the intrablock correlation and improves the recovery performance. Thus, we choose BSBL to solve the structure sparse representation problem in this paper.

X

‖xi ‖1 α‖Y − DX‖2F β‖H − WX‖2F

i

s:t: ‖d‖2 1;

(5)

where W is a learned linear classifier that maps sparse coefficients to labels, H is the ground-truth labels. Each column of H is a vector: hi 0; …; 1; …; 0T , where position of the nonzero element indicates the class. D-KSVD learns a dictionary and a linear classifier simultaneously. The learned dictionary is discriminative by adding term a‖H − WX‖2F to the objective function, which learns an ideal classifier W. The labels of training samples are exploited in Eq. (5) and make D-KSVD a supervised learning algorithm that has better performance in face recognition than plain K-SVD. Jiang et al. proposed a label consistent K-SVD algorithm (LC-KSVD) [24]. LC-KSVD is based on the work of D-KSVD and augments it with a label consistent term Q. The objective function is as follows: argmin D;W;X

X

‖xi ‖1 α‖Y − DX‖2F β‖H − WX‖2F γ‖Q − AX‖2F

i

s:t: ‖d‖2 1;

(6)

where Q is the discriminative sparse codes of Y. Each column of Q is a vector: qi 0; 0; …; 1; 1; …; 0; 0T , and the kth component of qi is nonzero if yi and dk share the same label.

2808


Liu et al.

The term ‖Q − AX‖2F represents the discriminative sparse coding error, which enforces the signals from the same class to have similar sparse representation, i.e., label consistency. Both D-KSVD and LC-KSVD utilize the information contained in the labels of training samples to improve classification performance. Further, LC-KSVD make an assumption that the sparse codes should be label consistent. But, actually, it is proven in [38] that, if the dictionary is well-learned, the sparse signal codes should be block sparse. In this paper, the proposed BSDC obtains better results by assuming that the sparse codes of a signal should be block sparse in classification.

where Xk is the sparse representation of Yk , and Xki denotes the corresponding rows associated with Di in Xk . It is proven in [38] that, if D is well learned, the sparse representation x of a given test sample from the ith class y is block sparse, i.e., xi 0 and xj 0 for all j i. Then Eq. (7) can be rewritten as

D. Our Contributions The main contributions of this work are as follows.

1 k αX T k 2 2 argmin ‖Y − Dk Xk ‖F ‖D D ‖ 2 2 ik i k F Dk ;Xk

1. We propose a block discriminative dictionary learning (BDDL) algorithm that learns discriminative class-specific dictionaries and concatenates them to form an over-complete dictionary for classification to avoid solving multiple sparse recovery problems. The block structure of sparse coefficients is exploited during learning. 2. An efficient gradient-based dictionary updating strategy of BDDL is developed. 3. We replace conventional sparse coding algorithms with structured sparse coding methods to be consistent with BDDL. 4. The proposed framework is robust to noise.

In this section, the proposed BSDC framework is introduced in detail. The BSDC consists of three parts: a proposed BDDL algorithm, a structured sparse coding algorithm, and a classifier. The diagram of BSDC is illustrated in Fig. 1. A. Block Discriminative Dictionary Learning Assuming we have a training set Y fYi gN i1 with N classes. The goal of dictionary learning algorithms is to learn a dictionary D under, which, given test sample y, can find a sparse representation x. Every column in D is associated with a label, so we divide D into N part D D1 ; D2 ; …; DN , Di is a classspecific subdictionary for the ith class. The kth class in training set Y can be decomposed as N X i1

Di Xki ;

(7)

(8)

We capture the relationship among different subdictionaries with a dictionary incoherence term ‖DTi Dj ‖2F , and the proposed BDDL algorithm for the kth subdictionary is formulated as follows: s:t: ‖d‖2 1; (9) where the first fidelity term ‖Yk − Dk Xkk ‖2F minimizes the reconstruction error and forces Xk to be block sparse, i.e., Xik

non-zero; 0;

ik : i≠k

We can obtain a block diagonal matrix if we concatenate all Xi together: 2 X X1

3. BLOCK SPARSE DISCRIMINATIVE CLASSIFICATION FRAMEWORK

Yk DXk

Yk Dk Xkk :

X2

6 6 … XN 6 4

X11

3 X22

..

7 7 7: 5

. XN N

The second dictionary incoherence term ‖DTi Dk ‖2F reduces the correlation among subdictionaries to enlarge the interclass distance. α is a trade-off between fidelity term and dictionary coherence term. B. Optimization of BDDL The objective function in Eq. (9) is not jointly convex, so we divide the optimization into a sparse coding stage and dictionary update stage, which are similar to what is mentioned in Section 2B. Unlike conventional dictionary-learning algorithms, we only need to solve a least-squares problem, unlike conventional dictionary-learning algorithms in sparse coding stage: argmin‖Yk − Dk Xkk ‖2F : Xk

Fig. 1. Diagram of proposed block sparse discriminative classification framework.

10

Liu et al.


The block sparse constraint is implicitly incorporated in sparse coding stage as analyzed in Section 3A, and the solution of Eq. (10) is Xkk DTk Dk −1 DTk Yk ; the remaining components of Xk are all zero, i.e., Xk 0; …; Xkk T ; …T . In the dictionary update stage, we fix the sparse coefficients Xk , and the optimal problem is as follows: 1 αX T argmin ‖Yk − Dk Xkk ‖2F ‖Di Dk ‖2F 2 2 i≠k Dk

s:t: ‖d‖2 1; (11)

where LDk denotes the objective function in Eq. (11), and the derivative of LDk w.r.t. Dk is ∇L α

X

Di DTi Dk − Yk − Dk Xkk Xkk T :

(12)

i≠k

There is no close-form solution for Dk , so we solve it iteratively via gradient descent: Dk ≔Dk − λ∇L;

(13)

where λ is the step size of gradient descent, and the column of Dk will be normalized after updated to assure ‖d‖2 1. The parameter λ can be determined by a strategy of line search. We plug Eq. (13) into Eq. (11) and obtain a function of λ, f λ LDk − λ∇L. We minimize f λ to obtain the optimal λ by letting ∇f 0 and obtain the close-form of λ: P TrAT Dk ∇LXkk − α TrB∇L i≠k P λ− ; ‖Dk ∇LXkk ‖2F α ‖DTi ∇L‖2F

(14)

i≠k

where A Yk − Dk Xkk ;

(15)

B DTk Di DTi :

(16)

The details of the derivation of Eqs. (12) and (14) are shown in Appendix A. C. Structured Sparsity and Classification After N class-specific subdictionaries are learned, we concatenate them together to form an over-complete dictionary, namely, D D1 ; D2 ; …; DN . When a test sample y is given, its sparse coefficient x is found by the following optimal problem: argmin‖x‖p ; x

s:t: y Dx;

(17)

where p ∈ f0; 1g. This is a typical sparse representation problem whose general solution is introduced above. But there is something more for the sparse coefficient x in classification task. First, each subdictionary Di is learned separately under the block sparse assumption. Second, it is proved in [38] that the correlation among all components in x, i.e., the block sparse, exists. Conventional sparse coding algorithms, such as OMP, do not consider this correlation, and they are not the best choice

2809

here. To preserve the correlation among all components in x, we optimize Eq. (17) via BSBL, one of the structured sparse coding algorithms. Note that the group sparsity is imposed in a different way. In the training stage, training data are enforced to be purely represented by the corresponding subdictionary, while a group sparse regularization is used in the test stage. The reason for this is that we have a ground truth label of training data, and, by representing them purely by the corresponding subdictionary, we can take full advantage of this information to improve the performance. Further, we add a dictionary coherence term to enlarge the interclass distance and make test samples be approximately represented by their corresponding subdictionary. This is widely used in sparse representationbased classification, such as D-KSVD and LC-KSVD, the methods used for comparison in this paper. They learn dictionaries and find the sparse codes of training samples using their proposed regularization, and, in test stage, an OMP algorithm is utilized to solve sparse coding problem. The given test sample y is assigned with a label that has the minimal reconstruction error. We define x¯ i to be a vector with the same size of x but set all entries to zero except the ones associated with the ith class, i.e., xi . The classification is formulated as follows: argmin‖y − D¯xi ‖;

(18)

i

where the optimal i is the result of classification of the test sample y.

4. EXPERIMENTAL RESULTS In this section, extensive experiments are conducted to test the performance of the proposed BSDC. Specifically, we test the effectiveness of the proposed framework on face recognition on an extended Yale B database and texture classification on a Brodatz dataset; the robustness is also verified on face recognition on the extended Yale B database [39,40]. We compare the performance of our proposed framework to that of SRC [7] with randomly selected training samples (the total number of training samples used in SRC should be constrained based on the dictionary size for a fair comparison) as dictionary (referred to as “RSRC”), RSRC adopted structured sparse coding algorithms in classification (referred to as “RSRCs ”), KSVD [9] learning framework and KSVD adopted structured sparse coding algorithms in classification (referred to as “KSVDs ”), D-KSVD[23] algorithm and D-KSVD adopted structured sparse coding algorithms in classification (referred to as “D-KSVDs ”), LC-KSVD [24] algorithm and LCKSVD adopted structured sparse coding algorithms in classification (referred to as “LC-KSVDs ”). The structured sparse coding used in the following experiments is BSBL. A. Parameters for BSDC Two parameters are to be determined in BSDC: the subdictionaries size per class d and the trade-off between fidelity term and dictionary coherence term α. We perform fivefold cross validation to find the best parameter pair. For experiments on the extended Yale B, the test values of d are {10, 13, 15, 17, 20} and the test values of α are {0.01, 0.1, 1, 10, 100}. For experiments on the Brodatz dataset, the test

2810


values of d are {50, 100, 150, 200, 250}, and the test values of α are {0.01, 0.1, 1, 10, 100}. B. Experiment on Face Recognition In this subsection, we conduct experiments on face recognition on the extended Yale B database, which comprises 2414 face images of 38 people, with 64 images per person and 192 × 168 pixels per image. This database is challenging for its varying illumination condition and expression: 1900 images, 50 per person, are selected randomly for training and the rest for testing. Each image is projected onto a 504-dimensional vector with a randomly generated matrix from zero-mean normal distribution. The experimental results are summarized in Table 1. As can be seen from Table 1, the accuracy of RSRC goes higher with more atoms per subdictionary, while the following learning algorithms obtain the highest accuracy at 15 atoms per subdictionary. The reason is that RSRC do not learn from training samples but randomly pick some to form dictionaries; more atoms per subdictionary means more information in it, while KSVD, D-KSVD, LC-KSVD, and the proposed method will summarize all training samples to infer a subdictionary, and all of them tell us that 15 atoms are sufficient to represent face images of a person. Under the same atom size, the proposed BSDC obtains the highest accuracy for its discriminative learning strategy and the utilization of the structure of sparse coefficients. Another phenomenon is the improvement of accuracy of RSRC, KSVD, D-KSVD, and LC-KSVD when structured sparse coding algorithms are used. RSRC achieves a remarkable improvement because the dictionary is not learned but picked from training samples, and the intrinsic structure is preserved; the structured sparse coding algorithm is consistent with this intrinsic structure. Oppositely, KSVD, D-KSVD, and LC-KSVD do not force the coefficients to be block sparse, and their accuracy increases a little. Further, we conduct an experiment to test the performance of KSVD, D-KSVD, LC-KSVD, and proposed BSDC with a varying number of training samples with an atom size set to be 15. The result is presented in Table 2. The accuracy gets higher with more training samples. But we should not use too many samples to train because the total number of images is limited; more training ones means less test samples and makes the results unstable. It is okay for performance comparison if the number of training samples of each algorithm is the same.

Liu et al.

Table 2. Performance of Different Methods with Varying Number of Training Samples on Extended Yale B Number of Training Samples Method

30

40

92.9 94.1 95.3 96.6

93.1 94.4 95.5 96.9

50

55

93.3 94.6 95.7 97.2

93.3 94.7 95.8 97.3

Acc (%) KSVD [9] D-KSVD [23] LC-KSVD [24] BSDC

C. Experiment of Robustness on Face Recognition In this subsection, we carry on several experiments to verify the robustness of BSDC on an extended Yale B database. A percentage of randomly chosen pixels from each of the test images is corrupted. The corrupted pixels are randomly chosen for each test image, and the location is unknown to the algorithm. We vary the percentage of corrupted pixels from 10 to 50 percent. Figure 2 shows an example test sample. The dictionary used here is the same as in Section 4.B; only the test samples are contaminated with uniform distribution noise. The experimental results are presented in Table 3. It can seen in Table 3 that the structured sparse coding algorithm significantly improves the classification performance when the test samples are corrupted by noise. The dual sparsity of structured sparse coding algorithm can explain this phenomena; first, the solution x should be sparse in global; second, the block structure of x forces the sparsity of the nonzero block. The dual sparsity increases the probability that the nonzero entries of x choose the correct location. D. Experiment on Texture Classification In this subsection, we perform an experiment on texture classification on a Brodatz dataset, which is comprised of 111 texture images of size 640 × 640. We select three texture images (Fig. 3) to perform the experiment. Each image is divided into 1600 16 × 16 patches, and 1200 of these patches are randomly chosen for training and the remaining 400 for test.

Table 1. Accuracy of Face Recognition Using Different Methods on Extended Yale B Atom Size Method RSRC [7] RSRCs [7] KSVD [9] KSVDs [9] D-KSVD [23] D-KSVDs [23] LC-KSVD [24] LC-KSVDs [24] BSDC

10

13

15

17

20

77.2 89.1 90.4 91.1 91.7 92.3 92.5 92.9 94.2

79.5 91.1 92.2 92.9 93.5 94.1 94.4 94.9 95.8

Acc (%) 81.9 92.8 93.3 93.9 94.6 95.1 95.7 96.2 97.2

83.5 94.2 92.7 93.4 94.1 94.7 95.0 95.3 96.4

84.2 95.3 91.9 92.5 93.2 93.9 94.2 94.2 95.4

Fig. 2. Corrupted test samples.

Liu et al.


2811

Table 3. Performance with Contaminated Test Samples Methods Corruption

RSRC [7]

RSRCs [7]

KSVD [9]

KSVDs [9]

80.2 75.0 61.2 41.9 31.6

91.6 90.2 88.7 82.8 68.9

91.2 86.8 74.9 61.8 48.1

93.0 92.5 90.2 84.4 71.8

10% 20% 30% 40% 50%

D-KSVD [23] Acc (%) 92.7 88.1 76.3 65.7 58.3

D-KSVDs [23]

LC-KSVD [24]

LC-KSVDs [24]

BSDC

93.9 92.9 91.1 86.8 72.7

94.3 90.1 79.9 70.1 64.8

95.4 93.2 92.1 87.6 74.5

96.49 95.33 93.38 88.32 76.26

Fig. 3. Test images selected from Brodatz dataset.

Table 4. Accuracy of Texture Classification Using Different Methods Atom Size Method RSRC [7] RSRCs [7] KSVD [9] KSVDs [9] D-KSVD [23] D-KSVDs [23] LC-KSVD [24] LC-KSVDs [24] BSDC

50

100

150

200

250

51.2 66.7 54.1 66.0 55.3 67.8 57.1 70.1 72.5

53.2 68.8 55.1 67.2 56.8 69.1 58.5 71.3 73.7

Acc (%) 55.3 70.8 56.7 68.6 58.1 70.3 59.8 72.4 74.9

57.1 72.5 55.8 67.9 57.1 69.5 58.9 71.6 74.1

58.7 73.6 54.6 66.5 55.9 68.4 57.6 70.7 72.9

per subdictionary. The reason is explained in Section 4.B. The difference is that the performance of all methods is greatly improved by structured sparse coding algorithms, since the interclass distance in this experiment is much larger than that in the extended Yale B. Further, we conduct an experiment to test the performance of KSVD, D-KSVD, LC-KSVD, and proposed BSDC with varying number of training samples with atom size set to be 150. The result is presented in Table 5. It can be found from Table 5 that the accuracy of all algorithms gets higher with more training samples. More training samples lead to unstable results, and a different number of training samples do not make a difference in the results of performance comparison.

5. CONCLUSION The experimental results are presented in Table 4. They are similar to the results in Table 1; the accuracy of RSRC goes higher with more atoms per subdictionary, while the following learning algorithms obtain the highest accuracy at 150 atoms

Table 5. Performance of Different Methods with Varying Number of Training Samples on Brodatz Number of Training Samples Method

1200

1400

KSVD [9] D-KSVD [23] LC-KSVD [24] BSDC

55.4 56.8 58.7 73.6

56.3 57.6 59.4 74.4

1600

1800

Acc (%) 56.7 58.1 59.8 74.9

56.9 58.5 60.1 75.1

In this paper, a novel block sparse discriminative classification is proposed under the assumption that the sparse coefficients of test samples are block sparse in classification. We first train class-specific subdictionaries from training samples via the proposed block discriminative dictionary-learning algorithm; then a structured sparse coding algorithm is adopted to find the sparse coefficients of test samples; finally, a classifier based on the minimal reconstruction error is applied. An effective gradient-based optimization scheme with line search is developed to learn the class-specific subdictionaries. The discriminative power of the dictionary and the structure of sparse coefficients are considered in BSDC. Experimental results show that the proposed framework outperforms SRC, KSVD, D-KSVD, and LC-KSVD in image classification. We also present the robustness introduced by structured sparse coding algorithms.

2812


Liu et al.

APPENDIX A

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications.

The derivation of Eq. (12): 1 αX T LDk ‖Yk − Dk Xkk ‖2F ‖D D ‖2 2 2 i≠k i k F 1 TrYk − Dk Xkk Yk − Dk Xkk T 2 αX TrDTi Dk DTi Dk T : 2 i≠k

REFERENCES

(A1)

We take the derivative of LDk w.r.t. Dk and obtain ∇L α

X

Di DTi Dk − Yk − Dk Xkk Xkk T :

(A2)

i≠k

The derivation of Eq. (14): 1 f λ LDk − λ∇L ‖Yk − Dk Xkk λDk ∇LXkk ‖2F 2 αX T ‖D D − λDi ∇L‖2F : 2 i≠k i k

(A3)

For simplicity, we let A Yk − Dk Xkk ;

(A4)

and we replace ‖M‖2F to TrMMT , 1 f λ TrAAT λ2 Dk ∇LXkk Xkk T ∇LT DTk 2 αX TrλDk ∇LXkk AT TrDTi Dk DTk Di 2 i≠k X αX − α TrλDTi ∇LDTk Di Trλ2 DTi ∇L∇LT Di : 2 ik i≠k (A5) We take the derivative of f λ w.r.t. λ and let ∇f 0, ∇f TrAT Dk ∇LXkk − α

X

TrDTk Di DTi ∇L

i≠k

λTrDk ∇LXkk Xkk T ∇LT DTk X λ TrDTi ∇L∇LT Di 0:

(A6)

i≠k

Again, we let B DTk Di DTi :

(A7)

We replace TrMMT to ‖M‖2F , and we get P P TrAT Dk ∇LXkk − α TrB∇L i≠k P λ− : ‖Dk ∇LXkk ‖2F α ‖DTi ∇L‖2F

(A8)

i≠k

ACKNOWLEDGMENTS This work was supported and completed at the Digital Media Technology Lab, Multimedia Information Technology Center,

1. M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing (Springer, 2010). 2. M. Elad, M. A. Figueiredo, and Y. Ma, “On the role of sparse and redundant representations in image processing,” Proc. IEEE 98, 972–982 (2010). 3. M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” J. R. Stat. Soc. Ser. B 68, 49–67 (2006). 4. Y. C. Eldar, P. Kuppinger, and H. Bolcskei, “Block-sparse signals: uncertainty relations and efficient recovery,” IEEE Trans. Signal Process. 58, 3042–3054 (2010). 5. R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, “Modelbased compressive sensing,” IEEE Trans. Inf. Theory 56, 1982–2001 (2010). 6. A. Majumdar and R. K. Ward, “Classification via group sparsity promoting regularization,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (IEEE, 2009), pp. 861–864. 7. J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 210–227 (2009). 8. K. Engan, K. Skretting, and J. H. Husøy, “Family of iterative LSbased dictionary learning algorithms, ILS-DLA, for sparse signal representation,” Digital Signal Process. 17, 32–49 (2007). 9. M. Aharon, M. Elad, and A. Bruckstein, “The K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process. 54, 4311–4322 (2006). 10. J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning for sparse coding,” in Proceedings of the 26th Annual International Conference on Machine Learning (ACM, 2009), pp. 689–696. 11. K. Skretting and K. Engan, “Recursive least squares dictionary learning algorithm,” IEEE Trans. Signal Process. 58, 2121–2130 (2010). 12. R. Gribonval, “Sparse decomposition of stereo signals with matching pursuit and application to blind separation of more than two sources from a stereo mixture,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (IEEE, 2002), Vol. 3, pp. 3057–3060. 13. M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process. 15, 3736–3745 (2006). 14. J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color image restoration,” IEEE Trans. Image Process. 17, 53–69 (2008). 15. E. Elhamifar and R. Vidal, “Robust classification using structured sparse representation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2011), pp. 1873–1879. 16. V. M. Patel, Y.-C. Chen, R. Chellappa, and P. J. Phillips, “Dictionaries for image and video-based face recognition,” J. Opt. Soc. Am. A 31, 1090–1103 (2014). 17. J. K. Pillai, V. M. Patel, R. Chellappa, and N. K. Ratha, “Secure and robust iris recognition using random projections and sparse representations,” IEEE Trans. Pattern Anal. Mach. Intell. 33, 1877–1893 (2011). 18. J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image superresolution via sparse representation,” IEEE Trans. Image Process. 19, 2861–2873 (2010). 19. X. Ma, H. Quang Luong, W. Philips, H. Song, and H. Cui, “Sparse representation and position prior based face hallucination upon classified over-complete dictionaries,” Signal Process. 92, 2066– 2074 (2012). 20. M. F. Tappen, B. C. Russell, and W. T. Freeman, “Exploiting the sparse derivative prior for super-resolution and image demosaicing,” in IEEE Workshop on Statistical and Computational Theories of Vision (IEEE, 2003).

Liu et al. 21. H. Wang, C. Yuan, W. Hu, and C. Sun, “Supervised class-specific dictionary learning for sparse modeling in action recognition,” Pattern Recogn. 45, 3902–3911 (2012). 22. K. Huang and S. Aviyente, “Sparse representation for signal classification,” in Proceedings of Advances in Neural Information Processing Systems (IEEE, 2006), pp. 609–616. 23. Q. Zhang and B. Li, “Discriminative K-SVD for dictionary learning in face recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2010), pp. 2691–2698. 24. Z. Jiang, Z. Lin, and L. Davis, “Label consistent k-svd: learning a discriminative dictionary for recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 35, 2651–2664 (2013). 25. J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Discriminative learned dictionaries for local image analysis,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, 2008), pp. 1–8. 26. E. Amaldi and V. Kann, “On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems,” Theor. Comput. Sci. 209, 237–260 (1998). 27. D. L. Donoho, “For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution,” Commun. Pure Appl. Math. 59, 797–829 (2006). 28. Y. C. Pati, R. Rezaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Conference Record of The Twenty-Seventh Asilomar Conference on Signals, Systems and Computers (IEEE, 1993), pp. 40–44. 29. T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sensing,” Appl. Comput. Harmon. Anal. 27, 265–274 (2009).


2813

30. S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comput. 20, 33–61 (1998). 31. I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm,” IEEE Trans. Signal Process. 45, 600–616 (1997). 32. A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci. 2, 183–202 (2009). 33. M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res. 1, 211–244 (2001). 34. C. B. Shaw and P. K. Yalavarthy, “Performance evaluation of typical approximation algorithms for nonconvex lp-minimization in diffuse optical tomography,” J. Opt. Soc. Am. A 31, 852–862 (2014). 35. E. Van Den Berg and M. P. Friedlander, “Probing the Pareto frontier for basis pursuit solutions,” SIAM J. Sci. Comput. 31, 890–912 (2008). 36. Y. C. Eldar and M. Mishali, “Robust recovery of signals from a structured union of subspaces,” IEEE Trans. Inf. Theory 55, 5302–5316 (2009). 37. Z. Zhang and B. Rao, “Extension of SBL algorithms for the recovery of block sparse signals with intra-block correlation,” IEEE Trans. Signal Process. 61, 2009–2015 (2013). 38. E. Elhamifar and R. Vidal, “Sparse subspace clustering: algorithm, theory, and applications,” IEEE Trans. Pattern Anal. Mach. Intell. 35, 2765–2781 (2013). 39. K.-C. Lee, J. Ho, and D. J. Kriegman, “Acquiring linear subspaces for face recognition under variable lighting,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 684–698 (2005). 40. A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few to many: illumination cone models for face recognition under variable lighting and pose,” IEEE Trans. Pattern Anal. Mach. Intell. 23, 643–660 (2001).

LGE-KSVD: robust sparse representation classification.

Discriminative sparse connectivity patterns for classification of fMRI Data.

DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework.

Matrix variate distribution-induced sparse representation for robust image classification.

Robust Pedestrian Classification Based on Hierarchical Kernel Sparse Representation.

A Robust Sparse Representation Model for Hyperspectral Image Classification.

Multiscale Region-Level VHR Image Change Detection via Sparse Change Descriptor and Robust Discriminative Dictionary Learning.

Visual tracking via discriminative sparse similarity map.

Robust sparse canonical correlation analysis.

Supervised block sparse dictionary learning for simultaneous clustering and classification in computational anatomy.

Supervised Discriminative Group Sparse Representation for Mild Cognitive Impairment Diagnosis.

Correction: Multiple Sparse Representations Classification.

Efficient nearest neighbors via robust sparse hashing.

Robust feature point matching with sparse model.

Robust Fringe Projection Profilometry via Sparse Representation.

Robust Exemplar Extraction Using Structured Sparse Coding.

Discriminative Bayesian Dictionary Learning for Classification.

Discriminative and Compact Coding for Robust Face Recognition.

Sparse extreme learning machine for classification.

Maxdenominator Reweighted Sparse Representation for Tumor Classification.

Discriminative embedded clustering: a framework for grouping high-dimensional data.

Half-quadratic-based iterative minimization for robust sparse representation.

Robust brain parcellation using sparse representation on resting-state fMRI.

Online Hierarchical Sparse Representation of Multifeature for Robust Object Tracking.