IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 6, JUNE 2014

2557

Combining LBP Difference and Feature Correlation for Texture Description Xiaopeng Hong, Member, IEEE, Guoying Zhao, Senior Member, IEEE, Matti Pietikäinen, Fellow, IEEE, and Xilin Chen, Senior Member, IEEE

Abstract— Effective characterization of texture images requires exploiting multiple visual cues from the image appearance. The local binary pattern (LBP) and its variants achieve great success in texture description. However, because the LBP(-like) feature is an index of discrete patterns rather than a numerical feature, it is difficult to combine the LBP(-like) feature with other discriminative ones by a compact descriptor. To overcome the problem derived from the nonnumerical constraint of the LBP, this paper proposes a numerical variant accordingly, named the LBP difference (LBPD). The LBPD characterizes the extent to which one LBP varies from the average local structure of an image region of interest. It is simple, rotation invariant, and computationally efficient. To achieve enhanced performance, we combine the LBPD with other discriminative cues by a covariance matrix. The proposed descriptor, termed the covariance and LBPD descriptor (COV-LBPD), is able to capture the intrinsic correlation between the LBPD and other features in a compact manner. Experimental results show that the COV-LBPD achieves promising results on publicly available data sets. Index Terms— Feature extraction, image descriptors, image texture analysis, covariance matrix, local binary pattern.

I. I NTRODUCTION

E

FFECTIVE general-purpose texture classification in images has been an important topic of interest in the past decades, since it can be widely used in material surface inspection, medical imaging, object recognition, scene recognition, and so forth. Despite its importance, its main challenges are twofold: On one hand, large intra-class divergence in appearance, such as illumination, color, rotation, and scale, makes it extremely difficult to model the texture images of the same class; On the other hand, the wide range of various texture classes increases the difficulty of distinguishing them. To deal with these challenges, a number of texture representations

Manuscript received November 19, 2013; revised February 18, 2014; accepted March 25, 2014. Date of publication April 10, 2014; date of current version May 7, 2014. This work was supported in part by the Academy of Finland and Infotech Oulu and in part by the FiDiPro Program of Tekes. The work of X. Chen was supported by the Natural Science Foundation of China under Contract 61025010 and Contract 61390511. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Xin Li. X. Hong, G. Zhao, and M. Pietikäinen are with the Department of Computer Science and Engineering, University of Oulu, Oulu 90570, Finland (e-mail: [email protected]; [email protected]; [email protected]). X. Chen is with the Key Laboratory of Intelligent Information Processing, Institute of Computing Technology Chinese Academy of Sciences, Beijing 100190, China, and also with the Department of Computer Science and Engineering, University of Oulu, Oulu 90570, Finland (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2316640

[1]–[14] have been proposed. Comprehensive studies on some existing descriptors are provided in [10] and [15]. The local binary pattern as well as its variants has been successfully applied to a wide variety of tasks, such as texture classification [16]–[22], face analysis [23], and visual speech recognition [20], [21]. The LBP encodes the co-occurrence of neighboring pixel comparisons within a local area. It is simple, computationally efficient, and robust against illumination changes. The frequency histogram [16], [17] of LBPs is usually adopted as the appearance description for an image region1 of interest. To achieve enhanced performance, researchers develop approaches to fuse the LBP with other visual cues. One intuitive way is to estimate their joint probability distribution. For instance, Ojala et al. build multi-dimensional histograms to combine the LBP histogram and the local contrast [16]. Guo et al. jointly combine three components of the completed LBPs (CLBP) [17]. However, in this case, a considerable dimension, saying exponentially increasing with the number of elementary features is usually required to achieve an accurate estimation. To alleviate this problem, some researches use the concatenation of separate histograms of single features. For example, Guo et al. develop a hybrid version of CLBP [17]. Similar cases can also be found in [16], [20], [21], and [24]. The major problem is that the intrinsic connections such as the correlation between features are neglected, resulting in much information lost. In spite of these existing approaches, it is still difficult to combine the LBP(-like) feature with other informative ones in a compact manner. The problem is derived from the nonnumerical constraint of LBP: the LBP(-like) feature is an index of discrete patterns, rather than a numerical response. Therefore, it is not reasonable to aggregate the LBP(-like) feature and other numeric ones into a vector straightforwardly. Moreover, commonly used distance metrics on Euclidean spaces fail to measure the distance between two non-numerical features precisely. Furthermore, popular dimension reduction techniques cannot be directly applied to the LBP family in order for a final representation. To overcome the problem coming from the non-numerical constraint, this paper proposes a numerical variant of LBP, named the LBP Difference (LBPD). We define the mean LBP 1 In this paper, the term ‘local area’ refers to a small image area (e.g. with a size of 3 × 3) including a central pixel and its neighboring pixels while the one ‘image region’ stands for a patch of an image, which probably consists of multiple local areas.

1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2558

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 6, JUNE 2014

to represent the average pattern of a given image region, and then employ the proposed LBPD to reflect the difference between one specific local pattern and the average one. To achieve enhanced performance, we propose a general approach to fuse the LBPD and other features. In specific, the LBPD features as well as other numerical features are computed at every pixel inside the image region. The pixel-wise feature responses are encapsulated straightforwardly by a tensor. Finally, we use a covariance matrix [25] to summarize the intrinsic correlation between those features in the tensor. The proposed descriptor, termed the COVariance and Local Binary Pattern Difference descriptor (COV-LBPD), combines the LBPD with additional cues in a compact manner. Experimental results show that the COV-LBPD achieves promising results on publicly available datasets. The remaining sections are structured as follows: Section II reviews the relevant studies. Section III proposes the LBPD features. Afterwards, LBPDs are combined with other informative features to form the COV-LBPD descriptor in Section IV. Finally, Section V reports experimental results on public datasets. Sections VI and VII provide corresponding discussions and conclude the paper respectively. II. R ELATED W ORK Given an image region of interest I, regarding multiple elementary features extracted at each pixel as random variables, the covariance matrix (CovM) descriptor characterizes their second order statistics inside I. The usage of multiple informative features ensures the discriminative power of CovM. For instance, Wu and Nevatia show that CovM has the best discriminative power for human detection, among the descriptors under comparison including the histogram of oriented gradients and the Edgelet [26]. Generally, the CovM based descriptors have a main advantage over the histogram based ones: CovMs are usually more compact than histograms when taking into account the same elementary features. Unlike the multi-dimensional histograms, which explicitly estimate the joint probability distribution with a considerable dimension, the CovM is a second-order symmetric matrix, of which the element in the position of (i, j ) is a statistic namely the covariance between the i -th and j -th random variables. The dimension of CovMs thus grows quadratically with the number of features. For example, considering the symmetry, the intrinsic dimension to store a CovM covering the RGB color channels is only (3 × (3 + 1) /2) = 6. For another example, Tuzel et al. show that 5 × 5 CovMs whose intrinsic dimension is 15 are capable of achieving promising results on the Brodatz texture dataset [25]. More importantly, the relatively moderate increase in dimensionality makes the CovM feasible to integrate multiple features. As a consequence, the covariance matrix has been widely applied to many applications, such as texture classification [25], object detection and tracking [27], human detection [26], [28], and medical image analysis [29]. More details of the framework of CovMs are provided in [28], [30], and [31]. There are several approaches related to the idea of combining LBP and CovM [32]–[34]. Some decode the LBP

into several measurements [32], [33] and quantize each one separately to calculate the CovM. However, multiple variables are required to fully reflect one single LBP. Others [34] encapsulate the LBP indices directly into the feature vector for CovMs. However, as the elementary features for CovMs are assumed to be numerical whereas the LBPs are not, it usually leads to unstable statistics. In contrast, as an extension of our preliminary work [35], this paper proposes the LBPD to reflect the variations of LBPs numerically. The proposed feature can be used for any central moment and is therefore inherently appropriate to be the elementary feature for CovMs. III. LBP D IFFERENCE ON E UCLIDEAN S PACE A. Local Binary Patterns The local binary pattern compares a pixel to its neighboring ones within a local area. In specific, the neighbors, whose feature responses exceed the central one’s are labeled as ‘1’, otherwise as ‘0’. The co-occurrence of the comparison results is subsequently recorded by a string of binary bits. Afterwards, weights coming from a geometric sequence which has a common ratio of 2 are assigned to the bits according to their indices in strings. The binary string with its bits weighted is consequently transformed into a decimal valued index (i.e. the LBP feature response). Formally, given an image region I, the LBP of the pixel xc with respect to the channel φ, is calculated in its general form as follows [16]:  P−1 u(φ(x p ) − φ(xc ))2 p , (1) fLBPP,R (xc , φ) = p=0

  where x p for p = 0, . . . , P − 1 are the P neighboring pixels of a radius R surrounding xc ; the function φ can be any numerical map of images, such as the intensity [16]–[20], the color channels [24], and the filter responses [36]; and u(x) is the step function with u(x) = 1 if x ≥ 0 and u(x) = 0 for otherwise. One LBP calculated by Eq. (1) stands for a particular structure of local areas concerning the center xc as well as its P neighbors. B. Covariance Matrices Covariance is a statistic indicating the extent to which two random variables co-vary. Multiple (e.g. d) elementary features extracted from each pixel x within an image region I are usually arranged in a d-dimensional feature vector f(x). Afterwards, a covariance matrix summarizes the d 2 covariances between any two of the d features [25]:  (f(x) − μ)(f(x) − μ)T , (2) Mcov (I) = c x∈I

where μ is the mean of {f(x)}x∈I and c is a normalization factor. The quadratic form on the right side ensures that CovM is symmetric and positive semi-definite. In practice, CovM is usually normalised to a correlation matrix (CorM) [28]: 1

1

Mcor (I) = − 2 Mcov (I)− 2 ,

(3)

where  is a diagonal matrix including all diagonal entries of Mcov . Obviously CorM is one particular kind of CovMs, whose diagonal elements are constantly equal to 1.

HONG et al.: COMBINING LBPD AND FEATURE CORRELATION

2559

Fig. 1. Neighbors having a unit distance to the LBP ‘128’: (a) the local bit order used in this paper; (b) neighbors under Euclidean distance; and (c) neighbors under Hamming distance. The numbers inside the LBP circles are the LBP indices encoded by decimal numbers, and the strings nearby are the corresponding binary numbers. The numbers in orange in parentheses and the ones in green attached to the dash line denote the Euclidean distances and Hamming distances between two LBPs respectively.

The calculation of CorMs by Eq. (3) is equivalent to the following procedure: firstly standardizing each of the d elementary features within I to be zero-mean and unifiedstandard deviation, and then calculating the covariance matrix by Eq. (2). It is reported that the CorM achieves enhanced robustness over the CovM, since the former disregards the standard deviations of features [28]. Moreover, the intrinsic dimension of the covariance matrix by Eq. (2) is d×(d + 1) /2, while the one of the correlation matrix by Eq. (3) is further reduced to d × (d − 1) /2. As a result, unless otherwise specified, CovM hereinafter also refers to the correlation matrix. Euclidean distance can not measure the proximity of CovMs precisely, as CovMs do not lie on Euclidean spaces [25], [28]. To address this problem, Riemannian manifold based metrics are employed. One common one is the affineinvariant metric [25]:  d ln2 (λi (M1 , M2 )), (4) d Riem (M1 , M2 ) = i=1

where {λi (M1 , M2 )}i=1,...,d are the d generalized eigen-values of two positive definite matrices M1 and M2 . C. Problem of Direct Combination Between CovMs and LBPs In general, any feature can be employed as one entry of the feature vectors to calculate CovMs by Eqs. (2-3) once it is in the set of real numbers R. Unfortunately, the LBP(-like) feature defined by Eq. (1) is an index of patterns and not on vector spaces, so that it is not theoretically reasonable to utilize the LBP(-like) feature to calculate CovMs in a forthright manner. We analyze this problem in details as follows. Firstly, the Euclidean distance, which is implied by the arithmetic subtraction of two numerical values on the right side of Eq. (2) fails to point out the precise relation of patterns. Obviously, the maximum and the minimum Euclidean distance between two different LBPs are 255 and 1 respectively. In general case, when there is a unit distance between two points, it usually indicates they are semantically similar. However, as illustrated in Figure 1(b), it is not always the case for LBPs. On one side, considering two LBPs v 1 = (127)10 or

(01111111)2 and v 2 = (128)10 or (10000000)2, even though none of the digits in corresponding bits of v 1 are the same with those of v 2 , the Euclidean distance between v 1 and v 2 is 1. On the other side, the distance between v 2 and v 3 = (129)10 or (10000001)2, between which there is only one bit different is 1 as well. Thereby it is not clear whether two LBPs are similar or not even if there is a unit Euclidean distance between them. Secondly, the responses of LBPs rely on the subjectively assigned order of bits, i.e., the way in which the bits are assigned with the power weights from the most one 2( P−1) to the least one 20 = 1 respectively. Statistics such as the covariance between the LBP and other features are inherently dependent on the order of bits. As a result, there is a potential risk for the covariance between the LBP and other features of being affected by noise and abrupt changes, especially in the bits with large weights. In order to overcome the aforementioned problems, the Hamming distance is adopted to measure two LBPs [37]–[41]. Instead of considering the input as numerical numbers, the Hamming distance regards the input as binary strings and then aggregates the bitwise differences. It is formally defined as the count of bits that are different between two binary strings a and b of the same length P:  P−1 (5) d H (a, b) = (a ( p) ⊕ b ( p)) , p=0

where a( p) and b( p) are the p-th bit of a and b respectively for p = 0, . . . , P − 1; and ⊕(·) is the exclusive or operator. As illustrated in Figure 1(b), there is consistently a one-bit difference between v 2 and any LBP which has a unit Hamming distance to v 2 . It is obvious that the Hamming distance reflects the topology of the space of LBPs more precisely than the Euclidean distance, when comparing the examples in Figure 1(a) and (b). In addition, since the Hamming distance is calculated bit by bit regardless of the weights of bits in Eq. (1), only a fraction of Hamming counts are affected by abrupt changes in images in most of the cases. Hence it is believed that measuring the proximity of LBPs by the Hamming distance is more robust to the noise than by the Euclidean distance.

2560

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 6, JUNE 2014

Fig. 2.

Illustration of LBP mean: (a) three LBPs; (b) the integer LBP mean m ˆ I ; and (c) the floating LBP mean m ˆ f.

The Hamming distance motivates to avoid the weighting in Eq. (1) and only preserve the co-occurrence of local comparisons. As a result, in this paper the local pattern is considered as a binary vector fˆLBP , each element of which corresponds to a particular bit of the ordinary LBP. In specific, the p-th bit of fˆLBP for p = 0, . . . , P − 1 is written as: fˆLBP ( p) = u(φ(x p ) − φ(xc )).

(6)

Afterwards, the Hamming distance between fˆa and fˆb can be expressed as follows: d H (fˆa , fˆb ) =

P−1 

 fˆa ( p) ⊕ fˆb ( p) .

(7)

p=0

Eq. (7) implies that the potential for extending the binary vectors to a more general case of floating points. It facilitates the proposal of our LBPD feature in the following subsections. D. LBP Mean It is obvious that the difference vectors between the feature vectors and their mean (abbreviated as difference vectors hereinafter) play a central role in calculating CovMs by Eqs. (2-3). It motivates to design a new class of LBP-like features for CovMs in accordance with the difference vectors. To this end, there are two problems to confront: Problem 1, how to define the mean of LBPs (abbreviated as ‘LBP mean’), which reflects the average local pattern within an image region? Problem 2, how to characterize the difference between a LBP and the LBP mean by a numerical feature? For Problem 1, we employ the Karcher mean [42] to define the LBP mean. The Karcher mean of a set of points is the point that minimizes the summation of distances (e.g. Hamming distances in this case) to all given points. More precisely, given a set of N LBPs denoted by S = {fˆ1 , . . . , fˆN }, the p-th element for p = 0, . . . , P − 1 of its Karcher mean m ˆ I is defined via a floor function ·:

 N ˆn ( p) f n=1 + 0.5 . (8) m I ( p) = N Obviously m ˆ I belongs to the set of the 2 P local binary patterns. If the constraint that LBP mean should be a LBP is relaxed, we obtain a more general definition of the LBP mean, ˆ f in the form which is denoted by m ˆ f . More specifically, m of floating-point vectors, can be computed by abolishing the floor function in Eq. (8): N fˆn (9) m ˆ f = n=1 . N

ˆ f of the three LBPs Figures 2(b) and (c) illustrate m ˆ I and m provided in Figure 2(a) respectively. For the integer mean in Eq. (8), m I ( p) = 1 means that the majority of LBPs under consideration have a value of ‘1’ in the p-th bit, and vice versa. In case of the floating mean m ˆ f , m f ( p) = a, where 0 ≤ a ≤ 1 indicates that (100a) % of the LBPs are ‘1’ in the ˆ f corresponding bit. Obviously both elements of m ˆ I and m indicate the statistics of the bits of LBPs. m ˆ I and m ˆ f thereby represent the average local pattern within a given image region. E. LBP Difference Given the LBP mean m ˆ defined in Eqs. (8-9), one can easily obtain the LBP difference vector through dˆ = (fˆ − m). ˆ For Problem 2, we have to encode dˆ on R1 without losing good distinguishing powers. At first, the magnitude of the expected features is expected to reflect how fˆ and m ˆ is different. To this end, motivated in part by Hamming distance in Eq. (5), we reach the norm of dˆ to aggregates the bitwise differences between fˆ and m. ˆ Specifically, let fˆ denote fˆLBP (x, φ) in Eq. (1) for clarity. The magnitude of the expected features is defined as follows: u (x, φ) = fˆ − m, ˆ f LBPD

(10)

where  ·  can be any type of norms defined in vector spaces, such as the L 1 norm and the L 2 norm. We name the feature defined above as the unsigned LBP Difference (LBPD), since its response is non-negative in accordance with the positivity property of norm. In order to gain supplementary discriminative information, we introduce a sign part and define the expected feature as follows: s (x, φ) = s(fˆ  − m) ˆ fˆ − m, ˆ f LBPD

(11)

where s(x) is the signum function with s(x) = 1 if x ≥ 0 and s(x) = −1 for otherwise. s(x) determines a binary relation of LBPs and makes the set of LBPs totally ordered. When the average bit of one LBP fˆ1 is above that of fˆ2 , the sign from fˆ1 to fˆ2 is positive, and vice versa. Take the integer LBP mean by Eq. (8) as an example. The sign of LBPD thus indicates whether one local pattern contains more bits of ‘1’ than that of the mean or not. As the one defined by Eq. (11) is the product of the sign part and the unsigned LBPD, it is named as the signed LBP Difference. Note that the norm used by both the magnitude and the sign avoids the responses of LBPD getting affected by the permutation of bits. The LBPD is therefore rotation invariant.

HONG et al.: COMBINING LBPD AND FEATURE CORRELATION

2561

Fig. 3. Different features extracted from the ‘Lena’ image: (a) the intensity; (b) the LBP in the intensity channel; (c) the unsigned LBP Difference; and (d) the signed LBP Difference. The last three ones are scaled for display.

Algorithm 1 Algorithm for LBPD Calculation Require: An image region I (x); given the number of neighboring pixels P of a radius R for LBPs and the feature channel φ as the parameters. Ensure: f LBPD (x, φ) 1: Calculating the feature responses φ (x) from I (x) 2: Getting the vectorized LBPs fˆLBPP,R (x, φ) by Eq. (6) 3: Obtaining the LBP mean by Eqs. (8) or (9) 4: return f LBPD (x, φ) by Eqs. (10) or (11)

Figure 3 illustrates the ‘Lena’ image and its feature images of the LBP, the unsigned LBPD, and the signed LBPD. It is observed that the three feature images are different in appearance. Moreover, the correlation between the unsigned LBPD and the intensity over all pixels in the image is 0.056; while the one between the signed LBPD and the intensity is −0.097. It suggests that both the unsigned and the signed LBPDs provide complementary information to the feature channel φ (intensity in this case), in which they are calculated. In summary, as two variants of LBPs, both the unsigned and the signed LBPDs reflect how a LBP is different to the LBP mean. More importantly, both of them are numerical so that they can be directly applied to calculate the CovM. Algorithm 1 summarizes the calculation of the LBPD. IV. C OVARIANCE M ATRIX W ITH LBPD Good discriminative power of CovMs to characterize the regions of interest is mainly derived from the multiple informative features used [25], [28]. It is reported that CovMs achieve promising results on the Brodatz dataset [25]. Only five elementary features, namely the intensity and the absolute values of derivatives (with respect to both x and y) to the first two orders are used to form the 5 × 5 covariance matrices descriptors in their case. However, it is not practical that all required information can be covered by these five features especially on challenging datasets, such as the KTH-TIPS2-a dataset [43]. It is therefore required to provide more discriminative features for CovMs to achieve competitive performance on challenging datasets. As the LBP defined by Eq. (1), the LBPD can also be calculated in any numerical feature channel φ. The set of LBPDs from multiple channels forms the ‘LBP Difference Set’ as illustrated in Figure 4. It together with the set of the regular

Fig. 4. Candidate feature sets of COV-LBPD. ‘Regular Features’ can be any elementary features used for CovMs in the literature and ‘LBP Difference Set’ refers to those calculated via Algorithm 1.

features {φ} provides an extended feature bank for CovMs. In this paper, the correlation between the LBPDs and other features are combined into a unified descriptor, termed the COVariance and Local Binary Pattern Difference descriptor (COV-LBPD). The introduction of LBPDs gives freedom to experiment with appropriate features. According to different sources of features, the existing elementary features for CovMs can be basically categorized into three classes: 1. physical quantities such as the intensity [25], 2. responses coming from digital filter convolutions such as the gradients [25], and 3. the elementary functions of the former two, such as the magnitude and the orientation [26], [27]. Obviously the LBPD provide information which is not covered by the existing features as shown by Figure 3. It is not surprising that the combination of LBPD and CovMs leads to performance improvement. V. E XPERIMENTS We conduct experiments to evaluate the LBPD feature and the COV-LBPD descriptor in the task of texture classification. In this section, unless specified otherwise each texture image is represented by a normalised COV-LBPD descriptor calculated by Eq. (3). In practice, there may exist correlation matrices which are not positive-definite. In this case, we add a small positive number (namely 10−4 ) to the diagonal entries of the correlation matrix. From our experiments in this Section, this violation never happen. For both the LBP and the LBPD, circularly symmetric neighboring pixels that are not exactly located at the pixels are obtained by bi-linear interpolations [16]. The parameter-free Nearest Neighbor classifier (NN) [43] is used

2562

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 6, JUNE 2014

Fig. 5. Evaluation of components for LBPD on Outex: (a-b) the signed LBPDs with L1 and L2 norm; (c-d) the unsigned LBPDs with L1 and L2 norm. The vertical and the horizontal axes stand for accuracy (%) and different combinations of radius and numbers of neighboring points respectively; (e) multiple feature channels analysis using accuracy (%) as the measurement. Both the two feature sets contain the LBPDs from three channels, where ‘f1’ for Set 1 and Set 2 are the signed LBPDs with parameters (φ = Mri , P = 8, and R = 1) and the signed LBPDs with parameters (Mri , 16, 2) respectively; (2) (2) ‘f2’ for Set 1 and Set 2 are the signed LBPDs with parameters (Mri , 16, 2) and the unsigned ones with parameters (Mri , 16, 2) respectively.

unless otherwise noted, since our focus is on representation. For the COV-LBPD and CovM based representations, NN searches for their nearest neighbors according to the affineinvariant metric by Eq. (4). For the histogram based ones, the chi-squared distance [16] is used for proximity measurement.

TABLE I A CCURACY OF D IFFERENT C HANNELS U NDER D IFFERENT R ESOLUTIONS O UTEX D ATASET. T HE A BBREVIATIONS s1, s2, AND s3 S TAND FOR (8,1), (16,2), (24,3) R ESPECTIVELY

ON THE

A. Key Factors of LBPDs and COV-LBPD In this subsection, we analyze different components of the proposed LBPD and evaluate the contribution of the LBPD to the COV-LBPD descriptor. Experiments are carried out on the Outex_TC_00012 dataset [16] (abbreviated as ’Outex’ hereinafter), which is widely used for rotation and illumination invariant texture classification. It contains 9120 images of 24 texture classes under various rotation and illumination conditions. According to the protocol [16], 20 of the 380 images per texture class are selected for training, and the remainders are for testing. The correlations in the CorM measure how much two random variables change together. Hence the three features namely the intensity I and two rotation invariant magnitudes Mri , and Mri2 are provided to form the regular set to support the evaluation. Specifically, 2 2 2 2 Mri = Ix + I y + Idl + Ial and (2)

Mri =



2

2



2



2

Ix + I y + Idl + Ial .

The differential operators I and I stand for the first and the second derivatives respectively. Moreover, the symbols x, y, dl, and al denote the directions of the horizontal line, the vertical line, the diagonal line y = x, and the anti-diagonal line y = −x respectively. We comprehensively assess the choices of the LBP mean, the norms used in the magnitude part and the sign part. In addition, we evaluate different parameters of the LBPD including the feature channel φ, the number of neighbors P, and the radius R. To examine the influence of these factors, only one LBPD is selected each time. The three regular features together with the selected LBPD are used to form a 4 by 4 COV-LBPD descriptors. Accuracy is used as the performance measurement. The performance of the LBPDs utilizing the floating mean ˆ I is illustrated by real lines and m ˆ f and the integer mean m

dash lines respectively in Figure 5(a) to (d). It can be observed that the LBPDs using m ˆ f (with an average accuracy of 63.5%) generally outperform those using m ˆ I (with an average accuracy of 61.5%). This is mainly because m ˆ f reflects the average local structures more precisely than m ˆ I. m ˆ f indicates the 0-1 distributions of bits, rather than only the majority as m ˆ I does. Thereby the Hamming distance between one specific pattern fˆ and m ˆ f leads to a finer divergence, compared with the one between fˆ and m ˆ I . In the remaining part of this section, we will focus on m ˆ f. The influence of the norm in the magnitude part can be observed by comparing Figures 5(a) and (b) in case of the signed LBPD, as well as (c) and (d) for the unsigned LBPD. One can find that there is no substantial difference in performance between the LBPDs with the L 1 and the L 2 norm. The L 1 norm is preferred because of its efficiency. Moreover, it slightly outperforms the L 2 norm in most of the cases. Comparing the results in Figures 5(a) and (c), it can be reached that it depends on the channel selected whether the signed LBPD or the unsigned LBPD is better. For the LBPDs in the channels of intensity and Mri , the signed ones are (2) better, whereas in the channel of Mri , the unsigned LBPD is better. A similar case can be observed by comparing Figures 5(b) and (d). 1) Multiresolution Analysis and Multiple Feature Fusion: As illustrated in Figure 5(a) to (d), the selections of the channel φ, the number of neighbors P, and the radius R are important to the performance. By altering φ, P, and R, we can realize features for different channels, local structures, and spatial resolutions.

HONG et al.: COMBINING LBPD AND FEATURE CORRELATION

2563

Fig. 6. Illustration of the effectiveness of combining the LBPD and other features on the Outex dataset: (a-c) Parts of the confusion matrices of the CovM3 , LBPriu2 8,1 , and COV-LBPD4 respectively; (d) the input test image #787; and (e-g) the nearest neighbours (NN) found from the ‘known’ samples by the three descriptors respectively.

Multi-resolution analysis can be accomplished by combining the responses provided by multiple features of different (P, R). Experimental results are listed in Table I. It can be found that the accuracy of the COV-LBPDs with the sign LBPDs in I and Mri is generally improved by using LBPDs of multiple resolutions. Moreover, both the signed and the (2) unsigned LBPDs in Mri are evaluated as the single unsigned LBPD gets better accuracy than the signed one in the channel (2) of Mri as shown in Figure 5. From Table I, one can observe (2) that all COV-LBPDs with multi-resolution LBPDs in Mri outperform the corresponding single-resolution ones as well independently of its sign part. Fusion of the LBPDs from multiple feature channels is vital for the COV-LBPD. As illustrated in the bottom of Figure 5(e), we examine two LBPD sets, each of which consists of the LBPDs on three basic features with different parameters respectively. More specifically, Set 1 uses the (2)  default settings of φ = I, Mri , Mri , P = 8, and R = 1, whereas Set 2 uses the parameter settings, each of which performs best in the corresponding channel respectively. It can be observed that both combinations improve the accuracy in Figure 5(e). Therefore, the feasibility of the proposed COV-LBPD is strongly implied. 2) Effectiveness of LBPD: To assess the effectiveness of the COV-LBPD descriptor, we compare the 3 × 3 CovM (CovM3 ) using the three regular rotation invariant features namely I , (2) Mri , and Mri ; and the 4 × 4 COV-LBPD using the signed LBPD with the setting of (I, 8, 1), denoted by COV-LBPD4 . The CovM3 only obtains an accuracy of 46.0%, while the COV-LBPD4 achieves an accuracy of 75.9%, leading to an increase by about 30%. Figures 6(a) to (c) provide the confusion matrices of the CovM3 , the histogram of the single scale, rotation invariant and uniform LBP (LBPriu2 8,1 ), and the COV-LBPD4 respectively. Moreover, given the test image as shown in Figure 6(d), Figures 6(e) to (g) illustrates the most similar samples found by the three descriptors respectively.

The CovM3 fails to identify different local structures between the image #43 and the test image. As a result, the image #43 is incorrectly considered as the most similar sample of the test image. The LBPriu2 8,1 cannot distinguish the global arrangement of local structures between the image #305 and the test image and thus also mis-classifies the test image. In contrast, despite an apparent change in orientation between the images, the proposed COV-LBPD4 correctly finds the image with the same class label as the nearest neighbor of the test image. The combination of the LBPD and the three regular features, captures not only the local patterns but also the global arrangement of them. Hence the COV-LBPD4 is able to achieve enhanced performance. In addition, we compare the COV-LBPD4 with the approach regarding the LBP as a numerical feature for CovMs as in [34] (denoted by CovM4 ), though the latter is not believed to be a reasonable combination. It is not surprising that our COV-LBPD4 also defeats this 4 × 4 CovM4 , which only gets an accuracy of 63.11%. Though the typical LBP is capable of characterizing the appearance of a local area, using the Euclidean distance embodied in Eqs. (2-3) fails to reflect the diversity of LBPs precisely. It is therefore difficult to get stable statistics for CovM4 . On the contrary, the proposed numerical feature is appropriate for statistics calculation. Consequently the combination between the LBPD and other regular features in the COV-LBPD improves the performance of the CovMs based approaches substantially. 3) Robustness to Noise: We test the robustness to noise of the COV-LBPD. We compare the COV-LBPD4 , the CovM3, and the LBPriu2 8,1 , with added white Gaussian noise. The results are illustrated in Figure 7. Here, the x axis is the signal-to-noise ratio (SNR), which is calculated by SNRdB = 10 log10 (var (image) /var (noise)). As shown in Figure 7, the accuracy of LBPriu2 8,1 decreases substantially when the SNR reduces, while CovM3 is robust to abrupt noises as there is only 2% decrease in performance even when SNR is −2, i.e., the energy of signals is only 63% of that of the noise. Our

2564

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 6, JUNE 2014

TABLE II P ERFORMANCE C OMPARISON ON O UTEX

Fig. 7.

Performance with white Gaussian noise on outex.

COV-LBPD4 inherits the robustness from the CovM and the discriminative powers both from the CovM and the LBPD, thus outperforms CovM3 and LBPriu2 8,1 at all SNR. The accuracy of the COV-LBPD4 at the SNR of −5 is still 49.7%, much higher than that of CovM3 and LBPriu2 8,1 . Note that a SNR of −5 indicates a noise corrupted image where the energy of signals is less than one third of that of the noise. B. Comparative Evaluation in Rotation Invariant Texture Classification Outex is a specific texture classification dataset for evaluating rotation invariance. Hence the features which are not robust to rotation changes do not make sense on this dataset. To achieve competitive performance, we assess the rotation invariant features used in the previous subsection and those in the literature [6], [7], and [44]. According to their accuracy, we fix a set of 17 features for building our COV-LBPD descriptor. (2) More specifically, it contains: 1-4) I , Mri , Mri , and the f n-vs-c , namely the neighbour-versus-center filter responses [2] with a filter support (SP) of 3; 5-7) the signed LBPD with parameters of (φ = I, (P, R) ∈ {(8, 1), (16, 2), (24, 3)}); 8-9) the signed LBPD with parameter settings of (Mri , 16, 2) and the unsigned one with (Mri(2) , 16, 2); 10-11) similar to 8-9) but the rotation invariant magnitudes are calculated with a SP of 5; 12) the default LBPD2 on fb , a binary variable indicating whether the pixel intensity is beyond the intensity mean of an image or not.; 13-14) the default LBPDs on f n-vs-c with SPs of {3, 5} respectively; 15) the default LBPD on the Laplacian filter response with a SP of 3; 16) the default LBPD on the Gaussian filter response with a SP of 7 and a sigma of 1; 17) the Schmid filter response [44] with (σ, τ ) = (2, 1). Table II lists the results of our approach and the state of the art on Outex, including the rotation invariant VZ-MR8 descriptor [9], the VZ-joint descriptor of an 11 × 11 patch [8], the three-scale LBP descriptor [16] and the three-scale rotation invariant and uniform LBP (LBPriu2 ) [16], the three-scale Dominant LBP (DLBP) [19], the LBP variance (LBPV) [18], the three-scale, rotation invariant completed LBP (CLBPriu2 ) [17], and the multi-scale LBP Histogram Fourier feature (LBP-HF) descriptors [22]. The accuracy of the other approaches are from the original publications. It is observed the COV-LBPD achieves an accuracy of 96.7%, which is the best among all approaches under consideration in Table II. 2 ‘The default LBPD’ refers to the signed LBPD with (P, R) = (8, 1).

C. Comparative Evaluation in Material Categorization 1) Data Set: We evaluate the COV-LBPD descriptor on two material categorization databases. First, the widely used KTH-TIPS database [45] contains images of 10 materials at nine scales. Images in each of the nine scales are captured under three poses and three illumination conditions, resulting in a total of 9 images per scale, and hence 81 images per material. We follow the experimental setup by Zhang et al. [10], in which 40 images of each material are randomly selected for training while the 41 remaining images are used for testing. Second, the challenging KTH-TIPS2-a dataset [43] contains four samples of eleven different materials, each at nine different scales and twelve different lightings and poses, totalling 4608 images. We follow the setup by Caputo et al. [43]. Three out of the four samples for each material are available for training, and the images of the remaining sample are used for testing. In order to obtain credible results, random experiments are carried out one hundred times and the average accuracy is reported as the final result for each dataset. 2) Feature Set: Direction is a very important attribute to distinguish the classes for several classes on KTH-TIPS and KTH-TIPS2a. Thus we empirically evaluate the combinations of hundreds of elementary features that have been successfully applied in material categorization [9], [10], [25], and choose a set which contains 21 features as follows: 1-3) the RGB color channels; 4-9) the absolute values of the 1st derivatives in x and y directions with filter supports of 3, 5, and 7 respectively; 10-15) the absolute values of the 2nd derivatives in x and y direction with filter supports of 3, 5, and 7 respectively; 16-18) the magnitudes of the 1st derivatives with filter supports of 3, 5, and 7 respectively; 19) the Laplacian filter responses with a filter support of 3; 20) f b as in the previous subsection; and 21) the signed LBPD with P = 8 and R = 1 calculated in the intensity channel. 3) Comparative Results: Different texture classification approaches are used for comparative study. Table III lists the accuracy of the state-of-the-art approaches. Besides VZ-MR8 [9], VZ-joint [8], LBPu2 [16] and the LBPriu2 [16], CLBPriu2 [17], which have been introduced in the experiments on Outex, we also compare our approach with the three-scale Weber local descriptor (WLD) [2], the three-scale Basic Image Features-columns histogram (BIF) [3], the three-scale uniform Local Ternary Pattern (LTPu2 ) and the three-scale rotation invariant and uniform local ternary pattern (LTPriu2 ) [23],

HONG et al.: COMBINING LBPD AND FEATURE CORRELATION

TABLE III P ERFORMANCE ON KTH-TIPS AND KTH-TIPS2-a

2565

TABLE IV P ERFORMANCE C OMPARISON ON 2D HeLa

TABLE V P ERFORMANCE C OMPARISON ON PAP -S MEAR 2005

the pyramid Local Energy Pattern (LEP) [36], and the covariance matrix of the five elementary features (CovM5) as used in [25]. We carried out related experiments of CovM5 [25], while the accuracy of the other approaches is either from the original publications or from the comparative evaluation [10], [36]. According to our best knowledge, the current best result on the KTH-TIPS dataset is 99.29% achieved by the approach of Sorted Random Projections (SRP) [46]. However this result can only be achieved by using support vector machine. When using the NN classifier as in this paper, the accuracy of SRP is 97.71%. It is observed the COV-LBPD achieves an accuracy of 98.00%, which outperforms most of the approaches under consideration and with a marginal distance to the best one using the NN classifier (98.50%). We also evaluate our approach on the KTH-TIPS2-a database. It can be found that COV-LBPD achieves the best accuracy of 74.86% among those methods compared in Table III. In addition, compared with the conventional covariance matrix CovM5 [25], the COV-LBPD descriptor obtains a substantial increase in accuracy by about 17%. As these two datasets, especially the KTH-TIPS2-a dataset are challenging, the promising performance of COV-LBPD proves the superiority of our method. It is worth noting that there is not any learning stage required to build the proposed COV-LBPD descriptor. In contrast, most of the state-of-the-art approaches such as LEP and SRP rely on an extra learning stage to form the description from the training images. D. Comparative Evaluation in Medical Image Analysis To show that the COV-LBPD is general for appearance based image representations, we apply it to the task of medical image analysis, which is highly relevant to texture classification. The 2D-HeLa [47] and the Pap-smear database [47], which are benchmarks for biomedical diagnosis, are used for evaluation. For fair comparison, we utilize the contrast-limited adaptive histogram equalization and adopt the Radial basis function kernel SVM as used in [48] and [49]. The parameters C and γ are determined by grid search. The feature set for COV-LBPD is the same with that in subsection V-C. All the

experimental results are obtained using 5-fold cross validation. For each dataset, images of each class are randomly split into five parts, with four for training and the remaining one for testing. 1) Protein Cellular Classification on the 2D HeLa Dataset: The aim of 2D-HeLa is to automatically identify sub-cellular organelles from fluorescence microscope images. It contains ten classes of HeLa cells which are stained with various organelle-specific fluorescent dyes. Each class contains various (from 70 to 98) images, resulting in 862 images in total. All images on 2D-Hela are with a resolution of 382 × 382. As the movements of HeLa cells are usually non-rigid, there are large variations in appearance of the images. The proposed approach is compared with the state-of-theart approaches on this dataset, including the local phase quantization (LPQ) descriptor [48], the EQP descriptor [48], the Dominant LBP (DLBP) and LTP (DLTP) approaches [49], the Multi-Resolution Decomposition (MRD) approach [50], and the discriminative Completed LBP (disCLBP) [51]. We also evaluate the original CovM5 [25]. The average accuracy over 5-fold cross validation is used for evaluation. Results of different approaches are listed in Table IV. It can be observed that the proposed approach achieves the highest classification accuracy of 95.8% among all approaches under comparison. 2) Abnormal Smear Cell Verification on the Pap-Smear Dataset: The Pap-smear 2005 dataset is designed to automatically classify smear cells into two categories: normal and abnormal. The dataset contains 917 images belonging to seven different classes, which are further categorized into two super-classes (that is, normal and abnormal). Each cell image is manually labeled by two experts, with difficult samples inspected by a specialist. The performance is evaluated by the area under the Receiver Operating Characteristic curve (AUC). Table V reports the average AUC over 5-fold cross validation achieved by different approaches. As observed, the proposed approach achieves an average AUC of 94.6%, higher than most of the approaches under comparison. The accuracy achieved by disCLBP [51] is higher than the COV-LBPD. However, it uses an extra

2566

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 6, JUNE 2014

learning stage to generate the most discriminative local patterns, while the proposed COV-LBPD is free of this learning stage. VI. D ISCUSSIONS A. Charactersitics of LBPD The proposed LBPD feature encodes the amount of variation that one LBP is away from the average local pattern, i.e. the LBP mean within a certain image region. The LBPD possesses good robustness. The LBPD is defined using the sign of comparisons between neighboring pixels as in Eq. (6), rather than the intensity of pixels. It is therefore robust to monotonic gray-scale changes caused, for example, by illumination variations. In addition, it is obvious that the permutation of bits is affected by the rotation of the images. In case of the LBP, weighting the bits using the predetermined power weights in Eq. (1) results in an index of codes that is totally different to the one before rotation. Hence additional efforts are required to achieve rotation invariance, as in the design of the rotation invariant LBP and the rotation invariant and uniform one [16]. In contrast, the proposed LBPD is inherently rotation invariant, since the norm used in Eqs. (10-11) ensures that the responses are independent of the permutation of bits. The LBPD is computationally efficient. Only arithmetic operations in vector spaces are required for the LBP mean and the LBPD. Specifically, given the LBPs of N pixels within an image region by Eq. (1), the time complexity of computing all LBPDs within the image region is O(P N), proportional to N having a ratio of the number of neighboring pixels P. The main difference between the proposed LBPD and the conventional LBP lies in that the LBPD reflects the diversity of the local co-occurrence, rather than represents it directly as the LBP does. Moreover, the LBPD is numerically valued whereas the LBP is an index of discrete patterns. Obviously, the numerical attribute makes the LBPD attractive. It is not only appropriate for the CovM as pointed out in this paper, but also theoretically suitable for any other statistical image descriptor, such as the region moments [52] or the histograms. Actually, our LBPD is a general framework which covers most of the existing variants of LBPs. As long as the features are organized in the same way to the vectorized LBPs in Eq. (6), they can be substituted into Eqs. (8) or (9) to get the LBP mean and then calculate the LBPD. More specifically, this principle directly holds for those variants of LBPs which vary the quantization level of pixel comparisons, the local neighborhood, or the number of pixel pairs, such as the multi-resolution LBP [16] and the LTP [23]. For the LBPlike features using particular mappings, such as the uniform LBP [16], the rotation invariant LBP [16], as well as the dominant LBP [18], our LBPD can be computed using the mapped features accordingly. For other variants which concatenate the LBP histograms of several feature channels, e.g. the CLBP [17], the LBPD can be computed in corresponding feature channels separately and multiple-channel fusion can be accomplished by Eqs. (2-3).

Section V-A.1 provides promising examples of calculating LBPDs in case of multi-resolution and multiple-channel fusion. Nevertheless, in consideration that the CovM provides an efficient way of integrating multiple cues, the optional elementary features for the COV-LBPD do not have to be limited to those used in the variants of LBPs. B. Charactersitics of COV-LBPD The proposed COV-LBPD descriptor has high discriminative power through blending multiple features. Moreover, it has good robustness. In specific, the robustness is derived from the following aspects: firstly, it is robust to abrupt noise when the number of pixels in an image region is large enough to form stable statistics; secondly, the robustness can be enhanced if all selected elementary features have the consistent robustness. For instance, when only features insensitive to rotation are selected as in Section V-A), the resulting COV-LBPD descriptor is robust to rotation changes; thirdly, normalization by Eq. (3) usually leads to improved robustness by eliminating the affects coming from different variances of features. In addition, the COV-LBPD is a compact descriptor. The low dimensionality comes from the correlation matrices. Specifically, the dimension of the COV-LBPD by Eq. (3) is d × (d − 1) /2. In case of the COV-LBPD using the 21 features as in the experiments, the intrinsic dimension of the descriptor is only 210. Low dimensionality generally leads to two benefits: low storage requirements and reduced computational complexity. In contrast, commonly used histogram based descriptors always have much higher dimension, such as 857 for the threescale uniform LBP descriptor [16], 2200 for CLBPriu2 [17], 2880 for the three-scale WLD [2], and 1296 for the multi-scale BIF-columns histogram [3]. Notwithstanding the three-scale rotation invariant and uniform LBP descriptor LBPriu2 [16] has a lower dimension of 54, its performance is much inferior to ours. The dimension of the COV-LBPD is expected to be lower in case that feature reduction algorithms as in [46] or feature selection approaches as in [51] can be used. Finally, from the viewpoint of computational costs, the COV-LBPD is desirable as well. Firstly, leveraging the integral representation [25], [28] leads to efficient computation for rectangular image regions (e.g., the texture images in this paper). Secondly, fast approximation [53], [54] of the Riemannian metric can be used for speeding up the computation of the distance between two COV-LBPDs. In practice, we evaluate the running time in the experiments on the KTH-TIPS2a dataset. We only use the regular calculation and do not employ any aforementioned fast computation techniques. For the 4608 texture images, the computation of the COV-LBPD descriptors based on the 21 selected elementary features takes about 420 seconds, resulting in 11 frames per second. Moreover, the running time for the about 10 million comparisons between any two of the COV-LBPD descriptors is about 2,200 seconds. Correspondingly, the speed is 0.2 millisecond per comparison. As all programs are implemented in the platform of Matlab v7.12 on a common laptop with Intel ‘i7’ CPU, the efficiency is expected to be further improved if we turn to C based implementations.

HONG et al.: COMBINING LBPD AND FEATURE CORRELATION

The proposed approach is therefore appropriate for real time applications. VII. C ONCLUSION This paper addresses the ‘non-numerical’ problem of the LBP by proposing a numerical variant, i.e. the local binary pattern difference. To obtain enhanced performance, we propose the COV-LBPD descriptor to combine the LBPD with other features within a covariance matrix and apply it to the task of texture classification. Experimental results show that the COV-LBPD achieves highly competitive results on five publicly available datasets. In future work we plan to apply the COV-LBPD to other challenging computer vision applications, such as the problems of object detection and object tracking, which it is reported that the traditional LBP does not suit well [24]. R EFERENCES [1] F. Bianconi, A. Fernández, E. González, D. Caride, and A. Calviño, “Rotation-invariant colour texture classification through multilayer CCR,” Pattern Recognit. Lett., vol. 30, no. 8, pp. 765–773, 2009. [2] J. Chen et al., “WLD: A robust local image descriptor,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1705–1720, Sep. 2010. [3] M. Crosier and L. D. Griffin, “Using basic image features for texture classification,” Int. J. Comput. Vis., vol. 88, no. 3, pp. 447–460, 2010. [4] R. Haralick, “Statistical and structural approaches to texture,” Proc. IEEE, vol. 67, no. 5, pp. 786–804, May 1979. [5] H. Ji, X. Yang, H. Ling, and Y. Xu, “Wavelet domain multifractal analysis for static and dynamic texture classification,” IEEE Trans. Image Process., vol. 22, no. 1, pp. 286–299, Jan. 2013. [6] S. Lazebnik, C. Schmid, and J. Ponce, “A sparse texture representation using local affine regions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1265–1278, Aug. 2005. [7] T. Leung, “Representing and recognizing the visual appearance of materials using three-dimensional textons,” Int. J. Comput. Vis., vol. 43, no. 1, pp. 29–44, 2001. [8] M. Varma and A. Zisserman, “A statistical approach to material classification using image patch exemplars,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 11, pp. 2032–2047, Nov. 2009. [9] M. Varma and A. Zisserman, “A statistical approach to texture classification from single images,” Int. J. Comput. Vis., vol. 62, nos. 1–2, pp. 61–81, 2005. [10] J. Zhang, S. Lazebnik, and C. Schmid, “Local features and kernels for classification of texture and object categories: A comprehensive study,” Int. J. Comput. Vis., vol. 73, no. 2, pp. 213–238, 2007. [11] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. [12] G. Liu, Z. Lin, and Y. Yu, “Radon representation-based feature descriptor for texture classification,” IEEE Trans. Image Process., vol. 18, no. 5, pp. 921–928, May 2009. [13] L. Li, C. Tong, and S. Choy, “Texture classification using refined histogram,” IEEE Trans. Image Process., vol. 19, no. 5, pp. 1371–1378, May 2010. [14] H. Lategahn, S. Gross, T. Stehle, and T. Aach, “Texture classification by modeling joint distributions of local patterns with Gaussian mixtures,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1548–1557, Jun. 2010. [15] U. Kandaswamy, S. Schuckers, and D. Adjeroh, “Comparison of texture analysis schemes under nonideal conditions,” IEEE Trans. Image Process., vol. 20, no. 8, pp. 2260–2275, Aug. 2011. [16] T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987, Jul. 2002. [17] Z. Guo, L. Zhang, and D. Zhang, “A completed modeling of local binary pattern operator for texture classification,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1657–1663, Jun. 2010. [18] Z. Guo, L. Zhang, and D. Zhang, “Rotation invariant texture classification using LBP variance (LBPV) with global matching,” Pattern Recognit., vol. 43, no. 3, pp. 706–719, 2010.

2567

[19] S. Liao, M. Law, and A. Chung, “Dominant local binary patterns for texture classification,” IEEE Trans. Image Process., vol. 18, no. 5, pp. 1107–1118, May 2009. [20] G. Zhao and M. Pietikäinen, “Dynamic texture recognition using local binary patterns with an application to facial expressions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 915–928, Jun. 2007. [21] G. Zhao, M. Barnard, and M. Pietikäinen, “Lipreading with local spatiotemporal descriptors,” IEEE Trans. Multimedia, vol. 11, no. 7, pp. 1254–1265, Nov. 2009. [22] G. Zhao, T. Ahonen, J. Matas, and M. Pietikäinen, “Rotation-invariant image and video description with local binary pattern features,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1465–1477, Apr. 2012. [23] X. Tan and B. Triggs, “Enhanced local texture feature sets for face recognition under difficult lighting conditions,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1635–1650, Jun. 2010. [24] X. Wang, T. Han, and S. Yan, “An HOG-LBP human detector with partial occlusion handling,” in Proc. IEEE 12th ICCV, Oct. 2009, pp. 32–39. [25] O. Tuzel, F. Porikli, and P. Meer, “Region covariance: A fast descriptor for detection and classification,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 589–600. [26] B. Wu and R. Nevatia, “Optimizing discrimination-efficiency tradeoff in integrating heterogeneous local features for object detection,” in Proc. IEEE Conf. CVPR, Jun. 2008, pp. 1–8. [27] O. Tuzel, F. Porikli, and P. Meer, “Learning on lie groups for invariant detection and tracking,” in Proc. IEEE Conf. CVPR, Jun. 2008, pp. 1–8. [28] O. Tuzel, F. Porikli, and P. Meer, “Pedestrian detection via classification on Riemannian manifolds,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 10, pp. 1713–1727, Oct. 2008. [29] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Geometric means in a novel vector space structure on symmetric positive-definite matrices,” SIAM J. Matrix Anal. Appl., vol. 29, no. 1, pp. 328–347, 2007. [30] Y. Lui, “Advances in matrix manifolds for computer vision,” Image Vis. Comput., vol. 30, nos. 6–7, pp. 380–388, 2012. [31] X. Pennec, P. Fillard, and N. Ayache, “A Riemannian framework for tensor computing,” Int. J. Comput. Vis., vol. 66, no. 1, pp. 41–66, 2006. [32] S. Guo and Q. Ruan, “Facial expression recognition using local binary covariance matrices,” in Proc. 4th IET ICWMMN, Nov. 2011, pp. 237–242. [33] A. Romero, M. Gouiffès, and L. Lacassagne, “Enhanced local binary covariance matrices (ELBCM) for texture analysis and object tracking,” in Proc. Int. Conf. Comput. Vis./Comput. Graph. Collaborat. Tech. Appl., 2013, p. 10. [34] Y. Zhang and S. Li, “Gabor-LBP based region covariance descriptor for person re-identification,” in Proc. ICIG, Aug. 2011, pp. 368–371. [35] X. Hong, G. Zhao, M. Pietikäinen, and X. Chen, “Combining local and global correlation for texture description,” in Proc. 21st ICPR, Nov. 2012, pp. 2756–2759. [36] J. Zhang, J. Liang, and H. Zhao, “Local energy pattern for texture classification using self-adaptive quantization thresholds,” IEEE Trans. Image Process., vol. 22, no. 1, pp. 31–42, Jan. 2013. [37] C. Kang, S. Liao, S. Xiang, and C. Pan, “Kernel sparse representation with local patterns for face recognition,” in Proc. 18th IEEE ICIP, Sep. 2011, pp. 3009–3012. [38] S. Liao, Z. Lei, S. Li, X. Yuan, and R. He, “Structured ordinal features for appearance-based object representation,” in Analysis and Modeling of Faces and Gestures (Lecture Notes in Computer Science). Berlin, Germany: Springer-Verlag, 2007, pp. 183–192. [39] H. Yang and Y. Wang, “A LBP-based face recognition method with Hamming distance constraint,” in Proc. 4th ICIG, Aug. 2007, pp. 645–649. [40] B. Yao, H. Ai, and S. Lao, “Matching texture units for face recognition,” in Proc. 15th IEEE ICIP, Oct. 2008, pp. 1920–1923. [41] S. Zhao and Y. Gao, “Establishing point correspondence using multidirectional binary pattern for face recognition,” in Proc. 19th ICPR, Dec. 2008, pp. 1–4. [42] H. Karcher, “Riemannian center of mass and mollifier smoothing,” Commun. Pure Appl. Math., vol. 30, no. 5, pp. 509–541, 1977. [43] B. Caputo, E. Hayman, and P. Mallikarjuna, “Class-specific material categorisation,” in Proc. 10th IEEE ICCV, Oct. 2005, pp. 1597–1604. [44] C. Schmid, “Constructing models for content-based image retrieval,” in Proc. IEEE Conf. CVPR, Dec. 2001, pp. II-39–II-45. [45] B. Caputo, E. Hayman, M. Fritz, and J.-O. Eklundh, “Classifying materials in the real world,” Image Vis. Comput., vol. 28, no. 1, pp. 150–163, 2010.

2568

[46] L. Liu, P. Fieguth, D. Clausi, and G. Kuang, “Sorted random projections for robust rotation-invariant texture classification,” Pattern Recognit., vol. 45, no. 6, pp. 2405–2418, Jun. 2012. [47] M. Boland and R. Murphy, “A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells,” Bioinformatics, vol. 17, no. 12, pp. 1213–1223, 2001. [48] L. Nanni, A. Lumini, and S. Brahnam, “Local binary patterns variants as texture descriptors for medical image analysis,” Artif. Intell. Med., vol. 49, no. 2, pp. 117–125, 2010. [49] L. Nanni, S. Brahnam, and A. Lumini, “Selecting the best performing rotation invariant patterns in local binary/ternary patterns,” in Proc. Int. Conf. IP, Comput. Vis., Pattern Recognit., 2010. [50] A. Chebira et al., “A multiresolution approach to automated classification of protein subcellular location images,” Bioinformatics, vol. 8, no. 1, p. 210, 2007. [51] Y. Guo, G. Zhao, and M. Pietikäinen, “Discriminative features for texture description,” Pattern Recognit., vol. 45, no. 10, pp. 3834–3843, 2012. [52] G. Doretto and Y. Yao, “Region moments: Fast invariant descriptors for detecting small image structures,” in Proc. IEEE Conf. CVPR, Jun. 2010, pp. 3019–3026. [53] X. Hong, H. Chang, S. Shan, X. Chen, and W. Gao, “Sigma set: A small second order statistical region descriptor,” in Proc. IEEE Conf. CVPR, Jun. 2009, pp. 1802–1809. [54] S. Kluckner, T. Mauthner, P. Roth, and H. Bischof, “Semantic classification in aerial imagery by integrating appearance and height information,” in Proc. 9th ACCV, 2009, pp. 477–488.

Xiaopeng Hong (M’13) received the B.Eng., M.Eng., and Ph.D. degrees in computer application from the Harbin Institute of Technology, Harbin, China, in 2004, 2007, and 2010, respectively. He has been a Scientist Researcher with the Center for Machine Vision Research, University of Oulu, since 2011. He has authored and co-authored more than 10 peer-reviewed articles in journals and conferences, and has served as a reviewer for several journals and conferences. His current research interests include pose and gaze estimation, texture classification, object detection and tracking, and visual speech recognition.

Guoying Zhao (SM’12) received the Ph.D. degree in computer science from the Chinese Academy of Sciences, Beijing, China, in 2005. She is currently an Associate Professor with the Center for Machine Vision Research, University of Oulu, Finland, where she has been a Researcher since 2005. In 2011, she was an Academy Research Fellow. She has authored and edited three books and two special issues, and has authored and co-authored more than 100 papers in journals and conferences. She is a Co-Organizer of ECCV 2014 Workshop on Spontaneous Facial Behavior Analysis: Long Term Continuous Analysis of Facial Expressions and Microexpressions and INTERSPEECH 2014 Special Session on Visual Speech Decoding. She is a Program Committee Member for many conferences. Her current research interests include gait analysis, dynamic-texture recognition, facial-expression recognition, human motion analysis, and person identification.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 6, JUNE 2014

Matti Pietikäinen received the D.Sc. (Tech.) degree from the University of Oulu, Finland. He is currently a Professor and Scientific Director of Infotech Oulu, and the Director of the Center for Machine Vision Research at the University of Oulu. From 1980 to 1981 and from 1984 to 1985, he visited the Computer Vision Laboratory at the University of Maryland. He has made pioneering highly cited contributions, e.g., to local binary pattern methodology, texture-based image and video analysis, and facial image analysis. He has authored about 300 refereed papers in international journals, books, and conferences. He was an Associate Editor of the IEEE T RANSACTIONS ON PATTERN A NALYSIS AND M ACHINE I NTELLIGENCE and Pattern Recognition journals, and currently serves as an Associate Editor of Image and Vision Computing journal. He was the President of the Pattern Recognition Society of Finland from 1989 to 1992. From 1989 to 2007, he served as a Governing Board Member of the International Association for Pattern Recognition (IAPR), and became a Founding Fellow of the IAPR in 1994.

Xilin Chen received the B.S., M.S., and Ph.D. degrees in computer science from the Harbin Institute of Technology, Harbin, China, in 1988, 1991, and 1994, respectively, where he was a Professor from 1999 to 2005. He was a Visiting Scholar with Carnegie Mellon University, Pittsburgh, PA, USA, from 2001 to 2004. He has been a Professor with the Institute of Computing Technology, Chinese Academy of Sciences (CAS), since 2004. He is the Director of the Key Laboratory of Intelligent Information Processing, CAS. He has authored one book and over 200 papers in refereed journals and proceedings in the areas of computer vision, pattern recognition, image processing, and multimodal interfaces. He is an Associate Editor of the IEEE T RANSACTIONS ON I MAGE P ROCESSING, a Leading Editor of the Journal of Computer Science and Technology, and an Associate Editor-in-Chief of the Chinese Journal of Computers. He served as an Organizing Committee/Program Committee Member for more than 40 conferences. He was a recipient of several awards, including the China’s State Scientific and Technological Progress Award in 2000, 2003, 2005, and 2012 for his research work. He is a fellow of the China Computer Federation.

Combining LBP difference and feature correlation for texture description.

Effective characterization of texture images requires exploiting multiple visual cues from the image appearance. The local binary pattern (LBP) and it...
3MB Sizes 5 Downloads 3 Views