Progressive image denoising through hybrid graph Laplacian regularization: a unified framework.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 4, APRIL 2014

1491

Progressive Image Denoising Through Hybrid Graph Laplacian Regularization: A Unified Framework Xianming Liu, Member, IEEE, Deming Zhai, Member, IEEE, Debin Zhao, Member, IEEE, Guangtao Zhai, Member, IEEE, and Wen Gao, Fellow, IEEE Abstract— Recovering images from corrupted observations is necessary for many real-world applications. In this paper, we propose a unified framework to perform progressive image recovery based on hybrid graph Laplacian regularized regression. We first construct a multiscale representation of the target image by Laplacian pyramid, then progressively recover the degraded image in the scale space from coarse to fine so that the sharp edges and texture can be eventually recovered. On one hand, within each scale, a graph Laplacian regularization model represented by implicit kernel is learned, which simultaneously minimizes the least square error on the measured samples and preserves the geometrical structure of the image data space. In this procedure, the intrinsic manifold structure is explicitly considered using both measured and unmeasured samples, and the nonlocal self-similarity property is utilized as a fruitful resource for abstracting a priori knowledge of the images. On the other hand, between two successive scales, the proposed model is extended to a projected high-dimensional feature space through explicit kernel mapping to describe the interscale correlation, in which the local structure regularity is learned and propagated from coarser to finer scales. In this way, the proposed algorithm gradually recovers more and more image details and edges, which could not been recovered in previous scale. We test our algorithm on one typical image recovery task: impulse noise removal. Experimental results on benchmark test images demonstrate that the proposed method achieves better performance than state-of-the-art algorithms. Index Terms— Image denoising, graph Laplacian, kernel theory, local smoothness, non-local self-similarity.

I. I NTRODUCTION

T

HE problem of recovering patterns and structures in images from corrupted observations is encouraged in many engineering and science applications, ranging from computer vision, consumer electronics to medical imaging. In

Manuscript received May 6, 2013; revised October 1, 2013 and November 18, 2013; accepted January 21, 2014. Date of publication January 29, 2014; date of current version February 18, 2014. This work was supported by the National Science Foundation of China under Grants 61300110, 61272386, and 61371146. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Damon M. Chandler. (Corresponding author: D. Zhai.) X. Liu, D. Zhai, and D. Zhao are with the School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China (e-mail: [email protected]; [email protected]; [email protected]). G. Zhai is with the Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]). W. Gao is with the School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, and also with the National Engineering Laboratory for Video Technology, and Key Laboratory of Machine Perception, School of Electrical Engineering and Computer Science, Peking University, Beijing 100871, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2303638

many practical image processing problems, observed images often contain noise that should be removed beforehand for improving the visual pleasure and the reliability of subsequent image analysis tasks. Images may be contaminated by various types of noise. Among them, the impulse noise is one of the most frequently happened noises, which may be introduced into images during acquisition and transmission. For example, it may be caused by malfunctioning pixels in camera sensors, faulty memory locations in hardware or transmission in a noisy channel [1]. In this paper, we focus on the task of impulse noise removal. Removing impulse noise from images is a challenging image processing problem, because edges which can also be modeled as abrupt intensity jumps in a scan line are highly salient features for visual attention. Therefore, besides impulse noise removal, another important requirement for image denoising procedures is that they should preserve important image structures, such as edges and major texture features. A vast variety of impulse noise removal methods are available in the literature, touching different fields of signal processing, mathematics and statistics. From a signal processing perspective, impulse noise removal poses a fundamental challenge for conventional linear methods. They typically achieve the target of noise removal by low-pass filtering which is performed by removing the high-frequency components of images. This is effective for smooth regions in images. But for texture and detail regions, the low-pass filtering typically introduces large, spurious oscillations near the edge known as Gibb’s phenomena [2], [3]. Accordingly, nonlinear filtering techniques are invoked to achieve effective performance. One kind of the most popular and robust nonlinear filters is the so called decision-based filters, which first employ an impulsenoise detector to determine which pixels should be filtered and then replace them by using the median filter or its variants, while leaving all other pixels unchanged. The representative methods include the adaptive median filter (AMF) [4] and the adaptive center-weighted median filter (ACWMF) [5]. Besides, many successful frameworks for impulse noise removal can be derived from the energy method. In this framework, image denoising is considered as a variational problem where a restored image is computed by a minimization of some energy functions. Typically, such functions consist of a fidelity term such as the norm difference between the recovered image and the noisy image, and a regularization term which penalizes high frequency noise. For example, Chan et al. [6] propose a powerful two-stage scheme, in which noise candidates are selectively restored using an objective function with an

1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1492


Fig. 1.

Intra-scale and inter-scale correlation.

1 data-fidelity term and an edge-preserving regularization term. Under the similar scheme, Cai et al. [7] propose a enhanced algorithm used for deblurring and denoising, and achieve wonderful objective and subjective performance. Different from Chan and Cai’s work, Li et al. [8] formulate the problem with a new variational functional, in which the content-dependent fidelity assimilates the strength of fidelity terms measured by the 1 and 2 norms, and the regularizer is formed by the 1 norm of tight framelet coefficients of the underlying image. From a statistical perspective, recovering images from degraded forms is inherently an ill-posed inverse problem. It often can be formulated as an energy minimization problem in which either the optimal or most probable configuration is the goal. The performance of an image recovery algorithm largely depends on how well it can employ regularization conditions or priors when numerically solving the problem, because the useful prior statistical knowledge can regulate estimated pixels. Therefore, image modeling lies at the core of image denoising problems. One common prior assumption for natural images is intensity consistency, which means: (1) nearby pixels are likely to have the same or similar intensity values; and (2) pixels on the same structure are likely to have the same or similar intensity values. Note that the first assumption means images are locally smooth, and the second assumption means images have the property of non-local self-similarity. Accordingly, how to choose statistical models that thoroughly explore such two prior knowledge directly determines the performance of image recovery algorithms. Another important characteristic of natural images is that they are comprised of structures at different scales. Through multi-scale decomposition, the structures of images at different scales become better exposed, and hence be more easily predicted. At the same time, the availability of multi-scale structures can significantly reduce the dimension of problem, hence, make the ill-posed problem to be better posed [9], [10]. Early heuristic observation about the local smoothness of image intensity field has been quantified by several linear parametric models, such as the piecewise autoregressive (PAR) image model [11], [12]. Moreover, the study of natural image statistics reveals that the second order statistics of natural images tends to be invariant across different scales, as illustrated in Fig. 1 (denoted by inter-scale correlation). And those

scale invariant features are shown to be crucial for human visual perception [13], [14]. This observation inspires us to learn and propagate the statistical features across different scales to keep the local smoothness of images. On the other hand, the idea of exploiting the non-local self-similarity of images has attracted increasingly more attention in the field of image processing [15], [16]. Referring to Fig. 1 (denoted by intra-scale correlation), the non-local self-similarity is based on the observation that image patches tend to repeat themselves in the whole image plane, which in fact reflects the intra-scale correlation. All those findings tell us that localnonlocal redundancy and intra-inter-scale correlation can be thought of as two sides of the same coin. The multiscale framework provides us a wonderful choice to efficiently combine the principle of local smoothness and non-local similarity for image recovery. Moreover, recent progress in semi-supervised learning gives us additional inspiration to address the problem of image recovery. Semi-supervised learning is motivated by a considerable interest in the problem of learning from both labeled (measured) and unlabeled (unmeasured) points [17]. Specially, geometry-based semi-supervised learning methods show that natural images cannot possibly fill up the ambient Euclidean space rather it may reside on or close to an underlying submanifold [18], [19]. In this paper, we try to extract this kind of low dimensional structure and use it as prior knowledge to regularize the process of image denoising. In another word, in the algorithm design, we will explicitly take into account the intrinsic manifold structure by making use of both labeled and unlabeled data points. Motivated by the above observation, the well-known theory of kernels [20] and works on graph-based signal processing [24]–[26], in this paper, we propose a powerful algorithm to perform progressive image recovery based on hybrid graph Laplacian regularized regression. Part of our previous work has been reported in [21]. In our method, a multi-scale representation of the target image is constructed by Laplacian pyramid, through which we try to effectively combine local smoothness and non-local self-similarity. On one hand, within each scale, a graph Laplacian regularization model represented by implicit kernel is learned which simultaneously minimizes the least square error on the measured samples and preserves the geometrical structure of the image data space by exploring non-local self-similarity. In this procedure, the intrinsic manifold structure is considered by using both measured and unmeasured samples. On the other hand, between two scales, the proposed model is extended to the parametric manner through explicit kernel mapping to model the interscale correlation, in which the local structure regularity is learned and propagated from coarser to finer scales. It is worth noting that the proposed method is a general framework to address the problem of image recovery. We choose one typical image recovery task, impulse noise removal, but not limit to this task, to validate the performance of the proposed algorithm. Moreover, in our method the objective functions are formulated in the same form for intra-scale and inter-scale processing, but with different solutions obtained in different feature spaces: the solution in the

LIU et al.: PROGRESSIVE IMAGE DENOISING THROUGH HYBRID GRAPH LAPLACIAN REGULARIZATION

original feature space by implicit kernel is used for intra-scale prediction, and the other solution in a higher feature space mapped by explicit kernel is used for inter-scale prediction. Therefore, the proposed image recovery algorithm actually casts the consistency of local and global correlation through the multi-scale scheme into a unified framework. The rest of the paper is organized as follows: In Section II, we introduce the proposed graph Laplacian regularized model and its kernel-based optimization solutions. Section III details the proposed multi-scale image recovery framework. Section IV presents some experimental results and comparative studies. Section V concludes the paper. II. I MAGE R ECOVERY VIA G RAPH L APLACIAN R EGULARIZED R EGRESSION A. Problem Description Given a degraded image X with n pixels, each pixel can be described by its feature vector xi = [ui , bi ] ∈ m+2 , where ui = (h, w) is the coordinate and bi ∈ m is a certain context of xi which is defined differently for different tasks. All pixels in the image construct the sample set χ = {x1 , x2 , . . . , xn }. We call the grayscale value yi as the label of xi . For image impulse noise removal, when image measures are noise-dominated, the performance of image recovery can be improved by implementing it in two steps [6]: The first step is to classify noisy and clean samples by using the adaptive median filter or its variant, which depends on the noise type. Then, noisy samples are treated as unlabeled ones with their intensity values to be re-estimated, and the rest clean samples are treated as labeled ones with intensity values unchanged. The second step is to adjust the inference to give a best fit to labeled measures and uses the fitted model to estimate the unlabeled samples. In view of machine learning, this task can be addressed as a problem of semi-supervised regression. B. Graph Laplacian Regularized Regression (GLRR) What we want to derive is the prediction function f , which gives the re-estimated values of noisy samples. Given labeled samples X l = {(x1 , y1 ), . . . , (xl , yl )} as the training data, one direct approach of learning the prediction function f is to minimize the prediction error on the set of labeled samples, which is formulated as follows: arg min J ( f ) = arg min f ∈Hκ

f ∈Hκ

l

yi − f (xi )2 + λ f 2 ,

(1)

i=1

where Hκ is the Reproducing Kernel Hilbert Space (RKHS) associated with the kernel κ. Hκ will be the completion of the linear span given by κ(xi , ·) for all xi ∈ χ, i.e., Hκ = span{κ(xi , ·)|xi ∈ χ}. The above regression model only makes use of the labeled samples to carry out inference. When the noise level is heavy, which means there are few labeled samples, it is hard to achieve a robust recovery of noisy image. Moreover, it fails to take into account the intrinsic geometrical structure of the image data. Note that we also have a bunch of unlabeled samples {xl+1 , . . . , xn } at hand. In the field of machine

1493

learning, the success of semi-supervised learning [20]–[25] is plausibly due to effective utilization of the large amounts of unlabeled data to extract information that is useful for generalization. Therefore, it is reasonable to leverage both labeled and unlabeled data to achieve better predictions. In order to make use of unlabeled data, we follow the well-known manifold assumption, which is implemented by a graph structure. Specially, the whole image sample set is modeled as a undirected graph, in which the vertices are all the data points and the edges represent the relationships between vertices. Each edge is assigned a weight to reflect the similarity between the connected vertices. As stated by the manifold assumption [18], data points in the graph with larger affinity weights should have similar values. Meanwhile, with the above definition, the intrinsic geometrical structure of the data space can be described by the graph Laplacian. Through the graph Laplacian regularization, the manifold structure can be incorporated in the objective function. Mathematically, the manifold assumption can be implemented by minimizing the following term: 1 n ( f (xi ) − f (x j ))2 Wi j . (2) R( f ) = i, j 2 where Wi j is in inverse proportion to d 2 (xi , x j ). Wi j is defined as the edge weight in the data adjacency graph which reflects the affinity between two vertices xi and x j . In graph construction, edge weights play a crucial role. In this paper, we combine the edge-preserving property of bilateral filter [16] and the robust property of non-local-means weight [15] to design the edge weights, which are defined as follows: ||u j − ui ||2 ||b j − bi ||2 1 exp − , Wi j = exp − (3) C σ2 ε2 σ > 0, ε > 0. where bi (b j ) is defined as the local patch centered on ui (u j ). The first exponential term considers the geometrical nearby, and the second one considers the structural similarity. Let D be a diagonal matrix, whosediagonal elements are the row sums of W, i.e., D(i, i ) = j Wi j . We define L = D − W ∈ n×n as the graph Laplacian. With above definition, Eq.(2) can be further written as: n n n R( f ) = f (xi )2 Wi j − Wi j f (xi ) f (x j ) i=1

j =1

= fT Df − fT Wf = fT Lf,

i, j

(4)

where f = { f (x1 ), . . . , f (xn )}. Combining this regularization term with Eq.(1), we obtain the objective function of Laplacian regularized least square (LapRLS): 2 arg min{J ( f ) = y L − f L + λ f 2 + γ fT Lf}, (5) f ∈Hκ

where y L = [y1 , . . . , yl ]T , f L = [ f (x1 ), . . . , f (xl )]T and f = [ f (x1 ), . . . , f (xn )]T . C. Optimization by Implicit Kernel In order to obtain the optimal solution for the above objective function, we exploit a useful property of RKHS, the so called representer theorem. It states that minimizing of any

1494


optimization task in Hilbert space H has finite representation in H. Theorem 1. (Representer Theorem [18]). Let χ be a nonempty set and κ a positive-definite real-valued kernel on χ ×χ with corresponding reproducing kernel Hilbert space H. Given a set of labeled samples {(x1 , y1 ), . . . , (xl , yl )} ∈ χ ×, a strictly monotonic increasing function : [0, ∞) → , and a loss function c : (χ × )2 → ∪ {∞}, each minimizer f ∈ H of the regularized risk functional (6) c((y1 , f (x1 )), . . . , (yl , f (xl ))) + f H admits a representation of theory l f (x) = αi κ(xi , x). i=1

(7)

In this paper, the loss function c is specified as the quadratic loss, and the regularization term f H takes the form f H = λ f 2 = λ f T f . The Representer Theorem above allows us to express the solution of regularized least square (RLS) formulated in Eq.(1) directly in terms of the labeled data and the kernels. As stated in the above subsection, we extend the RLS to LapRLS formulated in Eq.(5) to utilize both labeled and unlabeled information in regression. The following extended version of the Representer Theorem shows that the minimizer has an expansion in terms of both labeled and unlabeled samples and it is a key to our algorithm. Theorem 2. (Representer Theorem for Semi-supervised Learning [18]) Given a training sample set including both labeled and unlabeled samples {(x1 , y1 ), . . . , (xn , yn )}∈ χ×, the minimizer of the optimization problem formulated in c((y1 , f (x1 )), . . . , (yl , f (xl ))) + f H + γ fT Lf (8) admits an expansion f (x) =

n i=1

αi κ(xi , x)

(9)

in terms of both labeled and unlabeled samples. With the above exteneded Representer Theorem, we define the solution f of Eq.(5) as: ⎤ ⎡ n ⎤ ⎡ f (x1 ) i=1 αi κ(xi , x1 ) ⎥ ⎢ ⎥ ⎢ .. (10) f = ⎣ ... ⎦ = ⎣ ⎦ = Ka. . n f (xn ) i=1 αi κ(xi , xn ) where K is the kernel gram matrix with Ki j = κ(xi , x j ), a = [α1 , . . . , αn ]T . Denoting K L as the submatrix consisting of rows of K corresponding to those labeled samples in the set X L , we have f L = K L a. Mathematically, the kernel function is defined as: κ(xi , x) = (xi ), (x) = (xi )T (x), where ·, · is the inner product operator; and : χ → H is the kernel induced feature mapping, which maps the points in χ to a higher-dimensional RHKS space H. According to the above definition, Eq.(9) can be further written as: n f (x) = ( αi (xi )T )(x). (11) i=1

f in the second term of Eq. (5) is referred to as the coefficient vector of the mapped feature (x) in RHKS space, which is derived as n αi (xi )T ) = [(x1 ), . . . , (xn )]a. (12) f =( i=1

As a result, we have f 2 = aT [(x1 ), . . . , (xn )]T [(x1 ), . . . , (xn )]a = aT Ka,

(13)

With above representations, the objective function defined in Eq. (5) can be rewritten as: 2 arg min{J (a) = y L − K L a + λa T Ka + γ aT KLKa}. (14) a∈R n

Therefore, the original minimization problem converts to a quadratic problem with respect to the representation coefficient vector a. By taking ∂ J (a)/∂a = 0, we can derive a closedform solution as: −1 KL yL . (15) a∗ = K L K TL + λK + γ KLK Substituting this optimal coefficient vector into Eq. (10), we can get the optimal predicted values for all samples. How to define Ki j = κ(xi , x j ) will be elaborated in next section. We call the above model as implicit kernel GLRR (IK-GLRR), since the form of kernel function κ(xi , x j ) is directly given (as formulated in Eq. (21) and Eq. (22)), without relying on any concrete definition of . The above implicit kernel induced framework addresses the problem of nonlinear estimation in a nonparametric manner, which relies on the data itself to dictate the structure of the model. It provides us an effective approach to explore the intra-scale similarity, and can handle image recovery from very sparse data. D. Optimization by Explicit Kernel We further consider explicitly mapping samples to a high dimensional feature space in order to reformulate the proposed graph Laplacian regularized model in a linear manner in that space. This is equivalent to solving a nonlinear problem in the original space. This will bring us additional insights to address the current ill-posed problem. More specifically, after the process of feature mapping, we get xi = (xi ), which is a projected point in the highdimensional space. Then we define a linear kernel function in this higher-dimensional feature space as κ ( xi , x) = xiT x. With the definition, the representer theorem formulated in Eq. (9) can be written as: n n x. (16) αi κ ( xi , x) = αi xiT f ( x) = i=1

n

i=1

Denoting wT = xiT , then f ( x) converts to a linear i=1 αi function with the coefficient vector w. And f can be written as: ⎤ ⎡ T ⎤ ⎡ x1 f ( x1 ) w ⎥ ⎥ ⎢ ⎢ (17) f = ⎣ ... ⎦ = ⎣ ... ⎦ = XT w, T f ( xn ) xn w where X = [ x1 , . . . , xn ] is the feature matrix in the mapped high-dimensional space. Since f in Eq.(5) indicates the coefficient vector of the mapped feature (x) in RHKS space, we have f = w for explicit kernel optimization. Thus, f H = λ f 2 = λwT w.


1495

labeled and unlabeled samples through the graph Laplacian regularization; while the latter model is derived from the perspective of local structure preservation, which explicitly map the image data points to a high dimensional feature space, and establish a direct connection between the inter-scale correlation and the linear regression in the mapped feature space. III. P ROGRESSIVE H YBRID G RAPH L APLACIAN R EGULARIZATION Fig. 2. Model duality across different scales. Black dots represent samples at coarser scale, black dots and blank dots together represent samples at finer scale. The information of second order statistics is scale invariant, and therefore can be propagated from coarser scale to finer scale.

Substituting Eq. (17) into Eq. (5), the objective function can be formulated into a parametric quadratic form with respect to w: 2 Xw}, (18) XTL w + λwT w + γ wT XL arg min{J (w) = y L − w

where X L = { x1 , . . . , xl }. By taking ∂ J (w)/∂w = 0, we derive a closed-form solution as: −1 XTL + λI + γ XL yL . XL XL X (19) w∗ = Here, I is the identity matrix. We call the above model as explicit kernel GLRR (EK-GLRR), since it relies on an explicitly defined feature mapping , which will be detailed as follows. The kernel trick described above maps the non-linear features into a higher dimensional linear feature space, in which we can obtain a linear relationship between mapped features. In the field of image processing, some linear models, such as autoregressive (AR) model, have been widely used to quantify the local smoothness of image intensity field [11], [12]. It inspires us that the EK-GLRR model also can be used to finish similar tasks. Moreover, the study of natural image statistics reveals that the second order statistics of natural images tends to be invariant across different scales. Based on such an observation, as shown in Fig. 2, we first estimate the EK-GLRR model from the recovered image at a coarse scale and then use the model to adapt the reconstruction at the finer scale based on the geometric duality across different scales. In this way, we can learn and propagate the statistical features across different scales to keep the local smoothness of images. In practical design, the explicit projection : R → R 4 is defined as mapping one pixel to 4-dimensional vector including its four 8-connected neighbors along two diagonal directions. E. Discussion The IK-GLRR and EK-GLRR models described above supply complementary views towards the regularity in natural images: The former model operated in the original feature space is derived from the perspective of global geometry consistency, which attempts to explore the property of nonlocal self-similarity among image samples, and connect the

As stated in the above section, the IK-GLRR and EK-GLRR models provide two complementary views about the current image recovery task. A natural question is how to combine them together into an elegant framework. In this paper, we propose to use a simple multi-scale framework to achieve such a purpose. There are at least several reasons why we use the multi-scale framework. First, one important characteristic of natural images is that they are comprised of structures at different scales. Through multi-scale decomposition, the structures of images at different scales become better exposed, and hence be more easily predicted. Second, a multi-scale scheme will give a more compact representation of imagery data because it encodes low frequency parts and high frequency parts separately. As well known, the second order statistics of natural images tends to be invariant across different scales [13], [14]. Therefore, the low frequency parts can be extracted from much smaller downsampled image. Third, the stronger correlations among adjacent image blocks will be captured in the downsampled images because every four image blocks are merged into one block in the downsampled image. As a consequence, in this paper, we propose an effective approach to recover noisy imagery data by combining hybrid models and the multi-scale paradigm. We now introduce the multiscale implementation of the proposed method. To give a clear illustration, we summary the proposed multi-scale scheme in Fig. 3, where 90% samples in the test image Peppers are corrupted. We use the subscript l to indicate the level in the pyramid of downsampled images. The finest level (the original image) is indicated by l = 0. The larger is l, the coarser is the downsampled image. We denote the highest level to be l = L. First, the level-l image Il passes a low-pass filter F, which is implemented in our method by averaging the existing pixels in a 2 × 2 neighborhood on higher resolution. Then, the filter image is downsampled by 2 to get a coarser image Il+1 . . (20) Il+1 = F(Il ) ↓ 2, l = 0, . . . , L − 1. In this way, we can construct a Laplacian pyramid. In the practical implementation, we construct a tree-level Laplacian pyramid. At the beginning, we have the image I2 at scale 2 at hand, which is defined on the coarsest grid of pixels G2 . This initial image lacks a fraction of its samples. We start off by recovering the missing samples using the proposed IK-GLRR model, which has been detailed in Section II-C, to get a more complete grid I2 . This procedure can be performed iteratively by feeding the processing results I2 to the GLRR model as a prior for computing the kernel distance κ. In the

1496


Fig. 3.

Diagram of the proposed method.

practical experiments, two iterations was found to be effective in improving the processing results in such type of operations. Specially, in the first iteration, since there is only coarsest degraded image I2 at hand, we construct the kernel distance by Gaussian kernel: 2 κ(xi , x j ) = ex p(ui − u j /σ 2 ), (21) where ui and u j are the location coordinates of xi and x j respectively. For the rest iteration and iterations in other higher level, we use non-local means kernel to explore the global correlation among the whole image samples: 2 κ(xi , x j ) = ex p(bi − b j /σ 2 ). (22) Here, bi and b j are the local patches centered on xi and x j respectively, which represent the context information around xi and x j . The recovered image I2 is then interpolated to a finer grid G1 using the proposed EK-GLRR model, which has been detailed in Section II-D. The upsampled image I1 can be used as a prior estimation for the IK-GLRR model towards a refined I1∗ can be upconverted to I0 in the original estimate I1∗ . Then resolution grid G0 by the EK-GLRR model. And the refined estimate I0 can be combined with I0 into another IK-GLRR recovery procedure towards the final results I0∗ . Using the above progressive recovery based on intra-scale and inter-scale correlation, we gradually recover an image with few artifacts. Note that the basic scheme we use is closely related to the work [27]. However, we replace most of the components it uses with application-specific ones that we describe in the above section. The first contribution is that we explicitly take into account the intrinsic manifold structure by making use of both labeled and unlabeled data points. This is especially useful for the current impulse noise removal task, in which the

number of clean samples is usually not enough to train a robust predictor when the noise level is heavy. The large amounts of unlabeled data can be effectively utilized to extract information that is useful for generalization. The second contribution is that we propose to use the model induced by implicit kernel to consider the property of scale-invariant of natural image, which are shown to be essential for visual perception. This model can effectively learn and propagate the statistical features across different scales to keep the local smoothness of images. IV. E XPERIMENTAL R ESULTS AND A NALYSIS In this section, extensive experimental results are presented to demonstrate the superiority of the proposed algorithm on the task of impulse noise removal. In experiments, we test two cases: only denoising and both denoising and deblurring, to show the power of our method on handling different distortion. For these two cases, we test two kinds of impulse noise: salt-and-pepper noise and random-value noise. For thoroughness of our comparison study, we select seven widely used images in the literature as test images, as illustrated in Fig. 4. The images are all sized of 512 × 512. There are a few parameters involved in the proposed algorithm. σ 2 and ε2 are fixed to 0.5. λ and γ are set as 0.5 and 0.01 respectively. For comprehensive comparison, the proposed algorithm is compared with some state-of-the-art work in the literature. More specifically, four approaches are included in our comparative study: (1) kernel regression (KR) based methods [22]; (2) two-phase method proposed by Cai et al. [7]; (3) iterative framelet-based method (IFASDA) proposed by Li et al. [8]; (4) our method. The source code of our method can be available from http://homepage.hit.edu.cn/pages/xmliu.


Fig. 4.

1497

Seven sample images in the test set. TABLE I

O BJECTIVE Q UALITY C OMPARISON OF F OUR A LGORITHMS ( IN dB) FOR S ALT- AND -P EPPER N OISE R EMOVAL

For the first stage, we use the adaptive median filter (AMF) for salt-and-peppers noise detection, and use the adaptive center-weighted median filter (ACWMF) for random-value noise detection. Suppose that f be a noisy image with impulse noise and y be the filtered result by median-type filter, A be the image plane. The candidates of noisy pixels contaminated by impulse noise can be determined as follows: (1) For salt-and-peppers noise: N = {(i, j ) ∈ A : yi j = f i j and f i j ∈ {dmin , dmax }}, (23) (2) For random-valued impulse noise: N = {(i, j ) ∈ A : yi j = fi j }

(24)

Then the detected noisy pixels are treated as missing pixels (filled with 0) and the rest clean pixels are kept their original value to get the reference image. For more information about the methods of noise detection, we refer the interested readers to [4] and [5]. A. Salt-and-Pepper Noise Removal We first examine the performance comparison on restoring images contaminated by salt-and-pepper noise only. The test images are corrupted by salt-and-pepper noise with high noise rates: 80%, 85%, 90%. For detecting salt-and-pepper noise, we use the AM filter [5] with a maximum window size of 19. We quantify the objective performance of all methods by PSNR. Table I tabulates the objective performance of the compared methods. It is clear to see that for all images our method gives highest PSNR values among the compared methods. The average PSNR gain is up to 0.53dB compared with second best perform algorithm. For Lena, the performance gain is 1.01dB when the noise level is 85%.

It is worth mentioning that KR can be regarded as a special case of our method, which only performs single-scale estimation. KR fails to handle high noise levels, such as 90%. At noise levels 80% and 85%, our method works better than KR for all images except Barbara. In Barbara, there are many globally repeated textures with regular directions. Since such regions are not piecewise stationary, the geometric duality across different scales does not exist. Therefore, the multiscale framework does not work well in this case. In contrast, the single-scale and patch-based kernel regression works better. For images with repetitive structures like Barbara, we can degenerate the proposed scheme to single-scale to get better results. Given the fact that human visual system (HVS) is the ultimate receiver of the restored images, we also show the subjective comparison results. The recovered results for Lena, Peppers and Boat are illustrated in Fig. 5, corresponding to 90% salt-and-peppers noise. The contents of these noisy images are almost not invisible. From the results, we can find that, under high noise levels, the kernel regression methods generate some spurious high frequency artifacts; Cai’s method overblurs the results and cannot keep the edge structure well; while the IFASDA approach causes irregular outliers along edges and textures. It can be clearly observed that the proposed algorithm achieves the best overall visual quality through combining the intra-scale and inter-scale correlation: the image is sharper due to the property of local smoothness preservation when using inter-scale correlation, and the edges are more consistent due to the exploration of non-local self-similarity when using intra-scale correlation. B. Random-Valued Impulse Noise Removal We now consider the case that test images are corrupted by random-valued impulse noise only. The random noise

1498


Fig. 5. Subjective quality comparison on salt-and-pepper noise removal with the noise level 90%. Column 1: the noisy images; Column 2: the results of KR; Column 3: Cai’s results; Column 4: the results of IFASDA; Column 4: our results. TABLE II O BJECTIVE Q UALITY C OMPARISON OF F OUR A LGORITHMS ( IN dB) FOR R ANDOM -VALUED I MPULSE N OISE R EMOVAL

values are identically and uniformly distributed in [dmin , dmax ], therefore, clearly random-valued impulse noise are more difficult to detect than salt-and-pepper noise. And the task of random-valued noise removal is expected to be more difficult compared with salt-and-peppers noise removal. Therefore, for random-valued impulse noise removal, we test three medium

noise levels: 40%, 50% and 60%. In our experiments, the noise is detected by ACWMF [14], which is successively performed four times with different parameters for one image. The parameters are chosen to be the same as those in [23]. In Table II, we show the PSNR values when restoring the corrupted images with random-valued impulse noise. As


1499

Fig. 6. Subjective quality comparison on random-valued noise removal with the noise level 50%, Column 1: the noisy image; Column 2: the results of KR; Column 3: Cai’s results; Column 4: the results of IFASDA; Column 4: our results. TABLE III O BJECTIVE Q UALITY C OMPARISON OF F OUR A LGORITHMS ( IN dB) FOR S ALT- AND -P EPPER N OISE R EMOVAL AND D EBLURRING S IMULTANEOUSLY

depicted in this table, our method also achieves the best objective performance among the compared methods. The proposed algorithm achieves the highest average PSNR value for all cases. The average PSNR gain is up to 0.55dB compared with second best perform algorithm. The recovered results for Lena, Peppers and Boat are illustrated in Fig. 6, corresponding to

50% random-valued impulse noise. From the results, we can find after noise removal Cai’s method and IFASDA still have some regions with noise, such the face region of Lena, the region between two peppers in Peppers, and the mast region of Boat. Our method produces more clear results compared with other methods.

1500


Fig. 7. Subjective quality comparison on deblurring and denoising simultaneously for salt-and-peppers noise with noise level 90%. Column 1: the noisy image; Column 2: the result of KR; Column 3: Cai’s result; Column 4: the results of IFASDA; Column 4: our results. TABLE IV O BJECTIVE Q UALITY C OMPARISON OF F OUR A LGORITHMS ( IN dB) FOR R ANDOM -VALUED N OISE R EMOVAL AND D EBLURRING S IMULTANEOUSLY

C. Salt-and-Pepper Noise Removal and Deblurring Next, let us examine the performance of compared methods on mixed distortion, that is, performing denoising and deblurring simultaneously, where the test images are first blurred and then added with impulse noise. The blurring operators are Gaussian blur with a window size of 7 × 7 and a standard deviation of 1. We first test salt-and-peppers noise. The added

impulse noise is still with three heavy levels: 80%, 85%, 90%. Table III illustrates the quantitative comparison on these test images. It can be observed that the proposed algorithm achieves the highest PSNR values for all test images. The average PSNR gain is up to 1.46dB compared with second best perform algorithm. We also test the subjective quality comparison. The recovered results for Lena, Peppers and Boat are illustrated in Fig. 7,


1501

Fig. 8. Subjective quality comparison on deblurring and denoising simultaneously for random-valued impulse noise with noise level 50%. Column 1: the noisy image; Column 2: the result of KR; Column 3: Cai’s result; Column 4: the results of IFASDA; Column 4: our results. TABLE V O BJECTIVE Q UALITY V ERSUS AVERAGE P ROCESSING T IMES (dB/S ECONDS ) R ESULTS

shoulder in Lena, the edge along the pepper in Peppers, and the edges along the mast in Boat. It is easy to find that the edge across the region with heavy noise cannot be well recovered with other methods. This further demonstrates the power of the proposed multi-scale impulse noise removal algorithm. The strength of the proposed progressively recovery approach comes from its full utilization of the intra-scale and inter-scale correlations, which are neglected by the currently available single-scale methods. And many large-scale structures can be well recovered based upon the progressively computed lowlevel results, which is impossible for traditional single level impulse noise removal algorithms. D. Random-Valued Noise Removal and Deblurring

corresponding to 90% salt-and-peppers noise and blurring. Our method produces the most visually pleasant results among all comparative studies. Even under blur and impulse noise simultaneously, the proposed algorithm is still capable of restoring major edges and repetitive textures of the images. It is noticed that the proposed method can more accurately recover global object contours, such as the edge along the

We now consider the case that blurred images are corrupted by random-valued impulse noise. We still test three medium noise levels: 40%, 50% and 60%. Table IV tabulates the objective quality comparison with respect to PSNR of the four test methods. From this table, we can find at lower noise levels, such as 40%, for test images Wheel and Man, the proposed method loses slightly compared with IFASDA. But the average PSNR is still higher than IFASDA. From

1502


higher noise levels, such as 50% and 60%, the proposed method works the best among the compared methods for all test images. The average gain is up to 0.65dB. This demonstrates our method can handle more difficult cases. Fig. 8 illustrates the subjective quality comparison for Lena, Peppers and Boat when the noise level is 50%. It is noticed that the proposed method can more accurately recover images. Both the superior subjective and objective qualities of the proposed algorithm convincingly demonstrate the potential of the proposed hybrid graph Laplacian regularized regression for impulse noise removal. E. Running Time vs. Performance Comparison Another major issue needed to consider in image denoising is the computational complexity. Here we use random-valued noise removal and deblurring as an example to show the practical processing time and performance comparison among the compared methods. Table V gives the PSNR versus average processing times results on a typical computer (2.5GHz Intel Dual Core, 3G Memory). All of the compared algorithm are running on Matlab R2012a. As depicted in Table V, the computational complexity of the proposed method is higher than KR but lower than two other state-of-the-art image impulse noise removal algorithms, and achieves much better quality with respect to PSNR than other methods. V. C ONCLUSION In this paper, we present an effective and efficient image impulse noise removal algorithm based on hybrid graph Laplacian regularized regression. We utilize the input space and the mapped high-dimensional feature space as two complementary views to address such an ill-posed inverse problem. The framework we explored is a multi-scale Laplacian pyramid, where the intra-scale relationship can be modeled with the implicit kernel graph Laplacian regularization model in input space, while the inter-scale dependency can be learned and propagated with the explicit kernel extension model in mapped feature space. In this way, both local and nonlocal regularity constrains are exploited to improve the accuracy of noisy image recovery. Experimental results demonstrate our method outperforms the state-of-the-art methods in both objective and subjective quality. Moreover, the proposed framework is powerful and general, and can be extended to deal with other ill-posed image restoration tasks. ACKNOWLEDGMENT The authors would like to thank Dr. J.-F. Cai of University of Iowa, and Dr. Y.-R. Li of Shenzhen University for sharing their source codes for comparison. The authors also would like to thank the anonymous reviewers for their constructive suggestions which helped them improve the manuscript. R EFERENCES [1] K. N. Plataniotis and A. N. Venetsanopoulos, Color Image Processing and Applications. Berlin, Germany: Springer-Verlag, 2000. [2] S. G. Mallat, A Wavelet Tour of Signal Processing, the Sparse Way, 3rd ed. Amsterdam, The Netherlands: Elsevier, 2008.

[3] M. A. Little and N. S. Jones, “Generalized methods and solvers for noise removal from piecewise constant signals,” Proc. R. Soc., Math., vol. 467, no. 2135, pp. 3088–3114, 2011. [4] H. Hwang and R. A. Haddad, “Adaptive median filters: New algorithms and results,” IEEE Trans. Image Process., vol. 4, no. 4, pp. 499–502, Apr. 1995. [5] S.-J. Ko and Y. H. Lee, “Center weighted median filters and their applications to image enhancement,” IEEE Trans. Circuits Syst., vol. 38, no. 9, pp. 984–993, Sep. 1991. [6] R. H. Chan, Y. Dong, and M. Hintermüller, “An efficient two-phase L1-TV method for restoring blurred images with impulse noise,” IEEE Trans. Image Process., vol. 19, no. 7, pp. 1731–1739, Jul. 2010. [7] J. Cai, R. H. Chan, and M. Nikolova, “Fast two-phase image deblurring under impulse noise,” J. Math. Imaging Vis., vol. 36, no. 1, pp. 46–53, 2010. [8] Y. Li, L. Shen, D. Dai, and B. W. Suter, “Framelet algorithms for deblurring images corrupted by impulse plus Gaussian noise,” IEEE Trans. Image Process., vol. 20, no. 7, pp. 1822–1837, Jul. 2011. [9] W. Hong, J. Wright, K. Huang, and Y. Ma, “Multiscale hybrid linear models for lossy image representation,” IEEE Trans. Image Process., vol. 15, no. 12, pp. 3655–3671, Dec. 2006. [10] G. Zhai and X. Yang, “Image reconstruction from random samples with multiscale hybrid parametric and nonparametric modeling,” IEEE Trans. Circuits Syst., vol. 22, no. 11, pp. 1554–1563, Nov. 2012. [11] X. Liu, D. Zhao, R. Xiong, S. Ma, W. Gao, and H. Sun, “Image interpolation via regularized local linear regression,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3455–3469, Dec. 2011. [12] X. Zhang and X. Wu, “Image interpolation by adaptive 2-D autoregressive modeling and soft-decision estimation,” IEEE Trans. Image Process., vol. 17, no. 6, pp. 887–896, Jun. 2008. [13] D. L. Ruderman and W. Bialek, “Statistics of natural images: Scaling in the woods,” Phys. Rev. Lett., vol. 73, pp. 814–817, Aug. 1994. [14] A. Srivastava, A. B. Lee, E. P. Simoncelli, and S.-C. Zhu, “On advances in statistical modeling of natural images,” J. Math. Imaging Vis., vol. 18, no. 1, pp. 17–33, 2003. [15] A. Buades, B. Coll, and J. Morel, “Nonlocal image and movie denoising,” Int. J. Comput. Vis., vol. 76, no. 2, pp. 123–139, 2008. [16] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proc. IEEE 6th Int. Conf. Comput. Vis., Jan. 1998, pp. 839–846. [17] X. Zhu, “Semi-supervised learning literature survey,” Dept. Comput. Sci., Univ. Wisconsin, Madison, WI, USA, Tech. Rep. 1530, 2005. [18] M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: A geometric framework for learning from examples,” J. Mach. Learn. Res., vol. 7, pp. 2399–2434, Nov. 2006. [19] D. Cai, X. Wang, and X. He, “Probabilistic dyadic data analysis with local and global consistency,” in Proc. 26th ICML, Jun. 2009, pp. 105–112. [20] B. Scholkopf, S. Mika, C. J. C. Burges, P. Knirsch, K.-R. Muller, G. Ratsch, et al., “Input space versus feature space in kernel-based methods,” IEEE Trans. Neural Netw., vol. 10, no. 5, pp. 1000–1017, May 1999. [21] D. Zhai, X. Liu, D. Zhao, H. Chang, and W. Gao, “Progressive image restoration through hybrid graph Laplacian regularization,” in Proc. IEEE Data Compress. Conf., Snowbird, UT, USA, Mar. 2013, pp. 103–112. [22] H. Takeda, S. Farsiu, and P. Milanfar, “Kernel regression for image processing and reconstruction,” IEEE. Trans. Image Process., vol. 16, no. 2, pp. 349–366, Feb. 2007. [23] R. H. Chan, C. Hu, and M. Nikolova, “An iterative procedure for removing random-valued impulse noise,” IEEE Signal Process. Lett., vol. 11, no. 12, pp. 921–924, Dec. 2004. [24] P. Milanfar, “A tour of modern image filtering,” IEEE Signal Process. Mag., vol. 30, no. 1, pp. 106–128, Jan. 2013. [25] D. Shuman, S. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, Feb. 2013. [26] X. Liu, D. Zhao, J. Zhou, W. Gao, and H. Sun, “Image interpolation via graph-based Bayesian label propagation,” IEEE Trans. Image Process., vol. 23, no. 3, pp. 1084–1096, Mar. 2014. [27] L. Yuan, J. Sun, L. Quan, and H. Y. Shum, “Progressive inter-scale and intra-scale non-blind image deconvolution,” in Proc. SIGGRAPH, 2008, pp. 1–15.


Xianming Liu is currently an Assistant Professor with the Department of Computer Science, Harbin Institute of Technology (HIT), Harbin, China. He received the B.S., M.S., and Ph.D. degrees in computer science from HIT in 2006, 2008, and 2012, respectively. From 2009 to 2012, he was with the National Engineering Laboratory for Video Technology, Peking University, Beijing, as a Research Assistant. In 2011, he was a Visiting Student with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada, where he was a Post-Doctoral Fellow from 2012 to 2013. Since 2014, he has been a Post-Doctoral Fellow with the National Institute of Informatics, Tokyo, Japan. His research interests include image/video coding, image/video processing, and machine learning.

Deming Zhai is currently an Assistant Professor with the Department of Computer Science, Harbin Institute of Technology (HIT), Harbin, China. She received the B.S., M.S., and Ph.D. (Hons.) degrees in computer science from HIT in 2007, 2009, and 2014, respectively. In 2008, she joined the Joint Research and Development Laboratory, Chinese Academy of Sciences, Beijing, as a Research Assistant. In 2011, she was with the Hong Kong University of Science and Technology as a Visiting Student. In 2012, she was with the GRASP Laboratory, University of Pennsylvania, PA, USA, as a Visiting Scholar. Her research interests include machine learning and its application in computer version.

Debin Zhao received the B.S., M.S., and Ph.D. degrees in computer science from the Harbin Institute of Technology, China in 1985, 1988, and 1998, respectively. He is currently a Professor with the Department of Computer Science, Harbin Institute of Technology. He has published over 200 technical articles in refereed journals and conference proceedings in the areas of image and video coding, video processing, video streaming and transmission, and pattern recognition.

1503

Guangtao Zhai (M’10) received the B.E. and M.E. degrees from Shandong University, Shandong, China, in 2001 and 2004, respectively, and the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2009, where he is currently a Research Professor with the Institute of Image Communication and Information Processing. From 2006 to 2007, he was a Student Intern with the Institute for Infocomm Research, Singapore. From 2007 to 2008, he was a Visiting Student with the School of Computer Engineering, Nanyang Technological University, Singapore. From 2008 to 2009, he was a Visiting Student with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada, where he was a Post-Doctoral Fellow from 2010 to 2012. From 2012 to 2013, he was a Humboldt Research Fellow with the Institute of Multimedia Communication and Signal Processing, Friedrich Alexander University of Erlangen-Nuremberg, Germany. He received the Award of National Excellent Ph.D. Thesis from the Ministry of Education of China in 2012. His research interests include multimedia signal processing and perceptual signal processing.

Wen Gao (M’92–SM’05–F’09) received the Ph.D. degree in electronics engineering from the University of Tokyo, Japan, in 1991. He is a Professor of computer science with Peking University. Before joining Peking University, he was a Professor with the Harbin Institute of Technology from 1991 to 1995, and Professor with the Institute of Computing Technology, Chinese Academy of Sciences, from 1996 to 2006. He has published five books and over 600 technical articles in refereed journals and conference proceedings in the areas of image processing, video coding and communication, computer vision, multimedia retrieval, multimodal interface, and bioinformatics. Prof. Gao served on the editorial board for several journals, such as the IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOL OGY , the IEEE T RANSACTIONS ON M ULTIMEDIA , the IEEE T RANSAC TIONS ON AUTONOMOUS M ENTAL D EVELOPMENT, the EURASIP Journal of Image Communications, and the Journal of Visual Communication and Image Representation. He chaired a number of prestigious international conferences on multimedia and video signal processing, such as the IEEE ICME 2007, ACM Multimedia 2009, IEEE ISCAS 2013, and served on the advisory and technical committees of numerous professional organizations. He is a member of the Chinese Academy of Engineering, and a fellow of ACM.

A Unified Semi-Supervised Community Detection Framework Using Latent Space Graph Regularization.

Progressive image denoising.

Intrinsic graph structure estimation using graph Laplacian.

A unified learning framework for single image super-resolution.

Improved Rotating Kernel Transformation Based Contourlet Domain Image Denoising Framework.

A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

A Local Structural Descriptor for Image Matching via Normalized Graph Laplacian Embedding.

Hybrid regularizers-based adaptive anisotropic diffusion for image denoising.

Deformed graph laplacian for semisupervised learning.

Iterative reconstruction for CT perfusion with a prior-image induced hybrid nonlocal means regularization: Phantom studies.

A unified framework of demographic time.

A Laplacian based image filtering using switching noise detector.

Green channel guiding denoising on bayer image.

PSF: A Unified Patient Similarity Evaluation Framework Through Metric Learning With Weak Supervision.

Locally linear denoising on image manifolds.

Fast image reconstruction with L2-regularization.

A sensor-data-based denoising framework for hyperspectral images.

A Regularization Approach to Blind Deblurring and Denoising of QR Barcodes.

Visualization of genetic disease-phenotype similarities by multiple maps t-SNE with Laplacian regularization.

On the least signless Laplacian eigenvalue of a non-bipartite connected graph with fixed maximum degree.

Generalized total variation-based MRI Rician denoising model with spatially adaptive regularization parameters.

A Unified Framework for Data Visualization and Coclustering.

Toward a Unified Socio-Cognitive Framework for Salience in Language.

A unified framework and method for automatic neural spike identification.