Single Image Superresolution via Directional Group Sparsity and Directional Features.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2015.2432713, IEEE Transactions on Image Processing

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. X, 2014

1

Single Image Super-Resolution via Directional Group Sparsity and Directional Features Xiaoyan Li, Hongjie He, Ruxin Wang, and Dacheng Tao, Fellow, IEEE

Abstract Single image super-resolution (SR) aims to construct a high-resolution (HR) version from a single low-resolution (LR) image. The SR reconstruction is challenging because of the missing details in the given LR image. Thus, it is critical to explore and exploit effective prior knowledge for boosting the reconstruction performance. In this paper, we propose a novel SR method by exploiting both the directional group sparsity of the image gradients and the directional features in similarity weight estimation. The proposed SR approach is based on two observations: 1) most of the sharp edges are oriented in a limited number of directions; 2) an image pixel can be estimated by the weighted averaging of its neighbors. In consideration of these observations, we apply the curvelet transform to extract directional features which are then used for region selection and weight estimation. A combined total variation (CTV) regularizer is presented which assumes that the gradients in natural images has a straightforward group sparsity structure. In addition, a directional non-local means (D-NLM) regularization term takes pixel values and directional information into account to suppress unwanted artifacts. By assembling the designed regularization terms, we solve the SR problem of an energy function with minimal reconstruction error by applying a framework of templates for first-order conic solvers (TFOCS). The thorough quantitative This work was supported in part by the National Natural Science Foundation of China under Grant 61373180 and Grant 61461047, and in part by the Australian Research Council Projects under Grant DP-140102164, FT-130101457, and LP140100569. X. Li is with Sichuan Key Laboratory of Signal and Information Processing, Southwest Jiaotong University, Chengdu 610031, China, and with the Centre for Quantum Computation & Intelligent Systems and the Faculty of Engineering & Information Technology, University of Technology, Sydney, Ultimo, NSW 2007, Australia (e-mail: [email protected]). H. He is with Sichuan Key Laboratory of Signal and Information Processing, Southwest Jiaotong University, Chengdu 610031, China (e-mail: [email protected]). R. Wang and D. Tao are with the Centre for Quantum Computation & Intelligent Systems and the Faculty of Engineering & Information Technology, University of Technology, Sydney, Ultimo, NSW 2007, Australia (e-mail: [email protected]; [email protected]).

May 7, 2015

DRAFT

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.



2

and qualitative results in terms of PSNR, SSIM, IFC, and preference matrix, demonstrate that the proposed approach achieves higher quality SR reconstruction than state-of-the-art algorithms.

Index Terms Image super-resolution, reconstruction-based, directional group sparsity, directional features.

I. I NTRODUCTION The aim of image super-resolution (SR) is to construct a high-resolution (HR) image from one or more low-resolution (LR) images using software techniques which are able to exceed the limitations of existing acquisition system and can be effectively applied to remote surveillance, military reconnaissance, medical imaging and video supervision. Image SR reconstruction techniques have therefore attracted great attention over the last three decades [1]. Extensive SR reconstruction algorithms have been proposed and are generally grouped into the following three categories: 1) Interpolation-based methods (e.g., [2] and [3]) estimate the missing pixels in the HR grid using their known neighbors to upscale the input LR image. Although the complexity of these methods is low, they often produce noticeable blurring artifacts along the edges, which may be insufficient for high-quality SR reconstruction. 2) Learning-based methods (e.g., [4] and [5]) typically estimate a HR image by measuring the mapping relationship between LR and HR image patch pairs from an external training dataset. Freeman et al. [6] first introduced a Markov random field model to learn the relationship between local regions of images and their underlying scenes. The representative methods of this category include: (i) sparse coding (SC)-based methods (e.g., [7], [8], and [9]). Yang et al. [7] estimated high-frequency details from an over-complete dictionary using sparse coding theory. Zhang et al. [9] integrated local and non-local priors into a unified energy minimization model by constructing a multi-scale dictionary. (ii) regression-based methods (e.g., [10], [11], [12], [13], and [14]). Ni et al. [10] introduced the support vector regression (SVR) model to learn the relationship between LR image patches and the corresponding HR image patches. Wu et al. [13] used the kernel partial least squares (KPLS) regression model to estimate high-frequency information from a training database. (iii) neighbor embedding (NE)-based methods (e.g., [15], [16], and [17]). Based on histograms of oriented gradient (HOG) features, a sparse neighbor selection scheme [16] was presented to achieve better weight estimation. Timofte et al. [17] presented an anchored neighborhood regression (ANR) method May 7, 2015

DRAFT




3

to anchor the neighborhood embedding of each training LR patch to the nearest dictionary atom using the correlation rather than Euclidean distance. Methods of this type rely primarily on the quality of the external training dataset, however, and may produce unwanted artifacts when the difference between the input image and the training images is large. 3) Reconstruction-based methods (e.g., [18]) focus on exploiting prior knowledge on the super-resolved image and the reconstruction constraint, that is, the estimated HR image degraded by blurring and downsampling should be close to the input LR image. The image priors can be divided into the following subsets: (i) global prior (e.g., [19], [20], [21], and [22]). With the framework of templates for first-order conic solvers (TFOCS) [23], Fernandez-Granda et al. [24] applied transform-invariant (TI) and directional total variation (DTV) regularizations on the super-resolved planar regions in urban scenes, such that the global consistency constraints between sharp edges in the horizontal and vertical directions could be enforced. (ii) local prior (e.g., [25] and [26]). Takeda et al. [27] used steering kernel regression (SKR) for image SR recovery, which can effectively be adapted to local structures and is robust to noise. (iii) self-similarity prior. Protter et al. [28] proposed a non-local means (NLM)-based image SR reconstruction algorithm to capture similar structures in the input image. Zhang et al. [29] presented a multiscale similarity learning scheme with the NLM regularization term to estimate the missing details and sharpen edges. (iv) hybrid prior. Dong et al. [30] proposed an adaptive sparse domain selection (ASDS) framework by incorporating learned autoregressive models and NLM regularization. Zhang et al. [31] integrated NLM and SKR regularization terms into a maximum a posteriori (MAP) framework by learning nonlocal and local priors in the given LR image. Yu et al. [32] combined learning- and reconstructionbased methods for single image SR recovery, in which a dictionary is learnt from the LR input rather than an external training dataset since it may introduce unwanted details. The major advantage of this category is these methods preserve edges and suppress artifacts in the resultant images. In addition, external exemplar images are not required. In this paper, we present a novel reconstruction-based single image SR algorithm by exploiting directional group sparsity in natural images and their gradients. The proposed SR method is based upon the observation that the sharp edges in natural images are often oriented in a limited number of directions. This is because the non-zero elements in image gradients are sparsely grouped along different directions.

May 7, 2015

DRAFT




4

Jaggy artifacts, which may be introduced by strong constraints, often exist in the resultant images and significantly affect reconstruction quality. To suppress jaggy artifacts between sharp edges, each target pixel can be estimated by a weighted averaging of its neighbors. In the weight estimation, we use the pixel values of a local image patch, as well as the directional features of the center position. In contrast to [27], we exploit the directional group sparsity in image gradients to extract the main direction of the sharp edges on the desired HR image, which results in reliable SR quality and outperforms the total variation (TV) method [21] and DTV method [24]. The group sparsity structure of the image gradients allows us to design an effective regularization term that enforces the global consistency between sharp edges in a small number of directions. In addition, using only the pixel values in the neighbor search may lead to low matching. We therefore utilize both the pixel values and the directional features to compute the similarity weights. We propose a novel reconstruction-based SR framework in which we assemble the two effective priors with the original SR reconstruction constraint. To optimize the total energy function, we obtain an optimal SR solution by exploiting the adaptive accelerated first-order method, iteratively minimizing the reconstruction error until the maximum number of iterations is reached. In summary, there are three main contributions of the proposed method: 1) To fully exploit the group sparsity property of image gradients in key directions, a combined total variation (CTV) regularization term is proposed and effectively adapted to preserve the sharp edges in 16 different directions. 2) A directional non-local means (D-NLM) regularization term is proposed to effectively select those neighbors with similar pixels as well as similar directional information. 3) A SR framework incorporates the reconstruction constraint, CTV and D-NLM regularization terms. An optimal solution to this SR reconstruction problem is obtained by applying the TFOCS framework. The rest of this paper is organized as follows. We explain the proposed method in detail in Section II. Experimental results of the proposed method and the comparison with the existing state-of-the-art SR methods are illustrated in Section III. Section IV concludes this paper. II. P ROPOSED M ETHOD To address the problem of super-resolving an image obtained from natural scenes, we propose a novel reconstruction-based SR framework which incorporates two prior regularization terms (i.e., CTV and D-NLM) and the global reconstruction constraint to ensure that the SR reconstruction can be well-posed. To effectively extract the directional information of a natural image, we apply curvelet transform to May 7, 2015

DRAFT




5

Fig. 1. The flow chart of the proposed method for single image super-resolution, where the initial HR image X 0 is obtained by bicubic interpolation with the magnification factor s on the luminance component (in the YCbCr color space) of the input LR image.

obtain a certain number of directional features, which are then used to construct the directional vectors for each pixel and the weight matrices in different directions. Conventional TV regularization produces significant artifacts in the resultant image, while CTV regularization term can preserve sharp edges in sixteen directions (i.e., from 0◦ to 168.75◦ with step 11.25◦ ), and can suppress more artifacts than TV. The D-NLM regularization term measures the similarity between two small image patches in terms of the pixel values and the directional information of the pixel centering the corresponding patch. Blurring, downsampling, and noise are generated in the image acquisition process, and thus the global reconstruction constraint imposes a penalty on such errors in the SR reconstruction process. Fig. 1 shows the flow chart of the proposed method for single image SR.

A. Curvelet-Based Direction Extraction Curvelet transform is an efficient representation for preserving edges since it has very high directional sensitivity and is highly anisotropic [33]. The curvelet coefficients of the given image X can be obtained

May 7, 2015

DRAFT




6

Fig. 2. Curvelet coefficient matrices at different scales and partition of 16 different direction subsets.

by Q = Γ(X),

(1)

where Γ(·) represents the curvelet transform function. Q is a set of curvelet coefficients expressed as {Qj,l | j = 1, · · · , J; l = 1, · · · , Lj }, in which J and Lj are the total number of scales and directions at

the jth scale, respectively. Fig. 2 shows the curvelet coefficient matrices at five scales of a coarse-to-fine hierarchy. At the first scale, there is only one curvelet coefficient matrix that contains smooth details in the input image. From the second to fifth scales, there are 16, 32, 32, and 64 curvelet coefficient matrices, respectively. The coefficient matrices are numbered from 45◦ in clockwise order and contain more directional information at finer scales. Because of the directional symmetry, we partition the curvelet coefficient matrices at the finest scale into 16 direction subsets, denoted as {Zk }16 k=1 , which are marked with different colors, as shown in Fig. 2. To effectively capture the directional information, the directional features of the input image in 16 different directions can be defined as

Ak = P Γ−1 (Hk Q) ,

k = 1, · · · , 16,

(2)

where P(·) = abs(Re(·)) in which Re(·) is the real-valued function and abs(·) is the absolute-valued

May 7, 2015

DRAFT




CTV(X) =

7 X k=0

Gx (Rk (B5+k · X)) Gy Rk Bsgn(13+k) · X √ √ + m n−1 n m−1

7

! +√

TV(X) , 2(m − 1)(n − 1)

(6)

measure. Γ−1 stands for the inverse curvelet transform, and Hk is the operation of extracting the curvelet coefficient matrices related to the kth direction subset.

Hk Q = {Hk,j,l · Qj,l | j = 1, · · · , J; l = 1, · · · , Lj } ,

Hk,j,l =

 1,

if (j, l) ∈ Zk ,

0,

otherwise.

(3)

(4)

All the pixels have large or small directional values. Notably, the values in the non-edge regions are much smaller than those in the edge regions. To effectively measure the directional sparsity of the edges, we locate the positions with strong edges by predefining a threshold δ on the directional features. That is to say, when the directional feature value is larger than the threshold, the corresponding position belongs to the edge region. The localization matrices can be represented as

Bk (i, j) =

 1,

if Ak (i, j) > δ,

0,

otherwise.

(5)

When the threshold is too large, the number of non-zero elements is small and the directional sparsity is high, which leads the poor recovery of some edge details. Conversely, unwanted artifacts may be produced if the threshold is too small.

B. Combined Total Variation To convert a given LR image to its HR version, a common choice used in [21] is to penalize the 1-norm value of the image gradients, i.e., the TV value of the input image. The main advantage of TV regularization is that it effectively suppresses the ringing artifacts caused by the added noise. However, minimizing the total variation alone often fails to recover sharp edges in the horizontal and vertical directions, which is mostly due to the fact that the TV regularizer ignores the direction of the edges. The DTV regularizer, as shown in [24], compensates for the 1-norm value along the horizontal and May 7, 2015

DRAFT




8

Fig. 3. Non-blind deblurring of a character “N” using TV, DTV, and CTV minimization, respectively. The blurring kernel is a 31 × 31 Gaussian kernel with standard deviation 9.

vertical directions. This method achieves better SR results when the sharp edges are directed horizontally and vertically but still produce jagged artifacts along other directions, even though transform-invariant methods are used. An observation on natural images indicates that the sharp edges in a natural image are oriented principally in a small number of directions which induce the group sparsity property on image gradients since the non-zero values are generally grouped along the main directions. Based on these considerations, therefore, we present a combined total variation (CTV) regularization term that incorporates the multi-directional total variation and the conventional total variation of a HR estimation X with the size of m × n, mathematically formulated in Eq. 6, where

v m un−1 X uX t (Xi,j+1 − Xi,j )2 , Gx (X) =

(7)

j=1

i=1

v

Gy (X) =

n u um−1 X X

(Xi+1,j − Xi,j )2 ,

(8)

(Xi,j+1 − Xi,j )2 + (Xi+1,j − Xi,j )2 ,

(9)

t

j=1

TV(X) =

m−1 X n−1 Xq

i=1

i=1 j=1

Rk (Xi,j ) = Xu,v ,

 cos(kb) −sin(kb) , b = π , [u, v] = [i, j]  16 sin(kb) cos(kb)

(10)



May 7, 2015

(11)

DRAFT




9

Fig. 4. Description of sharp edge in its main direction. A local image patch is extracted from the Butterfly image. The local magnification in black rectangle is the corresponding luminance component. The patches in a brace are the direction features in 16 different directions with their own average energy ε. The subfigure with a thick dashed line demonstrates that the edge of local patch is mostly oriented in 168.75◦ direction.

sgn(x) =

 

x, x − 16,

if 16 − x > 0,

(12)

otherwise.

Eqs. (7) and (8) represent the horizontal and vertical total variation, respectively. Eq. (9) stands for the conventional total variation. Eqs. (10) and (11) represent the operation of image rotation. Eq. 12) stands for the extraction of the corresponding localization matrix from {Bk }16 k=1 . To sum up, Eq. (6) is equal to the sum of the 2-norm of the differences between nearby rows and columns in the main directional region after image rotation, as well as the 1-norm of all the image gradients. Taking into account the group sparsity in different directions, the major advantage of the nonparametric CTV regularizer is to recover sharp edges that are mainly oriented in the 16 directions, and penalize the gradients of the HR estimation. Fig. 3 gives an example of TV, DTV, and CTV cost function minimization for the non-blind deblurring of a character “N”. TV minimization produces significant artifacts along sharp edges. DTV minimization recovers the edges that are aligned horizontally and vertically, however it can also produce jagged artifacts along the diagonal line. In contrast to TV and DTV, CTV minimization preserves sharp edges in their main directions, as well as suppressing remarkable artifacts. C. Directional Nonlocal Means The conventional NLM regularization term [28] is a powerful prior for image denoising which estimates a target pixel by measuring its similarity with the image patches centering the neighbor pixels in a large May 7, 2015

DRAFT




10

search window. However, we argue that the similarity weight measurement based on the pixel values of the patches may lead to inaccurate estimation because of the noise and the significantly-changed local structures. Fortunately, as stated previously, the patches in a natural image exhibit strong regularities in a limited number of directions which can be used to stabilize the similarity weight estimation. Fig. 4 shows the main direction of an image patch with a sharp edge. The average energy is denoted as P l Pnl ε= m i=1 j=1 Ak (i, j)/(ml × nl ), where ml and nl are the height and width of local image patch, respectively. Here, we exploit the directional information of each pixel position to develop the directional NLM (D-NLM) regularization term for better weight estimation. For simplicity, we convert each directional feature into a column vector, i.e., Ak = {Ak,i }M i=1 (M = m × n, 1 ≤ k ≤ 16). For each pixel position, we obtain its directional feature vector, denoted as Vi = [A1,i , A2,i , · · · , A16,i ]T , where T represents the transpose operation. Mathematically, the D-NLM

similarity weights can be computed by kCi X − Cj Xk2 kVi − Vj k2 wij = exp − − ρ2N ρ2V

,

(13)

where Ci is the operator extracting the local image patch with the size of z × z centering on the pixel xi in the given image X with the size of m × n. k · k stands for 2-norm operator. ρN and ρV are the

kernel parameters to balance the confidence of the pixel values and directional features with respect to the weight wij . D-NLM assigns larger weights to the pixels that are not only close to the image patch Ci X , but also have similar directional information to Vi . We find the neighbors with k largest weights in a large search window of the size r × r. The m × n image X is converted into a column vector, denoted as C(X) = Xc = {xi }M i=1 where C is the function of converting a matrix into a column vector. To precisely estimate a target pixel at position i with its neighbors, the target pixel can be updated by the following weighted summation formulation: P x î =

wij xj

j∈Ω(xi )

P

wij

,

(14)

j∈Ω(xi )

where Ω(xi ) denotes the pixel set composed of the k neighbors with the largest weights related to xi . x î is the estimated pixel obtained by the proposed D-NLM method. Followed by [29], the D-NLM prior regularization term that exploits the self-similarity property with both pixel values and directional information can be formulated by May 7, 2015

DRAFT




ED-NLM (X) =

M X

kxi − Wi Xc k2 = k(I − W ) Xc k2 ,

11

(15)

i=1

where Wi is the ith row vector with the size of 1×M in which the non-zero elements are the corresponding D-NLM weights related to xi , and I is a M × M identity matrix. The whole D-NLM weight matrix W = {Wi,j }M i,j=1 is defined as

Wi,j

 w , ij =  0,

if j ∈ Ω(xi ),

(16)

otherwise.

Clearly, by ensembling the pixel similarity and the directional similarity into W , D-NLM is robust to noise and structural variation in local regions, and the precision of estimation is therefore improved. The D-NLM regularization term with similarity weights can extract the neighbors with higher similarity and improve the precision of estimation.

D. Energy Function and Optimization For single image SR recovery, the reconstruction error should be minimized to preserve high fidelity across the reconstruction process. Taking into account blurring and downsampling, the global reconstruction constraint can be formulated as

Edata (X) = kY − D (K ⊗ X)k2 ,

(17)

where D represents the downsampling operator, K stands for the blurring kernel, ⊗ is the convolution operator. X and Y are the original HR image and the degraded LR image, respectively. Y has the size of (m/s) × (n/s), where s is the magnification factor, which is often a positive value (e.g., s = 1.5 or 3). For natural images, we use a nonblind blurring kernel, generally defined as a b × b Gaussian blurring

kernel with a standard deviation σ . In solving the inverse problem of SR, it is extremely difficult to estimate the underlying HR image (i.e., X ) by only using the reconstruction constraint. This problem is ill-posed because of the information loss in the process of blurring and downsampling. Hence, it is necessary to employ regularization terms in the reconstruction process to make SR well-posed. In this paper, we incorporate CTV and D-NLM regularization terms to estimate the desired HR image by solving

May 7, 2015

DRAFT




X t+1 = X t + KT ⊗ DT Y − D K ⊗ X t −α

7 X

12

− β CT ((I − W )T (I − W ) C(X t ))

B5+k · RTk GTx Gx Rk B5+k · X t

√ /(2m n − 1)

k=0

−α

7 X

(19) Bsgn(13+k) ·

RTk

GTy

Gy Rk Bsgn(13+k) · X

t

√ /(2n m − 1)

k=0

√ − α∇TV(X t )/(2 2(m − 1)(n − 1)).

ˆ = arg min {Edata (X) + αCTV(X) + βED-NLM (X)} , X X

(18)

where α and β are two trade-off parameters to control the contributions of the data fidelity term, the CTV regularization term, and the D-NLM regularization term. ˆ favors group sparsity in the main directions From Eq. (18), we can see that the HR estimation X

and self-similarity in terms of pixel values and directional information under the global reconstruction constraint. To solve the above SR reconstruction problem, we apply the TFOCS technique [23], which allows us to minimize an energy function for an arbitrary linear operator and efficiently compute a smoothed version of its dual problem. Since the designed energy function is a convex quadratic problem, it can be efficiently solved by using the adaptive accelerated first-order method [34]. Therefore, the optimization of Eq. (18) can be formulated into Eq. (19). DT , KT , CT , RTk , GTx and GTy are the inverse transformations of D, K, C, Rk , Gx and Gy , respectively. In addition, ∇ is the gradient operator. The optimization process takes a certain number of iterations to reach convergence. Denote T as the maximum number of iterations. The proposed single image SR algorithm is summarized in Algorithm 1.

E. Computational Complexity In this subsection, we only discuss computational complexity. Considering the main steps in Algorithm 1, the proposed reconstruction-based SR method takes account of three parts: directional feature extraction, D-NLM weight computation, and energy function minimization. Suppose that the height and width of the target HR image are m and n, respectively. The computational complexity of directional feature extraction with curvelet transform is O(mn(logm + logn)). The

May 7, 2015

DRAFT




13

Algorithm 1: Proposed Framework for Single Image Super-Resolution Input: Low-resolution image Y , magnification factor s, threshold δ , search window length r, kernel function K and maximum iterations T. ˆ. Output: Final HR estimation X

Initialization: 1) t = 0; Upscale Y with bicubic interpolation by a factor of s to obtain the initial HR estimation X 0 ; 2) Extract the direction features on X 0 using Eqs. (1) and (2) to construct {Ak }16 k=1 ; 3) Compute the localization matrics {Bk }16 k=1 with the predefined threshold δ ; Weight Computation: 4) for i = 1, 2 · · · , M do a) For the target pixel xi , compute the similarity weights using Eq. (13) in its search window r × r; b) Choose the corresponding k neighbors with largest weights to construct its row vector Wi ; end Obtain the similarity weight matrix W ; Reconstruction: for t = 1, 2, · · · , T do 5) X t = X t−1 ; Calculate the data fidelity term on X t using Eq. (17) with the nonblind kernel K; 6) Calculate the CTV regularization term using Eq. (6); 7) Calculate the D-NLM regularization term using Eq. (15); 8) Update the HR estimation X t by solving Eq. (19); t = t + 1; end ˆ = X T. Result: Obtain the final HR estimation, i.e., X

computation of the D-NLM weight matrix is related to the search window size r × r, the image patch size z × z , 16-dimension directional features and the number of selected neighbors k . Thus, it takes O mnr2 (z 2 + 16)k to evaluate the similarity weight matrix for all the target pixels. From Eqs. (6), 15, 17, and 18, we see that each iteration requires ten times matrix multiplications and five times matrix additions, which at most takes O(10m2 n2 +5mn). Considering T iterations, the computational complexity of the proposed method is O mn 5T(2mn + 1) + log(mn) + r2 (z 2 + 16)k .

May 7, 2015

DRAFT




14

TABLE I I NFORMATION ON LR I MAGES AND RUNTIME ( SECONDS ) OBTAINED BY THE CTV-DNLM M ETHOD

Simulated low-resolution images Name

Size

Monarch 153 × 120

Time

Name

Size

Time

110.11

Seeds

80 × 120

46.14

Parrots

85 × 85

35.18

Tiger

120 × 80

46.24

Wall

66 × 105

35.20

Grapes

96 × 137

61.57

Girl

86 × 85

36.84

Oldman

88 × 66

28.31

Bike

85 × 85

38.51

Discovery 78 × 118

47.09

Flower

85 × 85

36.81

Lena

100 × 100

49.71

Actual low-resolution images Name

Size

Time

Tower

191 × 166

205.88

Name Text

Size

Time

107 × 191

120.96

III. E XPERIMENTAL R ESULTS AND A NALYSIS To evaluate the performance of the proposed method compared to other methods, we conduct experiments on a group of color images. We first convert the RGB channels of the color images to YCbCr channels and then implement the SR reconstruction on the luminance component (i.e., Y channel) alone. The remaining Cb and Cr components are simply upsampled by bicubic interpolation with the magnification factor s. From various SR approaches, we select six representative SR methods as the comparison baselines, i.e., bicubic interpolation, SKR [27], SC [7], ASDS [30], NLM-SKR [31], DTV [24], and ANR [17] methods1 . To comprehensively validate the effectiveness of the proposed method, we extend three versions of the proposed approach: 1) a method with only the CTV regularization term (denoted as CTV); 2) a method with only the D-NLM regularization term (denoted as DNLM); and 3) a method combining the CTV and D-NLM regularization terms (denoted as CTV-DNLM). To compare the conventional NLM method, we add an alternative approach that incorporates CTV and NLM regularization terms (denoted as CTV-NLM). 1

Note that SC and ANR methods use 61 training images to construct a dictionary consisting of 512 atoms; ASDS is the

method of ASDS-AR-NL-TD2; DTV is implemented without transform-invariant; the total number of iterations in ASDS and NLM-SKR methods is set to 800.

May 7, 2015

DRAFT




15

TABLE II PSNR ( D B), SSIM, AND IFC R ESULTS OF T WELVE S IMULATED LR I MAGES FOR 3× M AGNIFICATION WITH N OISE L EVEL η=0

Images Bicubic 21.802 Monarch 0.8035 2.011 23.489 0.6910 Seeds 1.445 25.644 0.8381 Parrots 1.674 22.884 0.6642 Tiger 1.457 27.316 0.7108 Wall 1.249 25.690 0.7861 Grapes 1.755 29.792 0.7422 Girl 1.293 26.021 Oldman 0.7324 1.540 20.882 0.6207 Bike 1.415 27.612 Discovery 0.8253 1.579 24.748 0.7040 Flower 1.274 26.118 0.7736 Lena 1.468 25.167 Average 0.7410 1.513

SKR

SC [7]

ASDS

[27] 23.234 0.8284 2.814 25.584 0.7277 2.298 27.082 0.8571 2.397 24.225 0.6818 2.182 28.596 0.7322 1.618 27.241 0.7962 2.428 31.688 0.7594 1.945 28.326 0.7666 2.353 22.099 0.6463 2.134 29.485 0.8479 2.309 26.383 0.7378 1.987 29.636 0.8347 2.461 26.965 0.7680 2.244

24.239 0.8417 2.897 26.652 0.7913 2.753 27.658 0.8681 2.475 25.144 0.7466 2.634 28.762 0.7527 1.831 28.505 0.8396 3.025 32.344 0.7931 2.311 28.818 0.7910 2.575 22.577 0.6983 2.287 30.323 0.8696 2.606 27.154 0.7765 2.244 29.945 0.8406 2.537 27.677 0.8007 2.515

[30] 24.638 0.8710 3.050 26.688 0.7884 2.808 28.036 0.8742 2.492 25.222 0.7403 2.603 28.799 0.7427 1.871 28.338 0.8442 3.054 31.634 0.7655 2.073 28.949 0.7973 2.716 23.017 0.7185 2.511 30.080 0.8653 2.406 27.276 0.7834 2.367 30.105 0.8435 2.563 27.732 0.8029 2.543

Methods NLMDTV SKR [31] [24] 24.423 24.972 0.8705 0.8859 3.261 3.435 26.668 27.359 0.7976 0.8217 2.891 3.083 27.971 28.123 0.8836 0.8920 2.786 2.951 25.000 25.395 0.7352 0.7637 2.618 2.812 28.536 29.043 0.7452 0.7714 1.794 2.174 28.572 28.984 0.8574 0.8682 3.233 3.392 32.106 32.641 0.7789 0.8039 2.263 2.516 28.935 29.459 0.7985 0.8224 2.827 2.937 23.007 23.178 0.7254 0.7528 2.628 2.673 30.505 30.825 0.8770 0.8899 2.783 2.952 27.264 27.589 0.7860 0.8122 2.372 2.586 29.104 30.603 0.8470 0.8662 2.752 3.014 27.674 28.181 0.8085 0.8292 2.684 2.877

ANR

CTV

DNLM

CTVNLM

CTVDNLM

[17] 25.012 0.8799 3.419 27.465 0.8256 3.097 28.383 0.8931 2.922 25.554 0.7715 2.903 29.246 0.7806 2.122 29.336 0.8740 3.459 32.819 0.8095 2.562 29.573 0.8235 2.971 23.286 0.7471 2.693 30.956 0.8916 3.010 27.824 0.8146 2.605 30.937 0.8715 3.061 28.366 0.8319 2.902

24.698 0.8713 3.366 26.646 0.7823 2.854 27.893 0.8788 2.803 24.828 0.7273 2.697 28.972 0.7558 2.086 28.566 0.8494 3.161 32.170 0.7839 2.361 28.639 0.7955 2.734 22.585 0.6961 2.475 30.339 0.8725 2.757 27.149 0.7792 2.379 30.206 0.8496 2.851 27.724 0.8035 2.710

25.284 0.8938 3.492 27.926 0.8401 3.247 28.977 0.9022 3.038 25.521 0.7707 2.878 28.823 0.7720 2.046 29.444 0.8805 3.558 33.015 0.8126 2.628 29.813 0.8324 3.116 23.717 0.7735 2.873 31.339 0.8997 3.123 28.146 0.8296 2.732 31.544 0.8825 3.270 28.629 0.8408 3.000

25.260 0.8954 3.531 28.018 0.8418 3.296 28.608 0.9005 3.054 25.653 0.7762 2.958 29.221 0.7812 2.144 29.533 0.8834 3.623 33.039 0.8132 2.654 29.789 0.8314 3.130 23.465 0.7715 2.830 31.255 0.8992 3.140 28.108 0.8290 2.735 31.508 0.8828 3.299 28.622 0.8421 3.033

25.708 0.8994 3.624 28.245 0.8438 3.328 29.176 0.9045 3.116 25.717 0.7763 2.965 29.196 0.7789 2.130 29.710 0.8845 3.628 33.086 0.8143 2.652 29.890 0.8330 3.141 23.800 0.7776 2.949 31.399 0.9001 3.164 28.189 0.8296 2.752 31.639 0.8829 3.324 28.813 0.8437 3.065

May 7, 2015

DRAFT




16

Fig. 5. Comparison of SR results (3× magnification) of noiseless (η = 0) Monarch image. The local magnification of ROI in (a) is presented in the top-right corner of each resultant image. (a) Original HR image. (b) SKR. (c) SC. (d) ASDS. (e) NLM-SKR. (f) DTV. (g) ANR. (h) Proposed CTV-DNLM method.

A. Experimental Settings We set the magnification factor in all experiments as 3 (i.e., s = 3). To simulate the degradation process, we apply a nonblind 7 × 7 Gaussian kernel with the standard deviation 1 (i.e., b = 7 and σ = 1), followed by the additive Gaussian white noise with a standard deviation η . For fair comparison, all the comparison baselines are conducted in the exactly same degradation model (i.e., Gaussian kernel blurring and downsampling). Table I shows the image information on the twelve simulated images used in [31] and [35], and two actual images from the SUN database [36]. The runtime of each example in Table I represents the time of reconstructing the final HR image, which is obtained by the proposed method on an Intel Core 2 CPU with Duo 3.0 GHz and 4GB memory in the Matlab environment. In the process of directional feature extraction, the choosen threshold δ is set to 0.5. For the computation

May 7, 2015

DRAFT




17

Fig. 6. Comparison of SR results (3× magnification) of noiseless (η = 0) Lena image. The local magnification of ROI in (a) is presented in the bottom-right corner of each resultant image. (a) Original HR image. (b) SKR. (c) SC. (d) ASDS. (e) NLM-SKR. (f) DTV. (g) ANR. (h) Proposed CTV-DNLM method.

of the D-NLM weight matrix, we set the search window and the image patch size to 9 × 9 and 5 × 5 (i.e., r = 9 and z = 5), respectively. The two kernel parameters are set as ρN = 50 and ρV = 30. In addition,

the number of neighbors is set as k = 10 within the search windows. For energy function optimization, the regularization parameters α and β are set to 0.8 and 2.3, respectively. To obtain the optimal solution of Eq. (18), we set the number of iterations as T = 200. Quantitatively, the widely used metrics are peak signal to noise ratio (PSNR) and structural similarity (SSIM) [37]. In [38], a systematic benchmark evaluation for single image SR methods is discussed on many quantitative metrics. The benchmark evaluations show that the information fidelity criterion (IFC) metric [39] is more effective at evaluating the information loss of the resultant images because this metric is sensitive to high-frequency details. The higher the IFC value is, the better the SR quality will be. To avoid image border effects, we only compare the central part of the resultant images, i.e., ˆ + 1 : m − v, v + 1 : n − v), where v = 3. The empirical study on the crucial parameters is discussed X(v

in Sec. III-F.

May 7, 2015

DRAFT




18

Fig. 7. SR results (3× magnification) of noiseless (η = 0) Bike image with alternative approaches. The local magnification of ROIs is shown at the bottom of each resultant image. (a) CTV (PSNR=22.585dB, SSIM=0.6961, IFC=2.475). (b) DNLM (PSNR=23.717, SSIM=0.7735, IFC=2.873). (c) CTV-DNLM (PSNR=23.800dB, SSIM=0.7776, IFC=2.949).

B. Results on Noiseless Images Twelve simulated noiseless images (i.e., noise level: η = 0) are used to evaluate the SR results in comparing the proposed approaches with several state-of-the-art SR methods. The objective assessments in terms of PSNR, SSIM, and IFC are presented in Table II. The values of the three assessments demonstrate that the proposed method achieves better SR reconstruction quality than the others except for the Wall and Girl images, but it outperforms all the comparison baselines in terms of the average values. To further demonstrate the visual quality of the various methods, the SR results with a magnification factor of 3 on the Monarch and Lena images are presented in Figs. 5 and 6, respectively. The region of interest (ROI) in each resultant image is magnified by bicubic interpolation with a factor of 2 and shown in the corners to illustrate the high-frequency details in different images. Overall, bicubic interpolation always gives the worst SR performance despite being very fast. Although the SKR method [27] has better performance than the bicubic interpolation, it is still clearly inferior to the proposed method because SKR often generates blurred artifacts along the edges. With the help of the sparse dictionary, the SC method estimates more high-frequency information in the resultant images than bicubic interpolation and SKR. However, it also introduces some unwanted artifacts when the input image is less relevant to the training examples. The ASDS method improves the visual quality and effectively suppresses noise, but the high-frequency details in the resultant images are somewhat smoothed, especially

May 7, 2015

DRAFT




19

Fig. 8. Comparison of runtime (seconds) with ASDS, NLM-SKR, and the proposed CTV-DNLM method.

along the edges. The NLM-SKR method performs well both in reproducing the natural-looking details and preserving the sharp edges, but blurred artifacts still appear along the edges. The DTV method resolves the images with sharp edges in horizontal and vertical directions, yet it is prone to producing noticeably jaggy artifacts in the textural regions of the resultant images. Given the sparse learned dictionaries, ANR achieve performance of similar quality to that of DTV, although it still fails to recover sharper edges. In addition, Fig. 7 demonstrates the SR quality of the Bike image using different approaches. As shown in the local magnification of ROIs in each resultant image, the proposed method that incorporates CTV and D-NLM regularization terms achieves competitive results in super-resolving plausible details in the textural regions, and suppressing artifacts along the sharp edges. Compared with the baselines, the quantitative and qualitative results obtained by the proposed CTV-DNLM method are better, which is largely due to the fact that directional group sparsity of sharp edges can be preserved and the weights can be better estimated by using the D-NLM regularization term. To validate the computational complexity, we compare the runtime of the proposed method and two representative reconstruction-based SR methods, i.e., ASDS and NLM-SKR methods, as shown in Fig. 8. The proposed approach consistently requires less runtime than the other two methods in all cases.

C. Results on Noisy Images In this subsection, we repeat the experiments on the simulated noisy images to validate the robustness of the proposed method against noise. We set the noise level η to 5 for mimicking a real image noise.

May 7, 2015

DRAFT




20

TABLE III PSNR ( D B), SSIM, AND IFC R ESULTS OF T WELVE S IMULATED LR I MAGES FOR 3× M AGNIFICATION WITH N OISE L EVEL η=5

Images Bicubic 21.631 Monarch 0.7405 1.211 23.250 0.6595 Seeds 1.266 25.265 0.7591 Parrots 1.023 22.667 0.6270 Tiger 1.175 26.739 0.6558 Wall 0.866 25.321 0.7453 Grapes 1.513 28.845 0.6816 Girl 0.847 25.639 Oldman 0.6900 1.208 20.747 0.5956 Bike 1.275 26.987 Discovery 0.7497 0.983 24.484 0.6585 Flower 1.115 25.717 0.7108 Lena 0.967 24.774 Average 0.6895 1.115

SKR

SC [7]

ASDS

[27] 23.143 0.8114 2.007 25.443 0.7172 2.048 26.850 0.8343 1.670 24.093 0.6707 1.833 28.310 0.7122 1.231 26.985 0.7830 2.134 31.106 0.7426 1.365 28.031 0.7528 2.005 22.013 0.6365 1.967 29.115 0.8279 1.648 26.174 0.7228 1.787 29.198 0.8141 1.834 26.705 0.7521 1.787

23.844 0.7567 1.754 25.944 0.7430 2.134 26.869 0.7598 1.381 24.667 0.6914 1.940 27.755 0.6726 1.204 27.526 0.7812 2.234 30.289 0.7038 1.260 27.803 0.7324 1.836 22.317 0.6608 1.955 28.932 0.7639 1.458 26.438 0.7096 1.700 28.648 0.7497 1.536 26.753 0.7271 1.700

[30] 24.456 0.8267 2.101 26.378 0.7741 2.364 27.769 0.8258 1.732 24.994 0.7227 2.143 28.506 0.7230 1.396 27.892 0.8244 2.491 31.090 0.7415 1.459 28.521 0.7723 2.139 22.833 0.7046 2.220 29.610 0.8296 1.723 26.965 0.7618 1.972 29.641 0.8193 1.862 27.388 0.7772 1.963

Methods NLMDTV SKR [31] [24] 23.881 24.548 0.8167 0.8173 1.935 2.015 26.308 26.620 0.7649 0.7776 2.266 2.354 27.259 27.443 0.8223 0.8129 1.568 1.562 24.632 24.959 0.7024 0.7209 1.928 2.073 27.716 28.199 0.6998 0.7117 1.169 1.348 27.860 28.036 0.8174 0.8223 2.395 2.475 30.880 30.865 0.7354 0.7361 1.309 1.363 28.158 28.432 0.7601 0.7703 1.972 2.029 22.687 22.882 0.6975 0.7160 2.177 2.218 29.388 29.445 0.8168 0.8026 1.568 1.581 26.678 26.865 0.7469 0.7561 2.332 2.446 28.461 29.283 0.7980 0.7920 1.691 1.706 26.992 27.298 0.7648 0.7697 1.812 1.881

ANR

CTV

DNLM

CTVNLM

CTVDNLM

[17] 24.530 0.7925 1.943 26.664 0.7772 2.347 27.458 0.7848 1.527 25.049 0.7172 2.097 28.127 0.6990 1.293 28.170 0.8129 2.441 30.606 0.7218 1.347 28.425 0.7636 2.010 22.985 0.7082 2.231 29.377 0.7875 1.580 26.996 0.7479 2.439 29.400 0.7809 1.712 27.315 0.7578 1.865

24.464 0.8329 2.098 26.331 0.7612 2.360 27.634 0.8367 1.653 24.870 0.7110 2.074 28.479 0.7220 1.361 27.665 0.8166 2.481 31.231 0.7491 1.421 28.391 0.7699 2.080 22.647 0.6863 2.193 29.588 0.8276 1.649 26.640 0.7449 2.386 29.458 0.8105 1.781 27.283 0.7724 1.915

24.736 0.8052 2.053 27.034 0.7887 2.494 27.941 0.7912 1.607 25.009 0.7153 2.079 27.962 0.6927 1.286 28.240 0.8182 2.513 30.657 0.7213 1.378 28.582 0.7681 2.083 23.301 0.7304 2.375 29.550 0.7906 1.626 27.279 0.7564 2.545 29.733 0.7858 1.815 27.502 0.7637 1.938

24.714 0.8044 2.033 27.144 0.7925 2.493 27.700 0.7976 1.574 25.125 0.7215 2.113 28.271 0.7081 1.343 28.307 0.8207 2.526 30.805 0.7279 1.392 28.675 0.7729 2.101 23.115 0.7273 2.314 29.654 0.7973 1.642 27.233 0.7593 2.563 29.749 0.7898 1.810 27.541 0.7683 1.940

25.149 0.8135 2.117 27.323 0.7964 2.535 28.205 0.8243 1.651 25.211 0.7248 2.140 28.337 0.7123 1.354 28.454 0.8246 2.557 31.039 0.7394 1.418 28.766 0.7782 2.147 23.424 0.7355 2.411 29.794 0.8185 1.665 27.350 0.7633 2.595 30.008 0.8023 1.855 27.755 0.7780 1.985

May 7, 2015

DRAFT




21

Fig. 9. Comparison of SR results (3× magnification) of noisy (η = 5) Flower image. The local magnification of ROI in (a) is presented in the bottom-right corner of each resultant image. (a) Original HR image. (b) SKR. (c) SC. (d) ASDS. (e) NLM-SKR. (f) DTV. (g) ANR. (h) Proposed CTV-DNLM method.

Table III shows the quantitative assessments in terms of PSNR, SSIM, and IFC obtained by different methods. Under the noisy environment, the proposed method is also superior to the other SR approaches in terms of average metric values. Figs. 9 and 10 compare the existing methods on noisy Flower and Seeds images, respectively. Although SKR is insensitive to noise, the resultant images are over-smoothed and cannot recover high-frequency details. SC estimates the high-frequency detail from the external examples, but it fails to suppress noise. Even though ASDS lowers the noise level, it unfortunately produces blurred artifacts along the edges. In contrast, most of the edges are preserved in NLM-SKR, but the details in the textural regions are lost. Due to the nature of DTV, the jaggy artifacts can be observed in both non-horizontal and nonvertical directions. ANR achieves good quality in super-resolving high-frequency details but, like SC, it is sensitive to noise. In contrast, the proposed CTV-DNLM method is robust against noise, and is capable of preserving sharp edges and recovering textural details. This is partly because CTV and DNLM regularization terms retain balance in suppressing noise and recovering plausible details.

May 7, 2015

DRAFT




22

Fig. 10. Comparison of SR results (3× magnification) of noisy (η = 5) Seeds image. The local magnification of ROI in (a) is presented in the top-left corner of each resultant image. (a) Original HR image. (b) SKR. (c) SC. (d) ASDS. (e) NLM-SKR. (f) DTV. (g) ANR. (h) Proposed CTV-DNLM method.

D. Results on Actual Images The simulated LR images in previous experiments cannot adequately validate the effectiveness of the proposed method since the blur model in the simulation is difficult to acquire in a real degradation process. In view of this, we repeat the SR experiments on actual LR images to illustrate the robustness and feasibility of the proposed method in a real acquisition environment. Two ROIs selected from the Tower and Text images are presented in Fig. 11. Figs. 12 and 13 show the visual SR quality of the comparison baselines and the proposed method with a magnification factor of 3. In this experiment, we mainly focus on the qualitative comparisons because there is no appropriate metric for quantifying the SR quality when without the corresponding true HR images. Hence, we adopt the user study of paired comparison in [40] to evaluate the SR performance of different methods. Twenty-two participants, aged from 23 to 31 years old, are each presented with a pair of super-resolved images for each ROI yielded by two different methods and asked to select the image in each pair which has finer textural details and sharper edges. To avoid bias, the subject image pairs are randomly sorted. If there are u comparison methods, the total number of pairs will be u × (u − 1)/2. Thus, there are 21 paired comparisons for each ROI in this paper. The preference matrices, as shown in the subfigure (h) of Figs. 12 and 13, demonstrate the qualitative evaluation of the participants’ preferences. For instance, the value in the row “SC” and the column

May 7, 2015

DRAFT




23

Fig. 11. Two ROIs selected from Tower and Text images.

“SKR” is 18, which means that 18 participants prefer the images obtained by the SC method. Only four participants consider SKR performs better than SC. Since each pair contains two different approaches, the diagonal of the preference matrix is marked with “−”. The column “Score” stands for the sum of all the columns. The higher the score is, the better the SR reconstruction quality will be. In Fig. 12(h), the score of ASDS is the lowest, which is partly due to the fact that it cannot estimate the fine details in the textural region. Overall, the scores for the different methods on the Tower and Text images suggest that the participants prefer our results because of the finer textural details and also because there are fewer noticeable artifacts along the sharp edges.

E. Statistical Results on a 200-Image Dataset To thoroughly test the applicability of the proposed SR method, we perform a statistical experiment on 200 natural images selected from the BSDS500 image database [41]. From Sec. III-B, we observe that DTV and ANR of the state-of-the-art methods achieved the second and third best results for the simulated noiseless LR images. Hence, we select these two methods as comparison baselines. Fig. 14 shows the probability distributions of PSNR and IFC gains of the proposed method over two competing methods for the noise level η = 0. The statistical results demonstrate that the proposed method constantly outperforms DTV and ANR methods. Specifically, the gains of the proposed method over DTV have the total probability of 95.5% and 82.5% for the PSNR and IFC values, respectively. Moreover, the sum of probability in the gains over ANR is 98.5% and 96.0% in terms of PSNR and IFC metrics. This also indicates the SR superiority of the proposed method compared to the competing methods.

May 7, 2015

DRAFT




24

Fig. 12. Comparison of SR results (3× magnification) of the Tower image. The local magnification of ROIs is presented in the bottom-left corner of each resultant image. (a) SKR. (b) SC. (c) ASDS. (d) NLM-SKR. (e) DTV. (f) ANR. (g) Proposed CTV-DNLM method. (h) Preference matrix.

F. Empirical Study on Parameters In this subsection, we discuss how to choose the correct parameters in the proposed SR framework. First, we conduct parametric experiments on the BSDS500 database by varying the two regularization parameters α from 0.8 to 2 with step 0.1 and β from 1.5 to 3 with step 0.1. Fig. 15(a) shows the surfaces of the average PSNR on the twelve simulated noiseless images. In this empirical study of regularization parameters, we set α = 0.8 and β = 2.3, where the average PSNR reaches the peak. The construction of the D-NLM weight matrix involves two filter parameters, i.e., ρN and ρV . In addition, Fig. 15(b) shows the average PSNR values by tuning ρN from 1 to 90 with step 10 and ρV from 20 to 70 with step 10

May 7, 2015

DRAFT




25

while the other parameter values are unchanged (i.e., using the values shown in Sec. III-A). The average PSNR at ρN = 50 and ρV = 30 is the highest. Since the search window size and local patch size may affect the estimation of similarity weights, we repeat the experiments to validate the effectiveness of these two side lengths. Fig. 16(a) shows the curves of the average PSNR by varying r and z from 5 to 17 with step 2 for the search window size and the local patch size, respectively. The results suggest that r = 9 and z = 5 is a good option. In the TFOCS framework, the number of iterations (i.e., T) significantly affects the runtime of the SR process. When T is too small, the optimization procedure may fail to converge and be unable to obtain good SR performance. It also requires more runtime but brings little or no improvement in quality to the SR when T is too large. To balance the SR quality and the reconstruction time cost, we analyze the relationship between the average PSNR and T, as well as the relationship between the average SSIM and T, where T is varied from 20 to 300 with step 20. As illustrated in Fig. 16(b), we can see that when the number of iterations is larger than 200, there is little improvement in the metrics. Therefore, we set T = 200 in the experiments for a reliable and stable solution. IV. C ONCLUSION In this paper, we propose a novel reconstruction-based method for single-image super-resolution, taking directional group sparsity and self-similarity with directional features into consideration. By exploiting the directional information of the 16 different directions, the CTV regularizer is clearly able to preserve sharp edges. Both the pixel values of local image patches and directional features at each pixel position are used to estimate the similarity weights and select the corresponding neighbors that are more similar to the target pixel. This results in the D-NLM regularization term reducing aliasing and jaggy artifacts in the final images. The proposed method incorporates the CTV and D-NLM regularization terms for solving the ill-posed problem in the SR process, as well as the global reconstruction constraint, into an iterative TFOCS framework to obtain the optimal solution. The comprehensive experimental results in terms of quantitative and qualitative assessment indicate the effectiveness of the proposed method over state-of-the-art methods. In the future, the proposed method will be extended to the following orientations:1) designing a more effective feature extraction pattern for preserving sharp edges, such as deep features [42] and opponent features [43]; 2) combining other powerful learning algorithms (e.g., [44], [45], [46], and [47]) to estimate more plausible details; 3) raising the SR reconstruction speed to apply in real-time environments with the methods of [48] and [49]. May 7, 2015

DRAFT




26

Fig. 13. Comparison of SR results (3× magnification) of the Text image. The local magnification of the ROI is presented at the bottom of each resultant image. (a) SKR. (b) SC. (c) ASDS. (d) NLM-SKR. (e) DTV. (f) ANR. (g) Proposed CTV-DNLM method. (h) Preference matrix.

R EFERENCES [1] S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruction: a technical overview,” IEEE Signal Process. Magazine, vol. 20, no. 3, pp. 21–36, May 2003. [2] X. Li and M. T. Orchard, “New edge-directed interpolation,” IEEE Trans. Image Process., vol. 10, no. 10, pp. 1521–1527, Oct. 2001. [3] X. Zhang and X. Wu, “Image interpolation by adaptive 2-d autoregressive modeling and soft-decision estimation,” IEEE Trans. Image Process., vol. 17, no. 6, pp. 887–896, Jun. 2008. [4] D. Tao, L. Jin, Y. Wang, and X. Li, “Person reidentification by minimum classification error-based kiss metric learning,” IEEE Transactions on Cybernetics, vol. 45, no. 2, pp. 242–252, Feb. 2015. [5] C. Xu, D. Tao, and C. Xu, “Multi-view intact space learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PP, no. 99, pp. 1–1, 2015. [6] W. Freeman, T. Jones, and E. Pasztor, “Example-based super-resolution,” IEEE Comput. Graphics Appl., vol. 22, no. 2, pp. 56–65, Mar. 2002.

May 7, 2015

DRAFT




27

Fig. 14. Probability distributions of PSNR and IFC gains for a 200-image dataset with a noise level η = 0. (a) PSNR values. (b) IFC values.

[7] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, Nov. 2010. [8] M. Song, C. Chen, J. Bu, and T. Sha, “Image-based facial sketch-to-photo synthesis via online coupled dictionary learning,” Inform. Sci., vol. 193, no. 0, pp. 233–246, 2012. [9] K. Zhang, X. Gao, D. Tao, and X. Li, “Multi-scale dictionary for single image super-resolution,” in Proc. IEEE Conf. Comput. Vision Pattern Recog., 2012, pp. 1114–1121. [10] K. Ni and T. Nguyen, “Image superresolution using support vector regression,” IEEE Trans. Image Process., vol. 16, no. 6, pp. 1596–1610, Jun. 2007. [11] K. I. Kim and Y. Kwon, “Single-image super-resolution using sparse regression and natural image prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 6, pp. 1127–1133, Jun. 2010. [12] D. Tao, L. Jin, W. Liu, and X. Li, “Hessian regularized support vector machines for mobile image annotation on the cloud,” IEEE Transactions on Multimedia, vol. 15, no. 4, pp. 833–844, Jun. 2013. [13] W. Wu, Z. Liu, and X. He, “Learning-based super resolution using kernel partial least squares,” Image Vision Comput.,

May 7, 2015

DRAFT




28

Fig. 15. Effect of regularization parameters and filter parameters on the SR quality in terms of average PSNR. (a) varying α from 0.8 to 2 with step 0.1 and β from 1.5 to 3 with step 0.1. (b) varying ρN from 1 to 90 with step 10 and ρV from 20 to 70 with step 10.

vol. 29, no. 6, pp. 394–406, 2011. [14] D. Tao, X. Lin, L. Jin, and X. Li, “Principal component 2-d long short-term memory for font recognition on single chinese characters,” IEEE Transactions on Cybernetics, vol. PP, no. 99, pp. 1–1, 2015. [15] K. Zhang, X. Gao, X. Li, and D. Tao, “Partially supervised neighbor embedding for example-based image super-resolution,” IEEE J. Sel. Topic Signal Process., vol. 5, no. 2, pp. 230–239, Apr. 2011. [16] X. Gao, K. Zhang, D. Tao, and X. Li, “Image super-resolution with sparse neighbor embedding,” IEEE Trans. Image Process., vol. 21, no. 7, pp. 3194–3205, 2012.

May 7, 2015

DRAFT




29

Fig. 16. Effect of search window size and patch size on weight estimation, as well as the maximum number of iterations of the SR performance. (a) varying r and z from 5 to 17 with step 2 in terms of average PSNR. (b) varying T from 20 to 300 with step 20 in terms of average PSNR and average SSIM.

[17] R. Timofte, V. De, and L. V. Gool, “Anchored neighborhood regression for fast example-based super-resolution,” in Proc. IEEE Int. Conf. Comput. Vision, 2013, pp. 1920–1927. [18] C. Xu, D. Tao, and C. Xu, “Large-margin multi-viewinformation bottleneck,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1559–1572, Aug. 2014. [19] Y.-W. Tai, S. Liu, M. Brown, and S. Lin, “Super resolution using edge prior and single image detail synthesis,” in Proc. IEEE Conf. Comput. Vision Pattern Recog., Jun. 2010, pp. 2400–2407. [20] K. Zhang, G. Mu, Y. Yuan, X. Gao, and D. Tao, “Video super-resolution with 3d adaptive normalized convolution,” Neurocomput., vol. 94, no. 0, pp. 140–151, 2012. [21] A. Marquina and S. J. Osher, “Image super-resolution by TV-regularization and Bregman iteration,” J. Sci. Comput., vol. 37, no. 3, pp. 367–382, Dec. 2008.

May 7, 2015

DRAFT




30

[22] T. Michaeli and M. Irani, “Nonparametric blind super-resolution,” in Proc. IEEE Int. Conf. Comput. Vision, Dec. 2013, pp. 945–952. [23] S. Becker, E. J. Candès, and M. C. Grant, “Templates for convex cone problems with applications to sparse signal recovery,” Math. Program. Comput., vol. 3, no. 3, pp. 165–218, 2011. [24] C. Fernandez-Granda and E. J. Candès, “Super-resolution via transform-invariant group-sparse regularization,” in Proc. IEEE Int. Conf. Comput. Vision, Dec. 2013, pp. 3336–3343. [25] D. Tao, L. Jin, Y. Wang, and X. Li, “Rank preserving discriminant analysis for human behavior recognition on wireless sensor networks,” IEEE Transactions on Industrial Informatics, vol. 10, no. 1, pp. 813–823, Feb. 2014. [26] D. Tao, J. Cheng, X. Lin, and J. Yu, “Local structure preserving discriminative projections for rgb-d sensor-based scene classification,” Information Sciences, no. 0, 2015. [27] H. Takeda, S. Farsiu, and P. Milanfar, “Kernel regression for image processing and reconstruction,” IEEE Trans. Image Process., vol. 16, no. 2, pp. 349–366, Feb. 2007. [28] M. Protter, M. Elad, H. Takeda, and P. Milanfar, “Generalizing the nonlocal-means to super-resolution reconstruction,” IEEE Trans. Image Process., vol. 18, no. 1, pp. 36–51, Jan. 2009. [29] K. Zhang, X. Gao, D. Tao, and X. Li, “Single image super-resolution with multiscale similarity learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 10, pp. 1648–1659, Oct. 2013. [30] W. Dong, D. Zhang, G. Shi, and X. Wu, “Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization,” IEEE Trans. Image Process., vol. 20, no. 7, pp. 1838–1857, Jul. 2011. [31] K. Zhang, X. Gao, D. Tao, and X. Li, “Single image super-resolution with non-local means and steering kernel regression,” IEEE Trans. Image Process., vol. 21, no. 11, pp. 4544–4556, Nov. 2012. [32] J. Yu, X. Gao, D. Tao, X. Li, and K. Zhang, “A unified learning framework for single image super-resolution,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 4, pp. 780–792, Apr. 2014. [33] E. J. Candès and D. L. Donoho, “Curvelets: a surprisingly effective nonadaptive representation of objects with edges,” in Cohen, Rabat and Schumaker, Eds.,Curve and Surface Fitting. Nashville: Vanderbilt University Press, 2000, pp. 105–120. [34] G. Lan, Z. Lu, and R. D. C. Monteiro, “Primal-dual first-order methods with O(1/) iteration-complexity for cone programming,” Math. Program., vol. 126, no. 1, pp. 1–29, Jan. 2011. [35] Y. Zhu, Y. Zhang, and A. L. Yuille., “Single image super-resolution using deformable patches,” in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Jun. 2014, pp. 2917–2924. [36] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba, “Sun database: Large-scale scene recognition from abbey to zoo,” in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Jun. 2010, pp. 3485–3492. [37] X. Gao, W. Lu, D. Tao, and X. Li, “Image quality assessment based on multiscale geometric analysis,” IEEE Trans. Image Process., vol. 18, no. 7, pp. 1409–1423, Jul. 2009. [38] C.-Y. Yang, C. Ma, and M.-H. Yang, “Single-image super-resolution: A benchmark,” in Computer Vision ECCV 2014, ser. Lecture Notes in Computer Science, 2014, vol. 8692, pp. 372–386. [39] H. Sheikh, A. Bovik, and G. de Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Trans. Image Process., vol. 14, no. 12, pp. 2117–2128, Dec. 2005. [40] M. Song, D. Tao, C. Chen, X. Li, and C. W. Chen, “Color to gray: Visual cue preservation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1537–1552, Sep. 2010. [41] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to

May 7, 2015

DRAFT




31

evaluating segmentation algorithms and measuring ecological statistics,” in Proc. IEEE Int. Conf. Comput. Vision, vol. 2, Jul. 2001, pp. 416–423. [42] M. Jaderberg, A. Vedaldi, and A. Zisserman, “Deep features for text spotting,” in Computer Vision ECCV 2014, ser. Lecture Notes in Computer Science, 2014, vol. 8692, pp. 512–528. [43] I. Alexiou and A. Bharath, “Spatio-chromatic opponent features,” in Computer Vision ECCV 2014, ser. Lecture Notes in Computer Science, 2014, vol. 8693, pp. 81–95. [44] B. Geng, D. Tao, C. Xu, L. Yang, and X.-S. Hua, “Ensemble manifold regularization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 6, pp. 1227–1233, Jun. 2012. [45] S. Cai, W. Zuo, L. Zhang, X. Feng, and P. Wang, “Support vector guided dictionary learning,” in Computer Vision ECCV 2014, ser. Lecture Notes in Computer Science, 2014, vol. 8692, pp. 624–639. [46] C. Bao, Y. Quan, and H. Ji, “A convergent incoherent dictionary learning algorithm for sparse coding,” in Computer Vision ECCV 2014, ser. Lecture Notes in Computer Science, 2014, vol. 8694, pp. 302–316. [47] X. Chen, D. Zou, J. Li, X. Cao, Q. Zhao, and H. Zhang, “Sparse dictionary learning for edit propagation of high-resolution images,” Jun. 2014, pp. 2854–2861. [48] E. Strekalovskiy and D. Cremers, “Real-time minimization of the piecewise smooth mumford-shah functional,” in Computer Vision ECCV 2014, ser. Lecture Notes in Computer Science, 2014, vol. 8690, pp. 127–141. [49] J. Herling and W. Broll, “High-quality real-time video inpainting with PixMix,” IEEE Trans. Visual. Comput. Graphics, vol. 20, no. 6, pp. 866–879, Jun. 2014.

May 7, 2015

DRAFT




32

Xiaoyan Li received the B.Eng. degree in Communication Engineering from Southwest Jiaotong University (SWJTU), Chengdu, China, in 2009. She is currently pursuing the Ph.D. degree in Communication and Information Systems from SWJTU. From 2012 to 2014, she has been a Visiting Ph.D. Student with the University of Technology, Sydney (UTS), Australia. Her research interests include computer vision, pattern recognition, and image super-resolution reconstruction.

Hongjie He is currently a Professor with the School of Information Science and Technology, Southwest Jiaotong University (SWJTU), Chengdu, China. She received the Ph.D. degree in signal and information processing from SWJTU, and the B.Sc. degree from Henan Normal University, Xinxiang, China. Her research interests are in the areas of image processing and information hiding.

Ruxin Wang received the B.Eng. degree from the Xidian University, China, in 2010, and the M.Sc. degree from Huazhong University of Science and Technology, China, in 2013. He is currently pursuing the Ph.D. degree in the University of Technology, Sydney (UTS), Australia. His current research interests lie primarily in deep learning, image restoration and computer vision.

Dacheng Tao (F’15) is Professor of Computer Science with the Centre for Quantum Computation & Intelligent Systems, and the Faculty of Engineering and Information Technology in the University of Technology, Sydney. He mainly applies statistics and mathematics to data analytics and his research interests spread across computer vision, data science, image processing, machine learning, neural networks and video surveillance. His research results have expounded in one monograph and 100+ publications at prestigious journals and prominent conferences, such as IEEE T-PAMI, T-NNLS, T-IP, JMLR, IJCV, NIPS, ICML, CVPR, ICCV, ECCV, AISTATS, ICDM; and ACM SIGKDD, with several best paper awards, such as the best theory/algorithm paper runner up award in IEEE ICDM’07, the best student paper award in IEEE ICDM’13, and the 2014 ICDM 10 Year Highest-Impact Paper Award.

May 7, 2015

DRAFT


Directional structural features of globular proteins.

Color demosaicking via fully directional estimation.

Image superresolution reconstruction via granular computing clustering.

Handling noise in single image defocus map estimation by using directional filters.

Superresolution microscopy reveals a dynamic picture of cell polarity maintenance during directional growth.

Color-direction patch-sparsity-based image inpainting using multidirection features.

Directional hearing: from biophysical binaural cues to directional hearing outdoors.

Directional Reflective Surface Formed via Gradient-Impeding Acoustic Meta-Surfaces.

Subtractive hybridization system using single-stranded phagemids with directional inserts.

Single image superresolution based on gradient profile sharpness.

Multi-directional local search.

Rotational vs. directional atherectomy.

Directional antineutrino detection.

Directional hearing AIDS.

Estimating directional epistasis.

Bi-directional hepatic hydrothorax.

Automatic detection of motion blur in intravital video microscopy image sequences via directional statistics of log-Gabor energy maps.

Resolution enhancement of lung 4D-CT via group-sparsity.

Cognitive Mechanisms Underlying Directional and Non-directional Spatial-Numerical Associations across the Lifespan.

Restenosis after directional coronary atherectomy.

Directional Darwinian Selection in proteins.

Directional acuity for drifting plaids.

Evolutionary trends in directional hearing.

Directional growth of renal calculi.