IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 1, JANUARY 2014

399

Frame Rate Up Conversion Based on Variational Image Fusion Won Hee Lee, Student Member, IEEE, Kyuha Choi, Student Member, IEEE, and Jong Beom Ra, Senior Member, IEEE

Abstract— This paper presents a new framework for motion compensated frame rate up conversion (FRUC) based on variational image fusion. The proposed algorithm consists of two steps: 1) generation of multiple intermediate interpolated frames and 2) fusion of those intermediate frames. In the first step, we determine four different sets of the motion vector field using four neighboring frames. We then generate intermediate interpolated frames corresponding to the determined four sets of the motion vector field, respectively. Multiple sets of the motion vector field are used to solve the occlusion problem in motion estimation. In the second step, the four intermediate interpolated frames are fused into a single frame via a variational image fusion process. For effective fusion, we determine fusion weights for each intermediate interpolated frame by minimizing the energy, which consists of a weighted-L1 -norm based data energy and gradientdriven smoothness energy. Experimental results demonstrate that the proposed algorithm improves the performance of FRUC compared with the existing algorithms. Index Terms— Frame rate up conversion, motion compensation, optical flow, variational image fusion.

I. I NTRODUCTION

M

OTION blur in hold-type displays is caused by their hold-type characteristic and motion pursuing integration of the human visual system together [1], [2]. To systemically analyze this phenomenon, a general hold-type display motion blur model was developed based on sampling and reconstruction theory [3]. By using the motion blur model, the frame rate up conversion (FRUC) technique was theoretically justified. FRUC is a widely used signal processing approach employed to reduce the motion blur appearing in hold-type displays, such as liquid crystal displays. Recently commercialized liquid crystal displays are capable of operating at frame rates of 120-240 Hz for motion blur reduction. To utilize these high frame rate displays, the frame rate should be up-converted from the transmitted video frames. Simple approaches to FRUC include black frame insertion, frame repetition, frame averaging, and gray frame insertion [2], [4]. However, black frame insertion causes decreased Manuscript received October 2, 2012; revised August 18, 2013; accepted October 18, 2013. Date of publication November 1, 2013; date of current version November 28, 2013. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Alessandro Foi. The authors are with the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2013.2288139

luminance, frame repetition causes motion jerkiness for nonstationary frames, frame averaging introduces blurring to moving objects, and gray frame insertion causes flickering. Meanwhile, impulsive driving techniques have been studied as a low cost solution, but they still suffer from a large area flicker problem [5]. To reduce these artifacts, various methods to compensate object motion have been developed [6]–[19]. Such methods are known as motion compensated FRUC. If the frame rate increases in a camera captured video, the amount of motion blur is expected to decrease. Therefore, to obtain the best performance in FRUC, the motion blur reduction may also be needed. However, in this paper, we do not consider motion blur as in most existing FRUC algorithms. Motion compensated FRUC is composed of two main procedures, motion estimation and motion compensated frame interpolation, and its performance is accordingly dependent on these two procedures. Block matching algorithms are commonly used for motion estimation in motion compensated FRUC, because it is simple and easy to implement [6]–[16]. However, block-matching-algorithm-based methods do not provide adequate performance in correctly estimating all the motion in a frame. Reliable motion estimation is important for the performance of motion compensated FRUC, and several algorithms to refine motion vectors (MVs) obtained from the block matching algorithm have been introduced [7]–[12]. However, the estimated MVs are still not good enough for motion compensated frame interpolation, and may produce blocking artifacts in interpolated frames. As a different approach in place of block matching algorithm, optical flow estimation, a pixel-based motion estimation method first proposed by Lucas and Kanade [20], can be considered. Several trials to solve the motion compensated FRUC problem based on optical flow estimation have been reported. Tang and Au examined the influence of optical flow estimation on the performance of motion compensated FRUC, and concluded that although an optical-flow-estimation-based method can avoid blocking artifacts appearing with use of a block matching algorithm, it can also produce salt-and-pepper artifacts [17]. The latter artifacts may be reduced by applying a smoothness constraint to optical flow estimation. Accordingly, Krishnamurthy et al. proposed a motion compensated FRUC method using a smoothness-constrained optical flow estimation approach [18]. Meanwhile, Keller et al. solved an optical flow based motion compensated FRUC problem, by considering motion estimation and motion compensated FRUC

1057-7149 © 2013 IEEE

400

in a single variational framework [19]. Although the two methods introduced in [18] and [19] can provide more reliable MVs than block matching algorithm, interpolated frames can still be degraded by erroneous MVs due to occlusion or oversmoothed MVs. In addition to motion estimation, motion compensated frame interpolation has also been studied. A major issue in motion compensated frame interpolation is an interpolation problem due to falsely estimated MVs in an occlusion area. To solve this occlusion problem, forward and backward MVs have been adaptively used for motion compensated frame interpolation [13]–[15], [18]. Ojo and Hann designed a cascaded median filter, whose input consists of four pixels determined from two neighboring frames, namely, two non-motion-compensated pixels and two motion-compensated pixels using forward and backward MVs [13]. The cascaded median filter then solves the occlusion problem by adaptively determining interpolated pixels. Wang et al. proposed a similar method to handle the occlusion problem [14]. In their work, an interpolated pixel value is determined as a weighted sum of two motion compensated pixels that are obtained by using forward and backward MVs, respectively. Here, weights are simply determined as the ratio of matching errors of two MVs. Dane and Nguyen suggested an interpolation scheme adaptive to the MV reliability based on a theoretical analysis [15]. Their analysis shows that adaptive weights based on MV reliability can improve the performance of motion compensated frame interpolation. However, optimal selection of the weights is still necessary for further improvement of the performance. Along with the optimal selection of weights based on MV reliability, high spatial correlation between the weights is also important for successful interpolation in motion compensated frame interpolation. This is because spatial discontinuity in weights in the same frame can cause undesirable artifacts. Krishnamurthy et al. dealt with the spatial smoothness of weights, but their method does not depend on MV reliability [18]. In this paper, we propose a novel motion compensated FRUC algorithm to effectively solve the occlusion problem. In the algorithm, rather than employing block-matchingalgorithm-based estimation, we adopt optical flow estimation for MVs to maximize the performance of motion compensated FRUC. Thereby we can handle non-rigid motions and slight brightness variations between frames, even though reflections by some transparent objects are the remaining problem. We estimate four sets of MV fields and corresponding reliabilities, for four adjacent frames, respectively. Using the estimated sets, we generate four intermediate interpolated frames, and then fuse them into a final interpolated frame based on a variational image fusion technique. To improve the performance of motion compensated frame interpolation, fusion weights are optimally determined by minimizing the proposed energy, which consists of data and smoothness terms. The data term is defined so as to determine appropriate fusion weights by minimizing the error of a fused image, which can be described by using MV reliabilities. Meanwhile, the smoothness constraint term is defined so as to alleviate undesired abrupt spatial deviation of fusion weights, which may introduce spatial inconsistency in the fused image.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 1, JANUARY 2014

Fig. 1. Motion compensated frame interpolation based on (a) one inward MV field, (b) one inward MV field of opposite direction, and (c) two inward and two outward MV fields.

This paper is organized as follows. In Section II, the occlusion problem in motion compensated FRUC is discussed. The proposed motion compensated FRUC algorithm is described in Section III. Section IV presents experimental results. Finally, we conclude the paper in Section V. II. M OTION C OMPENSATED F RAME R ATE U P C ONVERSION IN O CCLUSION A REA Since optical flow estimation generally provides reliable MV fields in non-occlusion areas, a major problem in motion compensated FRUC is inaccurate motion estimation in occlusion areas. It is known that inaccurate motion estimation is due to a lack of information for matching. For instance, in Fig. 1(a), background pixels A and B in Ft −1 are occluded in Ft +1 , and their inward MVs, which are estimated from Ft −1 to Ft +1 , may point at an incorrect background pixel C. Such incorrect MVs can cause blurring or artifacts in the overlap region in the interpolated frame Ft . The same problem can occur in uncovered pixels D and E where no inward MV is projected. To fill the corresponding hole region in Ft , we may use inward MVs of the opposite direction of pixels D and E that point at an incorrect background pixel F, as in Fig. 1(b). This problem can then also be considered as an occlusion problem as in the overlap region in Fig. 1(a). To complement these unreliable inward MVs in occlusion areas, we utilize two additional MVs by inverting the outward MV fields, which are estimated from Ft −1 to Ft −3 and Ft +1 to Ft +3 , as shown in Fig. 1(c). This is based on the observation that occluded pixels, for instance, pixels A and B in Ft −1 , are usually not occluded in the opposite direction toward Ft −3 . Hence, we can have additional reliable MVs for pixel interpolation in Ft by inverting the outward MVs. Note that, in Fig. 1(c), a pixel in Ft can have at least one reliable MV for motion compensated frame interpolation among two inward MV fields and two inverted outward MV fields for all regions including overlap and hole pixels. The other issue in occlusion areas is the weight determination of four MV fields to effectively utilize them for motion compensated frame interpolation. Since several MVs may be simultaneously projected onto a pixel in Ft , we need to interpolate four pixel values. For the interpolation, we should

LEE et al.: FRUC BASED ON VARIATIONAL IMAGE FUSION

401

Details of the three schemes, motion estimation, temporal interpolation, and image fusion, are provided in the following subsections. A. Motion Estimation

Fig. 2.

Overall structure of the proposed algorithm.

determine weights that can be described as the degree of contribution of the pixel intensity corresponding to each MV. High matching error and high spatial variation of MVs are usually related with occluded pixels having low MV reliability. Therefore, they can be utilized for determining weights of MVs that are closely related with their reliabilities [14], [15]. Since MV weights are independently determined for each interpolated pixel, they may introduce undesirable artifacts due to a lack of spatial consistency. These artifacts may be alleviated by applying a spatial smoothness constraint for MV weights, as described in the following section. III. P ROPOSED M OTION C OMPENSATED F RAME R ATE U P C ONVERSION Fig. 2 shows the overall structure of the proposed algorithm. The proposed motion compensated FRUC mainly consists of motion estimation and motion compensated frame interpolation stages. In the motion estimation stage, four MV fields, V1t , V2t , V3t , and V4t , are estimated by using an optical flow estimation scheme. Meanwhile, the motion compensated frame interpolation stage is divided into two steps, temporal interpolation and image fusion. In the temporal interpolation step, by using four consecutive input frames and corresponding MV fields, four intermediate interpolated frames, Fti , are produced at t. During this interpolation process, we also obtain a pixel reliability map, Rti , of each intermediate interpolated frame. In the image fusion step, the final image, Ft , is obtained by using the four intermediate interpolated frames and the corresponding pixel reliability maps via the proposed variational image fusion scheme. Several existing FRUC algorithms also utilize multiple MV fields to solve the occlusion problem [21], [22]. Bellers et al. estimate a forward and a backward MV by using three successive frames [21]. Cho et al. recently propose an approach that combines forward feature tracking and a backward blockbased motion estimation technique [22]. Those algorithms select either forward or backward MVs to solve the occlusion problem. In contrast, the proposed algorithm first determines intermediate interpolated frames using multiple MVs, and then fuses them into the final image. This distinct structure enables to maximize the image fusion efficiency by utilizing all the pixel reliabilities of the intermediate interpolated frames simultaneously.

In the motion estimation stage, we estimate the four MV fields described in Fig. 2; namely, two inward MV fields of V2t and V3t , and two outward MV fields of V1t and V4t . In order to estimate a MV field between two frames, we adopt an existing optical flow estimation algorithm [23]. Without loss of generality, we may describe the algorithm only for V2t . For two frames, Ft −1 and Ft +1 , the algorithm estimates the MV field that minimizes the optical flow energy,       (1) E OF V2t = E OF,D Vt2 + a · E OF,S Vt2 , where the data term E OF,D and smoothness term E OF,S are defined as 

      Ft −1 (x) − Ft +1 x + V2 2 + 2 t E OF,D Vt =    2 dx, γ ∇ Ft −1 (x)−∇ Ft +1 x + Vt2 2 

2 2    E OF,S Vt2 =  ∇u 2t + ∇v2t dx. 

2

2

(2) (3)

Here, ∇ denotes the gradient operator and the parameters α and γ denote the weighting parameters, respectively. In √ addition, V2t = (u 2t , vt2 ) and  (s) = s + ε, where ε is a tiny constant. The data term in Eq. (2) preserves the intensity and gradient constancy of the data, while the smoothness term in Eq. (3) smoothes the optical flow field. Both terms are measured in a regularized L 1 -norm via , and thus the data term can improve the robustness to outliers and the smoothness term can preserve motion boundaries. The energy is minimized by using a multi-level pyramidal scheme to cope with large displacements. In the minimization of E OF in each pyramidal level, however, optical flows obtained in the previous level are often oversmoothed near motion boundaries due to the smoothness term. To solve this problem, we refine the initial optical flow near the motion boundaries prior to the optimization for optical flow estimation, at each level. For the optical flow refinement, median filtering is often applied to the optical flow to remove outliers [24]. If the optical flow is over-smoothed, however, the median filtering will not be effective for sharpening motion boundaries because it selects an optical flow among neighboring ones without considering matching errors. Hence, we instead calculate five block matching errors at the current pixel by using its optical flow and the optical flows of its four neighboring pixels, respectively. The optical flow of the current pixel is then refined by replacing it with the optical flow providing the minimum matching error among the five. This refinement is recursively performed pixel by pixel. If the refinement is performed once along the raster (or anti-raster) scan direction, pixels at motion boundaries can be refined only when their left- (or right-) side pixel has a correct optical flow. To avoid this problem, we refine the optical flows twice along the raster and anti-raster scan directions. This refinement

402

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 1, JANUARY 2014

TABLE I AVERAGE E NDPOINT E RROR (AEE) AND AVERAGE A NGULAR E RROR (AAE) VALUES FOR THE A DOPTED O RIGINAL AND I TS R EFINED O PTICAL F LOW E STIMATION (OFE)

Fig. 3. (a), (b) The 219th and 221st frames of the Flower sequence. (c) Color code for visualizing optical flow fields. Hue and saturation denote the direction and magnitude of a flow, respectively. (d), (e) Forward and backward optical flow fields obtained by using the adopted optical flow estimation and (f), (g) by using its modified one.

procedure tends to sharpen the motion boundary of the initial optical flow that is smoothed in the previous pyramidal level. Thereby, we can achieve a sharp motion boundary in the optical flow estimated through the following optimization. Fig. 3 shows optical flow estimation results to examine the performance improvement. The results in Figs. 3(d) and (e) are obtained from the original optical flow estimation [23], while those in Figs. 3(f) and (g) are from the modified version. It is noted that forward MVs on the right side of the tree in Fig. 3(d) and backward MVs on the left side of the tree in Fig. 3(e) appear to be incorrect. Using the modified optical flow estimation, however, the corresponding MVs are correctly estimated, as shown in Figs. 3(f) and (g). Even though MVs on the other side of the tree or the occlusion area are incorrect even in Figs. 3(f) and (g), they can be complemented by using the inverted outward MVs. To examine the objective performance improvement of the refined optical flow estimation, we perform an experiment on the Middlebury benchmark datasets with ground truth [39]. In the experiment, we adopt two error metrics, the average angular error (AAE) and the average endpoint error (AEE), delineated as follows: ⎛ ⎞ 1 + u × u GT + v × vGT 1 −1 ⎝ ⎠ (4)  cos AAE = √ N 2 2 2 2 1+u +v 1+u +v GT

and AEE =

 1 (u − u GT )2 + (v − vGT )2 . N

GT

(5)

Here, (u, v) and (u GT , vGT ) denote the estimated and the ground truth optical flow, respectively, and N is the number of pixels. Table I shows the evaluation results. As seen in the table, the refined algorithm improves both metrics in all datasets.

B. Temporal Interpolation In the temporal interpolation step of the motion compensated frame interpolation stage, we produce four intermediate interpolated frames by using the four different MV fields, respectively. Note that, while two intermediate interpolated frames are determined by using inward MV fields, the other two intermediate interpolated frames are determined by using inverted outward MV fields. In the determination of an intermediate interpolated frame using a MV field, we may examine the reliability of the MV, because not all MVs in a field are reliable, especially in occlusion areas. 1) Motion Vector Reliability: Several measures of MV reliability have been proposed [15], [25]–[27]. Among them, we adopt a reliability measure based on the a posteriori probability of a MV [25], which is defined on the basis of temporal matching errors and the spatial variation of MVs. Namely,   

   err Vt2 (x) 2 R Vt (x) = Tτ R exp − λe   2 

sv Vt (x) , (6) × exp − λv where err and sv denote the temporal matching error and spatial variation of V2t , respectively. User-defined parameters λe and λv are related with acceptable ranges of err and sv, respectively. Tτ R (r ) denotes a hard threshold operator that removes outliers by setting R to zero if r is less than a small threshold value of τ R . In Eq. (6), we define err and sv as       1 err Vt2 (x) = 2 Ft −1 x j − Ft +1 x j +Vt2 (x) , (7) 1 Nw x j ∈w(x)     1 sv Vt2 (x) = 2 (8) ∇Vt2 x j . 1 Nw x j ∈w(x)

Here, w and Nw denote a window and its size, respectively. To improve the credibility of MV reliability while avoiding

LEE et al.: FRUC BASED ON VARIATIONAL IMAGE FUSION

403

Fig. 4. An example of inward interpolation based on neighboring projected pixels.

undesirable MV inconsistency in w(x), Nw is set to three. Note here that err and sv are defined using the L 1 -norm, unlike in [25], because it is known to be more robust to outliers than the L 2 -norm [28]. 2) Inward Interpolation: Temporal interpolation is a process to produce a new frame along motion trajectories. Fig. 4 illustrates an example of inward interpolation, or forward interpolation, to determine the value of pixel x in an intermediate interpolated frame, Ft2 . As seen in the figure, its neighboring pixels in Ft −1 are projected onto Ft2 by using an inward MV field. Pixel x of integer coordinates on Ft2 can then be determined as the weighted sum of the values of neighboring projected pixels x j ( j = 0, …, 5, in this example). Namely,         R Vt2 xj gd x, x j Ft −1 xj Ft2 (x) =

x j ∈w(x)



x j ∈w(x)

     R Vt2 xj gd x, x j

.

(9)

2   Here, x j = x j  +0.5·V2t (x j  ), and gd x, x j = exp(− x − x j 2 /(2σd2 )), where σd is a user-defined parameter. If the size of the window w(x) is too small, pixels that can be filled with correct pixel values may be wrongly assigned to a hole. Contrarily, if the size of the window is too big, pixels to be assigned to a hole may be filled with incorrect pixel values. Via experiment, the adequate size of window w(x) is determined as two. Note that, in Eq. (9), the weight of a projected pixel x j reveals the corresponding MV reliability and its distance from x. In Eq. (9), if R(V2t (x j  )) is zero for all j , we consider pixel x to be undetermined in this intermediate interpolated frame. Ft3 is similarly generated by replacing V2t and Ft −1 in Eq. (9) with V3t and Ft +1 . 3) Outward Interpolation: Outward interpolation is performed similarly to the inward interpolation by using an inverted outward MV field. In other words, we generate Ft1 and Ft4 by replacing V2t and Ft −1 in Eq. (9) with -V1t and Ft −1 and with -V4t and Ft +1 , respectively. The interpolation is considered reliable only if the outward MVs are collinear with the corresponding inward MVs. Therefore, we need to examine the collinearity between an outward MV and the corresponding inward MV.

Fig. 5. Collinearity test of a pixel (a) in a nonlinearly moving object and (b) in a static background, respectively.

As an example, let us consider a nonlinearly moving object on a static background, as shown in Fig. 5. Fig. 5(a) demonstrates that outward and inward MVs of an object pixel xo , V1t (xo) and V2t (xo), are not collinear, because the object motion is nonlinear. Hence, V1t (xo ) is considered unreliable and accordingly outward interpolation cannot be applied to it. However, outward and inward MVs of a pixel xb1 in the static background, or V1t (xb1) and V2t (xb1), appear to be collinear, as shown in Fig. 5(b), and outward interpolation can hence be applied. On the contrary, the MVs of a background pixel near the object or in the occlusion area, V1t (xb2) and V2t (xb2 ) in Fig. 5(b), appear not to be collinear. However, we can note that the corresponding outward interpolation is useful for determining the value of the occluded pixel, unlike a pixel in a nonlinearly moving object. To discriminate this pixel from a pixel in a nonlinearly moving object, we test the collinearity of the outward MV at pixel x, V1t (x), with the backward inward MV, V3t (x’), instead of with the forward inward MV, V2t (x). Note here that x’ in V3t (x’) denotes a pixel pointed at by V2t (x), namely, x’ = x + V2t (x). In Fig. 5(b), V1t (xb2 ) and V3t (xb2 ’) appear to be collinear. Therefore, we may consider V1t (xb2 ) to be reliable for outward interpolation. Meanwhile, since V1t (xo ) and V3t (xo’) are not collinear, as in Fig. 5(a), V1t (xo ) is considered unreliable for outward interpolation. If an outward MV, V1t (x), is found to be unreliable via the collinearity test, we set its reliability to zero. Namely,     R Vt1 (x) = 0 if Vt3 x + Vt2 (x) − Vt1 (x) > δ, (10) where δ is a user-defined threshold. C. Image Fusion In the image fusion step of the motion compensated frame interpolation stage, the four intermediate interpolated frames are combined to estimate the original image, Ft . However, the intermediate interpolated frames have different aspects from each other, because they are interpolated with different MV fields, respectively. To combine these intermediate interpolated frames into a single frame by minimizing the influence of erroneous MVs, we use reliability measures for the MVs

404

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 1, JANUARY 2014

of the MVs. Therefore, we determine the reliability of a pixel, Rti , by interpolating the reliabilities of MVs that correspond to neighboring projected pixels, similarly to Eq. (9). Namely,       R Vti xj gd x, x j Rti (x) =

x j ∈w(x)



x j ∈w(x)

  gd x, x j

.

(12)

If a pixel has no neighboring projected pixel, we set its reliability to zero. Figs. 6(e)-(h) show the pixel reliability maps of four different intermediate interpolated frames, Rt1 , Rt2 , Rt3 , and Rt4 . 2) Fusion Weight Determination Based on a Variational Framework: To determine the fusion weights for given pixel reliability maps, we consider the image fusion problem based on an observation model, Fti = Ft + ηi . Fig. 6. (a), (b), (c), (d) Four intermediate interpolated frames and (e), (f), (g), (h) their corresponding pixel reliability maps using MV fields, V1t , V2t , V3t , and V4t , respectively.

corresponding to each pixel, because not all MVs are reliable in practice, especially in the occlusion areas. Figs. 6(a)–(d) demonstrate four intermediate interpolated frames obtained for the 220th frame of the Flower sequence. In the intermediate interpolated frames, we regard the pixels with zero MV reliability as undetermined pixels, and represent them with green pixels. It is interesting that green pixels appear in both sides of the tree trunk in the intermediate interpolated frames, which correspond to hole and overlap regions, because those regions usually have very low reliabilities. As seen in the figure, an undetermined pixel or a green pixel in an intermediate interpolated frame can be recovered using the pixel(s) with high reliabilities at the same position in the other intermediate interpolated frame(s). Therefore, by properly fusing the four intermediate interpolated frames, we can effectively recover the pixel values in both hole and overlap regions. For the fusion of four intermediate interpolated frames, we introduce fusion weights. By assigning proper fusion weights to each intermediate interpolated frame according to the MV reliability, the fused image can be determined as follows: Ft (x) =

4

i=1

W i (x) Fti (x).

(11)

   Here, W i 0 ≤ W i ≤ 1 and i W i = 1 denotes fusion weights of Fti . Note that these continuous values of fusion weights can conceal artifacts by avoiding abrupt changes in transition areas, unlike the discrete values of fusion weights [29], [30]. A scheme for determining appropriate fusion weights will be described in the following sub-sections. 1) Pixel Reliability of an Intermediate Interpolated Frame: To determine fusion weights for obtaining a trustworthy pixel value in Ft , we utilize the reliabilities of the pixel values at the same position in Fti . Since a pixel in Fti is interpolated by using several MVs, its reliability is related to the reliabilities

(13)

ηi

Here, represents the noise due to the errors of Fti . These errors mainly arise from incorrect MVs and imperfect interpolation. The fusion problem becomes an inverse problem to estimate the original image using four intermediate interpolated frames and the corresponding pixel reliability maps. The most probable estimate can be obtained by finding the maximum of the posterior distribution of the original image given intermediate interpolated frames, which is called a maximum a posteriori (MAP) approach. According to Bayes’ rule, the conditional posterior distribution can be represented as     P Ft |Ft1 , Ft2 , Ft3 , Ft4 ∝ P Ft1 , Ft2 , Ft3 , Ft4 |Ft P (Ft ) . (14) In Eq. (14), the left-hand side term represents the posterior distribution, and the right-hand side term consists of the likelihood and the prior distribution. In the right term, the likelihood distribution represents the consistency of intensity along the motion trajectory and the prior distribution models the intrinsic characteristics of the original image. a) Data energy: To define the data energy, we first observe the likelihood distribution. By assuming that the noise in Eq. (13) is independent among different observations, the likelihood distribution in Eq. (14) can be written as 4      P Fti |Ft . P Ft1 , Ft2 , Ft3 , Ft4 |Ft =

(15)

i=1

Given the observation model in Eq. (13), the conditional distribution P(Fti |Ft ) can be replaced with the noise distribution, P(ηi ). Meanwhile, the statistical characteristic of the noise is closely related with the reliability of the corresponding pixel. Hence, to examine the statistical characteristic of the noise, we conduct an experiment on six image sequences, Bus, Flower, Football, Foreman, Stefan, and Mobile, as illustrated in Fig. 7. After applying temporal interpolation, we construct normalized histograms of pixel errors between an intermediate interpolated frame and the original image according to the range of pixel reliability values, as shown in Fig. 8. For example, the first normalized histogram in Fig. 8 is constructed for the pixels whose reliability is

LEE et al.: FRUC BASED ON VARIATIONAL IMAGE FUSION

405

Fig. 9.

Fig. 7. Experimental procedure to determine the relationship between the pixel reliability and the statistics of noise.

Curve fitting to determine λ as a function of reliability.

To utilize the relationship between pixel reliability and statistical characteristics of noise distributions in determining a data energy, we try to represent noise distributions with a statistical model. To quantify the characteristics of a distribution, we perform distribution fitting. Since all the distributions in Fig. 8 are heavy-tailed, they may be modeled as a zeromean Laplacian distribution [31]. Hence, using a maximumlikelihood criterion, we fit the distributions to a zero-mean Laplacian distribution model, P (x) ∝ exp (−λ |x|) .

(16)

The obtained values of λ for each range of reliabilities are then plotted as red triangles according to the reliability, as shown in Fig. 9. The plot of λ can be modeled as a function of R. Since the curve in Fig. 9 is similar to an exponential function, a two-term exponential model is selected as a function of λ(R), i.e., λ (R) = a · exp (b · R) + c · exp (d · R) . (17) Here, the coefficients, a, b, c, and d are determined as 5.00 × 10−2 , 1.44, 1.77 × 10−6 , and 1.41 × 101 based on a weighted least squares regression. Using this function, we can reveal the statistics of noise for the obtained reliability. Thereby, we can represent the likelihood distribution in Eq. (15) in terms of pixel reliability. Using Eqs. (11), (13), and (16), the likelihood distribution of Eq. (15) can then be rewritten as   P Ft1 , Ft2 , Ft3 , Ft4 |Ft 4      1 i i exp − ⊗ F − F t t t 1 ci i=1 ⎞ ⎞ ⎞ ⎛ ⎛⎛ 4 4 

1 j⎠ j i ⎠ ⎠ ⎝− it ⊗ ⎝⎝ − F = exp W ⊗ F t t i c

= Fig. 8. Normalized histograms of noise according to the range of pixel reliability values.

i=1

bigger than or equal to 0.05 and less than 0.06. From the figure it is noted that the noise for higher reliabilities tends to be concentrated on zero; this seems reasonable given that the intensity of a more reliable pixel becomes closer to that of the original pixel. Hence, the statistical characteristic of the noise distribution may be described in terms of pixel reliability.

j =1

1

(18) where it denotes a map of λ corresponding to Rti , ci is a normalization factor for Fti and can be determined as N 2 it (x), and ⊗ denotes the component-wise multiplicax

tion. A negative log version of the likelihood distribution P in

406

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 1, JANUARY 2014

Eq. (18), which is called data energy, can then be described as   E D W 1, W 2, W 3, W 4    4   4

  j i j i  = t (x)  W (x) Ft (x) − Ft (x)dx. (19)  j =1  i=1 The likelihood distribution in Eq. (18) is maximized by minimizing E D in terms of fusion weights, W i . To avoid a singularity problem in the minimization, the L 1 -norm in Eq. (19) is modified using a convex function, (·) [32],   E D W 1, W 2, W 3, W 4 ⎛ 2 ⎞    4 4  ⎟ ⎜ j i j i ∼ t (x)  ⎝ W (x) Ft (x) − Ft (x) ⎠dx. (20) =  j =1  i=1 b) Smoothness energy: Since the data energy in Eq. (20) is minimized pixel-by-pixel, the obtained fusion weights become spatially independent. In addition, the pixel reliability has no spatial correlation among neighboring pixels. Therefore, the data energy minimization process can produce a large spatial deviation of fusion weights in intermediate interpolated frames. However, since occlusion areas can have different image pixel values among intermediate interpolated frames, as shown in Figs. 6(a)-(d), the large spatial deviation of fusion weights can cause undesired abrupt changes between neighboring pixels in Ft . Although these abrupt changes can be alleviated by imposing a smoothness constraint on the fused image Ft , the constraint can cause blurring in Ft . To prevent Ft from blurring as well as undesired abrupt changes, we instead impose a smoothness constraint on the fusion weights. We can thereby expect consistently high weights in occlusion areas of Ft1 or Ft4 , which is generated by using an outward MV field and is considered to have high pixel reliabilities in its occlusion areas. As a smoothness constraint, we adopt an L 2 -norm based regularization, which effectively suppresses the large variation of fusion weights. While the variation of fusion weights is suppressed inside the same object, the discontinuity of fusion weights should be maintained at the object boundary. To keep the discontinuity, anisotropic diffusion driven by the image gradient may be used for the L 2 -norm based regularization [33], [34]. In our work, however, since intermediate interpolated frames contain undetermined pixels, diffusion driven by the gradient of each intermediate interpolated frame may not be appropriate. Hence, using an anisotropic diffusion driven by the gradient of the fused image instead of the gradient of each intermediate interpolated frame, we define the smoothness energy as   E S W 1, W 2, W 3, W 4 ⎛ ⎛ ⎞ ⎞  4 4 2

j j ⎠ ∇W i (x) ⎠ ⎝ = g ⎝ W F ∇ (x) (x) dx. t 2 i=1 j =1 2

(21) Here, g(x) = and κ is a parameter that controls the diffusivity of fusion weights by changing the 1/(1+x 2/κ 2 )

shape of g(x). The function g(x) will prevent diffusion across object boundaries where the gradient magnitude is usually high. Note that in Eq. (21) anisotropic diffusion is identically applied to four fusion weights so that boundaries of fusion weights coincide with the boundaries of the fused image. The total energy for the image fusion, E F , is then defined by combining the data energy and the smoothness energy, as follows. E F = E D +β ·E S , s.t. 0 ≤ W i (x) ≤ 1 and

4

W i (x) = 1, (22)

i=1

where the parameter β controls the contribution of the smoothness energy to E F . Fusion weights W i (x) are then obtained by minimizing E F . Note that in Eq. (22) W i (x) values are normalized such that their sum becomes one. The energy defined in Eq. (22) can be compared with the one that was previously proposed by Keller, et al. [19]. In their energy, the data and the smoothness energy are based on the intensity constancy along MVs and the spatial regularity of a middle frame, respectively. However, the credibility of the data energy may be degraded in occlusion areas where the intensity constancy is invalid. In contrast, the proposed data energy can reduce the influence of invalid pixels by assigning low fusion weights to them. In addition, the smoothness energy based on the spatial regularity of middle frame may blur the image. On the other hand, by imposing the regularity on fusion weights rather than the image, the proposed smoothness energy can prevent image blurring. 3) Numerical Implementation for Determining Fusion Weights: Minimization of E F in Eq. (22) with respect to the fusion weights can be achieved by using an Euler-Lagrange equation regarding each fusion weight, W k (x), pixel by pixel. Namely, ⎡⎧ ⎛⎛ ⎞2 ⎞⎫ ⎪ ⎪ 4 ⎨ ⎬



∂ EF i  ⎜⎝ j j i⎠ ⎟ = ψ W F − F ⎣ ⎝ ⎠ t t ⎪ t ⎪ ∂Wk ⎭ i=1 ⎩ j ⎛ ⎞ ⎤

j ×⎝ W j Ft − Fti ⎠ Ftk ⎦ j

⎛ ⎛ ⎛ ⎞ ⎞ ⎞

j k⎠ ⎠ ⎝ −β · di v ⎝g ⎝ W j Ft ⎠ = 0. ∇ ∇W j 2

(23) For readability, we remove ‘(x)’ from the equation. The nonlinearity of   (·) and g(·) can be relieved by applying a fixed point iteration [35], as follows. ⎡⎧ ⎛

2 ⎞⎫ ⎤ ⎬ ⎨   ⎢ i ψ  ⎝  W j n Ft j − F i ⎠ ⎥ t t ⎢ 4 ⎭⎥  j ⎢⎩ ⎥ ∂ EF = ⎢ ⎥ n

 ∂ (W k ) ⎢ ⎥ i=1 ⎣   j n+1 j ⎦ × W Ft − Fti Ftk j



    k n+1   j n j = 0, Ft ∇ W W − β · di v g ∇ j 2 (24)

LEE et al.: FRUC BASED ON VARIATIONAL IMAGE FUSION

407

A. Parameter Selection

Fig. 10. Average PSNR and SSIM values of the proposed algorithm as a function of parameter β.

where n denotes an index for fixed point iteration. Based on a gradient descent method, we can write the evolution equations as  n+1,τ +1  n+1,τ ∂ EF Wk = Wk − μ  n . (25) ∂ Wk Here, τ is the time variable and μ is the step size. After each iteration, fusion weights are normalized so that the sum of fusion weights of each pixel will be one. Note that normalization is performed by subtracting the average of the updated amounts of weights from each weight, namely, 

Wk

n+1,τ +1

 n+1,τ +1 = Wk 4 & n+1,τ +1  n+1,τ '(

j W − − Wj 4.

(26)

j =1

Unlike normalization via division, this subtraction based normalization does not change the updating direction of fusion weights. Fusion weights determined through the iteration of Eq. (26) are then applied to Eq. (11) to determine Ft (x). IV. E XPERIMENTAL R ESULTS To examine the performance of the proposed algorithm, several experiments were conducted using various video sequences with the original frame rate of 30 frames per second. Even frames of the video sequences are obtained by interpolating odd frames using several FRUC algorithms including the proposed algorithm. The interpolated frames are then compared to the original even frames to examine the performances of the algorithms. For an objective evaluation, two metrics, the peak signal-to-noise ratio (PRNS) and structural similarity (SSIM) [36], are used. The average processing time for the CIF format is 19.60 seconds on an Intel Core i7 960 at 3.2 GHz with a single core with four frame buffers. This consists of motion estimation time of 15.09 seconds and motion compensated frame interpolation time of 4.51 seconds. The computation time may be reduced further via code optimization and/or parallel processing.

The proposed algorithm uses eleven parameters. Parameters α, γ , and ε are for optical flow estimation, τR, λe , andλv for MV reliability, σd for inward interpolation, δ for a collinearity test, β and κ for image fusion energy, and μ for optimization. All the parameters are empirically set to fixed values regardless of the video sequences. Parameters for optical flow estimation, α, γ , and ε are known to be robust to parameter variations [23]. Based on the analysis in [23] and the according experiment, we set α to 30 and γ to 70, respectively. The parameter ε is set to 10−3 as in [23], since it is a constant that hardly affects the performance. Parameters τR, λe , and λv in Eq. (6) are for MV reliability. The hard threshold value, τR , is empirically set to 0.05 so that a MV can be considered an outlier if either one of two Laplacian functions in Eq. (6) is less than 0.05. The value of λe affects the performance of temporal interpolation. If λe is too low, a MV with a small matching error may be undesirably considered an outlier via thresholding. However, if λe is too high, the reliability will fully depend on the spatial variation of MVs, which is considered undesirable. Based on this observation, λe is set to 50. Meanwhile, the value of λv is related with the maximum allowed variation of neighboring MVs, or the maximum sv. Since we empirically set the maximum sv to 3, the value of λv is set to 1 in order to remove MVs having variation larger than 3. For the inward interpolation, the value of σd in Eq. (9) should be determined. σd affects the resolution of the interpolated image. If the value of σd is too high, weights gd (x, x j ) become similar. Hence, the interpolated pixel becomes the average of neighboring projected pixels, and thereby the resolution of the interpolated image is reduced. If the value of σd is too low, the interpolated pixel value becomes the one of the nearest pixel. Hence, jagging artifacts appear. Based on this observation, we set the value of σd to 0.25. Parameter δ represents the maximum MV difference allowed in the collinearity test. Since δ corresponds to twice the pixel displacement between the two pixels that are projected to Fti using V1t (x) and V3t (x+V2t (x)), we set δ to 2 so that the maximum displacement can be less than or equal to one pixel. Parameters β and κ in Eq. (21) are related with the image fusion energy. To empirically determine the value of β, we examine the average PSNR and SSIM of the proposed algorithm according to β, as shown in Fig. 10. We then set β to 300 at which both peaks of the PSNR and SSIM graphs appear. The value of κ is heuristically set to 70 so that diffusion can be prevented at strong edges. Meanwhile, the value of μ in Eq. (25) is also heuristically set to 10−4 so that the step size is small enough for convergence of the optimization process. B. Comparison With Existing Algorithms In order to examine the performance of the proposed algorithm, we tested it using fourteen video sequences. The proposed algorithm is compared with four existing algorithms. To examine the performance of the proposed algorithm in occlusion areas, Dane’s, Wang’s, and Jacobson’s algorithms

408

Fig. 11. FRUC results of the 220th (a) original frame, (b) Dane’s (PSNR: (PSNR: 25.93dB, SSIM: 0.9239), (d) 0.9312), (e) Keller’s (PSNR: 24.69dB, (PSNR: 28.23dB, SSIM: 0.9573).

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 1, JANUARY 2014

frame of the Flower sequence using 26.40dB, SSIM: 0.9308), (c) Wang’s Jacobson’s (PSNR: 24.26dB, SSIM: SSIM: 0.9100), and (f) the proposed

[14]–[16] are used for comparison, because these algorithms are based on adaptive motion compensated frame interpolation. For a fair comparison, it is desirable to use the same MV field for all the algorithms. However, the three algorithms are based on block MVs, whereas the proposed algorithm is based on optical flows. Therefore, for the three algorithms, we use block MVs obtained by block averaging the optical flows that are estimated in the proposed algorithm. In addition to the three algorithms above, Keller’s algorithm is also compared in order to determine how well the optical flow is used for motion compensated FRUC. For optical flow estimation, Keller’s algorithm uses the same energy term as the proposed algorithm, and therefore we anticipate that the performance of motion compensated FRUC will mainly depend on the effectively optical flows are utilized. The visual quality is compared in Figs. 11–13. In Fig. 11, occlusion areas along the forward and backward directions are indicated with rectangles. As shown in the figure, Dane’s, Wang’s, and Jacobson’s algorithms, which deal with the occlusion problem, tend to interpolate the window of the house in the right rectangle better than Keller’s algorithm, which does not consider the occlusion problem (Compare Figs. 11(b)-(d) with Fig. 11(e)). However, the interpolated images in the rectangles are considerably blurred because inappropriate interpolation with forward and backward MVs leads to blending of wrong blocks. In contrast, the proposed algorithm provides a clear background image, as in Fig. 11(f), by giving appropriate weights to the intermediate interpolated frames that are interpolated with four MV fields. The problem is exacerbated when more than one occlusion area is located in close proximity. In Fig. 12, the rectangle includes two neighboring occlusion areas, which are caused by the bus and the lamppost, respectively. As shown in Figs. 12(b)–(e), the background between the bus and the column is blurred or falsely interpolated in the previous

Fig. 12. FRUC results of the 38th frame of the Bus sequence using (a) original frame, (b) Dane’s (PSNR: 26.49dB, SSIM: 0.9178), (c) Wang’s (PSNR: 27.00dB, SSIM: 0.9244), (d) Jacobson’s (PSNR: 26.33dB, SSIM: 0.9215), (e) Keller’s (PSNR: 21.72dB, SSIM: 0.8020), and (f) the proposed (PSNR: 27.92dB, SSIM: 0.9420).

Fig. 13. FRUC results of the 14th frame of the Foreman sequence using (a) original frame, (b) Dane’s (PSNR: 32.20dB, SSIM: 0.9361), (c) Wang’s (PSNR: 32.78dB, SSIM: 0.9399), (d) Jacobson’s (PSNR: 27.12dB, SSIM: 0.9237), (e) Keller’s (PSNR: 31.00dB, SSIM: 0.9297), and (f) the proposed (PSNR: 33.45dB, SSIM: 0.9432).

algorithms, while the proposed algorithm interpolates the background comparatively well. Meanwhile, the background blurring problem can also be observed in an occlusion area in the Foreman sequence in Fig. 13. The left eye behind the nose, enclosed in a solid rectangle in the figure, is interpolated more clearly by the proposed algorithm than by the previous algorithms. In addition to the artifacts in occlusion areas, blocking artifacts are also observed in the dashed rectangles in Figs. 13(b)–(c), because the corresponding algorithms are based on block MVs. Although Jacobson’s algorithm is also based on block MVs, its MV refinement process noticeably alleviates blocking artifacts, as seen in Fig. 13(d). Keller’s algorithm and the proposed algorithm are free from blocking artifacts, as shown in Figs. 13(e) and (f), because they are based on optical flow estimation rather than block matching.

LEE et al.: FRUC BASED ON VARIATIONAL IMAGE FUSION

409

TABLE II AVERAGE PSNR VALUES FOR VARIOUS T EST S EQUENCES

TABLE III AVERAGE SSIM VALUES FOR VARIOUS T EST S EQUENCES

TABLE IV M EAN O PINION S CORE (MOS) FOR VARIOUS A LGORITHMS

For an objective evaluation, the average values of the PSNR and SSIM are given in Tables II and III, respectively, for the full frames of the test sequences. As shown in Table II, the PSNR performance of the proposed algorithm is mostly better than that of the competing algorithms, except for in the cases of the News and Soccer sequences. In those sequences, there exist many articulated objects, for which optical flows are often incorrectly estimated. In this case, the proposed algorithm incorrectly interpolates the middle frame by selecting an incorrect value as a majority among four intermediate interpolated pixels. In contrast, Dane’s and Wang’s algorithms interpolate the middle frame by averaging two pixels, and

may thereby provide a slightly better FRUC result in terms of PSNR. Meanwhile, in terms of SSIM, the proposed algorithm outperforms the existing algorithms for all the test sequences, as presented in Table III. Since the SSIM index is known to effectively represent the structural similarity perceived by the human visual system, we can state that the proposed algorithm can preserve the structural information better than the existing algorithms. For a subjective evaluation of video performance, we perform the mean opinion score test, which is widely used to obtain a numerical indication of perceived quality [37]. The mean opinion score is expressed in a range of 1 to 5, where 1 is the lowest quality and 5 is the highest. The test is performed by showing the FRUC results of the proposed and benchmark algorithms to ten evaluators in a random order. The evaluators are researchers working in the area of image processing. The mean opinion scores are presented in Table IV. As shown in the table, the proposed algorithm provides better subjective

410

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 1, JANUARY 2014

TABLE V AVERAGE PSNR AND AVERAGE SSIM VALUES M EASURED IN O CCLUSION M ASKS

Fig. 14. Average PSNR and SSIM of all the test sequences using the proposed algorithm with (a) λ(R) = 1 & β = 0, (b) λ(R) = R & β = 0, (c) λ(R) of the curve & β = 0, and (d) λ(R) of the curve & β = 300.

quality than benchmark algorithms. In Tables II and III, the performance of the proposed algorithm is presented in terms of the PSNR and SSIM for the whole frame. Since the proposed algorithm focuses on improving the performance of motion compensated FRUC especially in occlusion areas, it would also be meaningful to compare the proposed algorithm with the others only in occlusion areas. We determine occlusion masks by differencing adjacent frames, thresholding the magnitudes of differences, and dilating the obtained areas, as in [38]. Based on those masks, the average PSNR and SSIM of the test sequences are obtained as given in Table V. As shown in the table, those values of the proposed algorithm in occlusion areas are noticeably higher than those of the existing algorithms except for a small number of sequences. Even in those sequences, the differences are not large. Therefore, we can emphasize that the proposed algorithm improves the motion compensated FRUC performance especially in occlusion areas. C. Effectiveness of the Data Energy and Smoothness Energy The contribution of reliability to the data energy for image fusion, which is defined in Eq. (20), is modeled as a curve consisting of two exponential functions, as delineated in

Fig. 15. Motion compensated FRUC results and the corresponding fusion weights of the 4th frame of the Foreman sequence using the proposed algorithm (a) without and (b) with the smoothness energy.

Eq. (17). To examine the effectiveness of the curve, it may be compared to the other forms, namely, λ(R) of 1 or R. Here, the former corresponds to the L 1 -norm based data energy without weighting, while the latter implies the direct use of reliability as the weights for the weighted-L 1-norm. In this comparison, by excluding the smoothness energy, we use only the data energy so that the performance change depending on the adopted λ(R) can be fairly examined. Figs. 14(a)-(c) show average values of SSIM and PSNR for different λ(R) values, respectively. As shown in Figs. 14(a) and (b), a λ(R) of 1 provides better performance than a λ(R) of R. This implies that the reliability is not linearly correlated with the error distribution, as we have examined in Section III. Meanwhile, the λ(R) of the adopted curve performs better than the others, as shown in Fig. 14(c). From this experiment, we can conclude that the data energy, which is based on a L 1 -norm that is properly weighted according to the reliability, contributes to the performance improvement exhibited by the proposed algorithm. To examine the usefulness of the smoothness energy described in Eq. (21), the average PSNR and SSIM obtained when using the proposed algorithm without and with the smoothness energy term are compared in the graphs in

LEE et al.: FRUC BASED ON VARIATIONAL IMAGE FUSION

411

TABLE VI AVERAGE PSNR AND AVERAGE SSIM VALUES OF R ESULTS OF THE P ROPOSED A LGORITHM U SING F OUR AND T WO MV F IELDS

the intermediate interpolated frames of two inward MV fields in Figs. 6(b) and (c) do not contain the window information, unlike the intermediate frame of an outward MV field in Fig. 6(d). For an objective evaluation, the average PSNR and the average SSIM values are compared between the two cases in Table VI. The table shows that the proposed algorithm using four MV fields mostly provides better objective performance than the algorithm using only two inward MV fields. V. C ONCLUSION

Fig. 16. FRUC results of the proposed algorithm by using (a), (b), (c) four MV fields and (d), (e), (f) two inward MV fields for (a), (d) the 220th frame of Flower, (b), (e) the 38th frame of Bus, and (c), (f) 14th frame of Foreman sequence, respectively.

Figs. 14(c) and (d), respectively. According to the graphs, the use of the smoothness energy term clearly improves the algorithm performance. Fig. 15(a) shows an interpolated frame and the corresponding fusion weights produced by the proposed algorithm without the smoothness term. As expected, fusion weights obtained without the smoothness term depend only on the reliabilities of the corresponding pixel, and their defects are directly related with the artifacts indicated by arrows in the figure. In contrast, fusion weights obtained with the smoothness term alleviate those artifacts, as shown in Fig. 15(b). D. Effectiveness of the Use of Outward Motion Vectors To examine the effectiveness of the use of outward MV fields in solving the occlusion problem, the proposed algorithm is tested using three sequences in presented Fig. 16. Figs. 16(a), (b), and (c) show three images that are obtained by using four MV fields, or two inward and two outward MV fields, while Figs. 16(d), (e), and (f) show the images obtained by using only two inward MV fields. Compared to the former images, the latter images do not provide desirable interpolation results in the occlusion areas. This means that two inward MV fields cannot provide correctly interpolated occlusion areas in their intermediate interpolated frames. For example,

In this paper, we proposed a new motion compensated FRUC framework designed especially to alleviate the occlusion problem. In the framework, we first estimate four sets of MV field using a modified optical flow estimation algorithm. We then construct the four intermediate interpolated frames by using the estimated MV fields and the reliability of the MV. Finally, we combine intermediate interpolated frames into a single interpolated frame by using a variational image fusion scheme. To define proper energy terms for variational image fusion, we observed the statistical relationship between error distributions of intermediate interpolated frames and the corresponding pixel reliability. Based on the observations, we establish a fitted curve function that associates the pixel reliability with a parameter of error distribution. Based on the fitted curve function, the data energy term is defined for image fusion. A smoothness energy term is also defined as prior information for image fusion. Experimental results show that the proposed algorithm can improve the performance of motion compensated FRUC in terms of both objective and subjective qualities, especially in occlusion areas. R EFERENCES [1] Y. Shimodaira, “Fundamental phenomena underlying artifacts induced by image motion and the solutions for decreasing the artifacts on FPDs,” in Proc. SID Symp., 2003, pp. 1034–1037. [2] T. Kurita, “Moving picture quality improvement for hold-type AMLCDs,” in Proc. SID Symp., 2001, pp. 986–989. [3] H. Pan, X. Feng, and S. Daly, “LCD motion blur analysis and modeling based on temporal PSF,” in Proc. SID Symp., 2006, pp. 1704–1709. [4] N. Kimura, T. Ishihara, H. Miyata, T. Kumakura, K. Tomizawa, A. Inoure, et al., “New technologies for large-sized high-quality LCD TV,” in Proc. SID, 2005, pp. 1734–1737. [5] T. Kim, B. Ye, C. P. Vu, N. Balram, and H. Steemer, “Motionadaptive alternate gamma drive for flicker-free motion-blur reduction in 100/120-Hz LCD TV,” J. Soc. Inf. Display, vol. 17, no. 3, pp. 203–212, 2009. [6] R. Castagno, P. Haavisto, and G. Ramponi, “A method for motion adaptive frame rate up-conversion,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 5, pp. 436–446, Oct. 1996.

412

[7] G. Dane and T. Q. Nguyen, “Smooth motion vector resampling for standard compatible video post-processing,” in Proc. Asilomar Conf. Signals, Syst. Comput., vol. 2. Nov. 2004, pp. 1731–1735. [8] B.-D. Choi, J.-W. Han, C.-S. Kim, and S.-J. Ko, “Motion-compensated frame interpolation using bilateral motion estimation and adaptive overlapped block motion compensation,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 4, pp. 407–416, Apr. 2007. [9] Y. Ling, J. Wang, Y. Liu, and W. Zhang, “Spatial and temporal correlation based frame rate up-conversion,” in Proc. Int. Conf. Image Process., Oct. 2008, pp. 909–912. [10] A.-M. Huang and T. Nguyen, “A multistage motion vector processing method for motion-compensated frame interpolation,” IEEE Trans. Image Process., vol. 17, no. 5, pp. 694–708, May 2008. [11] K. Hilman, H.-W. Park, and Y.-M. Kim, “Using motion-compensated frame-rate conversion for the correction of 3:2 pulldown artifacts in video sequences,” IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 6, pp. 869–877, Sep. 2000. [12] S. Hong, J. H. Park, and B. H. Berkeley, “Motion-interpolated FRC algorithm for 120Hz LCD,” in Proc. SID Symp., 2005, pp. 1892–1895. [13] O. A. Ojo and G. Haan, “Robust motion-compensated video upconversion,” IEEE Trans. Consumer Electron., vol. 43, no. 4, pp. 1045–1056, Nov. 1997. [14] D. Wang, A. Vincent, P. Blanchfield, and R. Klepko, “Motioncompensated frame rate up-conversion-Part II: New algorithms for frame interpolation,” IEEE Trans. Broadcasting, vol. 56, no. 2, pp. 142–149, Jun. 2010. [15] G. Dane and T. Q. Nguyen, “Optimal temporal interpolation filter for motion-compensated frame rate up conversion,” IEEE Trans. Image Process., vol. 15, no. 4, pp. 978–991, Apr. 2006. [16] N. Jacobson, Y. Lee, V. Mahadevan, N. Vasconcelos, and T. Q. Nguyen, “A novel approach to FRUC using discriminant saliency and frame segmentation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2924–2934, Nov. 2010. [17] C. W. Tang and O. C. Au, “Comparison between block-based and pixelbased temporal interpolation for video coding,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 4. May 1998, pp. 122–125. [18] R. Krishnamurthy, J. M. Woods, and P. Moulin, “Frame interpolation and bidirectional prediction of video using compactly encoded opticalflow fields and label fields,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 5, pp. 713–726, Aug. 1999. [19] S. H. Keller, F. Lauze, and M. Nielsen, “Temporal super resolution using variational methods,” in High-Quality Visual Experience: Creation, Processing and Interactivity of High-Resolution and High-Dimensional Video Signals, M. Mrak, M. Grgic, and M. Kunt, Eds. New York, NY, USA: Springer-Verlag, 2010. [20] B. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in Proc. 7th Int. Joint Conf. Artif. Intell., Aug. 1981, pp. 674–679. [21] E. B. Bellers, J. W. van Gurp, J. G. W. M. Janssen, J. R. Braspenning, and R. Wittebrood, “Solving occlusion in frame-rate up-conversion,” in Proc. Int. Conf. Consumer Electron., Jan. 2007, pp. 1–2. [22] Y. H. Cho, H. Y. Lee, and D. S. Park, “Temporal frame interpolation based on multi-frame feature trajectory,” IEEE Trans. Circuits Syst. Video Technol., to be published. [23] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” in Proc. 8th Eur. Conf. Comput. Vis., vol. 4. May 2004, pp. 25–36. [24] D. Sun, S. Roth, and M. J. Black, “Secrets of optical flow estimation and their principles,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2010, pp. 2432–2439. [25] D. Wang, A. Vincent, and P. Blanchfield, “Hybrid de-interlacing algorithm based on motion vector reliability,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 8, pp. 1019–1025, Aug. 2005. [26] S. Borman and R. L. Stevenson, “Simultaneous multi-frame MAP superresolution video enhancement using spatio-temporal priors,” in Proc. IEEE Int. Conf. Image Process., vol. 3. Oct. 1999, pp. 469–473. [27] A. Bruhn and J. Weickert, “A confidence measure for variational optic flow methods,” in Geometric Properties Incomplete Data, New York, NY, USA: Springer-Verlag, 2006, pp. 283–298. [28] N. Kwak, “Principal component analysis based on L1-norm maximization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 9, pp. 1672–1680, Sep. 2008. [29] Y. Xia and M. S. Kamel, “Cooperative learning algorithms for data fusion using novel L1 estimation,” IEEE Trans. Signal Process., vol. 56, no. 3, pp. 1083–1095, Mar. 2008.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 1, JANUARY 2014

[30] O. Féron and A. Mohammad-Djafari, “Image fusion and unsupervised joint segmentation using HMM and MCMC algorithms,” J. Electronic Imaging, vol. 14, no. 2, pp. 023014–023019, May 2005. [31] T. Aysal and K. Barner, “Quadratic weighted median filters for edge enhancement of noisy images,” IEEE Trans. Image Process., vol. 15, no. 11, pp. 3294–3310, Nov. 2006. [32] L. Bar, N. Sochen, and N. Kiryati, “Image deblurring in the presence of salt-and-pepper noise,” in Proc. 5th Int. Conf. Scale Space PDE Methods Comput. Vis., 2005, pp. 107–118. [33] B. Tang, G. Sapiro, and V. Caselles, “Diffusion of general data on nonflat manifolds via harmonic maps theory: The direction diffusion case,” Int. J. Comput. Vis., vol. 36, no. 2, pp. 149–161, 2000. [34] S. Ince and J. Konrad, “Occlusion-aware optical flow estimation,” IEEE Trans. Image Process., vol. 17, no. 8, pp. 1443–1451, Aug. 2008. [35] K. Soumyanath and V. Borkar, “An analog scheme for fixed-point computation-part II: Applications,” IEEE Trans. Circuits Syst. I, Fundamental Theory Appl., vol. 46, no. 4, pp. 442–451, Apr. 1999. [36] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004. [37] Subjective Video Quality Assessment Methods for Multimedia Applications, Int. Telecommun. Union, Geneva, Switzerland, Sep. 1999. [38] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski, “A database and evaluation methodology for optical flow,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8. [39] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski, “A database and evaluation methodology for optical flow,” Int. J. Comput. Vis., vol. 92, no. 1, pp. 1–31, 2011.

Won Hee Lee received the B.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2008. He is currently pursuing the Ph.D. degree. His research interests include frame rate up conversion, optical flow estimation, and stereo matching.

Kyuha Choi received the B.S. degree in electrical and electronic engineering from Yonsei University, Seoul, Korea, in 2006, and the Ph.D. degree in electrical engineering and computer science from the Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2013. He is currently with the Digital Media and communications Research and Development Center, Samsung Electronics Company, Suwon, Korea. His research interests include super-resolution, de-noising, and IR enhancement.

Jong Beom Ra received the B.S. degree in electronic engineering from Seoul National University, Seoul, Korea, in 1975, and the M.S. and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, in 1977 and 1983, respectively. From 1983 to 1987, he was an Associate Research Scientist with the Department of Radiology, Columbia University, New York. Since 1987, he has been with the Department of Electrical Engineering, KAIST, where he is currently a Professor. His research interests are digital image processing, video signal processing, 3-D visualization, 3-D display systems, and medical imaging.

Frame rate up conversion based on variational image fusion.

This paper presents a new framework for motion compensated frame rate up conversion (FRUC) based on variational image fusion. The proposed algorithm c...
3MB Sizes 0 Downloads 0 Views