5020

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Multifeature-Based Surround Inhibition Improves Contour Detection in Natural Images Kai-Fu Yang, Chao-Yi Li, and Yong-Jie Li Abstract— To effectively perform visual tasks like detecting contours, the visual system normally needs to integrate multiple visual features. Sufficient physiological studies have revealed that for a large number of neurons in the primary visual cortex (V1) of monkeys and cats, neuronal responses elicited by the stimuli placed within the classical receptive field (CRF) are substantially modulated, normally inhibited, when difference exists between the CRF and its surround, namely, non-CRF, for various local features. The exquisite sensitivity of V1 neurons to the center-surround stimulus configuration is thought to serve important perceptual functions, including contour detection. In this paper, we propose a biologically motivated model to improve the performance of perceptually salient contour detection. The main contribution is the multifeature-based centersurround framework, in which the surround inhibition weights of individual features, including orientation, luminance, and luminance contrast, are combined according to a scale-guided strategy, and the combined weights are then used to modulate the final surround inhibition of the neurons. The performance was compared with that of single-cue-based models and other existing methods (especially other biologically motivated ones). The results show that combining multiple cues can substantially improve the performance of contour detection compared with the models using single cue. In general, luminance and luminance contrast contribute much more than orientation to the specific task of contour extraction, at least in gray-scale natural images. Index Terms— Contour detection, non-classical receptive field, surround inhibition, cue combination.

I. I NTRODUCTION

C

ONTOUR detection is one of the fundamental tasks in computer vision applications, such as image segmentation and shaped-based object recognition [1]. Unlike edge

Manuscript received August 23, 2013; revised February 9, 2014, June 16, 2014, and August 29, 2014; accepted September 22, 2014. Date of publication October 1, 2014; date of current version October 21, 2014. This work was supported in part by the Major State Basic Research Program under Grant 2013CB329401, in part by the Natural Science Foundations of China under Grant 61375115, Grant 61075109, Grant 91120013, and Grant 31300912, in part by the Outstanding Doctoral Support Program, University of Electronic Science and Technology of China, Chengdu, China, under Grant YBXSZC20131041, in part by the 111 Project of China under Grant B12027, and in part by the Program for Changjiang Scholars and innovative Research Team in University of China under Grant IRT0910. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Adrian G. Bors. (Corresponding author: Yong-Jie Li.) K.-F. Yang and Y.-J. Li are with the School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China (e-mail: [email protected]; [email protected]). C.-Y. Li is with the School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, and also with the Center for Life Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2361210

which refers to any abrupt change in local image features such as luminance and color, contour represents changes from object to background or one surface to another [2]. A large number of methods have been proposed for edge or contour detection in recent decades. Typical examples include local differential [3], [4], phase congruency [5], [6], active contours [7], [8], statistical inference [2], [9] (more methods, see [1] for a review). Though considerable progress has been made regarding the accuracy and efficiency of contour detection, it is still challenging for the computational models to perform as intelligently as the human visual system when extracting salient contours from complex scenes [1]. Contour perception is also one of the fundamental functions of human visual system. It has been widely believed that the primary visual cortex (V1) contributes much to edge detection and contour perception [10]–[22]. The pioneering work of Hubel and Wiesel in the early 1960’s [10] reveals that the majority of neurons in V1 are exquisitely sensitive to oriented bars or edges in the classical receptive field (CRF), a limited area of visual space. Subsequently, extensive neurophysiological findings on monkeys and cats clearly indicate a peripheral region beyond CRF, known as non-classical receptive field (non-CRF), can modulate the spiking response of a V1 neuron to the stimuli placed within the CRF [11]–[22]. The modulation of non-CRF is generally called contextual influence or center-surround interaction. For most V1 neurons, the surround (non-CRF) exhibits inhibitory influence and the strength of inhibition is sensitive to various feature differences between the CRF and non-CRF, such as the difference in orientation [23]–[25], in spatial frequency [24], [26], in luminance, contrast and color [24], in relative moving speed [24], [27], and in spatial phase [28]. Fig. 1 shows the surround modulation properties of V1 neurons responding to different feature patterns, including orientation, luminance and luminance contrast. The neuronal responses are strongly inhibited when the stimuli within the CRF and nonCRF share similar features. The inhibition decreases or even disappears with the increment of feature difference [24], [28]. These findings suggest that the surround inhibition of V1 neurons may be an important neural basis for the visual system to perceive objects in cue-invariant ways [24]. It has been generally accepted that contextual modulation plays an important role in various visual perception tasks, such as contour integration [29]–[32], pop-out [16], [33], and figureground discrimination [21], [33]. Especially, contour plays an important role in advanced visual perception tasks, because it conveys much of the important information of object [1]. Although extracting contours from complex natural scenes

1057-7149 © 2014 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

YANG et al.: MULTIFEATURE-BASED SURROUND INHIBITION IMPROVES CONTOUR DETECTION

5021

Fig. 1. The properties of V1 neurons responding to different feature patterns: orientation, luminance and luminance contrast. Higher similarity between the center and surround features corresponds to stronger surround inhibition, and hence lower neuronal response. Adapted from [24].

is difficult for the existing computer vision systems, our visual system can accomplish it quite quickly and accurately, and it has been arousing great interests among researchers of different fields to understand the underlying biological mechanisms. Based on the findings in the physiological studies mentioned above, several biologically inspired models have been recently proposed for contour detection [34]–[44] (more methods, see [1] for a review). These models are capable of enhancing perceptually salient contours in cluttered natural scenes more effectively than some traditional edge detectors [3], [45], [46]. However, most of these biologically-inspired models are based on surround modulation and only emphasize the orientation tuning property of surround, and hence, it is difficult for them to extract the contours defined by other cues, though some of them try to improve the capability of detecting contours by introducing multi-scale strategy [36], [40], [43]. Integrating the resource of multiple scales is an efficient way to improve the contour detection [36], [40], [43], [47], [48], because the important contours are usually present at different scales with varying sizes of objects. In addition, another strategy of the visual system to cope with complex scenes is to integrate multiple visual features. In natural environments, contour detection is extremely difficult because of the complex physical conditions, such as the poor lighting, shadow, and haze. However, biological visual systems can overcome these problems and extract reliable boundaries by combining various local visual cues [49], [50]. The idea of combining multiple cues has been studied in various visual estimation problems, such as spatial feature detection [51], orientation discrimination [52], texture edge localization [53], [54] and contour detection [2], [55]. From the perspective of engineering, there are many existing algorithms that exhibit excellent capability of boundary detection by integrating the information of multiple scales or cues. Typically, Martin et al. [2] proposed the Pb method to extract and localize boundaries by taking into account multiple cues (including color, brightness and texture). Subsequently, other Pb-based algorithms have been proposed to improve the boundary detection by taking multiple scales [47], [48] or global information [48], [56]. These algorithms usually

integrate the information of multiple scales or cues with a certain supervised learning method. In this paper we are concerned with both of the contextual modulation and cue combination in the visual system, as well as the different contribution of various local features for contour detection in cluttered natural scenes. In our model, the weights of selective surround inhibition from non-CRF are first computed for each individual cue including orientation, luminance and luminance contrast. Then a scale-guided combination strategy is designed to combine the inhibition weights of different cues under the control of statistical information about CRF responses at two different scales. The combined inhibition weights are then used to modulate the strength of surround inhibition to obtain the final responses to salient edges. The source code is available at http://www.neuro.uestc.edu.cn/vccl/. The remaining of this paper is organized as follows. In Section II, we describe the details of cue extraction, surround inhibition, weight combination of various types of inhibition, and the full multiple-cue inhibition model. In Section III, we evaluate the performance of the proposed model on RuG40 and BSDS300 datasets. Finally, we discuss our model and draw conclusions in Section IV. II. C ONTOUR D ETECTION M ODELS The visual information processing begins with the photoreceptors in retina, which sample the local information of the external visual world by transforming optical signals into neural responses. In the early stages of vision, information processing is often conceived as coding the neural representation of natural scenes by a set of elementary features. Neurons in V1 can be conceived as a set of neural filters to extract specific local features in limited spatial regions [57]. Marr’s computational theories [58] suggest that local information is abstracted into the symbolic representation of objects by a series of processing stages. However, how the visual system perceives the global structures is still confusing [29], [30], [59], [60]. In this study, we propose a new model to execute the extraction and combination of local features with centersurround interaction for the specific task of contour detection.

5022

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 2. The general framework of the proposed model. The final response at each location is computed by subtracting the surround inhibition from the CRF response, where the surround inhibition is modulated by the weight that is computed using a scale-guided combination of three types of inhibitory weights derived from the corresponding local cues.

The general framework of the proposed model is shown in Fig. 2. We first compute the responses of V1 neurons to the stimuli placed within the CRF. The three types of local cues (orientation, luminance, and luminance contrast) are extracted from the original image, and then the corresponding inhibitory weights at each location are computed based on the center-surround difference. Then, a scale-guided combination strategy (see details in Fig. 4) is designed to combine the three types of inhibitory weights into a unified one, which is then used to modulate the surround inhibition when calculating the final response at each location by subtracting the surround inhibition from the CRF response.

and the standard deviation σ determine the ellipticity and the size of CRF, respectively. In this paper, we set γ as 0.5 on the basis of physiological finding [62]. At each location, a pool of neurons with Nθ different preferred orientations θi are employed to process the local stimuli

A. CRF Response of V1 Neuron

where ∗ denotes the convolution operation. Then, at each location, a winner-take-all strategy is performed, i.e., the maximum CRF response over all Nθ cells with different orientations is selected as the final CRF response, which is written as

For the orientation-selective V1 neurons modeled in this study, we use the derivative of 2D Gaussian function to describe their properties responding to the stimuli within CRF [44], [57], [61]. The derivative of 2D Gaussian function can be expressed as ∂g( x,  y; θ, σ ) (1) R F(x, y; θ, σ ) = ∂ x 1 y2  x 2 + γ 2 g( x,  y; θ, σ ) = ex p(− ) (2) 2 2 2πσ 2σ where ( x,  y) = (xcos(θ )+ ysi n(θ ), −xsi n(θ )+ ycos(θ )), θ is the preferred orientation of a neuron. The spatial aspect ratio γ

θi =

(i − 1)π , i = 1, 2, . . . , Nθ Nθ

(3)

For an input image I (x, y), the CRF response of a V1 cell with preferred orientation θi is computed as ei (x, y; θi , σ ) = |I (x, y) ∗ R F(x, y; θi , σ )|

E(x, y; σ ) = max{ei (x, y; θi , σ )|i = 1, 2, . . . , Nθ }.

(4)

(5)

B. Extraction of Local Cues In this subsection, we will describe how to extract the multiple local visual cues, i.e., local orientation (x, y), local luminance L(x, y), and local luminance contrast C(x, y).

YANG et al.: MULTIFEATURE-BASED SURROUND INHIBITION IMPROVES CONTOUR DETECTION

We code the orientation information of the local stimuli around (x, y) with a vector as T (x, y) = [e1 , e2 , . . . , e Nθ ](x,y)

(6)

This vector will be used to compute the orientation difference of the stimuli between CRF and non-CRF. This definition of orientation information of local stimuli at each location with a vector is quite different from that adopted in previous models [34]–[37], which normally define the preferred orientation of a neuron with the maximum CRF response over Nθ neurons as the orientation of the local stimuli. Such computation of local orientation is sensitive to noise, and incorrect orientation estimate would result in ineffective surround inhibition. In contrast, we code the orientation information of local stimuli as a vector which most likely provides accurate orientation difference computation in a noise-insensitive way. This orientation coding strategy will be further demonstrated in the following subsection. In addition, local luminance and luminance contrast are also important visual features for understanding natural scenes. Many studies have reported the statistics of local luminance and contrast in natural images, but with inconsistent conclusions. For example, Mante et al (2005) and Frazor and Geisler (2006) claimed that local luminance and luminance contrast are independent or weakly dependent with each other in the early visual system and in natural scenes [63], [64]. However, Lindgren et al (2008) revealed the strong spatial dependence between local luminance and luminance contrast in natural images [65]. In this work, we evaluate the contribution of local luminance and luminance contrast in contour detection with surround inhibition. Following the methods in [63]–[65], we measure the luminance and luminance contrast in local patches formed by a raised cosine weighted window      π (7) (x i − x)2 + (yi − y)2 + 1 w(x i , yi ) = 0.5 cos δ where δ is the radius of local square window Sx y , (x i , yi ) is the location of the i th pixel in the patch with the center at (x, y). In this study, δ is 5, i.e., Sx y has a window size of 11 × 11 pixels. The local luminance and root-meansquare (RMS) defined luminance contrast are calculated according to  1 w(x i , yi )I (x i , yi ) (8) L(x, y) = μ (x i ,yi )∈Sx y

 (I (x i , yi ) − L(x, y))2

1 C(x, y) = w(x i , yi ) μ L(x, y)2 (x i ,yi )∈Sx y



(9)

where μ = (x i ,yi )∈Sx y w(x i , yi ). Note that the values of luminance and luminance contrast are linearly normalized to [0, 1] (i.e., L, C ∈ [0, 1]) for convenience of computation. C. Surround Inhibition of Individual Features In this subsection, based on the maps of visual cues computed earlier, we will describe how to compute the weights

5023

of surround inhibition for individual cues at each location, i.e., the orientation-selective surround inhibition weight W (x, y), the luminance-selective surround inhibition weight W L (x, y), and the luminance contrast-selective surround inhibition weight WC (x, y). The strength of surround inhibition normally decreases with the increasing distance from the CRF center [14], [15], [18]. In this study we adopt the distance-related weighting function as [34], [35], [37], [43] Wd (x, y) =

DOG + σ,ρ (x, y)

DOG + σ,ρ (x, y)1

(10)

where  · 1 denotes the L 1 norm, and DOG + σ,ρ (x, y) is calculated as   x 2 + y2  1 DOG + (x, y) = H ex p − σ,ρ 2 2π(ρσ ) 2(ρσ )2  x 2 + y 2  1 ex p − − (11) 2πσ 2 2σ 2

0 z 0 Wcom (x, y) = mi n(W , W L , WC )(x, y), E(x, y) ≤ 0 (18)

E(x, y) = N(E(x, y; σ )) − N(E(x, y; 2σ ))

where the standard deviation σ θ establishes the sensitivity of inhibitory strength with orientation difference. We experimentally set σ θ = 0.2 in this paper. Similarly, for the features of luminance and luminance contrast, the weights of feature-selective surround inhibition from the neuron at (x i , yi ) in the non-CRF to the neuron at (x, y) in the CRF are respectively given by  

L(x, y, x i , yi )2 W l (x, y, x i , yi ; σ l ) = ex p − 2 2σ l  

C(x, y, x i , yi )2 (15) W c (x, y, x i , yi ; σ c ) = ex p − 2 2σ c where the standard deviations σ l and σ c establish the sensitivities of inhibitory strength with the feature differences of luminance and luminance contrast, respectively. In this study we experimentally set σ l = σ c = 0.05.

L(x, y, x i , yi ) and C(x, y, x i , yi ) are respectively the feature differences of luminance and luminance contrast between the stimuli at (x i , yi ) in non-CRF and (x, y) in CRF, which are computed as

L(x, y, x i , yi ) = |L(x, y) − L(x i , yi )|

C(x, y, x i , yi ) = |C(x, y) − C(x i , yi )|

(16)

(19)

where N(·) is a linear normalization operator. For noise smoothing, gaussian filtering is applied to the E(x, y) map. Note that, since E(x, y) is just the difference of two “blurred” responses, it is sufficient to remove some isolated points on E(x, y) map with a simple Gaussian filter, though a good image denoiser, e.g., some edge-preserving image smoothing methods [67]–[70], provides valuable preprocessing for contour detection. Why such weight combination strategy described by (18) and (19) is valid, the reason is explained as follows. Generally, the gradient magnitude map at a large scale (i.e., the CRF responses E(x, y; 2σ )) includes reliable contours, but misses the detailed edges and may be inaccurate in localization. In contrast, the CRF responses at a fine scale (i.e., E(x, y; σ )) covers more details, also including the undesired edges in cluttered or textured regions. In the E(x, y) map reflecting the difference of (normalized) edge responses at two scales, the pixels with E(x, y) > 0 more likely belong to undesired non-meaningful edges than perceptually salient contours; hence, the weights of surround inhibition should be stronger at the locations with E(x, y) > 0, which is realized by selecting the maximum of W , W L , and WC , as done in (18). In contrast, the pixels with E(x, y) ≤ 0 more likely belong to perceptually salient contours than unwanted

YANG et al.: MULTIFEATURE-BASED SURROUND INHIBITION IMPROVES CONTOUR DETECTION

5025

Fig. 4. The flowchart of multiple feature based computation of surround inhibition weights. The difference of CRF responses (i.e., E(x, y)) at two different scales (i.e., E(x, y; σ ) and E(x, y; 2σ )) are used to guide the combination of surround inhibition weights of three individual features (i.e., W , W L , and WC ) to obtain the final surround inhibition weights of the neurons responding to the input scenes (i.e., Wcom (x, y)).

detailed edges, and therefore, the weights of surround inhibition should be weaker at such locations, which is realized by choosing the minimum of W , W L , and WC , as done in (18). Taken together, the strategy described above may endow our model with the ability to extract more perceptually salient contours and suppress more undesired trivial edge fragments, compared to the model using only single cue, considering the high complexity of natural scenes. Note that, the idea of our “scale based” guiding described in Fig. 4 seems, to some extent, similar to that of well-known “unsharp masking” since we also utilize information at two different scales. Our strategy, however, is distinguished from others in the following ways. We utilize the difference between the two “images” at two different scales (i.e., E(x, y; σ ) and E(x, y; 2σ )), here the “images” E(x, y; σ ) and E(x, y; 2σ ) are not the images blurred by different Gaussian filters, but the neuronal responses to edges at two different scales. That is, E(x, y) denotes the difference of edge responses at two scales. In contrast, the difference between the original image (one scale) and its Gaussian blurred version (another scale) in the technique of “unsharp masking” reflects the edges that need to be enhanced. In short, our E(x, y) is quite different from “unsharp masking” in its meaning and role. Another point to be mentioned is that our model can not be classified into true multiresolution-based ones, since we do not directly extract edges at multiple scales, but just use the multiscale (two-scale) information to determine the surround inhibition weights before extracting final edges at one (fine) scale, as described in the following section. E. Salient Contour Extraction Considering the fact that the fine-scale edge information includes more details with more accurate localization, we first compute the non-specific (i.e., isotropic) surround inhibition by convoluting E(x, y; σ ), the CRF response map at the fine

scale, with Wd (x, y), the distance-based weighting function I nh(x, y) = E(x, y; σ ) ∗ Wd (x, y)

(20)

Then, we construct the final neuronal responses to the salient contours by subtracting the combined weight (i.e., Wcom (x, y)) modulated surround inhibition from the CRF responses at the fine scale according to r (x, y) = H (E(x, y; σ ) − α · Wcom (x, y) · I nh(x, y))

(21)

where H (·) is defined in (12). The factor α denotes the connection strength between the neurons within the CRF and its surrounding non-CRF. III. E XPERIMENTAL R ESULTS In this section, we first give experimental results on a natural image dataset (RuG40) [34] to demonstrate how multifeaturebased surround inhibition in our model offers advantages over single-feature based surround inhibition. We tested our model using the whole RuG40 dataset with a population of 40 natural images, and quantitative performance was compared with existing models based on the ground truth contour maps. We also tested our model’s performance on the BSDS300 dataset [71]. Table I summarizes the meanings of the parameters involved in our model. The parameter settings are also listed in Table I, which are identical for both RuG40 and BSDS300 datasets except p. In this study, we select the values of α as α ∈ [1.0, 5.5] (for RuG40 and BSDS300 datasets) and p as p ∈ [0.1, 1.0] (for RuG40 dataset only) to test the robustness of various models, because these two parameters exert relatively high influences on the overall performance. A. Inhibitory Effects When computing gradient magnitude map, we can find that many unwanted texture edges exist in the CRF response

5026

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

TABLE I T HE S UMMARY OF PARAMETER M EANINGS AND S ETTINGS

strongly to perceptually salient contours, but is relatively insensitive to textures (Fig. 5(f)). We also evaluated the performance of our models with individual-cue-based inhibition. We obtained the response of the models with individual cue inhibition by replacing Wcom (x, y) in (21) with the individual cue based inhibitory weight in (14) or (17). Note that, we use the symbol MCI to denote the operator with multiple-cue inhibition, and OI, LI, and CI to denote the operators with orientation, luminance, and luminance contrast inhibition, respectively. We also use Noinh to denote the operator without surround inhibition, i.e., the gradient magnitude operator at the fine scale. B. Experiments on RuG40 Dataset Fig. 5. Demonstration of inhibitory effects. (a) input image, (b) gradient magnitude map at the fine scale, i.e., the neuronal responses to the stimuli within CRF, (c) the isotropic surround inhibition (with distance-related weighting), (d) the inhibitory weight combined based on multiple cues, (e) the surround inhibition modulated by the combined weight, (f) the final responses to salient contours. For (b) and (f), higher grey levels correspond to stronger neuronal responses. For (c)∼(e), higher grey levels correspond to higher inhibitory weights or surround inhibitions.

map (Fig. 5(b)). In order to suppress textures, the isotropic inhibition model (ISO) [34] computes the inhibition term by considering the distance-related influence (Fig. 5(c)). With the absence of feature-selective modulation, the isotropic inhibition suppresses texture efficiently, but meanwhile it also inhibits some perceptually salient contours, especially the contours embedded in cluttered background. In our model, the final inhibition term (Fig. 5(e)) is modulated simultaneously by the distance-(Fig. 5(c)) and cue-based weights (Fig. 5(d)). From Fig. 5(d), we can clearly see that at the locations of object contours (e.g., the contour of rhinoceros), the inhibition strengths are generally weaker; in contrast, the texture regions (e.g., the grasses) receive relatively stronger inhibition. Consequently, our inhibition strategy is more efficient for texture suppression and contour protection. Fig. 5(e) shows that the locations of contours receive weaker inhibition than other locations. Hence, the proposed model can respond

We tested our model on an available contour detection dataset, RuG40 dataset [34], which includes 40 natural images and associated ground truth binary contour maps drawn by hand. The dataset can be downloaded from http://www.cs.rug.nl/~imaging/databases/contour_database/. This dataset has been widely used to evaluate the performance of contour detectors [34], [36], [38], [43], [57]. In order to quantitatively evaluate the performance of our method, the binary contour maps were constructed using the standard procedure of non-maxima suppression followed by hysteresis thresholding [3], [34]. We measured the similarity between the binary contour map detected by a model and the known ground truth (human-marked contours), according to [34] car d(E) (22) P= car d(E) + car d(E F P ) + car d(E F N ) where car d(S) denotes the number of elements of the set S. E refers to the set of contour pixels that are correctly detected by an algorithm. E F P and E F N refer to the set of false positive pixels and false negative pixels, respectively. Therefore, a higher measure P reflects a better overall performance of contour detection. We compared our model with another surround inhibition-based model (isotropic inhibition, ISO) [34], and the operator without surround inhibition (Noinh). In this experiment, 100 parameter combinations (10 values of α within [1.0, 5.5] and 10 values of p within [0.1, 1]) were

YANG et al.: MULTIFEATURE-BASED SURROUND INHIBITION IMPROVES CONTOUR DETECTION

5027

Fig. 6. The mean performance of multiple models over all the images of the whole RuG40 dataset with various parameter combinations. Each image provides a performance measure of P, and each bar in the figure represents the mean P over all the images with one parameter combination (α, p). The notion of P@(α, p) below each panel illustrates the best mean P obtained using the parameter values of α and p on all the images.

Fig. 7. Results of various models on two example images from RuG40 dataset [34] using the optimal parameter setting derived on the whole dataset (see Fig. 6). The number on the top of each image indexes its performance (P value).

selected for further statistical analysis. In Fig. 6, The heights of bars reflect the mean values of P over total 40 images in RuG40 dataset with different operators. The optimal performance P and the corresponding parameters of α and p over all images are listed at the bottom of each graphic with a format of P@(α, p). Form Fig. 6, the surround inhibition based operators substantially improve the performance of contour detection compared to the operator of Noinh. In addition, the methods with cue-selective inhibition (OI, LI, CI, and MCI) achieve higher P values than the model with isotropic surround inhibition (ISO). In particular, different from the previous reports [34], [37], [66], the model with OI outperforms the model with ISO because of the novel computation of orientation difference (see Section II). More importantly, the model of MCI obtains the best performance.

Two example images are listed in Fig. 7. In general, the operator of Noinh responds to abundant of edges in texture regions. Surround inhibition can suppress the textures efficiently, while ISO still has weak capability of removing texture edges. In contrast, operators with cue-dependant surround inhibition (OI, LI, and CI) show different properties when detecting contours, For instance, CI misses some low contrast defined contours, but suppresses more textures. Finally, MCI usually obtains the best contour detection results by combining multiple cues. The results suggest that the integration of multiple cues resulted in improvement of performance and robustness of contour detection system. Fig. 8 illustrates more comparisons of our MCI operator with other recent biologically-inspired models, including Butterfly-shaped inhibition [72] and Adaptive inhibition [43]. For Butterfly-shaped inhibition, we selected the parameter

5028

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

TABLE II Q UANTITATIVE C OMPARISON OF VARIOUS M ODELS ON THE BSDS300 G RAYSCALE I MAGES W ITH F-S CORE

Fig. 8. Quantitative comparison of various models on the whole RuG40 dataset. The blue and red bars are respectively for the mean P values over all images of various models with optimal parameter setting for the dataset as a whole and with respective optimal parameter setting for each of the individual images in the dataset. Besides the models compared in Figs. 6 and 7, the existing models of Butterfly-shaped inhibition [72], Adaptive inhibition [43] are also compared here.

as σ = 2.0, and other parameters were the same as those listed in Table I in [72]. For Adaptive inhibition, we also set the parameters as same as ref. [43]. For these two existing methods, we took 10 different values of p within [0.1, 1], as done in our model. Fig. 8 clearly shows that the results with image-specific optimal parameters (red bars) are substantially better than that with the optimal parameter setting for the whole dataset (blue bars). More importantly, in both situations, the model of MCI shows obvious improvement over other models, except the models of butterfly-shaped inhibition [72] and adaptive inhibition [43] with optimal parameters for the whole dataset. The good performance of the models of butterfly-shaped inhibition and adaptive inhibition could be attributed to the dividing of the whole surrounding non-CRF into four subregions, which provides a more flexible way to adapt to local features. In addition, Fig. 9 shows more results on two example images, which indicate again that the proposed MCI model exhibits excellent capability of suppressing textures and detecting perceptually salient contours. C. Experiments on BSDS300 Dataset We further tested our models on Berkley Segmentation Dataset (BSDS300) [71], which includes 300 images and associated 5∼10 human-marked segmentations for each image. Fig. 10 presents several examples of the probability of contours computed by our model (in (21)) followed by the operation of non-maxima suppression. We used the same parameter settings as that on RuG40 dataset except p, because hysteresis thresholding was not used in this experiment. The first and second columns of Fig. 10 show three images from BSDS300 dataset and their human-marked boundaries. The third to fifth columns show the results of three state-of-the-

art algorithms: Global Probability of Boundary (gPb) [48], Pb(BG+TG) [2], and Pb(BG) [2]. The last column shows the results of our MCI model, indicating that our MCI responds to less texture edges. Note that, some edges are considered as boundaries in ground truth of BSDS300. For example, the test image in the bottom row of Fig. 10 clearly shows that some subjects have marked the edges of grasses as contours, and the ground truth was constructed by overlapping all the markers of different subjects. Quantitative comparison was also carried out using the test set of 100 images in BSDS300. We computed the so-called F-Score, defined as F = 2P R/(P + R) [2]. P and R represent precision and recall, respectively. F-Score signals the similarity between the contours identified by the algorithm and human subjects. From Table II, our MCI operator gets the score of 0.62, which is higher than other surroundinhibition-based methods, including the methods proposed in this paper (OI, LI, and CI) and ISO [34]. Our MCI operator (F = 0.62) also scores higher than another recent biologicallyinspired model, PC/BC-V1+lateral+texture [57] (F = 0.61). The model of PC/BC-V1+lateral+texture scores 0.61 under the condition that recurrent lateral excitatory connections and other special lateral connections are incorporated into the classical PC/BC model of V1 [57]. In contrast, our model only uses the surround inhibition, which could be regarded to be equivalent to the special lateral connections used in the model of PC/BC-V1+lateral+texture. In other words, our model does not introduce the similar mechanism as recurrent lateral excitation. This reveals that our model could be extended by integrating facilitatory surround modulation, considering the special role of collinear facilitation of V1 neurons in visual processing [20], [59], [73], [74] and its successful applications in computational modeling [37], [41], [42], [75], [76]. The scores of our methods are below those of some learning-based algorithms, such as gPb [48], mPb [47], Pb(BG+TG) [2] and BEL [77]. However, these methods need extra supervised learning processing, and they normally employ more cues or image scales. In fact, our proposed models use only image intensity as the source of contour detection. In our methods, the multiple cues and scales are used to guide the surround inhibition to remove textures

YANG et al.: MULTIFEATURE-BASED SURROUND INHIBITION IMPROVES CONTOUR DETECTION

5029

Fig. 9. Visual comparison of MCI with other biologically-inspired models, including Butterfly-shaped inhibition [72], Adaptive inhibition [43], and PC/BC-V1+lateral+texture [57]. The images at the top and bottom rows are from RuG40 dataset [34] and BSDS300 dataset [71], respectively.

Fig. 10.

Results of various models on three example images from BSDS300 dataset [71].

from the fine scale gradient magnitude map, which only employs intensity edges at single scale. Finally, the MCI model (F = 0.62) outperforms the Pb(BG) method (F = 0.60) that uses only same intensity information. More comparisons are listed in Table II. Note that the value of parameter α used in (21) is also listed for each model. IV. D ISCUSSION AND C ONCLUSION At the level of individual neurons, classical receptive fields (CRFs) respond to all the main edges including both desired and undesired ones, and the surround inhibition from

non-CRF contributes to the suppression of undesired edges. To this point, appropriately weighting the surround inhibition is crucial. Focusing on this central aspect, this work proposed a new framework for contour detection based on the surround inhibition in multiple visual feature dimensions, with emphasis on how to determine the surround inhibition weights with the resource of multiple features. The main contribution of the work could be summarized as follows. (a) A center-surround interaction based framework was designed to integrate various features for the task of contour detection. To our knowledge, this is the first attempt to build a unified model combining multiple features with the

5030

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

center-surround mechanism of the visual system. Other visual features could be easily integrated into this unified framework in the future. (b) A new strategy was designed to weight the surround inhibition by combining the surround inhibition of individual features. This strategy employs the difference of neuronal responses at two different scales, which provides an intuitive and effective way to determine where most probably exist the salient contours and undesired edges. (c) Our results show that anisotropic (feature-selective) surround inhibition works better than isotropic surround inhibition when extracting salient contours. Note that the work in [34] shows opposite conclusion. We attribute the better performance of anisotropic model to our new orientation coding scheme which, to some extent, could code texture patterns. (d) Our results also show that compared to the widely used feature of orientation, luminance and luminance contrast provide much more contributions to contour detection, at least on the grey-level images. This suggests that low level cues play different roles in contour perception in the visual system. The luminance and luminance contrast perhaps play a major role in suppressing textures and detecting existing contours. Meanwhile, higher-order statistics (e.g., orientation) may convey additional information that potentially augments the first-order statistics. In this work, we only employed the visual features of orientation, luminance, and luminance contrast. Though these three features have been widely used in the field of image processing, the contribution of the work does not lie in the introduction and extraction of these three specific features, but how different visual features contribute cooperatively to the visual perception task of contour detection in a unified way of center-surround interaction. In addition, the implementation of computing surround inhibition weights using (14)∼(17) faithfully simulates the physiological observations (no detailed analysis was given here, but the involved functions like EXP and DoG have been widely used in the fields of neurovision). As for the computational integration of various surround inhibition weights, it seems somewhat heuristic in (18)∼(19), since its biological substrates is quite unclear. Even so, the implementation is still quite promising, considering the fact that the operations of Min and Max involved in (18)∼(19) have been widely accepted as the canonical neural computations in the visual system [78]. Besides the three features employed, the differences of other features, such as spatial frequency and color [24] and spatial phase [28], between the CRF and non-CRF also selectively activate V1 cells. This finding indicates another extension to our current framework in the future by combining more visual features, which is an efficient way that has been widely adopted [2], [50]–[56]. The main reason that color was not considered in this work is that most color information is processed with special coloropponent mechanisms (e.g., the various types of doubleopponent neurons in V1). Our recent work [44] studied the possible function of a kind of double-opponent color neurons in detecting boundaries. To effectively utilizing the cue of color, the current framework needs to be extended by adding another branch for non-center-surround mechanism based processing.

Another important question is how to integrate the signals carried by various information sources. Compared to others [2], [47], [48], our model is distinguished mainly by how to use and how to integrate the multiple cues, i.e., our model uses multiple cues to generate possible surround inhibition weights, and these cue-specific surround inhibition weights are combined using a Min and Max based strategy; the combined weight is used to modulate the final neuronal response by weighting the surround inhibition. To our knowledge, such strategy of fusing multiple cues is the first attempt, along the lines of both biologically- and nonbiologically-inspired. For example, some algorithms integrate the multiple cues/scales by a linear sum operator with a supervised learning [2], [47], [48], and generally, a preprocessing phase before learning based cue combination is required to manually tune those cue-specific parameters in order to obtain the best performance. In contrast, our model only needs a preprocessing phase to handle a list of parameters (Table I). In general, different images may require different optimal parameter values (see Fig. 8). However, with one set of parameters statistically optimal at the level of entire dataset (RuG40), we have still obtained competitive performance in comparison to other models. Our model acted relatively robust, i.e., parameter values in broad range results in similar acceptable performance (see Fig. 6). This allow us to select proper parameter values effortlessly. In summary, in this study we proposed a center-surround interaction based model by combining multiple local cues to improve contour detection in cluttered scenes. The contribution of the work to the field of image processing is founded on the fact that more and more research in contour detection has been focusing on understanding and modeling of the biological mechanisms that human use to perceive contours [1]. It is expected that the work is not only of interest to neurovision researchers who care of unclear biological substrates, but also quite relevant to the field of image processing, where models of higher efficiency and efficacy are especially desired. ACKNOWLEDGMENT The authors would like to thank Prof. De-Wen Hu, Hong-Mei Yan and Dr. Ling Wang for valuable conversations and discussions on the presented work. They would like to thank Dr. Ting Li and Kaiwen Cheng for comments on manuscript. Thoughtful comments and suggestions from anonymous reviewers for improving the manuscript are also very much appreciated. They would also like to thank Nicolai Petkov and his colleagues for their source codes used for validating our models. R EFERENCES [1] G. Papari and N. Petkov, “Edge and line oriented contour detection: State of the art,” Image Vis. Comput., vol. 29, nos. 2–3, pp. 79–103, 2011. [2] D. R. Martin, C. C. Fowlkes, and J. Malik, “Learning to detect natural image boundaries using local brightness, color, and texture cues,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 5, pp. 530–549, May 2004. [3] J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986.

YANG et al.: MULTIFEATURE-BASED SURROUND INHIBITION IMPROVES CONTOUR DETECTION

[4] D. Marr and E. Hildreth, “Theory of edge detection,” Proc. Roy. Soc. London B, Biol. Sci., vol. 207, no. 1167, pp. 187–217, 1980. [5] P. Kovesi, “Image features from phase congruency,” Videre, J. Comput. Vis. Res., vol. 1, no. 3, pp. 1–26, 1999. [6] P. Kovesi, “Phase congruency detects corners and edges,” in Proc. Austral. Pattern Recognit. Soc. Conf., DICTA, 2003, pp. 309–318. [7] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” Int. J. Comput. Vis., vol. 1, no. 4, pp. 321–331, 1988. [8] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contours,” Int. J. Comput. Vis., vol. 22, no. 1, pp. 61–79, 1997. [9] S. Konishi, A. L. Yuille, J. M. Coughlan, and S. C. Zhu, “Statistical edge detection: Learning and evaluating edge cues,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 1, pp. 57–74, Jan. 2003. [10] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,” J. Physiol., vol. 160, no. 1, pp. 106–154, 1962. [11] J. Allman, F. Miezin, and E. McGuinness, “Stimulus specific responses from beyond the classical receptive field: Neurophysiological mechanisms for local-global comparisons in visual neurons,” Annu. Rev. Neurosci., vol. 8, no. 1, pp. 407–430, 1985. [12] D. Fitzpatrick et al., “Seeing beyond the receptive field in primary visual cortex,” Current Opinion Neurobiol., vol. 10, no. 4, pp. 438–443, 2000. [13] C. D. Gilbert and T. N. Wiesel, “The influence of contextual stimuli on the orientation selectivity of cells in primary visual cortex of the cat,” Vis. Res., vol. 30, no. 11, pp. 1689–1701, 1990. [14] H. E. Jones, K. L. Grieve, W. Wang, and A. M. Sillito, “Surround suppression in primate V1,” J. Neurophysiol., vol. 86, no. 4, pp. 2011–2028, 2001. [15] M. K. Kapadia, G. Westheimer, and C. D. Gilbert, “Spatial distribution of contextual interactions in primary visual cortex and in visual perception,” J. Neurophysiol., vol. 84, no. 4, pp. 2048–2062, 2000. [16] J. J. Knierim and D. C. van Essen, “Neuronal responses to static texture patterns in area V1 of the alert macaque monkey,” J. Neurophysiol., vol. 67, no. 4, pp. 961–980, 1992. [17] C.-Y. Li, “Integration fields beyond the classical receptive field: Organization and functional properties,” Physiology, vol. 11, no. 4, pp. 181–186, 1996. [18] L. Chao-Yi and L. Wu, “Extensive integration field beyond the classical receptive field of cat’s striate cortical neurons—Classification and tuning properties,” Vis. Res., vol. 34, no. 18, pp. 2337–2355, 1994. [19] F. Sengpiel, A. Sen, and C. Blakemore, “Characteristics of surround inhibition in cat area 17,” Experim. Brain Res., vol. 116, no. 2, pp. 216–228, 1997. [20] P. Seriès, J. Lorenceau, and Y. Frégnac, “The ‘silent’ surround of V1 receptive fields: Theory and experiments,” J. Physiol.-Paris, vol. 97, nos. 4–6, pp. 453–474, 2003. [21] G. A. Walker, I. Ohzawa, and R. D. Freeman, “Asymmetric suppression outside the classical receptive field of the visual cortex,” J. Neurosci., vol. 19, no. 23, pp. 10536–10553, 1999. [22] K. Zipser, V. A. F. Lamme, and P. H. Schiller, “Contextual modulation in primary visual cortex,” J. Neurosci., vol. 16, no. 22, pp. 7376–7389, 1996. [23] A. M. Slllito, K. L. Grieve, H. E. Jones, J. Cudeiro, and J. Davls, “Visual cortical mechanisms detecting focal orientation discontinuities,” Nature, vol. 378, pp. 492–496, Nov. 1995. [24] Z.-M. Shen, W.-F. Xu, and C.-Y. Li, “Cue-invariant detection of centre– surround discontinuity by V1 neurons in awake macaque monkey,” J. Physiol., vol. 583, no. 2, pp. 581–592, 2007. [25] A. Das and C. D. Gilbert, “Topography of contextual modulations mediated by short-range interactions in primary visual cortex,” Nature, vol. 399, no. 6737, pp. 655–661, 1999. [26] K. K. De Valois and R. Tootell, “Spatial-frequency-specific inhibition in cat striate cortex cells,” J. Physiol., vol. 336, no. 1, pp. 359–376, 1983. [27] C.-Y. Li, J.-J. Lei, and H.-S. Yao, “Shift in speed selectivity of visual cortical neurons: A neural basis of perceived motion contrast,” Proc. Nat. Acad. Sci. USA, vol. 96, no. 7, pp. 4052–4056, 1999. [28] W.-F. Xu, Z.-M. Shen, and C.-Y. Li, “Spatial phase sensitivity of V1 neurons in alert monkey,” Cerebral Cortex, vol. 15, no. 11, pp. 1697–1702, 2005. [29] S. C. Dakin and N. J. Baruch, “Context influences contour integration,” J. Vis., vol. 9, no. 2, 2009, Art. ID 13. [30] R. Hess and D. Field, “Integration of contours: New insights,” Trends Cognit. Sci., vol. 3, no. 12, pp. 480–486, 1999. [31] D. J. Field, A. Hayes, and R. F. Hess, “Contour integration by the human visual system: Evidence for a local ‘association field’,” Vis. Res., vol. 33, no. 2, pp. 173–193, 1993.

5031

[32] W. Li and C. D. Gilbert, “Global contour saliency and local colinear interactions,” J. Neurophysiol., vol. 88, no. 5, pp. 2846–2856, 2002. [33] V. A. Lamme, “The neurophysiology of figure-ground segregation in primary visual cortex,” J. Neurosci., vol. 15, no. 2, pp. 1605–1615, 1995. [34] C. Grigorescu, N. Petkov, and M. A. Westenberg, “Contour detection based on nonclassical receptive field inhibition,” IEEE Trans. Image Process., vol. 12, no. 7, pp. 729–739, Jul. 2003. [35] N. Petkov and M. A. Westenberg, “Suppression of contour perception by band-limited noise and its relation to nonclassical receptive field inhibition,” Biol. Cybern., vol. 88, no. 3, pp. 236–246, 2003. [36] G. Papari, P. Campisi, N. Petkov, and A. Neri, “A biologically motivated multiresolution approach to contour detection,” EURASIP J. Adv. Signal Process., vol. 2007, no. 1, p. 119, 2007, Art. ID 071828. [37] Q. Tang, N. Sang, and T. Zhang, “Extraction of salient contours from cluttered scenes,” Pattern Recognit., vol. 40, no. 11, pp. 3100–3109, 2007. [38] Q. Tang, N. Sang, and T. Zhang, “Contour detection based on contextual influences,” Image Vis. Comput., vol. 25, no. 8, pp. 1282–1290, 2007. [39] L. Long and Y. Li, “Contour detection based on the property of orientation selective inhibition of non-classical receptive field,” in Proc. IEEE Conf. Cybern. Intell. Syst., Sep. 2008, pp. 1002–1006. [40] G.-E. La Cara and M. Ursino, “A model of contour extraction including multiple scales, flexible inhibition and attention,” Neural Netw., vol. 21, no. 5, pp. 759–773, 2008. [41] Z. Li, “A neural model of contour integration in the primary visual cortex,” Neur. Comput., vol. 10, no. 4, pp. 903–940, 1998. [42] M. Ursino and G. E. La Cara, “A model of contextual interactions and contour detection in primary visual cortex,” Neural Netw., vol. 17, nos. 5–6, pp. 719–735, 2004. [43] C. Zeng, Y. Li, and C. Li, “Center–surround interaction with adaptive inhibition: A computational model for contour detection,” NeuroImage, vol. 55, no. 1, pp. 49–66, 2011. [44] K. Yang, S. Gao, C. Li, and Y. Li, “Efficient color boundary detection with color-opponent mechanisms,” in Proc. IEEE Conf. CVPR, Jun. 2013, pp. 2810–2817. [45] D. Ziou and S. Tabbone, “Edge detection techniques—An overview,” Pattern Recognit. Image Anal. C/C Raspoznavaniye Obrazov I Anal. Izobrazhenii, vol. 8, pp. 537–559, 1998. [46] M. Basu, “Gaussian-based edge-detection methods—A survey,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32, no. 3, pp. 252–260, Aug. 2002. [47] X. Ren, “Multi-scale improves boundary detection in natural images,” in Proc. 10th ECCV, 2008, pp. 533–545. [48] M. Maire, P. Arbeláez, C. Fowlkes, and J. Malik, “Using contours to detect and localize junctions in natural images,” in Proc. IEEE Conf. CVPR, Jun. 2008, pp. 1–8. [49] P. V. McGraw, D. Whitaker, D. R. Badcock, and J. Skillen, “Neither here nor there: Localizing conflicting visual attributes,” J. Vis., vol. 3, no. 4, pp. 265–273, 2003, Art. ID 2. [50] J. Rivest and P. Cabanagh, “Localizing contours defined by more than one attribute,” Vis. Res., vol. 36, no. 1, pp. 53–66, 1996. [51] F. S. Frome, S. L. Buck, and R. M. Boynton, “Visibility of borders: Separate and combined effects of color differences, luminance contrast, and luminance level,” J. Opt. Soc. Amer., vol. 71, no. 2, pp. 145–150, 1981. [52] J. Rivest, I. Boutet, and J. Intriligator, “Perceptual learning of orientation discrimination by more than one attribute,” Vis. Res., vol. 37, no. 3, pp. 273–281, 1997. [53] M. S. Landy, “Combining multiple cues for texture edge localization,” Proc. SPIE, Human Vis., Vis. Process., Digit. Display IV, vol. 1913, pp. 506–517, Sep. 1993. [54] M. S. Landy and H. Kojima, “Ideal cue combination for localizing texture-defined edges,” J. Opt. Soc. Amer. A, Opt., Image Sci., Vis., vol. 18, no. 9, pp. 2307–2320, 2001. [55] C. Zhou and B. W. Mel, “Cue combination and color edge detection in natural scenes,” J. Vis., vol. 8, no. 4, pp. 1–25, 2008, Art. ID 4. [56] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 898–916, May 2011. [57] M. W. Spratling, “Image segmentation using a sparse coding model of cortical area V1,” IEEE Trans. Image Process., vol. 22, no. 4, pp. 1631–1643, Apr. 2013. [58] D. Marr, Vision: A Computational Approach. San Francisco, CA, USA: Freeman, 1982. [59] G. Loffler, “Perception of contours and shapes: Low and intermediate stage mechanisms,” Vis. Res., vol. 48, no. 20, pp. 2106–2127, 2008.

5032

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

[60] J. M. Wolfe, M. L.-H. Võ, K. K. Evans, and M. R. Greene, “Visual search in scenes involves selective and nonselective pathways,” Trends Cognit. Sci., vol. 15, no. 2, pp. 77–84, 2011. [61] R. A. Young and R. M. Lesperance, “The Gaussian derivative model for spatial-temporal vision: II. Cortical data,” Spatial Vis., vol. 14, no. 3, pp. 321–390, 2001. [62] J. P. Jones and L. A. Palmer, “An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex,” J. Neurophysiol., vol. 58, no. 6, pp. 1233–1258, 1987. [63] V. Mante, R. A. Frazor, V. Bonin, W. S. Geisler, and M. Carandini, “Independence of luminance and contrast in natural scenes and in the early visual system,” Nature Neurosci., vol. 8, no. 12, pp. 1690–1697, 2005. [64] R. A. Frazor and W. S. Geisler, “Local luminance and contrast in natural images,” Vis. Res., vol. 46, no. 10, pp. 1585–1598, 2006. [65] J. T. Lindgren, J. Hurri, and A. Hyvärinen, “Spatial dependencies between local luminance and contrast in natural images,” J. Vis., vol. 8, no. 12, pp. 1–13, 2008, Art. ID 6. [66] C. Grigorescu, N. Petkov, and M. A. Westenberg, “Contour and boundary detection improved by surround suppression of texture edges,” Image Vis. Comput., vol. 22, no. 8, pp. 609–622, 2004. [67] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proc. 6th ICCV, Jan. 1998, pp. 839–846. [68] Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edge-preserving decompositions for multi-scale tone and detail manipulation,” ACM Trans. Graph., vol. 27, no. 3, 2008, Art. ID 67. [69] K. Subr, C. Soler, and F. Durand, “Edge-preserving multiscale image decomposition based on local extrema,” ACM Trans. Graph., vol. 28, no. 5, 2009, Art. ID 147. [70] L. Xu, C. Lu, Y. Xu, and J. Jia, “Image smoothing via L 0 gradient minimization,” ACM Trans. Graph., vol. 30, no. 6, 2011, Art. ID 174. [71] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proc. 8th IEEE ICCV, vol. 2. Jul. 2001, pp. 416–423. [72] C. Zeng, Y. Li, K. Yang, and C. Li, “Contour detection based on a non-classical receptive field model with butterfly-shaped inhibition subregions,” Neurocomputing, vol. 74, no. 10, pp. 1527–1534, 2011. [73] U. Polat and D. Sagi, “Lateral interactions between spatial channels: Suppression and facilitation revealed by lateral masking experiments,” Vis. Res., vol. 33, no. 7, pp. 993–999, 1993. [74] U. Polat and D. Sagi, “The architecture of perceptual spatial interactions,” Vis. Res., vol. 34, no. 1, pp. 73–78, 1994. [75] S.-C. Yen and L. H. Finkel, “Extraction of perceptually salient contours by striate cortical networks,” Vis. Res., vol. 38, no. 5, pp. 719–741, 1998. [76] W. Huang, L. Jiao, J. Jia, and H. Yu, “A neural contextual model for detecting perceptually salient contours,” Pattern Recognit. Lett., vol. 30, no. 11, pp. 985–993, 2009.

[77] P. Dollar, Z. Tu, and S. Belongie, “Supervised learning of edges and object boundaries,” in Proc. IEEE Comput. Soc. CVPR, vol. 2. Jun. 2006, pp. 1964–1971. [78] M. Carandini and D. J. Heeger, “Normalization as a canonical neural computation,” Nature Rev. Neurosci., vol. 13, no. 1, pp. 51–62, 2011.

Kai-Fu Yang received the B.Sc. and M.Sc. degrees in biomedical engineering from the University of Electronic Science and Technology of China, Chengdu, China, in 2009 and 2012, respectively, where he is currently pursuing the Ph.D. degree. His research interests include visual mechanism modeling and image processing.

Chao-Yi Li received the degrees from the Chinese Medical University, Shenyang, China in 1956, and Fudan University, Shanghai, China, in 1961. He became an Academician of the Chinese Academy of Sciences, Beijing, China, in 1999. He is currently a Professor with the University of Electronic Science and Technology of China, Chengdu, China, and the Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences. He is also the Honorary Director of the Key Laboratory for Neuroinformation with the Ministry of Education, China. His research interests include visual neurophysiology and brain–computer interface.

Yong-Jie Li received the Ph.D. degree in biomedical engineering from the University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2004. He is currently a Professor with the Key Laboratory for Neuroinformation, Ministry of Education, School of Life Science and Technolgy, UESTC, China. His research interests include visual mechanism modeling, image processing, and intelligent computation.

Multifeature-based surround inhibition improves contour detection in natural images.

To effectively perform visual tasks like detecting contours, the visual system normally needs to integrate multiple visual features. Sufficient physio...
6MB Sizes 0 Downloads 7 Views