5756

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Hybrid Approaches for Interactive Image Segmentation Using the Live Markers Paradigm Thiago Vallin Spina, Paulo A. V. de Miranda, and Alexandre Xavier Falc˜ao Abstract— Interactive image segmentation methods normally rely on cues about the foreground imposed by the user as region constraints (markers/brush strokes) or boundary constraints (anchor points). These paradigms often have complementary strengths and weaknesses, which can be addressed to improve the interactive experience by reducing the user’s effort. We propose a novel hybrid paradigm based on a new form of interaction called live markers, where optimum boundary-tracking segments are turned into internal and external markers for region-based delineation to effectively extract the object. We present four techniques within this paradigm: 1) LiveMarkers; 2) RiverCut; 3) LiveCut; and 4) RiverMarkers. The homonym LiveMarkers couples boundary-tracking via live-wire-on-the-fly (LWOF) with optimum seed competition by the image foresting transform (IFT-SC). The IFT-SC can cope with complex object silhouettes, but presents a leaking problem on weaker parts of the boundary that is solved by the effective live markers produced by LWOF. Conversely, in RiverCut, the long boundary segments computed by Riverbed around complex shapes provide markers for Graph Cuts by the Min-Cut/Max-Flow algorithm (GCMF) to complete segmentation on poorly defined sections of the object’s border. LiveCut and RiverMarkers further demonstrate that live markers can improve segmentation even when the combined approaches are not complementary (e.g., GCMFs shrinking bias is also dramatically prevented when using it with LWOF). Moreover, since delineation is always region based, our methodology subsumes both paradigms, representing a new way of extending boundary tracking to the 3D image domain, while speeding up the addition of markers close to the object’s boundary— a necessary but time consuming task when done manually. We justify our claims through an extensive experimental evaluation on natural and medical images data sets, using recently proposed robot users for boundary-tracking methods. Manuscript received November 25, 2013; revised July 17, 2014; accepted October 14, 2014. Date of publication November 4, 2014; date of current version November 21, 2014. This work was supported in part by the S˜ao Paulo Research Foundation under Grant 2011/01434-9 and in part by the National Council for Scientific and Technological Development under Grant 303673/2010-9, Grant 305381/2012-1, Grant 479070/2013-0, and Grant 486083/2013-6. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Theo Gevers. T. V. Spina and A. X. Falc˜ao are with the Institute of Computing, University of Campinas, Campinas 13083-852, Brazil (e-mail: [email protected]; [email protected]). P. A. V. de Miranda is with the Department of Computer Science, Institute of Mathematics and Statistics, University of S˜ao Paulo, S˜ao Paulo 05508-090, Brazil (e-mail: [email protected]). This paper has supplementary downloadable material available at http://ieeexplore.ieee.org., provided by the author. The supplementary files contain an introductory video for a demonstration of the software. Also included on the authors’ website are videos that demonstrate the robot users segmenting a few images from the datasets. The total size of the videos is 15 MB. Contact [email protected] or [email protected] for further questions about this work. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2367319

Index Terms— Interactive image segmentation, hybrid user interaction paradigm, boundary-tracking, region-based delineation, Generalized Graph Cut, live wire.

I. I NTRODUCTION

I

MAGE segmentation requires recognition to indicate the object’s whereabouts in the image and make corrections, and delineation to define the object’s precise spatial extent. Humans usually outperform computers in recognition and the other way around can be observed in delineation. In many applications, such as medical image analysis [1] and digital matting [2], [3], it is desirable to have interactive segmentation methods that combine the superior abilities of humans for recognition with the outperformance of computers for delineation in a synergistic way [4]–[13]. In this context, the challenges are to simultaneously (i) maximize accuracy, precision, and computational efficiency, (ii) minimize user involvement and time, and (iii) maximize the user’s control over the segmentation process. Several interactive segmentation methods exploit boundary constraints (e.g., anchor points) or region constraints (e.g., internal and external markers), and make direct/indirect use of some image-graph concepts, such as arc weight between pixels. The weight may represent different attribute functionals such as similarity, speed function, affinity, cost, and distance; depending on different frameworks used, such as watershed, level sets, fuzzy connectedness, and graph cuts [9]. In boundary-tracking, the object may be defined by optimumboundary segments that pass through the anchor points to close its contour. This idea was first formulated as a heuristic search problem in an image-graph by Martelli [14], but with no guarantee of success. This guarantee was only possible without any shape constraints in the 2D dynamic programming framework of live wire [4], [5]. However, the real-time response of live wire with respect to user’s actions strongly depended on the image size. This problem was circumvented later in live-wire-on-the-fly (LWOF), by exploiting key properties of Dijkstra’s algorithm to determine optimum paths [15]. Several approaches further extended live wire to cope with multiple challenges [16], [17], and used its concepts to develop new boundary-tracking techniques (e.g., Riverbed [18]) and to achieve segmentation by exploiting methods with complementary properties [19]. Methods based on region constraints usually have the advantage of being more easily extended to 3D. Some popular approaches based on internal and external markers are water-

c 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 1057-7149  See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

SPINA et al.: HYBRID APPROACHES FOR INTERACTIVE IMAGE SEGMENTATION

5757

Fig. 1. (a) IFT-SC handles complex object shapes but suffers from the same drawbacks of FC and WS towards weak boundary information, thus presenting the “blocking” effect around the wrist. (b) Hand segmentation using LWOF with fourteen anchor points. (c) Hand segmentation by LiveMarkers used a single LWOF segment marker on the wrist to delineate the hand. (d) As IFT-SC’s dual, Riverbed requires 10 anchors on the wrist to overcome weak boundary information. (e) Graph cut by min-cut/max-flow shortcuts the wrist even though 13 markers were necessary to avoid shrinking. (f) RiverCut requires one Riverbed segment marker along the fingers to complete segmentation.

shed [20]–[23] (WS), fuzzy connectedness [24]–[27] (FC), and the traditional graph cut segmentation by the min-cut/max-flow algorithm [6], [28] (GCMF). These methods can define the object as some optimum cut in the graph, according to the Generalized Graph Cut (GGC) segmentation framework [29], [30], and can produce similar results under certain conditions [23], [27], [31]. However, they may differ in computational efficiency, user involvement, time, and control, depending on the quality of the arc-weight assignment and algorithm chosen for implementation. Aside from the tie-zones [32], WS and FC are more robust to the markers’ position and better perform in the case of complex object silhouettes (e.g., Fig. 1a) than GCMF and LWOF [29], [31]. However, in the presence of poorly defined parts of the boundary (bad arc-weight assignment), they present a leaking problem where parts of the background (object) are conquered by object (background) markers. At the same time, LWOF and GCMF produce smoother borders and better perform on weaker sections of the boundary (Figs. 1b and 1e). Indeed, hybrid methods based on region constraints, such as Relative Fuzzy Connectedness (RFC) combined with GCMF, have been pursued in [33] to gain the advantages of both, i.e., boundary smoothness from GCMF coupled with the robustness to marker positioning from RFC. Similarly, United Snakes [34] couples live wire with active coutours to exploit their complementary advantages to cope with noise. Li et al. [35] further investigated the usage of polygonal lines as boundary constraints for the refinement of region-based delination via GCMF. These approaches motivate further investigation on hybrid techniques that can combine the strengths of boundaryand region-based delineation paradigms to eliminate their weaknesses. We propose a novel hybrid paradigm for interactive image segmentation based on a new form of interaction called live markers, which combines boundary-tracking with region-based delineation to leverage the advantages of both paradigms. In live markers, optimum-boundary segments computed between user-selected anchor points using a boundarytracking method are turned into internal and external markers (seeds) for an underlying region-based algorithm to define the object’s spatial extent (e.g., Figs. 1c and 1f). In this manner, we are able to speed up the addition of precise markers

near the object’s boundary, which is a necessary but time consuming task [36], while segmenting images in 2D and 3D. Moreover, because object definition is always region-based, the live markers paradigm subsumes both forms of interaction by seamlessly integrating brush strokes as seeds (e.g., Fig. 3). We present four methods following the live markers paradigm: RiverCut, LiveCut, RiverMarkers, and the homonym LiveMarkers [19]. These approaches stem from combinations of boundary-tracking via LWOF or Riverbed, with region-based delineation using GCMF or optimum seed competition through the image foresting transform [37] (IFT-SC). These components view an image as a weighted graph, taking the pixels as the nodes and an adjacency relation between them to form the arcs. Then, boundary- and region- based delineation are reduced to some cut that separates the foreground from the background in the image-graph [6], [18], [29], [31]. LWOF and GCMF possess similar properties, as previously stated, precisely because they seek equivalent cuts in dual image-graphs [6]. Likewise, Riverbed and IFT-SC are proven to be duals for the same reason [18] (Figs. 1a and 1d). LiveMarkers and RiverCut couple LWOF and IFT-SC, Riverbed and GCMF, respectively, in order to make use of their complementary advantages to eliminate their weaknesses. IFT-SC can be seen as a version of watershed that uses markers imposed by the user on the objects of interest and background. As a result, IFT-SC can simultaneously handle multiple objects with complex silhouettes in linear time [37], with further robustness to marker positioning [29], [31]; although it presents the aforementioned leaking problem on weaker parts of the boundary (Fig. 1a). In this sense, live markers from LWOF form perfect barriers that solve the leaking problem with minimum user effort [19] (Fig. 1c). In RiverCut, the long boundary segments produced by Riverbed around complex shapes prevent GCMF’s well known shrinking bias, while the latter shortcuts poorly defined parts of the boundary (Fig. 1f). At the same time, LiveCut and RiverMarkers enable us to investigate the effect of combining different paradigms of interaction to complete segmentation using “the same” underlying delineation algorithm. Live-wire-on-the-fly, Riverbed, and IFT-SC are created using a common framework for developing methods based

5758

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

on optimum-connectivity named the Image Foresting Transform [37] (IFT). According to the Generalized Graph Cut segmentation framework, the cut induced by the IFT-SC’s partitioning of the image-graph makes the IFT and GCMF’s algorithms the only two required for solving the minimization of an entire range of energy functions [29]. We thus argue that developing methods that incorporate the advantages of the IFT and GCMF frameworks is paramount for effective segmentation, being live markers a hybrid paradigm that substantially reduces the user’s effort. Our main contributions are threefold: • The development of four hybrid techinques using the live markers paradigm: LiveMarkers, RiverCut, LiveCut, and RiverMarkers. While the former two approaches combine complementary advantages, the latter ones allow us to demonstrate that changing the form of interaction to live markers can also improve segmentation, specially to overcome GCMF’s shrinking bias. • An extensive experimental evaluation of all 8 methods (4 hybrid and their individual region and boundary components) using recently proposed robot users [36], [38], [39], including some we designed specifically for boundary-tracking [40]. Such evaluation involves measures of accuracy, user effort, and control over the segmentation process, obtained from datasets with natural and 2D medical images, showing improvement with statistical significance. • Examples of using LiveMarkers to segment multiple objects simultaneously and 3D images of the brain. In the latter case, LWOF is executed for one or multiple slices providing seed voxels in different plane orientations (coronal, sagittal, axial), while delineation by IFT-SC is always in 3D. We present in Section II basic concepts on image-graphs used throughout this paper. Section III defines all the individual components of our hybrid techniques within the Generalized Graph Cut segmentation framework, including the IFT and GCMF algorithms. Section IV details the segmentation process using live markers, including the specifics of our four hybrid approaches. We then present our experimental validation based on robot users in Section V, using two natural image datasets and one with CT images of the liver. Lastly, we state our conclusions in Section VI. II. BASIC C ONCEPTS ON I MAGE -G RAPHS An image Iˆ is a pair (D Iˆ , I), where D Iˆ ⊂ Z n corresponds to the image domain and I(t) assigns a set of m scalars Ib (t), b = 1, 2, . . . , m, to each pixel t ∈ D Iˆ . We will consider the cases where n = 2 or 3 and m = 1 or 3. The subscript b is removed when m = 1. In this work, the triplet (I1 (t), I2 (t), I3 (t)) for natural images denotes the R, G, and B values of a pixel t. Feature extraction essentially transforms  where an image Iˆ = (D Iˆ , I) into the pair Fˆ = (D Iˆ , F),  = (F1 (t), F2 (t), . . . , Fm (t)) is a feature vector assigned F(t) to t. We have found that converting RGB color images to the YCbCr colorspace and deacresing the Y channel’s importance using a factor of 0.2 produces a feature map F that better

copes with illumination changes [38]. Similarly, for grayscale images we perform a 3-level multi-scale Gaussian filtering by gradually increasing the standard deviation to reduce noise. In both cases, feature extraction assigns to each pixel a vector with m = 3 features. In our approach, a graph (N , A) may be defined by taking a set N ⊆ D Iˆ of pixels as nodes and an irreflexive and symmetric adjacency relation A between nodes of N to form the arcs. We use t ∈ A(s) or (s, t) ∈ A to indicate that a node t ∈ N is adjacent to a node s ∈ N . For 2D images or slices, A = A8 is the standard 8-neighborhood in the spacial domain, while for 3D region-based delineation we have the 6-neighborhood A6 over the scene’s voxels. The arcs (s, t) ∈ A are assigned fixed integer weights 0 ≤ w(s, t) ≤ K that linearly combine rich information computed from the local image features Fˆ and object color models [11]. While the former captures discontinuities between regions with homogeneous color and texture, the latter aims to increase the difference between the expected foreground and background by assigning a soft segmentation value 0 ≤ O(t) ≤ K to each pixel, forming an object membership map O where pixels with higher resemblance to the object are given values closer to K (tipically, K = 255). The soft segmentation is interactively performed in a previous step using supervised fuzzy pixel classification from Fˆ and a set of brush strokes drawn by the user on the foreground and background (see [11] for greater details). The graphs for both region and boundary-based delineation are given lower arc weights w(s, t) to arcs on the object’s boundary [11]. In region-based delination, these lower weights represent discontinuities between the foregound and background connected components, while for boundary-tracking they belong to the path along the boundary that should be followed. III. G RAPH -BASED I MAGE S EGMENTATION Approaches for region-based graph-cut segmentation rely on objective functions that measure some global property of the object’s boundary using the arc weights. The idea is to assign weights to the arcs such that the minimum of this objective function corresponds to the desired segmentation (i.e., a cut boundary whose arcs connect the nodes between object and background). The Generalized Graph Cut framework for interactive segmentation [29] focuses on solving delineation by considering the minimization of the following energy function:   ˆ = β wβ (s, t), (1) E 1 ( L) ∀(s,t )∈A|L(s) = L(t )

where Lˆ is a labeling assignment for pixels t ∈ D Lˆ , such that L(t) ∈ {0, 1} in the binary case with 0 representing the background, and β is a parameter within ∈ [1, ∞). Ciesielski et al. [29] proved in the GGC context that, for solving the equation above for some β in the specified range, only two algorithms are essentially needed: the traditional graph cuts by min-cut/max-flow [6] and the IFT-SC [37]. The latter minimizes the cut measure defined by Eq. 2, among all

SPINA et al.: HYBRID APPROACHES FOR INTERACTIVE IMAGE SEGMENTATION

possible segmentation results satisfying the region constraints imposed by the user. ˆ = E 2 ( L) max w(s, t) (2) ∀(s,t )∈A|L(s) = L(t )

From [31] we have that, under certain conditions, β → ∞ in Eq. 1 makes solutions computed by GCMF to converge to the watershed from seed competition obtained by IFT-SC. This is clear since such operation makes energies E 1 and E 2 to match. Hence, finite values of β ask for delineation using the GCMF algorithm, while β = ∞ makes IFT-SC’s linear time implementation [37] more attractive [29]. In the following, we first briefly describe region-based segmentation using GCMF, given its omniprescence in image processing. We then focus on the IFT and its derivative operators for region- and boundarybased segmentation (i.e., IFT-SC, LWOF, and Riverbed). A. Interactive Segmentation by Graph Cuts Using Min-Cut/Max-Flow Interactive segmentation methods using the min-cut/maxflow algorithm [6], [41] traditionally extend the image-graph (N , A) adding two terminal nodes, which represent object and background and are directly connected to all the pixels s ∈ N . The corresponding energy functions (see [6], [28]) incorporate a data term to deal with GCMF’s bias towards small boundaries. In our setting, such measure is undesirable for RiverCut and LiveCut since it can introduce segmented regions disconnected to markers. Of course, there is a vast literature dedicated towards preventing GCMF’s shrinking bias and dealing with deserted islands, including the embedding of shape priors [10]. However, we want to investigate what are the effects of using live markers to overcome the former. Fortunately, energy function E 1 already encompasses a way to deal with GCMF’s bias. As discussed in [31], parameter β in Eq. 1 provides an interesting adaptive procedure to improve GCMF, which penalizes arcs between pixels with high weights [42], thereby expanding the cut boundary as β approaches infinity. In our implementation, we simplify Eq. 1 to  ˆ = wβ (s, t). (3) E 3 ( L) ∀(s,t )∈A|L(s) = L(t)

The minimization of the energy function above can be easily accomplished by following [6] and, as pointed out by Ciesielski et al. [29], leads to solutions equivalent to minimizing E 1 . In this case, seed imposition involves assigning infinite weights to arcs connecting foreground seed pixels s ∈ So to the source node, and 0 to arcs connecting them to the sink node (the opposite being true for background seeds t ∈ Sb ). The object is naturally composed of pixels t ∈ D Iˆ connected to the source ˆ being assigned node after the optimum cut is computed on L, label L(t) = 1. We note that multi-object segmentation is an NP-hard problem using GCMF, being an issue outside the scope of this work. B. Image Foresting Transform The Image Foresting Transform introduces a generalization of Dijkstra’s algorithm that works for multiple sources and

5759

smooth path-cost functions [37]. The IFT partitions the imagegraph (N , A) into an optimum-path forest where each node is assigned to the optimum-path tree rooted at its most strongly connected seed pixel (or anchor point, depending on the delineation operator). In IFT, a connectivity function is assigned to any path in the graph, including trivial paths formed by single nodes. Considering an initial connectivity map with only trivial paths, its maxima (minima) are taken as root nodes. These roots may offer better paths to their adjacent nodes and the adjacent nodes may also propagate better paths in such a way that an optimum-path propagation process transforms the image into an optimum-path forest. Different image operators are then reduced to a local processing on the atributes of the forest (optimum paths, root labels, optimum connectivity values). The IFT’s partitioning of the graph induces an optimum cut with properties related to the operator, and connectivity function, being developed. A path πt = t1 , t2 , . . . , t in a graph (N , A) is a sequence of adjacent nodes with terminus at a node t. A path πt = πs · s, t indicates the extension of a path πs by an arc (s, t) and a path πt = t is said trivial. A connectivity function f assigns to any path πt a value f (πt ). A path πt is optimum if f (πt ) ≤ f (τt ) for any other path τt = t1 , t2 , . . . , t in (N , A). Considering all possible paths with terminus at each node t ∈ N , an optimum connectivity map V (t) is created by V (t) =

min

{ f (πt )}.

∀πt in (N ,A)

(4)

The IFT solves this minimization problem above by computing an optimum-path forest — a function P which contains no cycles and assigns to each node t ∈ N either its predecessor node P(t) ∈ N in the optimum path or a distinctive marker P(t) = nil ∈ N , when t is optimum (i.e., t is said root of the forest). The root R(t) of each pixel t can be obtained by following its optimum path backwards in P. For greater efficiency, we propagate them on-the-fly, creating a root map R. C. Interactive Segmentation by IFT-SC Object extraction using the IFT-SC occurs on a regular 8-neighbor graph derived from the image (D Iˆ , A8 ). All the pixels in a set S are taken as seeds for optimum competition by IFT. As opposed to the operators for boundary-based delineation, we will define IFT-SC considering the maximization of V (t) using IFT, which can be straightforwardly obtained by using the complement of Eq. 4. We then expect that the optimum-path forest P computed on (D Iˆ , A8 ) with the connectivity function f min given by  K + 1 if t ∈ S, f min ( t ) = −∞ otherwise, (5) f min (πs · s, t ) = min{ f min (πs ), w(s, t)}, extracts the object as the union of trees rooted at the object pixels in S. Multi-label segmentation is readily obtained since IFT creates a label map L(t) = λ(R(t)) propagating the true object labels assigned through λ(t) = 0, 1, . . . , c to the

5760

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 2. Contour tracking with live-wire-on-the-fly (in blue) on a toy example to clarify explanation. (a) The user selects an initial point A on the object’s boundary and then moves the mouse in clockwise orientation. Q is the wavefront of optimum paths already computed by LWOF, which can be displayed at no extra cost (we dilate the contour, optimum-boundary segment, and wavefront for better visualization). (b) A second point B is selected on the boundary. (c) Final contour with 2 segments.

seed pixels t ∈ S. That is, seeds from all the c objects and background will compete with each other to conquer their most strongly connected pixels, hopefully from the same label, but segmentation errors may occur. By adding/removing seeds, the optimum-path forest can be updated only in the affected trees rooted at them. This results in the differential property of IFT-SC [7] (DIFT-SC), which allows corrections in sublinear time in practice. This strongly favors the use of the DIFT-SC for multidimensional interactive image segmentation [7]. D. Interactive Segmentation by Live-Wire-on-the-Fly In order to segment an object with live wire [15], the user selects a starting point on the object’s boundary (e.g., point A in Fig. 2a), and, for any subsequent position of the mouse cursor, the method computes an optimum path from A to that position in real time. As the user moves the cursor close to the boundary, the optimum segment snaps on to it. The user can quickly verify the longest segment, as the one with terminus at point B in Fig. 2b, and deposit the mouse cursor at that position. The process is then repeated from B until the user decides to close the contour (Fig. 2c). The sequence s1 , s2 , . . . , s N of anchor points (seeds) selected by the user on the object’s boundary forces the closed contour to pass through them, in that order, starting from s1 and ending in s N , where s1 = s N . The selected curve that satisfies those constraints consists of N − 1 optimum segments πs2 , πs3 , . . . , πs N , where each πsi is an optimum path connecting si−1 to si . Therefore, we can solve this problem by N −1 executions of the IFT using the initial point s ∗ = si−1 as seed, and path-cost function f  in Eq. 4. The object contour can be obtained from the predecessor map P after the last execution. Connectivity function f exploits the orientedness property of live wire to favor segmentation on a single orientation. In this case, we consider oriented pixel edges to favor clockwise segmentation  0 if t = s ∗  f  ( t ) = +∞ otherwise   f (π ) + wγ (s, t) if O(l) ≤ O(r )  f  (πs · s, t ) =  s (6) otherwise, f  (πs ) + K γ where l and r are the spels at the left and right sides of arc s, t , O is a reference map expected to be brighter inside the object, and γ ≥ 1 favors longer optimum paths. For our purpose, O is taken as the object membership map computed during arc weight estimation. Counterclockwise orientation

may be obtained by inverting the reference map test as O(r ) ≤ O(l) in Eq. 6. At each iteration i , all previous segments πs2 , πs3 , . . . , πsi−1 are kept unchanged during the algorithm, so their nodes can not be revisited or reseted. The only exception is when we compute the last segment, in this case we make V (s1 ) = +∞ and reset s1 ’s status to let it be reconquered. Each IFT execution can further exploit the Bellman’s optimality principle [43] for early termination and incremental computation [15], thereby leading to live-wire-on-the-fly (LWOF). E. Interactive Segmentation by Riverbed The idea of this boundary-tracking approach is to simulate the behavior of water flowing through a riverbed. The water crosses the riverbed always seeking lower ground levels, snaking through the river bends, instead of short-cutting the path. The prime moving force of water is gravity. Therefore, at any instant of the algorithm, its decision of where to go, does not depend on the past history. The water will always tend to flow down the slope. This leads to the following connectivity function for a starting seed point s ∗  0 if t = s ∗ ,  f w ( t ) = +∞ otherwise,  w(s, t) if O(l) ≤ O(r ), (7) f w(πs · s, t ) = K otherwise. That is, at any moment the IFT algorithm with f w will move through the arc with lowest weight w(s, t). This algorithm with all the features discussed in Section III-D (i.e., previous segments kept unchanged, early termination, and incremental computation) results in the Riverbed approach. Riverbed requires fewer anchor points to handle complex shapes; on the other hand, live wire favors shortestdistance jumps across regions where the boundary is not well defined [18]. As in LWOF, function f w naturally favors segmentation in clockwise orientation. We note that, in the case of the 8-neighborhood used here, the non-planarity of the graph requires some tricks to avoid self crossing (e.g., consider thicker frozen segments [18]). IV. H YBRID S EGMENTATION U SING L IVE M ARKERS The live markers methodology gives the user complete freedom to draw brush strokes on the objects/background for region-based segmentation, or select anchor points for boundary-based delineation. Regardless, the object is always

SPINA et al.: HYBRID APPROACHES FOR INTERACTIVE IMAGE SEGMENTATION

5761

Fig. 3. (a) Initial segmentation using region constraints (markers) for DIFT-SC. (b) Activation of LWOF with one anchor point, and optimum path (blue line) computed until the current cursor position (white cross). (c) Automatically generated live marker with internal and external seeds from the LWOF border segment and update of the DIFT-SC delineation.

extracted using a region-based delineation operator (Fig. 3). The seeds in this case encompass the brush strokes and the automatically generated set of pixels surrounding the optimum-boundary segments computed by the underlying boundary-tracking method, the latter being referred to as live markers. Upon the acceptance of an optimum-boundary segment by the user, the underlying pixels and their corresponding neighbors (within the user-selected brush stroke radius ρ) become live markers (Figs. 3b–3c). The orientedness property of the boundary-tracking method [15], [18] is exploited to determine which true marker label λ(t) should be assigned to the adjacent pixels on each side of the segment (left or right). Live markers are so important that in many cases they can virtually replace brush strokes altogether, since the corresponding region-based method labels the rest of the image accordingly (e.g., Figs. 1c, 1f, and 4). The addition of a new boundary segment causes the regionbased delineation method to be instantaneously issued to update the result on-the-fly. Segmentation may continue by either prolonging the current boundary segment, restarting boundary-tracking at another location with a new anchor point, or by adding/removing markers manually. LiveMarkers, RiverCut, LiveCut, and RiverMarkers follow the above strategy. They will be detailed after we define how to compute live markers from optimum-boundary segments. A. Computing Live Markers From Optimum-Boundary Segments Let M be a set of brush marker pixels selected by the user and B be the set of pixels that belong to the optimum path πsi = t1 , t2 , . . . , tn rooted at anchor point si−1 , such that t1 = si−1 and tn = si . The pixels t ∈ B are always assigned the true object label λ(t) = l ∈ {1, 2, . . . , c}, because they belong to the object’s border. We then dilate B using the userspecified spatial radius ρ. For each arc t j −1 , t j ∈ πsi, where j = 2, 3, . . . n, we insert the adjacent pixels within ρ that are to their left and right into two disjoint sets L and R, respectively.1 Such definition ensures a tight marker label 1 We only consider the arcs where t j −1 and t j +1 are the only 8-neighbors of t j from πsi to avoid cases when the segment touches itself. We then select one pixel to the left and one to the right of arc t j −1 , t j and propagate their labels using breadth-first search.

Fig. 4. Left: Live markers selected on multiple objecsts using LWOF. Right: Segmentation result using DIFT-SC.

assignment around the segment that protects object and background from leaking paths. Furthermore, parameter ρ allows region-based delineation to overcome wide gradient borders and weak object boundary surfaces in 3D images, for instance. The brush stroke radius ρ is tipically ρ = 4 pixels. The marker labels for pixels in sets L and R are given according to the current boundary-tracking orientation. Since clockwise orientation expects the object of interest l to be on the right side of the segment, for every s ∈ R, λ(s) = l and for every t ∈ L, λ(t) = r ∈ {0, 1, . . . , c}, where r = l is the label of a secondary object of interest. For the binary case r is always 0, otherwise, the user may choose another label to segment multiple objects simultaneously (e.g., Fig. 4). The opposite is valid for counterclockwise orientation. Finally, seed set S = M ∪ B ∪ L ∪ R is the one used to extract the object. Note that the true labels λ(t) for each t ∈ M depends on the id selected by the user upon drawing the marker, being always between 0 and c. B. Segmentation by LiveMarkers In LiveMarkers [19], DIFT-SC computes an optimum-path forest spanning from a set of selected marker pixels (seeds) to every node in a graph derived from the image (Section III-C). Seeds compete among themselves and each object is defined by the pixels more strongly connected to its internal seeds than to any other. Arc-weight estimation aims at computing lower arc weights across the object’s border than anywhere else, such that the

5762

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 5. Top: Segmentation of one hemisphere of the human brain using LiveMarkers to add two markers on 2D slices from different planes of the volume. Bottom: renditions of DIFT-SC’s 3D delineation result.

object can be extracted using DIFT-SC with f min from only two seeds, one inside and another outside it [9]. Nevertheless, perfect arc-weight assignment is often not possible, and the maximization of the path value V (t) in DIFT-SC makes optimum paths from object roots to avoid as much as possible lower weight arcs inside the object (usually in noisy regions) and meet optimum paths from external roots at the higher arc weights across the object’s boundary. After that meeting, external paths are blocked and the noisy pixels inside the object tend to be conquered by object roots. Therefore, the selection of live markers as LWOF’s optimum paths on weaker parts of the boundary ensures a more effective correction of the segmentation result. Furthermore, the additive path cost function used by LWOF produces smoother border segments than those computed by DIFT-SC with f min (compare Figs. 1a and 1b). Multi-object segmentation is readily obtained using LiveMarkers by allowing the user to select any pair of true labels λ(s), λ(t) ∈ {0, 1, 2, . . . , c} for pixels s ∈ B ∪ R and t ∈ L (Fig. 4). If Iˆ is a 3D image, i.e., D Iˆ ⊂ Z 3 , then we use 6-neighbor adjacency to form a volumetric graph (D Iˆ , A6 ). The user navigates through the slices and adds 2D live markers to the seed set S using LWOF on the corresponding 2D image plane (top of Fig. 5). On the other hand, DIFT-SC propagates in 3D across the volumetric graph to produce a 3D delineation (bottom of Fig. 5). Hence, the user is able to prevent DIFT-SC’s leaking problem in 3D with live markers, using LWOF’s convenient 2D interface. To overcome wide 3D boundaries inside the object, the live marker brush radius ρ is often increased to 8. C. Segmentation by RiverCut Following the concepts of LiveMarkers, RiverCut’s implementation requires the straightforward replacement of LWOF with Riverbed and DIFT-SC with region-based

Fig. 6. (a) GCMF’s shrinking problem with β = 2.0 in Eq. 3. (b) Final segmentation result using GCMF required 9 brush strokes. (c) Even with β = 2.0, RiverCut prevents the shrinking problem using only 4 live markers. (d) LiveCut also dramatically avoids GCMF’s bias, requiring only 5 live markers to complete segmentation.

delination by GCMF. The live markers creation is the same as in Section IV-A, although for RiverCut they are best placed on regions surrounding complex parts of the object’s silhouette to prevent GCMF’s well known shrinking bias (Figs. 1f and 6c). GCMF’s bias comes from the fact that, for lower values of β, the minimization of E 3 may prioritize shorter cuts surrounding brush strokes (Fig. 6a), since the summation of those arc weights yield lower values that solve Eq. 3. To circumvent this problem, users must place more markers further apart in the spatial domain in error locations determined by visual inspection, forcing GCMF to look for longer cuts that coincide with the object’s true boundary (Fig. 6b). In this sense, optimum-path segments computed by Riverbed naturally avoid short-cutting the boundary, thereby producing longer live markers (Fig. 6c). Riverbed’s features stem from its duality to IFT-SC. The cut boundaries obtained by IFT-SC are piecewise optimum. That is, the minimization of energy E 2 in Eq. 2 also applies recursively to all subparts of the boundary, as proven in [31]. In other words, any part of a cut boundary is chosen as one that minimizes its maximum weights w(s, t) of the graph. Similarly, the Riverbed segments, with the unoriented version of f w on the dual graph, traverse arcs that minimize recursively the same energy corresponding to the cut measure that IFT-SC presents on the primal graph [18].2 Their difference 2 We refer to dual planar graphs obtained by transforming a pixel’s vertices into nodes, connecting them with arcs that cross the edges between adjacent pixels on the primal graph, and assigning the same corresponding edge weights to the arcs [6], [18]. The corresponding properties are extensible to non-planar adjacencies such as A8 .

SPINA et al.: HYBRID APPROACHES FOR INTERACTIVE IMAGE SEGMENTATION

lies on how the seeds are interpreted, and on the dynamic of execution: IFT-SC finds a global optimum using the seeds as regional constraints, while Riverbed performs successive energy minimization between the ordered pairs of anchor points that act as boundary hard constraints (i.e., it computes a sequence of optimum segments). D. Segmentation by LiveCut and RiverMarkers The live markers paradigm for LiveCut and RiverMarkers provides closure for the theoretical relationships stablished between boundary- and region-based delineation. LiveCut and RiverMarkers essentially combine methods from extreme cases of energy function E 1 (Eq. 1). While LWOF and GCMF minimize E 1 (and E 3 ) for lower values of β [6], Riverbed and IFT-SC operate on the spectrum of β = +∞ as described in Section IV-C. Therefore, LiveCut and RiverMarkers are end points of E 1 as well and can be seen as supplementary versions of LiveMarkers and RiverCut, respectively. The more we raise β in E 3 , the more likely we are to observe LiveCut behaving as LiveMarkers, and RiverCut as RiverMarkers. Similarly, increasing γ → +∞ in connectivity function f  (Eq. 6) should lead LWOF to produce boundary segments equivalent to Riverbed with f w [18]. Hence, LiveCut is the most adaptative of all live markers methods here proposed, since different values for its free parameters β and γ can theoretically turn LiveCut into any of the other three methods. In our experiments we demonstrate LiveCut’s behavior for increasing values of β to support our claims. LiveCut’s flexibility makes it interesting to investigate what are the effects of using LWOF-based live markers to cope with GCMF’s shrinking bias (e.g., Fig. 6d). At the same time, segmenting an image using RiverMarkers is somewhat equivalent to only using IFT-SC, the main difference being that it can be harder to add live markers on weaker parts of the object’s boundary. Nevertheless, RiverMarkers’ importance comes from a theoretical perspective, given that it is the upper limitant for LiveMarkers, RiverCut, and LiveCut (i.e., the case when both γ → +∞ in f  and β → +∞ in E 3 ). V. E XPERIMENTAL E VALUATION We have compared LiveMarkers, RiverCut, LiveCut, and RiverMarkers with their individual boundary- and regionbased components in the segmentation of two natural image datasets: GrabCut [28] (50 images) and Geodesic Star Convexity [10] (GeoStar, 151 images). To evaluate the methods’ behavior in medical images, we segmented a dataset containing 40 CT-images obtained from 2D slices of the liver from 10 different subjects. The ground truths of all datasets refer to foreground and background, being binarized to disconsider uncertainty regions [10], [28]. We also added a 5 pixel frame around each image, since some objects touch the image’s border, and closed label holes to facilitate the comparison of boundary-based methods. For GCMF and DIFT-SC, the experiments were carried out by robot users proposed for region-based delineation in [36] and [38]. We evaluated the remaining approaches using adaptations of novel robots we designed for simulating the

5763

user’s behavior during interactive segmentation with boundarytracking methods [40]. Robot users are algorithms that mimic the user’s behavior with interactive image segmentation methods, which aim at removing the user’s bias and improving reproducibility during the evaluation of these methods. In this case, we evaluated our hybrid approaches by only considering the effect of live markers, although it should be clear that they can naturally take advantage of brush strokes. The fuzzy supervised classification during arc weight estimation was a common pre-processing step for all methods, which used the set of initial marker pixels provided with the GeoStar dataset (the GrabCut dataset also used these markers since it is subsumed by GeoStar [10]). We created a similar set of markers for the liver dataset, providing for every image one foreground and three background brush strokes. We evaluated the methods on three grounds: segmentation accuracy, amount of user effort, and control over the segmentation process. We also made a careful evaluation of the important parameter β for GCMF-based approaches (Eq. 3) at the end of this section. All approaches that use live-wireon-the-fly considered the oriented path cost function f  from Eq. 6 using γ = 1.5 [15]. We used version 2.2 of the mincut/max-flow library [6] in our non-optimized implementation that combines Python, Cython, and C++. All experiments were executed on an Intel Core i7-3770K CPU running at 3.50GHz, with 32 GB of RAM, using 4 threads to speed up computation. A. Robot Users for Region- and Boundary-Based Segmentation The robot users proposed in [36] and [38] select circular brush strokes on the geodesic centers of error components with largest size at each iteration. Since hybrid methods always provide one live marker containing both foreground and background seed pixels, we allow the selection of up to two brush strokes at each iteration of the algorithm, one with foreground and another with background seeds [38]. Simulating the user’s behavior when using boundarytracking methods involves performing the same kind of action using a virtual mouse pointer that moves around the ground truth’s border (see [40] for details). The basic idea is to select an initial anchor point s1 close to the ground truth’s boundary using some criterion (e.g., near the highest image gradient value) and then moving the mouse on a fixed band of pixels around the ground truth’s border with a given radius (tipically 2.5 pixels to better mimic a real user, as verified in [40]). The virtual mouse follows the orientation considered by the underlying boundary-tracking method. For each mouse position, the robot evaluates an error measure that dictates how well the contour segment adheres to the ground truth’s border. If the error is too high, a new anchor point is deposited on the last location where it was acceptable; otherwise, the robot seeks for a longer optimum-boundary segment with minimum error.3 Afterwards, the robot repeats the process from the newly selected anchor point until closing the contour. 3 We use the novice robot user proposed in [40], which carefully verifies whether the boundary segment follows the contour in orderly fashion.

5764

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

The process above naturally implements the robot user for LWOF and Riverbed. It is an interesting approach since it could help to locate anchor points for automatic segmentation methods, such as those based on Active Shape Models combined with live wire [17]. To deal with our hybrid techniques, we first run the boundary-tracking robot with either LWOF or Riverbed until closing the contour and store all of the corresponding optimum boundary segments between each of the chosen anchor points si and si+1 . Then, our adapted robot selects at each iteration the boundary segment with greatest number of seeds located on the largest error component as one live marker, and runs the corresponding region-based delineation method. This choice is similar to the one performed by the region-based robot users, being a fair way to compare the methods from different paradigms without biasing the result.4 As verified in [36], this choice further reflects the non-expert user’s response to the problem. Since we compare our hybrid methods with boundarytracking techniques, we let all robots segment each image until the accuracy (F-measure score) reaches the minimum obtained by either LWOF or Riverbed. To prevent loops, the robots also stop if all the pre-computed optimum boundary segments have been selected, for hybrid methods, or if they can no longer add brush strokes within a safe distance from the border [38] (2 pixels). This is in contrast to the works of [36], [38], and [39], which stop after a fixed number of iterations. Our criteria are key for comparing the actual amount of interaction (number of selected markers or anchor points) that is required to complete segmentation among the different paradigms, while also reflecting the behavior of novice human users who do not understand that minor mistakes can be solved with post-processing, as verified in [11] and [40]. Note that each live marker counts as one, although it has object and background pixels, while every foreground and background marker selected by the geodesic robot are counted individually. As advised in [10], we use a brush radius of 8 pixels for DIFT-SC and GCMF in our experiments, and 4 pixels for live markers because they are placed on the object’s boundary. B. Evaluation of Accuracy, User Effort, and Control Over Segmentation We stablish segmentation accuracy by taking into account both region- and boundary-based metrics. Namely, the F-measure score computed over the ground truths and the average Euclidean distance between the segmentation mask and ground truth boundaries E D [11]. The amount of user effort is related to the number of markers or anchor points that were required to complete segmentation, depending on the paradigm. User control over the process is given by the number of label changes that occurred between subsequent iterations i = 1, 2, . . . , k, for each pixel t ∈ D Iˆ in a map N(t),  and nor ) 1  , malizing them according to C M = |D | t ∈D ˆ 1 − NN(t max I Iˆ where Nmax is the maximum number of changes that can occur [11]. We count a label change whenever a pixel t that 4 Analogously to reducing the brush size close the boundary [36], we downscale the live marker radius to 1.5 if the boundary segment nearly touches itself.

TABLE I E XPERIMENTAL R ESULTS FOR A LL T HREE D ATASETS . C OLOR C ODES AND G ROUPING R EPRESENT THE PARADIGM OF E ACH M ETHOD (B OUNDARY, R EGION , AND H YBRID , R ESPECTIVELY ), W ITH B OLD VALUES D ENOTING THE B EST S CORES FOR E ACH PARADIGM

was assigned its final label L k (t) in iteration i − 1 changes to another L i (t) = L k (t) in 1 < i < k. Table I presents the overall scores of the aforementioned measures for all three datasets, using β = 7.0 in Eq. 3 for GCMF-based methods. The bold values denote the best scores for each paradigm and measure. E D values solve ties in F-measure, while the number of interactions resolve disputes for C M. The graphs in Fig. 7 show the average F-measure curve of the first 20 iterations of segmentation for all approaches. We computed the accuracy for boundary-based methods only once when the contour is closed. Hence, in the graphs the F-measure scores for LWOF and Riverbed appear in iteration 1, being nearly superimposed and close to 1.0. From the segmentation accuracy point of view, all methods yield equivalent high scores since they were forced to achieve at least the F-measure result obtained by either LWOF or Riverbed before stopping. The same reasoning is valid for E D. With the exception of GCMF, all methods give the user tight control over the segmentation process according to C M. GCMF’s bias towards small cuts destabilizes segmentation until enough markers are added by the robot, even though we considered a fairly high value for β. The average amount of user effort required for segmentation sheds the light on the real difference among all the methods. The number of user interactions required for hybrid approaches can be up to 53% less than the necessary amount for boundary- and region-based techniques, as seen on Table I.

SPINA et al.: HYBRID APPROACHES FOR INTERACTIVE IMAGE SEGMENTATION

5765

Fig. 7. F-measure curve for the first 20 iterations of segmentation of the GrabCut, GeoStar, and Liver datasets using all 8 methods. Methods using live markers achieve higher accuracy much faster. Note that LWOF and Riverbed’s accuracies are computed only once when the contours are closed, being superimposed and close to 1.0.

Fig. 8. Nemenyi post-hoc test for the Friedman average ranking of the methods in each dataset, according to the increasing amount of required user interactions. The horizontal bars link segmentation methods with statistically equivalent results (at p ≤ 0.05), whose average ranks lie within the range given by the calculated critical distance CD. Methods with average rank closer to 1 required less user intervention, thereby performing better in the experiments.

The graphs in Fig. 7 also show usually faster convergence rates for live markers methods than the other paradigms. Interestingly, we observe that LiveMarkers and LiveCut performed equally better for all three datasets, both in the total number of required markers and convergence speed. RiverCut’s final scores are slightly better than its counterpart, RiverMarkers, but its convergence rate is slower in the first iterations specially in the GeoStar dataset. The reason behind this is that GeoStar presents more challenging images to segment, making it difficult to obtain longer segments using Riverbed to provide large markers for GCMF. Therefore, lower contrast between foreground and background favors live markers methods using LWOF. The large standard deviation for user effort in Table I makes live markers methods seem equivalent, and even overlapping with the other paradigms. To disprove this hypothesis and show that the proposed methods indeed require less user intervention with statistical significance, we have analyzed them using the Friedman test, representing its results through the Nemenyi post-hoc test. As pointed out by Demˇsar [44], when the assumptions that the data is drawn from a population with normal distribution and sphericity (variance homogeneity) are violated, the aforementioned tests are more reliable than ANOVA. Fig. 8 depicts the Nemenyi tests computed for each dataset. The methods are sorted from 1 to 8 according to the average rank computed by the Friedman test, in increasing order of required user interaction (values closer to 1 indicate better performance). The horizontal bars connect methods that are statistically equivalent at p ≤ 0.05. LiveMarkers and LiveCut are both statistically equivalent and the overall winners for all three datasets. Aside from the GeoStar dataset, RiverCut closely follows both methods being statistically equivalent to them according to the critical distances (CD) computed by the Nemenyi test for the GrabCut and Liver datasets. In GeoStar, RiverCut is only equivalent to LiveMarkers. GCMF, DIFT-SC,

Fig. 9. F-measure curve for 20 iterations of segmentation of the GeoStar dataset, with varying values of β for the min-cut/max-flow algorithm (Eq. 3). Note how the live markers methodology improves segmentation even for β = 1.0, while higher values (β > 7.0) lead GCMF-based delineation to achieve results similar to IFT-SC-based methods, as expected.

and Riverbed are equivalent for all three datasets, being at the bottom of the rank. LWOF and RiverMarkers lie somewhere between these two groups rank-wise. Aside from the GrabCut dataset, there is an intersection between LWOF and RiverMarkers in the Nemenyi tests, however, the CD values for the GrabCut and Liver datasets put these methods in the range of either DIFT-SC and Riverbed (LWOF) or the other live markers approaches (RiverMarkers). Thus, we cannot state anything statistically about these two methods for the GeoStar and Live datasets. Empirically, we have observed that RiverMarkers is less stable

5766

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 10. Final segmentation results obtained by the robot users using boundary- and region-based techniques (top rows) and the proposed hybrid approaches using only live markers (bottom rows). Anchor points are green dots on the object’s boundary. Brush strokes for GCMF and DIFT-SC are colored circles with internal/external labels shown in blue/red or white/blue, depending on the image for greater contrast. Live markers follow the same color scheme, being the boundary segment between consecutive anchor points. Notice how live markers-based approaches require fewer interactions to delineate images with weaker boundary information (statue) and complex shapes (flower), combining the strenghts of the other four methods and paradigms.

than the other live markers approaches because both Riverbed and DIFT-SC are more susceptible to weaker boundary information, which is the main challenge of all three datasets. For just the same reason, we see better performance of LWOF over Riverbed and the region-based methods. We may conclude that hybrid approaches using live markers improve the interactive experience in segmentation by reducing the required amount of user effort, while maintaining accuracy and control. Table I and Fig. 7 further indicate that such type of interaction helps even when combining methods without complementary advantages, such as RiverMarkers and LiveCut. As observed in [18], Riverbed can be used to complete DIFT-SC’s segmentation directly, by considering the duality between them. Hence, the RiverMarkers’ combination essentially tries to exploit this property to cope with

DIFT-SC’s leaking problem using live markers. However, our results clearly show that LiveMarkers is superior in this aspect. To further strenghten our claim towards the use of live markers, we have analyzed the behavior of parameter β of Eq. 3 for methods derived from the min-cut/max-flow algorithm. The graphs in Fig. 9 show the result of using LiveCut, RiverCut, and GCMF to segment all images in the GeoStar dataset, with a fixed number of 20 iterations and increasing values of β between 1.0 and 30.0. These graphs clearly show as expected that higher values of β (e.g., over 15) make LiveCut and RiverCut behave like LiveMarkers and RiverMarkers, respectively. This reasoning stems from GCMF becoming IFT-SC, although we leave out the same type of comparison with these two methods for clarity (similarly, LWOF and Riverbed are ignored because they do not use β). The interesting aspect of

SPINA et al.: HYBRID APPROACHES FOR INTERACTIVE IMAGE SEGMENTATION

the graphs in Fig. 9 is that even for lower values of β (e.g., less than 7.0) the live markers methodology improves the quality of segmentation by making the region-based delineation using the min-cut/max-flow more stable. After only 10 iterations, LiveCut with β = 1.0 reaches LiveMarkers’ accuracy and maintains it for the remaining ones. Therefore, we confirm β = 7.0 as a good trade off between boundary smoothness from GCMF and marker positioning robustness obtained from IFT-SC, when segmenting an image using either LiveCut or RiverCut. Our results further indicate that testing other values for γ in f  is unnecessary, given the overall proficiency of both LiveMarkers and LiveCut. Fig. 10 depicts some examples of the final segmentation result for all methods obtained using the robot users. See our supplementary materials for video results, our datasets, and the code for all of our methods and robots (www.ic.unicamp.br/∼tvspina/projects/livemarkers). VI. C ONCLUSIONS We have presented the live markers methodology, a hybrid paradigm for the effective segmentation of natural and medical images in 2D and 3D, requiring minimum user intervention. In this approach, optimum boundary-tracking segments are turned into hard constraints (markers) for a region-based delineation method, which actually performs foreground extraction. We demonstrated this methodology through the combinations of boundary- and region-based delineation methods inspired by the Generalized Graph Cut segmentation framework [29]. This framework suggests that the algorithms of the the Image Foresting Transform [37] (IFT) and the min-cut/max-flow algorithm [6] are the two most suitable for solving an entire class of energy functions related to optimum graph cuts. Hence, we investigated four live markers approaches derived from these algorithms, which combine boundary-tracking via live-wire-on-the-fly (LWOF) or Riverbed with region-based delineation using DIFT-SC or GCMF. LWOF can overcome weaker parts of the object’s border to stop DIFT-SC’s leaking problem and their successful combination, LiveMarkers [19], motivated the study of other hybrid methods in this work. RiverCut turns the long boundary segments computed by Riverbed into markers for region-based GCMF to complete segmentation by short-cutting sections of the object’s border with lower contrast. The remaining two combinations, LiveCut and RiverMarkers, do not take advantage of complementary properties. Instead, they demonstrated that, in the case of LiveCut, live markers dramatically decreased the required amount of user effort even when performing GCMF without penalizing arc weights with an increasing power function or the traditional data-driven energy term [6]. RiverMarkers mainly represented a way to exploit the duality between Riverbed and IFT-SC [18]. We conducted extensive experiments using novel robot users proposed for the evaluation of boundary-tracking techniques, confirming our claims with statistical reliability. Hence, we conclude that live markers is a powerful methodology that can be used as a new form of interaction in user-steered image segmentation. We have further shown examples of how to use LiveMarkers in the segmentation of 3D images of the brain

5767

and multiple objects. In the former case, live markers can be added in multiple slices while delineation is always carried out in 3D, thereby representing a new interface for extending boundary-tracking techniques from the 2D domain. A. Challenges, Automation, and Future Works The live markers paradigm represents a new form of user interaction that can be coupled with any scribble-based system for interactive image segmentation. It is on par with other approaches that try to minimize the user’s effort for indicating the whereabouts of the desired foreground in an image, such as the bounding box selection interface proposed in GrabCut [28]. In other words, the live markers paradigm is only as good as the methods it combines in coping with different challenges. For low contrast images or that have a certain amount of noise, for example, the LiveMarkers method takes advantage of both LWOF and DIFT-SC to facilitate segmentation. LWOF can overcome poorly defined sections of the foreground’s boundary under low contrast. At the same time, DIFT-SC’s path cost function f min makes IFT’s optimum paths rooted at foreground and background seeds meet on the object’s boundary, before absorbing noisy pixels inside those regions. Hence, as long as noise speckles do not cause severe gaping of the border, they are not a major issue. Similarly, LiveCut and RiverCut depend on GCMF to handle noisy areas. Of course, our feature extraction step and arc weight estimation (Section II) help considerably with those challenges. See our supplementary material for some examples. Future works involve testing the live markers methodology with other recently proposed approaches for region-based interactive segmentation (see [10], [13], [38]), developing new boundary-tracking approaches using other types of connectivity functions [12], and seeking extra boundary smoothness via post-processing [45]. Also, as we speculated in [40], it should be possible to automate the selection of both brush strokes and live markers using robot users. The expert robot users proposed in [38] and [40] can be used to learn the spatial distribution of commonly selected markers for the segmentation of an object-specific dataset (e.g., the brain). The system would then segment a test image placing markers according to the statistics of the learnt distribution, evaluating if the resulting segmentation mask resembles the ones seen in the training dataset, similarly to constrained parametric min-cuts [46]. ACKNOWLEDGMENT The authors would like to thank Prof. Dr. J. K. Udupa and the creators of the GrabCut, GeoStar, and Video Pose 2 [47] datasets for the images used in this paper. R EFERENCES [1] S. D. Olabarriaga and A. W. M. Smeulders, “Interaction in the segmentation of medical images: A survey,” Med. Image Anal., vol. 5, no. 2, pp. 127–142, 2001. [2] T. Porter and T. Duff, “Compositing digital images,” in Proc. ACM Trans. Graph., New York, NY, USA, 1984, pp. 253–259. [3] J. Wang and M. Cohen, “Image and video matting: A survey,” Found. Trends Comput. Graph. Vis., vol. 3, no. 2, pp. 97–175, 2008.

5768

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

[4] A. X. Falc˜ao, J. K. Udupa, S. Samarasekera, S. Sharma, B. E. Hirsch, and R. A. Lotufo, “User-steered image segmentation paradigms: Live wire and live lane,” Graph. Models Image Process., vol. 60, no. 4, pp. 233–260, 1998. [5] E. N. Mortensen and W. A. Barrett, “Interactive segmentation with intelligent scissors,” Graph. Models Image Process., vol. 60, no. 5, pp. 349–384, 1998. [6] Y. Boykov and G. Funka-Lea, “Graph cuts and efficient N-D image segmentation,” Int. J. Comput. Vis., vol. 70, no. 2, pp. 109–131, Nov. 2006. [7] A. X. Falc˜ao and F. P. G. Bergo, “Interactive volume segmentation with differential image foresting transforms,” IEEE Trans. Med. Imag., vol. 23, no. 9, pp. 1100–1108, Sep. 2004. [8] X. Bai and G. Sapiro, “Geodesic matting: A framework for fast interactive image and video segmentation and matting,” Int. J. Comput. Vis., vol. 82, no. 2, pp. 113–132, Apr. 2009. [9] P. A. V. de Miranda, A. X. Falc˜ao, and J. K. Udupa, “Synergistic arc-weight estimation for interactive image segmentation using graphs,” Comput. Vis. Image Understand., vol. 114, no. 1, pp. 85–99, 2010. [10] V. Gulshan, C. Rother, A. Criminisi, A. Blake, and A. Zisserman, “Geodesic star convexity for interactive image segmentation,” in Proc. IEEE CVPR, San Francisco, CA, USA, Jun. 2010, pp. 3129–3136. [11] T. V. Spina, P. A. V. de Miranda, and A. X. Falc˜ao, “Intelligent understanding of user interaction in image segmentation,” Int. J. Pattern Recognit. Artif. Intell., vol. 26, no. 2, pp. 1265001-1–1265001-26, 2012. [12] L. A. C. Mansilla, F. A. M. Cappabianco, and P. A. V. Miranda, “Image segmentation by image foresting transform with non-smooth connectivity functions,” in Proc. 26th SIBGRAPI, Arequipa, Peru, Aug. 2013, pp. 147–154. [13] P. A. V. Miranda and L. A. C. Mansilla, “Oriented image foresting transform segmentation by seed competition,” IEEE Trans. Image Process., vol. 23, no. 1, pp. 389–398, Jan. 2014. [14] A. Martelli, “Edge detection using heuristic search methods,” Comput. Graph. Image Process., vol. 1, no. 2, pp. 169–182, 1972. [15] A. X. Falc˜ao, J. K. Udupa, and F. K. Miyazawa, “An ultra-fast usersteered image segmentation paradigm: Live wire on the fly,” IEEE Trans. Med. Imag., vol. 19, no. 1, pp. 55–62, Jan. 2000. [16] F. Malmberg, E. Vidholm, and I. Nystrom, “A 3D live-wire segmentation method for volume images using haptic interaction” in Proc. DGCI, vol. 4245. Szeged, Hungary, 2006, pp. 663–673. [17] J. Liu and J. K. Udupa, “Oriented active shape models,” IEEE Trans. Med. Imag., vol. 28, no. 4, pp. 571–584, Apr. 2009. [18] P. A. V. Miranda, A. X. Falc˜ao, and T. V. Spina, “Riverbed: A novel user-steered image segmentation method based on optimum boundary tracking,” IEEE Trans. Image Process., vol. 21, no. 6, pp. 3042–3052, Jun. 2012. [19] T. V. Spina, A. X. Falc˜ao, and P. A. V. Miranda, “User-steered image segmentation using live markers,” in Proc. CAIP, vol. 6854. Seville, Spain, 2011, pp. 211–218. [20] L. Vincent and P. Soille, “Watersheds in digital spaces: An efficient algorithm based on immersion simulations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 6, pp. 583–598, Jun. 1991. [21] J. B. T. M. Roerdink and A. Meijster, “The watershed transform: Definitions, algorithms and parallelization strategies,” Fundam. Inf., vol. 41, pp. 187–228, Apr. 2000. [22] R. Audigier and R. A. Lotufo, “Watershed by image foresting transform, tie-zone, and theoretical relationship with other watershed definitions,” in Proc. ISMM, Rio de Janeiro, Brazil, Oct. 2007, pp. 277–288. [23] J. Cousty, G. Bertrand, L. Najman, and M. Couprie, “Watershed cuts: Thinnings, shortest path forests, and topological watersheds,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 5, pp. 925–939, May 2010. [24] P. K. Saha and J. K. Udupa, “Relative fuzzy connectedness among multiple objects: Theory, algorithms, and applications in image segmentation,” Comput. Vis. Image Understand., vol. 82, no. 1, pp. 42–56, 2001. [25] J. K. Udupa, P. K. Saha, and R. A. Lotufo, “Disclaimer: ‘Relative fuzzy connectedness and object definition: Theory, algorithms, and applications in image segmentation’,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 11, pp. 1485–1500, Nov. 2002. [26] K. C. Ciesielski, J. K. Udupa, P. K. Saha, and Y. Zhuge, “Iterative relative fuzzy connectedness for multiple objects with multiple seeds,” Comput. Vis. Image Understand., vol. 107, no. 3, pp. 160–182, 2007.

[27] K. C. Ciesielski, J. K. Udupa, A. X. Falc˜ao, and P. A. V. Miranda, “Comparison of fuzzy connectedness and graph cut segmentation algorithms,” Proc. SPIE Med. Imag., vol. 7962, pp. 7962031–79620312, Mar. 2011. [28] C. Rother, V. Kolmogorov, and A. Blake, “‘GrabCut’: Interactive foreground extraction using iterated graph cuts,” ACM Trans. Graph., vol. 23, no. 3, pp. 309–314, 2004. [29] K. C. Ciesielski, J. K. Udupa, A. X. Falc˜ao, and P. A. V. Miranda, “Fuzzy connectedness image segmentation in graph cut formulation: A lineartime algorithm and a comparative analysis,” J. Math. Imag. Vis., vol. 44, no. 3, pp. 375–398, 2012. [30] C. Couprie, L. Grady, L. Najman, and H. Talbot, “Power watershed: A unifying graph-based optimization framework,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 7, pp. 1384–1399, Jul. 2011. [31] P. A. V. Miranda and A. X. Falc˜ao, “Links between image segmentation based on optimum-path forest and minimum cut in graph,” J. Math. Imag. Vis., vol. 35, no. 2, pp. 128–142, Oct. 2009. [32] R. Audigier, R. A. Lotufo, and M. Couprie, “The tie-zone watershed: Definition, algorithm and applications,” in Proc. IEEE ICIP, Genoa, Italy, Sep. 2005, pp. 654–657. [33] K. C. Ciesielski, P. A. V. Miranda, A. X. Falc˜ao, and J. K. Udupa, “Joint graph cut and relative fuzzy connectedness image segmentation algorithm,” Med. Image Anal., vol. 17, no. 8, pp. 1046–1057, 2013. [34] J. Liang, T. McInerney, and D. Terzopoulos, “United snakes,” Med. Image Anal., vol. 10, no. 2, pp. 215–233, 2006. [35] Y. Li, J. Sun, C.-K. Tang, and H.-Y. Shum, “Lazy snapping,” ACM Trans. Graph., vol. 23, no. 3, pp. 303–308, Aug. 2004. [36] P. Kohli, H. Nickisch, C. Rother, and C. Rhemann, “User-centric learning and evaluation of interactive segmentation systems,” Int. J. Comput. Vis., vol. 100, no. 3, pp. 261–274, 2012. [37] A. X. Falc˜ao, J. Stolfi, and R. A. Lotufo, “The image foresting transform: Theory, algorithms, and applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 1, pp. 19–29, Jan. 2004. [38] P. E. Rauber, T. V. Spina, P. J. de Rezende, and A. X. Falc˜ao, “Interactive segmentation by image foresting transform on superpixel graphs,” in Proc. SIBGRAPI, Arequipa, Peru, Aug. 2013, pp. 131–138. [39] K. McGuinness and N. E. O’Connor, “Toward automated evaluation of interactive segmentation,” Comput. Vis. Image Understand., vol. 115, no. 6, pp. 868–884, 2011. [40] T. V. Spina and A. X. Falc˜ao, “Robot users for the evaluation of boundary-tracking approaches in interactive image segmentation,” in Proc. ICIP, Paris, France, 2014. [41] K. Li, X. Wu, D. Z. Chen, and M. Sonka, “Optimal surface segmentation in volumetric images—A graph-theoretic approach,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 1, pp. 119–134, Jan. 2006. [42] L. Liang, K. Rehm, R. P. Woods, and D. A. Rottenberg, “Automatic segmentation of left and right cerebral hemispheres from MRI brain volumes using the graph cuts algorithm,” NeuroImage, vol. 34, no. 3, pp. 1160–1170, 2007. [43] R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton Univ. Press, 1957. [44] J. Demˇsar, “Statistical comparisons of classifiers over multiple data sets,” J. Mach. Learn. Res., vol. 7, pp. 1–30, Dec. 2006. [45] F. Malmberg, I. Nystr¨om, A. Mehnert, C. Engstrom, and E. Bengtsson, “Relaxed image foresting transforms for interactive volume image segmentation,” Proc. SPIE, vol. 7623, p. 762340, Mar. 2010. [46] J. Carreira and C. Sminchisescu, “CPMC: Automatic object segmentation using constrained parametric min-cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 7, pp. 1312–1328, Jul. 2012. [47] B. Sapp, D. Weiss, and B. Taskar, “Parsing human motion with stretchable models,” in Proc. IEEE CVPR, Colorado Springs, CO, USA, Jun. 2011, pp. 1281–1288. Thiago Vallin Spina received the B.Sc. degree in computer science from the University of Campinas, Campinas, Brazil, in 2009, where he is currently pursuing the Ph.D. degree in computer science with Prof. A. X. Falc˜ao, with a focus on interactive segmentation of natural images and videos. From 2011 to 2012, he was with the University of Minnesota, Minneapolis, MN, USA, with Prof. G. Sapiro on body pose estimation applied to the analysis of early behavioral markers of Autism in toddlers. His research primarily involves segmentation of images and videos using graph-based tools and fuzzy models of content knowledge. His areas of interest are image processing and analysis, computer vision, human pose estimation, and machine learning.

SPINA et al.: HYBRID APPROACHES FOR INTERACTIVE IMAGE SEGMENTATION

Paulo A. V. de Miranda is currently a Professor with the Institute of Mathematics and Statistics, University of S˜ao Paulo, S˜ao Paulo, Brazil. He received the B.Sc. degree in computer engineering and the M.Sc. degree in computer science from the University of Campinas (UNICAMP), Campinas, Brazil, in 2003 and 2006, respectively. He was a recipient of the Best Master’s Dissertation Award from the Institute of Computing, UNICAMP. From 2008 to 2009, he was with the University of Pennsylvania, Philadelphia, PA, USA, on image segmentation for his doctorate. He received the Ph.D. degree in computer science from UNICAMP, in 2009. After that, he was a Post-Doctoral Researcher in the project called, Brain Image Analyzer, which is held in conjunction with Professors with the Department of Neurology, UNICAMP. He has been a Professor of Computer Science and Engineering with the University of S˜ao Paulo since 2011. His main research involves image segmentation and analysis, medical imaging applications, pattern recognition, and content-based image retrieval.

5769

Alexandre Xavier Falc˜ao is currently a Full Professor with the Institute of Computing, University of Campinas, Campinas, Brazil. He received the B.Sc. degree in electrical engineering from the Federal University of Pernambuco, Recife, Brazil, in 1988. He has worked on biomedical image processing, visualization, and analysis since 1991. He received the M.Sc. degree in electrical engineering from the University of Campinas, in 1993. From 1994 to 1996, he was with the Medical Image Processing Group, Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA, on interactive image segmentation for his doctorate. He received the Ph.D. degree in electrical engineering from the University of Campinas, in 1996. In 1997, he worked on a project for Globo TV at a research center, CPqD-TELEBRAS, Campinas, developing methods for video quality assessment. He has been a Professor of Computer Science and Engineering with the University of Campinas since 1998. His main research interests include image/video processing, visualization, and analysis, image annotation, organization, and retrieval, machine learning and pattern recognition, and their applications in biology, medicine, biometrics, endodontics, geology, and agriculture.

Hybrid approaches for interactive image segmentation using the live markers paradigm.

Interactive image segmentation methods normally rely on cues about the foreground imposed by the user as region constraints (markers/brush strokes) or...
3MB Sizes 0 Downloads 5 Views