Hierarchical lung field segmentation with joint shape and appearance sparse learning.

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 33, NO. 9, SEPTEMBER 2014

1761

Hierarchical Lung Field Segmentation With Joint Shape and Appearance Sparse Learning Yeqin Shao, Yaozong Gao, Yanrong Guo, Yonghong Shi, Xin Yang, and Dinggang Shen*

Abstract—Lung field segmentation in the posterior–anterior (PA) chest radiograph is important for pulmonary disease diagnosis and hemodialysis treatment. Due to high shape variation and boundary ambiguity, accurate lung field segmentation from chest radiograph is still a challenging task. To tackle these challenges, we propose a joint shape and appearance sparse learning method for robust and accurate lung field segmentation. The main contributions of this paper are: 1) a robust shape initialization method is designed to achieve an initial shape that is close to the lung boundary under segmentation; 2) a set of local sparse shape composition models are built based on local lung shape segments to overcome the high shape variations; 3) a set of local appearance models are similarly adopted by using sparse representation to capture the appearance characteristics in local lung boundary segments, thus effectively dealing with the lung boundary ambiguity; 4) a hierarchical deformable segmentation framework is proposed to integrate the scale-dependent shape and appearance information together for robust and accurate segmentation. Our method is evaluated on 247 PA chest radiographs in a public dataset. The experimental results show that the proposed local shape and appearance models outperform the conventional shape and appearance models. Compared with most of the state-of-the-art lung field segmentation methods under comparison, our method also shows a higher accuracy, which is comparable to the inter-observer annotation variation.

Index Terms—Active shape model, chest radiograph, deformable segmentation, local appearance model, local sparse shape composition, sparse learning. Manuscript received January 15, 2014; revised February 06, 2014; accepted February 06, 2014. Date of publication February 11, 2014; date of current version August 28, 2014. This work was supported in part by the National Institutes of Health (NIH) under Grant CA140413, in part by the National Basic Research Program of China 2010CB732506, and in part by the Shanghai Basic Research Program 12JC1406600. (Yeqin Shao and Yaozong Gao are co-first authors.) Asterisk indicates corresponding author. *D. Shen is with the Department of Radiology and Biomedical Research Imaging Center (BRIC), Chapel Hill, NC 27599 USA and also with Department of Brain and Cognitive Engineering, Korea University, Seoul 136701, Korea (e-mail: [email protected]). Y. Shao is with the Institution of Image Processing & Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China and with the Department of Radiology and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, NC 27599 USA and also with Nantong University, Jiangsu 226019, China (e-mail: [email protected]). Y. Gao and Y. Guo are with the Department of Radiology and Biomedical Research Imaging Center (BRIC), Chapel Hill, NC 27599 USA (e-mail: gaoking132gmail.com, [email protected]). Y. Shi is with the Digital Medical Research Center, Fudan University, Shanghai 200032, China (e-mail: [email protected]). X. Yang is with the Institution of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: yangxin@sjtu. edu.cn). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMI.2014.2305691

Fig. 1. Illustration of 2-D cardiothoracic ratio. 2-D cardiothoracic ratio [3] is defined as the squared root of the ratio between cardiac area (dark gray) and the summation of cardiac and lung field areas (light gray) in the chest radiograph.

I. INTRODUCTION

P

OSTERIOR–ANTERIOR (PA) chest radiograph is a widely adopted medical imaging modality for screening and diagnosis of pulmonary diseases such as lung cancer, tuberculosis, emphysema, and pneumoconiosis due to its low radiation, low cost, and wide deployment [1], [2]. Lung field segmentation from chest radiograph, as an initial step of pulmonary diseases diagnosis, provides a lung mask for subsequent processing steps. Accurate lung field segmentation not only saves physicians’ efforts for manual identification of the lung anatomy but also decreases the false positives of computer-aided diagnosis (CAD) in detecting the lung cancer [1]. Besides, lung field segmentation also plays an indispensable role in the hemodialysis treatment. In the hemodialysis treatment, the dry weight of a patient is often used as a standard to keep the fluid balance during dialysis session. A misevaluation of dry weight could lead to a fatal illness, sometimes even death. Nowadays, 2-D cardiothoracic ratio, which is calculated based on the lung field segmentation of chest radiograph, is often used to measure the dry weight, as shown in Fig. 1. To precisely compute the 2-D cardiothoracic ratio, accurate lung field segmentation is thus clinically desired. However, it is still a great challenge to accurately segment the lung field from a chest radiograph due to the following reasons. 1) Variation of lung shapes: Because of the individual differences (e.g., gender, age, and physical condition), there are large shape variations across different patients. Additionally, the heart occludes left lung in different ways, which causes large shape variations of the left lung across patients in 2-D radiograph (Fig. 2). 2) Ambiguity of lung boundary: Due to the overlap with other surrounding organs (e.g., clavicles and rib cages), the intensities of different parts of lung are not consistent. Besides, the

0278-0062 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1762

Fig. 2. Illustration of high shape variability and boundary ambiguity of the lung field. Blue curves denote the manually-delineated lung contours. Red arrows indicate the fuzzy boundary. All shown images are of original resolution, without being scaled.

complicated nearby structures along lung boundary (e.g., heart and trachea) can have similar intensity with lung, which leads to a fuzzy boundary (as indicated by red arrows in Fig. 2). To deal with these difficulties, many lung field segmentation methods have been proposed for PA chest radiograph. In general, they can be divided into three categories. 1) Rule-based segmentation methods: These methods employ pre-defined anatomical rules to segment the lung field. Brown et al. [4] proposed to generate an anatomical model of the lung boundary and match the extracted image edges to this model for the lung field segmentation. Duryea et al. [5] presented a method to detect the outline of diaphragm to delineate the lung field. Armato et al. [6] developed an approach which combines the global and local grey-level threshold analysis to automatically segment the lungs from digitized chest radiograph. 2) Pixel-based classification methods: These methods formulate lung field segmentation as a classification problem and thus learn a classifier to label each pixel as lung or nonlung. Ginneken et al. [7] proposed to use a K-NN classifier with a set of Gaussian derivative filters to classify each image pixel in a multi-resolution manner. McNitt–Gray et al. [8] employed the features selected by stepwise discriminant analysis and also two classifiers, a linear discriminator and a neural network, to classify the lung image. Ginneken et al. [9] integrated the rule-based anatomical knowledge into pixel-wise classification to combine the strengths of both methods. 3) Deformable model based methods: These methods adopt the internal force from object shape and the external force from image appearance to guide the lung segmentation. Dawoud [10] suggested to fuse shape prior, which is extracted from training samples, into an intensity-based iterative thresholding approach for lung field segmentation. Annangi et al. [11] integrated the lung edge and castrophenic angle features into the level set formulation to avoid the local minima incurred by the rib cage and clavicle. Sohn [12] located and brightened the nearby trachea to eliminate its influence on segmentation and then employed a Chan–Vese active contour model to segment the lung field. Shi et al. [13] proposed to combine both population-based and patient-specific shape statistics to constrain the deformable contour and achieved a robust and accurate


lung field segmentation result. Xu et al. [1] developed a segmentation algorithm to combine the edge and region force for both model initialization and deformation. Under ASM framework, Zhang et al. [14] presented a sparse shape composition (SSC) model to constrain the deformed shape by a sparse representation of shapes in the training samples. Among these methods, active shape model (ASM) and active appearance model (AAM) [15]–[17], gain more popularity than other methods due to their capability to incorporate both lowlevel appearance cues and high-level shape prior into a unified framework. In the conventional ASM [18], the shape model is represented by a set of manually annotated points. To construct a shape space, shapes of all training images are affine aligned onto a selected template space, and then PCA [19] is used to represent shape space by keeping major shape variations from all aligned shapes. To establish an appearance model, Gaussian appearance model is built for each point. Specifically, in the training images, each point of the shape model is represented by a profile with the pixel-wise appearance information (such as intensities and gradients) extracted along both inward and outward surface normal directions of this point. Then, for each point in the shape model, both the mean and covariance matrix of profiles are computed from different training images to serve as the Gaussian appearance model for this point. During the ASM-based segmentation, the mean shape is firstly initialized onto the testing image, and then each point in the shape model is independently deformed along both inward and outward surface normal directions to a position with the minimal cost such as defined below (1) where is the th manually annotated point of the shape model, is the candidate position in either inward or outward surface normal directions of point , is the profile computed at position , and are the mean profile and covariance matrix of point computed from all training images, respectively. Subsequently, the deformed shape is refined by PCA shape model. By alternating the model deformation and shape refinement, the shape model is gradually driven to the target boundary under the guidance of both appearance and shape models. Despite of some effectiveness exhibited in lung field segmentation, the conventional ASM suffers from three limitations. 1) In conventional ASM, the mean shape of training images is employed as initial shape for the deformation model. Due to the large inter-patient shape variation, the mean shape can be far different from the target shape of the patient, which results in inferior model initialization. Because ASM is sensitive to shape initialization, a bad initial shape can easily cause the deformation stuck in the local minima [Fig. 3(a)]. A recently proposed method, namely marginal space learning (MSL) [20], can provide better initialization by aligning the mean shape with sequentially estimated position, orientation and scaling parameters. However, it may not be always robust due to the “sequential” nature—a gross estimation error in one step would propagate to the next steps and eventually produce an

SHAO et al.: HIERARCHICAL LUNG FIELD SEGMENTATION WITH JOINT SHAPE AND APPEARANCE SPARSE LEARNING

1763

Fig. 3. Illustration of the major limitaions of the conventional ASM. Image (a) demostrates the limitation of using the mean shape as initialization. Green curve indicates the segmentation result using mean shape as initialization. The blue curve indicates the shape of manual ground-truth. Segmentation result by mean shape is distracted by the surrounding structures. Image (b) shows the limitations of both PCA-based shape model and the global SSC-based shape model. Red and green curves indicate the segmentation results by the PCA-based shape model and the global SSC-based shape model, respectively. Blue curve indicates the shape of manual ground-truth. Image (c) shows the lung shape distribution along the two major shape variation modes, corresponding to the two largest eigenvalues. It can be observed that the shape distribution is not Gaussian. Image (d) shows the joint distribution of the first PCA components of intensity and gradient based line features at one model point. Intensity and gradient based line features are the intensities and gradients extracted along both inward and outward surface normal directions of model point, respectively. As image (d) demonstrates, the distribution is not Gaussian as well.

unreasonable initialization. As reported in [21], MSL tends to suffer from large shape variations. 2) The shape distribution of all training images may not be Gaussian. In conventional ASM, with Gaussian assumption, the training shapes are mathematically modeled by PCA with major shape variation modes. However, the lung shape does not follow Gaussian distribution [Fig. 3(c)], and hence major variation modes cannot appropriately describe various shapes. Therefore, the shape details, if statistically not significant, could be smoothed out (see and compare the red curve in Fig. 3(b) with the blue curve—the manual ground-truth). A recently developed method, sparse shape composition (SSC) [14], proposes a nonparametric shape model to preserve details of the deformable shape. However, due to the inhomogeneous shape variation modes along the lung boundary (i.e., small variations on the apex, and large variations on the base) (Fig. 2), the sparse representation of the global shape tends to fit only the overall shape, but neglects the local details, which are actually very important for accurate lung field segmentation (see and compare the green curve in Fig. 3(b) with the blue curve—the manual ground-truth). 3) The appearance statistics of lung boundaries might not follow the Gaussian distribution, either. In the conventional ASM, the

appearance model of a manually annotated point is assumed to fit a Gaussian distribution. However, this assumption does not hold in many applications [22]–[24]. As an effective solution, machine learning based methods have been adopted recently to overcome this limitation [22]–[24]. The same situation applies to the lung field radiograph. Due to the ambiguous boundary and large inter-patient variations, the lung boundaries do not follow the Gaussian assumption as shown in Fig. 3(d), which potentially limits the performance of deformable segmentation. In this paper, we propose a hierarchical lung field segmentation method with joint shape and appearance sparse learning to accurately segment the lung field. The main contributions of the proposed method are fourfold. 1) A robust shape initialization approach is designed to infer an initial shape based on the anatomically constrained landmark localization and sparse shape representation. Compared with the conventional mean shape initialization, the inferred initial shape is much closer to the true patient-specific shape, thus greatly reducing the chance of being stuck in the local minima. 2) We present a set of local sparse shape composition (local SSC) models to describe the shape in a segment-to-segment manner. To better represent the target shape, the whole lung shape is divided into several seg-

1764


Fig. 4. Flowchart of the proposed method.

ments, each with consistent shape variation. Thus, each segment of a shape can be sparsely composed by the corresponding segments of the selected training samples, which provides more flexibility for shape composition and also preserves the local shape details. 3) Instead of assuming a Gaussian distribution on image appearance, we adopt nonparametric sparse representation for appearance modeling. To deal with the inhomogeneous appearance along the lung boundary, we split and cluster the lung boundary into several segments of similar appearance characteristics. On each segment, we build a local sparse appearance model to better capture the appearance characteristics for effectively guiding the shape deformation. 4) Finally, a hierarchical segmentation strategy is proposed to integrate the scale-dependent shape and appearance information together for effective segmentation. The remainder of the paper is organized as follows. In Section II, we introduce our hierarchical lung field segmentation framework, followed by the details for each component of our method. Extensive experiments are carried out in Section III. The paper concludes in Section IV. II. METHODS A. Overview Our method aims to hierarchically segment lung fields in chest radiograph using both shape and appearance sparse learning. As most learning-based methods, the proposed method consists of training stage and testing stage (as shown in Fig. 4). In the training stage, three models (i.e., shape initialization model, local shape model, and local appearance model)

are learned from training samples for use in the testing stage. 1) Shape Initialization Model : To construct shape initialization model, 14 landmarks (e.g., white circles of “Training Data” in Fig. 4) on the lung boundary are selected. (In this paper, we define these 14 selected landmarks as “landmarks,” and other manually annotated landmarks on the lung boundary as “points.”) For each landmark, a specific landmark detector is learned via cascade learning [25]. In addition, the spatial relations across different landmarks are also modeled by regression models to improve the overall consistency of landmark detection. 2) Local Shape Model : To establish local shape model, the whole lung shape is first split into segments according to the local shape variations and spatial positions. Then, for each shape segment, we learn a local sparse shape dictionary, which contains local shape priors. 3) Local Appearance Model : To build local appearance model, the lung field boundary is similarly split into several segments according to the appearance cues and spatial distances. For each segment, we learn a discriminative sparse dictionary as a local appearance model, for capturing the local appearance characteristics. In the testing stage, the learned landmark detectors of the shape initialization model are first adopted to detect the 14 landmarks on a testing lung image. Based on the detected landmarks, an initial shape is inferred by sparse shape representation. This initial shape is deformed under the guidance from both local appearance models and local shape models in a segment-to-segment manner. The appearance-guided deformation and the shape-statistics-based refinement are iteratively performed until convergence. To ensure the robustness and accuracy of our method, a hierarchical deformable segmentation framework is exploited to drive the deformable model onto the boundary in a coarse-to-fine manner.


1765

Fig. 5. Illustration of cascade learning framework. represents the classifier at the th cascade level, denotes the initial negative denotes the positive training samples, denotes the misclassified negative training samples after the previous cascade levels, training samples, denotes the correctly classified negative training samples by .

B. Robust Shape Initialization As is well known, deformable model is sensitive to the initialization. For example, an initialization far from the desired boundary could lead to the failure of segmentation. To initialize the lung shape model, our method consists of three steps: 1) 14 landmarks are detected using a learning-based landmark detection procedure; 2) spatial regression models are learned to correct the possible wrongly-detected or missed landmarks; 3) sparse shape composition is used to infer an initial lung shape model based on the finally located landmarks. 1) Learning-Based Landmark Detection: Motivated by Viola’s face detection [25], we adopt a learning-based approach for landmark detection. The idea is to formulate the landmark detection as a landmark and nonlandmark classification problem. For each landmark, pixels within a small neighborhood of the landmark are labeled as positives and all others are treated as negatives. The extended Haar features [26] are used as image features. Due to the imbalanced positive and negative training samples (in the ratio of ), it is impossible to train an Adaboost classifier with all negative training samples, therefore we employ a cascade-learning framework [25], [27], as shown in Fig. 5. In the training stage, a sequence of Adaboost classifiers is trained to gradually eliminate the negatives from the positives. Specifically, in cascade level , we use previous trained classifiers to test which negatives are misclassified by the previous cascade classifiers. Those misclassified negatives will be used as negative training samples together with all original positives to train an additional classifier in the current level. This process continues until we find the ratio of misclassified negatives to the positives is less than 0.5. In the testing stage, the learned cascade classifiers are used to classify each pixel of the new lung image. The pixel with the finally highest classification score is regarded as the detected landmark. To increase the robustness and efficiency of our landmark detection, we adopt a multi-scale detection scheme. Specifically, in the training stage, we train one sequence of classifiers (cascade classifiers) for each scale. In total, we use three scales. Thus, we have three sequences of classifiers for each landmark. In the testing stage, we first detect the landmark in the coarse scale, which is employed as initialization for the next finer levels. In the fine levels, landmark detection only needs to search over a local neighborhood around the initialization from the last level. Therefore, the efficiency of landmark detection is largely increased under this multi-scale scheme. Fig. 5 illustrates the idea of cascade learning.

Fig. 6. Illustration of landmark detection and outlier correction. Green crosses are the ground-truth landmarks manually annotated by expert. Red points are the detected landmarks by appearance features. Blue circles are the positions of landmarks after outlier correction. In image (a), the landmark in the yellow box is missed. After the correction of lung landmarks, the landmark in yellow box is recovered [as indicated by blue circle in image (b)].

In order to infer a good initial shape from landmarks, we manually select 14 landmarks on the lung boundary for detection (as shown by green crosses in Fig. 6) and train an independent detector for each landmark. These 14 landmarks are chosen based on two criteria (although other advanced methods could also be used): 1) the image appearance of each landmark is distinctive within its neighborhood; 2) the combination of these landmarks is sufficient to represent the rough shape of the underlying lung field. 2) Spatial Regression Models for Outlier Correction: Due to the large inter-patient appearance variance (e.g., heart occlusion) and also the fuzzy lung boundary, the independently detected landmarks in the previous step might be wrongly detected or even missed. Therefore, we exploit the spatial correlations among these landmarks to handle the potential wrong detections. Suppose is the entire landmark set. To verify whether landmark is correctly detected, we use other landmark subset (with cardinality ) to determine its correctness. In the training stage, for each landmark , we will train a set of linear regression models using every landmark subset of other landmarks. Each linear regression model is trained to predict the spatial location of landmark according to three other landmarks. It can be trained by minimizing the following least square problem: (2) where is the linear regression matrix that predicts landmark using landmarks in the subset , is the coordinate of manually annotated landmark

1766


in the th training image, is the concatenating vector of coordinate of manually annotated landmarks belonging to the subset in the th training image, and is the number of training images. Along with each linear regression model, we also compute a covariance matrix across training images to measure the training error of linear prediction. In the testing stage, a set of learned regression models is used to determine whether detected landmark is an outlier. Specifically, we evaluate the confidence of detected landmark at each linear regression model as follows:

represented by their counterparts in the training images, as formulated below (4) where denotes landmark, denotes the concatenation vector of coordinate of the 14 detected landmarks (after outlier correction) in the new lung image, is an affine transformation matrix estimated from detected landmark positions to the mean landmark shape (i.e., ), is the dictionary containing the column vectors (of all training images), each of which representing the coordinates of the corresponding 14 landmarks in each training lung image, (or ) is the sparse coding vector, and is a coefficient controlling the degree of sparsity. The last term denotes -norm. Based on the estimated sparse code , the initial global shape for a new lung image can be inferred as the linear combination of corresponding training shape instances

(3) where , tected dence imum

is the coordinate of detected landmark is the concatenating vector of coordinate of delandmarks in landmark subset . The final confiof detected landmark is determined by the maxconfidence of all linear regression models, i.e., . The reason for using the maximum confidence as the final confidence is that there might be wrongly detected landmarks (outliers) involved in the prediction. In such case, a correctly detected landmark may have low confidence in linear regression models contaminated by outliers. However, as long as there are correct landmarks in , the maximum operator will ensure that a correctly detected landmark will not be labeled as an outlier. Besides, there is a chance for several outliers to happen to construct a reasonable spatial configuration. However, this situation is quite rare in practice, according to our empirical experience. Since the confidence value is computed using a conditional normal distribution, it represents the possibility of the detected landmark being correctly located. We choose a confidence threshold as 0.05. If the final confidence of a detected landmark is below 0.05, which indicates that an unlikely event happens, we regard this detected landmark as an outlier. If a detected landmark is regarded as an outlier, we use the corresponding learned regression models to re-predict its location for outlier correction. Specifically, the confidence of any pixel in the image to be selected as the correct location of landmark can be determined by the average of confidences from all associated linear regression models, i.e., . The pixel with the highest confidence is determined as the prediction location of landmark . 3) Initial Shape Inference by Sparse Shape Composition: Up to this stage, the robust landmark localization is achieved according to the appearance and spatial correlation of landmarks. Based on these detected landmarks, an initial shape model can be inferred by sparse shape composition [14]. Specifically, the finally detected landmarks of a new lung image can be sparsely

(5) where is the estimated initial global shape for the new lung image, and is a (global) shape dictionary with more explanations given in Section II-C1. C. Sparse Learning for Shape Modeling Shape priors are important to deformable segmentation, especially when low-level boundary information is weak and misleading. To capture the large lung shape variation, we propose a local SSC for the lung shape modeling. In this section, we first briefly analyze the limitation of the conventional SSC, and then introduce our local SSC. 1) Limitation of the Conventional SSC: Recently, SSC [14] gets popular by using sparse representation [(6)] of existing training shapes to globally constrain a new shape (6) where is a global shape to be represented, is the affine transformation matrix estimated from to the mean shape , i.e., , denotes a dictionary containing global shapes [as also used in (5)], is the sparse code of the target shape to be estimated, is again a coefficient for controlling the sparsity of , and is a -norm to ensure that only a few dictionary elements are selected in the representation. Here, both and elements of dictionary are the global lung shapes. This representation can successfully overcome the parametric Gaussian assumption of the traditional PCA shape modeling, and has shown better capability to preserve the shape details. However, the conventional SSC reconstructs the shape by selecting a few globally similar training shapes. In the lung field segmentation, the lung shape variation is inhomogeneous (i.e., the shape variation on the base of lung is much larger than that on the apex of lung, especially for the left lung, as shown in Fig. 2). Therefore, the conventional SSC, if applied to the whole lung shape, tends to focus on the shape parts with larger variation modes and thus ignore the parts with smaller but important variation modes (Fig. 7).


Fig. 7. Limitation of conventional global sparse shape composition. Cyan, green, and blue points represent the deformed shape, the refined shape by the conventional global SSC, and the manual ground-truth, respectively. As shown in the red rectangle areas, the cyan deformed shape (left image) has been driven on the boundary, while the green refined shape (right image) by the global SSC pulls it away from the boundary.

To overcome the limitation of the conventional SSC, we propose a local SSC to model the lung shape. The key idea is to decompose the shape into segments, each with homogenous variations, and build local shape dictionary on each of them separately. As each segment has more consistent shape variations, the local shape dictionaries can better capture the local shape statistics, hence, better constraining the deformable model in the testing stage. 2) Division of Variation-Homogenous Shape Segments: In order to better capture local shape priors, the lung shape should be divided into segments that have consistent local shape variations. Specifically, let denote training shapes. Each training shape consists of man, where ually annotated points, i.e., denotes the column vector of 2-D coordinate of point in the training shape , and is the total number of points. We employ affinity propagation [28] to cluster points of all training shapes, by using the point similarity defined in (7) and (8) in Algorithm 1. It is worth noting that the point similarity is defined with both the consistency of position variations and the spatial distances between points. In this way, we can ensure that the model will be decomposed into spatially continuous segments (e.g., segments), each with consistent shape variation. The deformable shape is then decomposed into local shape

1767

Fig. 8. Illustration of overlapping shape segments. Red, green, blue, and cyan points outline the shape segments generated by Algorithm 1. Dark blue curves indicate the overlapping shape segments. For simplicity, this figure takes four shape segments as an example.

segments, i.e., . Each shape segment consists of points, i.e., . The details are shown in Algorithm 1. Since shape segments generated by Algorithm 1 are nonoverlapping and independently constrained by different local shape dictionaries (as indicated by the red, green, blue and cyan points in Fig. 8), the direct combination of refined local shape segments is likely to generate unsmooth boundaries between nearby shape segments. To alleviate this problem, we make the adjacent shape segments overlap with each other by extending each clustered shape segment to include several adjacent points (as indicated by the dark blue curves in Fig. 8). 3) Sparse Shape Composition on Local Shape Segments: Once the overlapped shape segments are generated, a local shape dictionary is learned on each segment. Specifically, in the training stage, we extract the same segment from all training shapes, followed by an alignment using Procrustes analysis [29], [30]. Then, the aligned shape segments, i.e., , are column-wisely stacked to form a local shape dictionary , i.e., . In the testing stage, given an intermediate deformed shape, we use the learned local shape models to refine it. Specifically, the shape refinement is performed in a segment-to-segment fashion. For segment , sparse representation (6) is employed to refine the deformed segment with a corresponding reconstructed segment from the local shape dictionary . Then, all independently refined shape segments are combined into an entire

1768

shape. For points belonging to multiple shape segments, their positions are determined by the average of all overlapping parts of shape segments. It should be noted that the number of shape segments in Algorithm 1 is not fixed and can be adaptively changed. If , the local SSC becomes a special case, i.e., the conventional global SSC. By gradually increasing , we model the shape in a hierarchical way, which facilitates a coarse-to-fine deformation strategy (see Section II-E for details.) D. Local Appearance Model With Sparse Representation Similar to the local shape models, we also build appearance models on local lung boundary segments. In the training stage, we first use a clustering method to divide the lung boundary into appearance-consistent segments. Since each segment has relatively homogeneous appearance, it is easier to model their appearance characteristics. Specifically, a discriminative dictionary learning method is adopted to learn a local appearance model on each segment. Also, we employ a soft sparse representation classification method to address the ambiguous boundary. In the testing stage, the learned appearance models are used to effectively drive the deformable model to the lung boundary. It should be noted that the dictionary learning for shape model and appearance model is essentially different. The shape model aims to represent a new shape instance with the same class samples (i.e., a one-class reconstruction problem), hence needs a representative dictionary. However, the appearance model tends to distinguish one class from the other class (i.e., a two-class classification problem), hence needs a discriminative dictionary. The discriminative dictionary learning will be introduced in Section II-D2. But the details of local appearance model are first presented in the following Section II-D1. 1) Division of Appearance-Consistent Boundary Segments: To divide the boundary into several spatially continuous segments with appearance consistency, we use affinity propagation to cluster points along the lung boundary (Algorithm 2). Specifically, histogram of oriented gradients (HOG) [31], extended


Haar wavelets (Haar) [26] and local binary patterns (LBP) [32] features are adopted to capture the local appearance characteristics around each point. The similarity between points is defined by appearance characteristics and spatial distance (as in (9)–(10) of Algorithm 2). In this way, we can cluster the lung boundary into segments, i.e., , each of which has the consistent appearance characteristics. By building different local appearance models on each appearance-consistent segment, the appearance inhomogeneous problem is largely relieved. Hence, the learned local appearance models can better capture the local appearance characteristics than a global appearance model. It should be noted that the segments in appearance model are different from those of shape model (Section II-C) due to the different clustering criterion. This actually makes the proposed method more adaptive to the variation of shape and appearance. 2) Discriminative Dictionary Learning: To distinguish lung tissues from nonlung tissues near the lung boundary, discriminative dictionary is learned to model the appearance characteristics. On each segment, we will learn two dictionaries to model the appearances of lung and nonlung tissues, respectively. For the sake of better discrimination, lung dictionary should better represent pixels inside the lung than those outside the lung. Similarly, nonlung dictionary should have better representation power to pixels outside the lung than those inside the lung. To accomplish this, we use a supervised way to learn dictionaries. Specifically, we first sample pixels near the segment on each training image as the training samples. These training samples are represented by HOG, Haar and LBP features extracted from the patch centered at the sampled pixel. Then, we adopt Fisher Separation Criterion (FSC) [33] to rank the features according to their discriminative ability. Based on the ranking, we select top features as the discriminative feature representation of each sample. Finally, to reduce the redundancy of the dictionary, we employ K-means dictionary learning [34] to learn a compact discriminative dictionary for each class based on the selected top discriminative


features. Compared with other advanced dictionary learning methods (e.g., K-SVD [35]), K-means dictionary learning is more capable for preserving the distance between training samples of different classes [34]. 3) Soft Sparse Representation Based Classification: Once the lung and nonlung dictionaries and are learned, in order to obtain the likelihood of a pixel belonging to the lung, we extend the conventional sparse representation based classification (SRC) [36] to a soft version. For each testing pixel, we extract the same Haar, HOG and LBP features in its neighborhood as patch-based feature representation of this pixel. In the conventional SRC, the classification problem is formulated as a sparse representation problem (11) where indicates the combination of lung and nonlung appearance dictionaries, is the discriminative patchbased feature representation of a testing pixel after feature selection, is the sparse code of the testing pixel, and is the coefficient that controls sparsity. By solving the sparse representation, we can get the sparse coefficient , where and are the sparse coefficients for lung and nonlung dictionaries. These two vectors of sparse coefficients can be used to reconstruct the testing pixel and to obtain the representation residues with respect to both classes (12) In the conventional SRC, the class with lower representation residue (i.e., -norm) is adopted as the class label for the testing pixel. However, in the ambiguous area (e.g., the bottom boundary of left lung), hard classification may produce large error. To alleviate this problem, we adopt a logistic regression model [37] to predict the lung likelihood based on the combined residual features (e.g., ). The logistic regression model is learned by minimizing the following energy function:

(13) where and are the linear coefficients of logistic regression model, is the combined residue of a testing pixel (i.e., represents the coordinate of the testing pixel), is the label of the testing pixel (where denotes a nonlung pixel and denotes a lung pixel), is the total number of training pixels, is the regularization weight,

1769

and is the regularization term to avoid overfit of the regression. After learning the optimal parameters and , we can use the logistic function to predict the likelihood of a testing pixel at belonging to the lung (14) where is the combined residue of the testing pixel at . 4) Local Appearance-Guided Deformation: Given a testing image and initial lung shape, the local appearance models constructed in the training stage are used to guide the shape deformation. Specifically, to deform a point to the lung boundary, we search a local range along both inward and outward surface normal directions of the point. A position with the most similar profile pattern to the boundary profile pattern is chosen as the deformed position of this point. Mathematically, we search a position that minimizes (15), shown at the bottom of the page, where is the 2-D coordinate of point , is the outward normal direction at point , is the position offset to be optimized in either inward or outward surface normal direction, is the search range, is the likelihood profile vector centered at position along both inward and outward surface normal directions, is the lung likelihood of the pixel at estimated by the soft SRC (14), is the half length of likelihood profile, is the desired boundary profile pattern with the first entries as 1 and the last entries as 0. Each point in the shape model is independently deformed. After deformation of each point, the deformed shape will be refined by the proposed local sparse shape models (Section II-C). E. Hierarchical Deformable Segmentation Framework To segment the lung field robustly and accurately, we propose a hierarchical deformable segmentation framework to integrate the scale-dependent shape and appearance information into different segmentation stages. In the hierarchical framework, the segmentation is first carried out in the coarsest scale. Then, the segmentation result is used as initialization to the next fine scale. The major principle under the hierarchical framework is that the coarse-scale segmentation concentrates on robustness, while the fine-scale segmentation focuses on accuracy. To achieve this goal, we adaptively adjust the following three parameters at different deformation stages. • The patch size for appearance feature computation (Section II-D2) gradually decreases. The larger patch size in early stages ensures that rich context appearance information is captured, thus providing robustness in

(15) (16) (17)

1770


III. EXPERIMENTS

Fig. 9. Basic Haar-like filters used in this paper. Red and blue squares indicate the 2-D rectangle functions with positive and negative polarities, respectively. Squares with dashed border represent the empty areas.

tissue differentiation, while the smaller patch size in the fine stages focuses on the local details of structures, thus providing accuracy in tissue differentiation. • The deformation search range (Section II-D4) gradually decreases. The longer search range in the early stages allows the model to deform to the lung boundaries even if its initial position is far from there. The smaller search range in the fine stages can precisely adapt the deformable model to the lung boundary, which reduces the chance of being distracted by the nearby noisy structures. • The number of shape segments (Section II-C2) gradually increases. Initially, only global SSC is adopted to limit the refined shape in a small shape space in order to prevent the segmentation failure. In the fine stages, as the deformation approaches to the target lung boundary, we can use more shape segments to better model the local shape details for shape refinement. With this variable number of shape segments, we can establish a hierarchical SSC to model the lung shape robustly and accurately. The details of the hierarchical deformable segmentation framework are shown in Algorithm 3 above.

In this section, two sets of experiments are conducted to evaluate the proposed method on a public dataset, Japanese Society of Radiological Technology (JSRT) [38]. One set of experiments below is employed to validate the contribution of each component presented in our method. Another set of experiments is conducted to compare our method with several state-of-the-art lung field segmentation methods. The JSRT dataset consists of 247 PA chest radiographs with 154 normal samples and 93 abnormal samples evenly located in two folds. The original radiographs are of 2048 2048 pixels, 0.175 mm/pixel, and 12-bit gray levels. Totally 94 boundary points (50 on left lung and 44 on right lung) are manually annotated and connected to form the ground-truth of each image. Like most literatures [7], [13], [39], we down-sample the original radiographs to a size of 256 256 pixels for our experiments, and thus the pixel size becomes . To quantitatively evaluate our proposed method, we employ six commonly used metrics. 1) Accuracy (Acc) is the proportion of correctly labeled pixels in the entire testing image (18) is the number of correctly labeled lung pixels, where is the number of falsely labeled lung pixels (i.e., label background pixels as lung pixels), is the number of correctly labeled background pixels, is the number of


1771

Fig. 10. Segmentation results on three different subsets of images. First, third, and fifth rows show the original images, and the second, fourth, and sixth rows show the segmentation results. Green points indicate the automatic segmentations of the proposed method and the blue points denote the manual ground-truth. Numbers in the second, fourth, and sixth rows are the overlap scores and ACD between automatic segmentations and manual ground-truths, respectively. The unit of ACD is mm.

falsely labeled background pixels (i.e., label lung pixels as background pixels). 2) Specificity (Spec) is the proportion of correctly labeled pixels in the background region

3) Sensitivity (Sens) is the proportion of correctly labeled pixels in the lung fields

(19)

(20)

1772


Fig. 11. Comparison of the segmentation results between mean shape and inferred shape based initializations. Image (a) shows the initial positions of the mean shape (red curve) and inferred shape (green curve), respectively. Manual ground-truth is displayed as the blue curve. Image (b) shows segmentation results using the mean shape (red curve) and inferred shape (green curve) as initializations under ASM framework, respectively. Image (c) shows segmentation results using the mean shape (red curve) and inferred shape (green curve) as initializations under the proposed framework, respectively.

TABLE I QUANTITATIVE COMPARISON BETWEEN INITIALIZATIONS WITH THE MEAN SHAPE AND THE INFERRED SHAPE, UNDER ASM AND THE PROPOSED FRAMEWORK, BY USING OVERLAP SCORE. P-VALUE OF TWO-SAMPLE T-TEST ON OVERLAP SCORES BETWEEN EACH OTHER METHOD AND THE “INFERRED SHAPE OUR METHOD” SHOWS THAT THE IMPROVEMENT BY THE “INFERRED SHAPE OUR METHOD” IS STATISTICALLY SIGNIFICANT AT THE SIGNIFICANCE LEVEL OF 0.05

TABLE II QUANTITATIVE COMPARISON BETWEEN INITIALIZATIONS WITH THE MEAN SHAPE AND THE INFERRED SHAPE, UNDER ASM AND THE PROPOSED FRAMEWORK, BY USING ACD. THE UNIT IS MM. P-VALUE OF TWO-SAMPLE T-TEST ON ACD BETWEEN EACH OTHER METHOD AND THE “INFERRED SHAPE OUR METHOD” SHOWS THAT THE IMPROVEMENT BY THE “INFERRED SHAPE OUR METHOD” IS STATISTICALLY SIGNIFICANT AT THE SIGNIFICANCE LEVEL OF 0.05

4) Overlap score is a similarity measure between the segmented lung fields and the manual ground-truth (21) 5) Dice similarity coefficient (DSC) [40] is another overlap measure between the segmented lung fields and the manual ground-truth (22) 6) Average curve distance (ACD) is the average distance between the boundary of segmented lung and that of the manual ground-truth

(23) where is the minimum distance of pixel of the segmented lung boundary to the pixels of groundtruth boundary , is the minimum distance

to the pixels of pixel of the ground-truth boundary of segmented lung boundary , and is the cardinality of a set. In the experiments, we use two-fold cross-validation to evaluate the performance of our method. Specifically, the entire dataset is evenly divided into two sets. In each round of crossvalidation, one set is used for training, and the other set is used for testing. Thus, totally we use two rounds. Then, the results of the two sets are averaged to obtain the final results. The degree of the sparsity (lambda) is determined by four-fold cross-validation in each training set with grid search. Specifically, the search range is set from 10 to 100, with step size 5. Other parameters used in our experiments are listed as follows. • The number of landmarks for left and right lungs is 8 and 6, respectively. • The number of selected features for the discriminant dictionary learning is 200. • The number of variation-homogenous shape segments increases gradually from 1, 2 to 4. • The number of appearance-consistent boundary segments is set to 10.


1773

Fig. 12. Box plots of the overlap scores for four different methods using single-resolution and multi-resolution strategies. Light blue indicates the results using single-resolution implementation, while dark blue indicates the results using multi-resolution implementation.

• The half length of likelihood profile is set as 15 pixels. • The original patch size for feature filter is 27. • The original search range is 6. • The number of scale is 3. • The maximum number of iterations in each scale is 20. To efficiently solve the sparse representation problem, we use SPAMS [41], a publicly available sparse modeling tool. The extended Haar features are generated by convoluting the image with the scaled Haar-like filters. The basic Haar-like filters are shown in Fig. 9. Each basic Haar-like filter consists of one or more 2-D rectangle functions with different polarities (24) (25) where is the number of 2-D rectangle functions, which is set to 1 or 2 in this paper, and represents a basic 2-D rectangle function. and are the polarity and center of the th 2-D rectangle function, respectively. In this paper, the coefficient for scaling each basic Haar-like filter is set as 3 and 5, respectively. Besides, the number of histogram bins for HOG features is set as 9, and the number of HOG cells is set as 1. Also, we use uniform and rotation invariant LBP features with eight neighbors, and each feature is extracted in three resolution levels. A. Qualitative Segmentation Results To qualitatively evaluate the proposed method, we list several segmentation results in Fig. 10. In this figure, the first and second rows show images with large shape variations and their corresponding segmentation results, respectively. The third and fourth rows show images with ambiguous boundaries and their corresponding segmentation results, respectively. The fifth and

sixth rows show normal images and their corresponding segmentation results, respectively. It can be observed from Fig. 10 that the proposed method not only achieves good performance on normal cases (the last two rows), but also reasonable segmentation results for the images with large shape variation (the first two rows) and ambiguous boundaries (the middle two rows). These results qualitatively prove the effectiveness of the proposed method. It is noted that there is a challenging case (the image at the second column of the middle two rows) in our dataset, where the appearance of lung is ambiguous due to a big stomach caused by gas below the left lung, thus leading to the only failure for the proposed method. B. Initialization With Mean Shape Versus Inferred Shape In this section, to evaluate the shape initialization component, we compare the conventional initialization with the mean shape of training images (i.e., the mean shape) and our new initialization with the inferred shape from sparse shape composition in Section II-B (i.e., the inferred shape). Specifically, we employ the mean shape and the inferred shape as the initial shape of each testing image to segment the lung fields under both ASM and the proposed framework, respectively. Note that all methods in this section are implemented in single resolution. Initialization of Mean Shape Model: To initialize the mean shape, we adopt an exhaustive search strategy to find an optimal transformation that can bring the mean shape onto the testing image. The entire procedure can be divided into two steps: 1) generate candidate transformations, and 2) search for the optimal transformation with the best model matching degree. In the first step, we generate a set of candidate transformations by combining different translations, rotations and scales. Specifically, to test different translations, we first determine a tight bounding box that covers all lung centers of the training images. Then, by sampling any point in the bounding box, we can obtain a

1774

candidate translation that can place the mean shape onto the testing image. After obtaining any translation, we allow the mean shape to rotate clock-wisely from 45 to 45 with a step size of 15 . Once both translation and rotation are sampled, the mean shape can also be scaled differently, i.e., from 0.5 to 1.5 with a step size of 0.25. By combining all these candidate translations, rotations and scales, we can get a set of candidate transformations. In the second step, we use the Gaussian appearance models learned in the training stage to evaluate the matching degree of each candidate transformation. Specifically, we align the mean shape onto the testing image by using a candidate transformation. Then, for each model point, we compute its matching degree as the probability of its local profile (extracted from the testing image based on the transformed mean shape) in the respective Gaussian appearance model (1). The overall matching degree of the transformed mean shape is defined as the summation of matching degrees from all model points. Finally, a transformation with the best matching degree is regarded as the optimal transformation, which can be finally used to initialize the mean shape on the testing image. Fig. 11 shows the initial shapes and their segmentation results, by using the mean shape and the inferred shape as initializations, respectively. It can be observed from Fig. 11(a) that the initial position of the mean shape (red curve) is far away from the manual ground-truth, while the initial position of the inferred shape (green curve) is much closer to the manual ground-truth (blue curve). After deformable segmentation, we can see from both Fig. 11(b) and (c) that the segmentation results using the mean shape as initialization (red curve) under both ASM and the proposed framework are inaccurate since they can be easily distracted by nearby structures, while the segmentation results using the inferred shape as initialization (green curve) overlap well with the manual ground-truth (blue curve), especially at the bottom half of the left lung, which is usually the most challenging part to segment accurately. To quantitatively analyze the difference of segmentations with the respective use of mean shape and inferred shape as initialization, both overlap score and ACD between automatic segmentation and manual ground-truth segmentation are computed. As shown in Tables I and II, the initialization with the inferred shape can achieve higher overlap score and lower ACD than that with the mean shape under both ASM and the proposed framework, indicating the effectiveness of using the inferred shape in shape initialization. C. Single-Resolution Versus Multi-Resolution As mentioned above, deformable model is sensitive to the initialization. Since the mean shape based initialization is usually far from the real lung boundary, the corresponding deformable segmentation tends to be trapped in the local minima, thus resulting in the low performance, as shown in Tables I and II of Section III-B. In the literature, multi-resolution strategy is often adopted to alleviate this limitation [42]–[44]. To compare the differences of single-resolution and multi-resolution strategies in deformable segmentation, in this section, we evaluate the performances of the conventional ASM and our pro-


Fig. 13. Comparison between PCA, global SSC and local SSC shape representations in deformable segmentation. Cyan, green, and blue curves represent the deformed shape, the refined shape, and the manual ground-truth. Images (a) and (b) are the finally deformed and refined shapes using PCA, respectively. Images (c) and (d) are the finally deformed and refined shapes using global SSC, respectively. Images (e) and (f) are the finally deformed and refined shapes using local SSC, respectively.

posed method using single-resolution and multi-resolution implementations with both the mean shape and our inferred shape as initializations (similar to Section III-B). Specifically, for the single-resolution implementation, each algorithm runs only in the finest resolution level (256 256 pixels). For the multi-resolution implementation, each algorithm runs in three resolution levels sequentially. Here, two down-sampled resolution levels (i.e., ) are employed, besides the finest resolution level. With these three resolution levels, we can build an appearance model for each resolution but a shape model for all resolutions. In the testing stage, each algorithm first uses the mean shape or our inferred shape as the initial shape for the lowest resolution level, and then employs the segmentation result of a lower resolution level as the initial shape of a next higher resolution level. In each resolution level, ASM or our proposed method is employed to segment the testing image until either the maximum number of iterations is reached or the shape difference between two subsequent iterations is less than a threshold. Note that, except in this section, the proposed method and ASM are implemented using only single resolution.


1775

TABLE III QUANTITATIVE COMPARISON BETWEEN PCA, GLOBAL SSC AND LOCAL SSC USING OVERLAP SCORE. P-VALUES OF TWO-SAMPLE T-TEST ON OVERLAP SCORES BETWEEN PCA AND LOCAL SSC, AND BETWEEN GLOBAL SSC AND LOCAL SSC, SHOW THAT THE IMPROVEMENT OF THE LOCAL SSC IS STATISTICALLY SIGNIFICANT AT THE SIGNIFICANCE LEVEL OF 0.05

TABLE IV QUANTITATIVE COMPARISON BETWEEN PCA, GLOBAL SSC AND LOCAL SSC USING ACD. UNIT IS MM. P-VALUES OF TWO-SAMPLE T-TEST ON ACD BETWEEN PCA AND LOCAL SSC, AND BETWEEN GLOBAL SSC AND LOCAL SSC, SHOW THAT THE IMPROVEMENT OF THE LOCAL SSC IS STATISTICALLY SIGNIFICANT AT THE SIGNIFICANCE LEVEL OF 0.05

Fig. 14. Comparison of local appearance model and Gaussian-based appearance model. Images (a) and (b) show the segmentation results of local appearance model and Gaussian-based appearance model, respectively. Blue and green curves denote the manual ground-truth and the segmentation result, respectively.

It is worth noting the difference between multi-resolution strategy and our hierarchical strategy Section II-E. In multi-resolution strategy, each testing image is typically down-sampled into different resolutions for segmentation, with the segmentation results of the coarse resolution to help initialize the segmentation of the next fine resolution. In our hierarchical strategy, instead of down-sampling the testing image, we adopt different patch sizes, search ranges, and numbers of shape segments for our model to segment the target object. Actually, these two strategies can be combined together to achieve better segmentation. That is, within each resolution or down-sampled testing image, our hierarchical strategy can be applied by using different patch sizes, search ranges and numbers of shape segments. Fig. 12 shows the segmentation results of mean , inferred , mean method, and inferred method using single-resolution and multi-resolution strategies, respectively. It can be observed that the multi-resolution strategy achieves better performance than the single-resolution strategy for both ASM and our proposed method. Besides, with multi-resolution implementation, all deformable segmentation methods using the mean shape as

initialization obtain a large performance improvement. This is because the mean shape based initialization tends to result in lower segmentation performance, as discussed in the beginning of this section. The multi-resolution implementation can largely alleviate this limitation of the mean shape based initialization in a coarse-to-fine way, and thus achieve more accurate segmentation result. Compared with the mean shape based initialization, our inferred shape based initialization in the single-resolution implementation has already achieved a relatively reasonable segmentation (Fig. 12). Therefore, the performance improvement with the multi-resolution implementation is much smaller. However, as shown in Fig. 12, no matter in single-resolution and multi-resolution strategy, our inferred shape based initialization can still achieve better performance (along with less performance variation) than the mean shape based initialization. From Fig. 12, we can also see that, by combining the multi-resolution strategy and hierarchical strategy together, our method’s performance can be increased (see the right-most two columns). However, due to the limited performance improvement and also the increased computational time of multi-resolution strategy, we finally use only the hierarchical

1776


TABLE V QUANTITATIVE COMPARISON BETWEEN GAUSSIAN-BASED APPEARANCE MODEL AND LOCAL APPEARANCE MODEL USING OVERLAP SCORE. P-VALUE OF TWO-SAMPLE T-TEST ON OVERLAP SCORES BETWEEN GAUSSIAN-BASED APPEARANCE MODEL AND LOCAL APPEARANCE MODEL SHOWS THAT THE IMPROVEMENT BY THE LOCAL APPEARANCE MODEL IS STATISTICALLY SIGNIFICANT AT THE SIGNIFICANCE LEVEL OF 0.05

TABLE VI QUANTITATIVE COMPARISON BETWEEN GAUSSIAN-BASED APPEARANCE MODEL AND LOCAL APPEARANCE MODEL USING ACD. UNIT IS MM. P-VALUE OF TWO-SAMPLE T-TEST ON ACD BETWEEN GAUSSIAN-BASED APPEARANCE MODEL AND LOCAL APPEARANCE MODEL SHOWS THAT THE IMPROVEMENT BY THE LOCAL APPEARANCE MODEL IS STATISTICALLY SIGNIFICANT AT THE SIGNIFICANCE LEVEL OF 0.05

Fig. 15. Comparison of hierarchical deformable segmentation framework and single-scale deformable segmentation framework. Blue and green points denote the manual ground-truth and automatic segmentation result, respectively. Image (a) shows the segmentation result by the hierarchical segmentation framework. Images (b) and (c) show the segmentation results by using the coarse-scale deformable segmentation framework and the fine-scale deformable segmentation framework, respectively.

strategy with the single-resolution implementation for our proposed method in the rest experiments. D. Comparison of Shape Representation With PCA, Global SSC, and Local SSC To justify the statement that the performance of local SSC is better than performances of both global SSC and PCA in refining the deformed shape, we perform a comparative experiment in this section. In the experiment, we change the shape representation of the proposed method from local SSC to either global SSC or PCA, with all other components remaining the same. In this way, the differences between local SSC and global SSC or PCA can be effectively evaluated. Specifically, PCA is used to capture 98% shape variations in the experiment. The global SSC applies the sparse shape composition on the whole lung shape during segmentation, while the local SSC applies the sparse shape composition on shape segments, with the number of shape segments gradually increasing at later segmentation stages. The total number of iterations used in the experiment is set to 20, which is enough for convergence of algorithm as observed by us.

A visual comparison between the segmentation results of the same image by PCA, global SSC and local SSC is demonstrated in Fig. 13. The cyan and green curves represent the finally deformed and refined shapes. Fig. 13(b) and (d) shows that the refined shapes (green) by PCA and global SSC cannot fit to the local details, i.e., a deep concave on the right lung boundary as indicated by the red rectangle, although the deformed shape (cyan) in Fig. 13(a) and (c) have been driven to the concave by appearance cues. On the other hand, in Fig. 13(f), the refined shape (green) of local SSC fits well to the concave in a segment-to-segment manner and thus leads to more accurate segmentation. To quantitatively compare local SSC with global SSC and PCA, the overlap score and ACD metrics between automatic segmentation and the manual ground-truth are computed. Table III shows that the overlap score of local SSC is higher than those of global SSC and PCA . Table IV illustrates that ACD of local SSC is much smaller than those of global SSC and PCA ( ). The higher overlap score of local SSC indicates the higher


1777

TABLE VII QUANTITATIVE COMPARISON BETWEEN COARSE-SCALE, FINE-SCALE, AND HIERARCHICAL DEFORMABLE SEGMENTATIONS USING OVERLAP SCORE. P-VALUES OF TWO-SAMPLE T-TEST ON OVERLAP SCORES BETWEEN COARSE-SCALE OR FINE-SCALE AND HIERARCHICAL DEFORMABLE SEGMENTATION SHOW THAT THE IMPROVEMENT BY THE HIERARCHICAL DEFORMABLE SEGMENTATION IS STATISTICALLY SIGNIFICANT AT THE SIGNIFICANCE LEVEL OF 0.05

similarity between the segmented lung field and the manual ground-truth. The lower ACD of local SSC also shows that the segmented lung field boundary is much closer to the manual ground-truth. Therefore, local SSC can achieve higher segmentation performance than both global SSC and PCA, when employed to refine the deformed shape. E. Contribution of Local Appearance Model In this section, to further validate the contribution of our local appearance model, we compare the local appearance model with the Gaussian-based appearance model of conventional ASM. Specifically, under our proposed framework, only the local appearance model based on the discriminative features is replaced by the Gaussian-based intensity appearance model of conventional ASM. Fig. 14 demonstrates the segmentation results of the same image by local appearance model and Gaussian-based appearance model, respectively. It can be observed that, under the same shape constraints, the Gaussian-based intensity appearance model drives a few parts of the deformable shape onto the target boundary, with the most parts still remaining in the vicinity of the boundary. On the other hand, the local appearance model can drive the deformable shape mostly onto the target boundary. The incorrect segmentation of Gaussian-based intensity appearance model is mainly due to the fact that the appearance model actually does not fit the Gaussian distribution; however, the strategy used to select candidate boundary position (during the shape deformation) is still dependent on the Gaussian-based metric. In addition, some ambiguous parts of the boundary also contribute to the incorrect segmentation, i.e., distracting the deformation model from the true boundary. In contrast, our local appearance model employs pixel-wise classification on a discriminative feature space in a nonparametric manner, hence effectively driving the deformable shape onto the target boundary. To quantitatively analyze the difference between the local appearance model and the Gaussian-based intensity appearance model, we compute both overlap score and ACD between the segmentation result and the manual ground-truth. As Tables V and VI show, the local appearance model achieves an obvious improvement in both overlap score and ACD, compared with the Gaussian-based appearance model. Note that the pixel size in our down-sampled image is 1.4 mm/pixel. As Table VI illustrates, the ACD values of local appearance model and Gaussianbased appearance model are 1.669 and 3.727 mm, respectively. Thus, the improvement in boundary localization by the local appearance model mm mm mm mm is more than one pixel on average. It is clear that

the local appearance model outperforms the Gaussian-based appearance model. F. Importance of Hierarchical Deformable Segmentation Framework To study the role of hierarchical deformable segmentation framework, we compare the hierarchical deformable segmentation framework and single-scale (coarse-scale or fine-scale) deformable segmentation framework. To visually evaluate the role of the hierarchical deformable segmentation framework, Fig. 15 demonstrates the segmentation results of the same image with hierarchical and single-scale deformable segmentation frameworks, respectively. It can be observed that the coarse-scale deformable segmentation framework achieves a robust, but inaccurate segmentation result, which is only similar to the manual ground-truth in overall shape, but not in local shape, as shown in the red rectangle of Fig. 15(b). While, the fine-scale deformable segmentation framework achieves accurate segmentation at most parts, but very bad segmentation at some part, as shown in the red rectangle of Fig. 15(c). On the other hand, the hierarchical deformable segmentation framework achieves both robust and accurate segmentation result, which is more similar and closer to the true boundary, compared with results of both coarse- and fine-scale deformable segmentation frameworks, as shown in the red rectangle of Fig. 15(a). Tables VII and VIII show the quantitative segmentation results of hierarchical segmentation framework, coarse-scale segmentation and fine-scale segmentation frameworks on all patient images. From these two tables, we can see that the hierarchical deformable segmentation framework achieves a better segmentation result than others, by taking advantage of both the robustness of coarse scale and the accuracy of fine scale during the segmentation procedure. G. Comparison With Other State-of-the-art Methods To justify the effectiveness of our proposed method in lung field segmentation, we compare the proposed method with state-of-the-art lung field segmentation methods [1], [2], [7], [10]–[14], [39], [45]. The performances of all methods are summarized in Tables IX and X. Since some methods evaluate the segmentation performance on both lungs, and others only on either left or right lung, we report the segmentation results on both lungs in Table IX, and the segmentation results of left or right lung in Table X. Besides, since the metrics used in different methods are inconsistent in their publications, for effective comparison, we list only the mainly reported metrics of those methods in Tables IX and X. Also, the best performances

1778


TABLE VIII QUANTITATIVE COMPARISON BETWEEN COARSE-SCALE,FFINE-SCALE, AND HIERARCHICAL DEFORMABLE SEGMENTATIONS USING ACD. UNIT IS MM. P-VALUES OF TWO-SAMPLE T-TEST ON ACD BETWEEN COARSE-SCALE OR FINE-SCALE AND HIERARCHICAL DEFORMABLE SEGMENTATION SHOW THAT THE IMPROVEMENT OF THE HIERARCHICAL DEFORMABLE SEGMENTATION IS STATISTICALLY SIGNIFICANT AT THE SIGNIFICANCE LEVEL OF 0.05

TABLE IX QUANTITATIVE COMPARISON OF THE PROPOSED METHOD AND OTHER MTHODS ON BOTH LEFT AND RIGHT LUNGS. NA INDICATES THAT THE RESPECTIVE METRIC IS NOT REPORTED IN THE PUBLICATION. UNIT OF ACD IS MM. UNIT OF COMPUTATION TIME IS SECOND. PC REPRESENTS PIXEL CLASSIFICATION

TABLE X QUANTITATIVE COMPARISON OF THE PROPOSED METHOD AND OTHER METHODS ON LEFT OR RIGHT LUNG

of [10] and [2], [45] are listed in Table IX. Note that all methods of [1], [2], [7], [10], [12], [13], [39], [45] are evaluated on the same data set (JSRT) as our proposed method. It can be observed, from Tables IX and X above, that the proposed method achieves the competitive segmentation results to all automatic segmentation methods under comparison. Specifically, among these methods, Zhang et al. [14] achieves a similar segmentation accuracy (0.970) as our method (0.975) on right lung; however, on the left lung, their segmentation accuracy (0.925) is much lower than our method (0.969). This may be due to the fact that the shape variation of the left lung is larger and also its boundary is more ambiguous than the right lung, hence the segmentation of left lung is more challenging. This is also the reason why the segmentation of the left lung is usually not so accurate. On the other hand, our proposed method can still achieve high DSCs on both right and left lungs, which

reflects the robustness of our proposed method on lung field segmentation. It is worth noting that Ibragimov [45] obtains a comparable segmentation performance to the proposed method, with similar overlap score and slightly better ACD. Also, the proposed method achieves better performance than most of the methods in Ginneken [7] and similar performance to the pixel classification (PC) post-processed and hybrid voting methods in [7]. Compared with the inter-rater agreement reported in [7] and also provided in Table IX, the overlap score and ACD of our proposed method are very close to the human raters. This again proves the accuracy of our proposed method on lung field segmentation. Besides, we also list the computation time of different methods in Table IX. Our algorithm is implemented in MATLAB and run on Intel Xeon E5507 2.27 GHz CPU. For each testing image, the landmark detection and correction take 1.4 s on average, the initial shape inference takes 2.3 s on av-


erage, and the deformable segmentation takes 31.5 s on average. So the total processing time is 35.2 s for segmenting lung fields in one radiograph. IV. CONCLUSION Segmentation is widely employed in image processing [15], [16], [46]–[54] to locate objects and boundaries in the image. In this paper, we have presented a hierarchical lung field segmentation method with joint shape and appearance sparse learning to overcome the challenges caused by ambiguous boundary and large shape variation. In the beginning, a learning-based landmark detection approach is adopted to detect several salient landmarks on the lung boundary. Based on the detected landmarks, an initial shape that can be very close to the target boundary is inferred by SSC. Then, a set of local sparse appearance models are employed to effectively drive the shape deformation. To refine the deformed shape, a local SSC is further proposed to impose the segment-wise shape constraint. Finally, a hierarchical deformable model integrates local appearance model and local SSC to achieve both robust and accurate segmentation results. Experiments on 247 PA chest radiographs of JSRT dataset show that our proposed joint local shape and appearance models outperform the conventional shape and appearance models, respectively. Also, our method is better than most of the state-of-the-art segmentation methods under comparison. REFERENCES [1] T. Xu, M. Mandal, R. Long, I. Cheng, and A. Basu, “An edge-region force guided active shape approach for automatic lung field detection in chest radiographs,” Comput. Med. Imag. Graph., vol. 36, pp. 452–463, 2012. [2] G. Coppini, M. Miniati, S. Monti, M. Paterni, R. Favilla, and E. M. Ferdeghini, “A computer-aided diagnosis approach for emphysema recognition in chest radiography,” Med. Eng. Phys., vol. 35, pp. 63–73, 2013. [3] Y. Shi, F. Qi, Z. Xue, K. Ito, H. Matsuo, and D. Shen, “Segmenting lung fields in serial chest radiographs using both population and patientspecific shape statistics,” in Proc. MICCAI, pp. 83–91. [4] M. S. Brown, L. S. Wilson, B. D. Doust, R. W. Gill, and C. Sun, “Knowledge-based method for segmentation and analysis of lung boundaries in chest X-ray images,” Comput. Med. Imag. Graph., vol. 22, pp. 463–477, 1998. [5] J. Duryea and J. M. Boone, “A fully automated algorithm for the segmentation of lung fields on digital chest radiographic images,” Med. Phys., vol. 22, pp. 183–191, 1995. [6] S. G. Armato, III, M. L. Giger, and H. MacMahon, “Automated lung segmentation in digitized posteroanterior chest radiographs,” Acad. Radiol., vol. 5, pp. 245–255, 1998. [7] B. V. Ginneken, M. B. Stegmann, and M. Loog, “Segmentation of anatomical structures in chest radiographs using supervised methods: A comparative study on a public database,” Med. Image Anal., vol. 10, pp. 19–40, 2006. [8] M. F. McNitt-Gray, H. Huang, and J. W. Sayre, “Feature selection in the pattern classification problem of digital chest radiograph segmentation,” IEEE Trans. Med. Imag., vol. 14, no. 3, pp. 537–547, Sep. 1995. [9] B. v. Ginneken and B. M. t. H. Romeny, “Automatic segmentation of lung fields in chest radiographs,” Med. Phys., vol. 27, pp. 2445–2455, 2000. [10] A. Dawoud, “Fusing shape information in lung segmentation in chest radiographs,” in Image Analysis and Recognition. New York: Springer, 2010, pp. 70–78.

1779

[11] P. Annangi, S. Thiruvenkadam, A. Raja, H. Xu, X. Sun, and L. Mao, “A region based active contour method for X-ray lung segmentation using prior shape and low level features,” in Proc. IEEE Int. Symp. Biomed. Imag.: From Nano to Macro., 2010, pp. 892–895. [12] K. Sohn, “Segmentation of lung fields using Chan-Vese active contour model in chest radiographs,” Proc. SPIE, vol. , 2011, p. 796332. [13] Y. Shi, F. Qi, Z. Xue, L. Chen, K. Ito, H. Matsuo, and D. Shen, “Segmenting lung fields in serial chest radiographs using both populationbased and patient-specific shape statistics,” IEEE Trans. Med. Imag., vol. 27, no. 4, pp. 481–494, Apr. 2008. [14] S. Zhang, Y. Zhan, M. Dewan, J. Huang, D. N. Metaxas, and X. S. Zhou, “Towards robust and effective shape modeling: Sparse shape composition,” Med. Image Anal., vol. 16, pp. 265–277, 2012. [15] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active shape models-their training and application,” Comput. Vis. Image Understand., vol. 61, pp. 38–59, 1995. [16] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 681–685, Jun. 2001. [17] M. B. Stegmann, B. K. Ersboll, and R. Larsen, “Fame–A flexible appearance modeling environment,” IEEE Trans. Med. Imag., vol. 22, no. 10, pp. 1319–1331, Oct. 2003. [18] T. F. Cootes, A. Hill, C. J. Taylor, and J. Haslam, “Use of active shape models for locating structures in medical images,” Image Vis. Comput., vol. 12, pp. 355–365, 1994. [19] I. Jolliffe, Principal Component Analysis. New York: Wiley, 2005. [20] B. M. Kelm, S. K. Zhou, M. Suehling, Y. Zheng, M. Wels, and D. Comaniciu, “Detection of 3D spinal geometry using iterated marginal space learning,” in Medical Computer Vision. Recognition Techniques and Applications in Medical Imaging. New York: Springer, 2011, Lecture Notes in Computer Science, pp. 96–105. [21] N. Lay, N. Birkbeck, J. Zhang, and S. K. Zhou, “Rapid multi-organ segmentation using context integration and discriminative models,” Inf. Process. Med. Imag., pp. 450–462, 2013. [22] L. Wang, F. Shi, G. Li, Y. Gao, W. Lin, J. H. Gilmore, and D. Shen, “Segmentation of neonatal brain MR images using patch-driven level sets,” NeuroImage, vol. 84, pp. 141–158, 2014. [23] D. Breitenreicher, M. Sofka, S. Britzen, and S. K. Zhou, “Hierarchical discriminative framework for detecting tubular structures in 3D images,” Inf. Process. Med. Imag., pp. 328–339, 2013. [24] D. Mahapatra, “Graph cut based automatic prostate segmentation using learned semantic information,” in Proc. IEEE 10th Int. Symp. Biomed. Imag., 2013, pp. 1316–1319. [25] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57, pp. 137–154, 2004. [26] Y. Zhan, M. Dewan, M. Harder, A. Krishnan, and X. S. Zhou, “Robust automatic knee MR slice positioning through redundant and hierarchical anatomy detection,” IEEE Trans. Med. Imag., vol. 30, no. 12, pp. 2087–2100, Dec. 2011. [27] Y. Zhan, X. S. Zhou, Z. Peng, and A. Krishnan, “Active scheduling of organ detection and segmentation in whole-body medical images,” in Proc. MICCAI, 2008, pp. 313–321. [28] B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, pp. 972–976, 2007. [29] C. Goodall, “Procrustes methods in the statistical analysis of shape,” J. R. Stat. Soc. Ser. B (Methodol.), pp. 285–339, 1991. [30] D. Shen, W.-H. Wong, and H. H. Ip, “Affine-invariant image retrieval by correspondence matching of shapes,” Image Vis. Comput., vol. 17, pp. 489–499, 1999. [31] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2005, pp. 886–893. [32] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987, Jul. 2002. [33] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, 2003. [34] Y. Gao, S. Liao, and D. Shen, “Prostate segmentation by sparse representation based classification,” in Proc. MICCAI, 2012, pp. 451–458. [35] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.

1780

[36] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009. [37] S. Menard, Applied Logistic Regression Analysis. Thousand Oaks, CA: Sage, 2002. [38] J. Shiraishi, S. Katsuragawa, J. Ikezoe, T. Matsumoto, T. Kobayashi, K.-I. Komatsu, M. Matsui, H. Fujita, Y. Kodera, and K. Doi, “Development of a digital image database for chest radiographs with and without a lung nodule receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules,” Am. J. f Roentgenol., vol. 174, pp. 71–74, 2000. [39] J. H. Tan, U. R. Acharya, C. M. Lim, and K. T. Abraham, “An interactive lung field segmentation scheme with automated capability,” Digital Signal Process., vol. 23, pp. 1022–1031, 2013. [40] L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, pp. 297–302, 1945. [41] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning for sparse coding,” in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 689–696. [42] T. F. Cootes and C. J. Taylor, Statistical models of appearance for computer vision Imag. Sci. Biomed. Eng., Univ. Manchester, Manchester, U.K., Mar. 2004. [43] X. Wang, X. Zhao, L. Jia, and P. Yang, “Pupil localization for multiview eyeballs by ASM and eye gray distribution,” Procedia Eng., vol. 15, pp. 2993–2998, 2011. [44] M. León and B. Escalante-Ramirez, “Segmentation of knee cartilage by using a hierarchical active shape model based on multi-resolution transforms in magnetic resonance images,” in Proc. IX Int. Seminar Med. Inf. Process. Anal., 2013, p. 892214. [45] B. Ibragimov, B. Likar, F. Pernus, and T. Vrtovec, “A game-theoretic framework for landmark-based image segmentation,” IEEE Trans Med. Imag., vol. 31, no. 9, pp. 1761–1776, Sep. 2012.


[46] J. A. Sethian, Level set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics Computer. Vision, and Materials Scince. Cambridge, U.K.: Cambridge Univ. Press, 1999, vol. 3. [47] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” Int. J. Comput. Vis., vol. 1, pp. 321–331, 1988. [48] Y. Wang, E. K. Teoh, and D. Shen, “Lane detection using B-snake,” in Proc. 1999 Int. Conf. Inf. Intell. Syst., 1999, pp. 438–443. [49] D. Shen and H. H. Ip, “A hopfield neural network for adaptive image segmentation: An active surface paradigm,” Pattern Recognit. Lett., vol. 18, pp. 37–48, 1997. [50] Y. Zhan and D. Shen, “Automated segmentation of 3D us prostate images using statistical texture-based matching method,” in Proc. MICCAI, 2003, pp. 688–696. [51] Q. Feng, M. Foskey, S. Tang, W. Chen, and D. Shen, “Segmenting CT prostate images using population and patient-specific statistics for radiotherapy,” in Proc. IEEE Int. Symp. Biomed. Imag.: From Nano to Macro, 2009, pp. 282–285. [52] S. Liao and D. Shen, “A feature-based learning framework for accurate prostate localization in CT images,” IEEE Trans. Image Process., vol. 21, no. 8, pp. 3546–3559, Aug. 2012. [53] W. Li, S. Liao, Q. Feng, W. Chen, and D. Shen, “Learning image context for segmentation of the prostate in CT-guided radiotherapy,” Phys. Med. Biol., vol. 57, pp. 1283–1283, 2012. [54] S. Liao, Y. Gao, J. Lian, and D. Shen, “Sparse patch-based label propagation for accurate prostatelocalization in CT images,” IEEE Trans. Med. Imag., vol. 32, no. 2, pp. 419–434, Feb. 2013.

Joint Lung CT Image Segmentation: A Hierarchical Bayesian Approach.

Robust selection-based sparse shape model for lung cancer image segmentation.

Learning joint intensity-depth sparse representations.

AUTOMATED LEVEL SET SEGMENTATION OF HISTOPATHOLOGIC CELLS WITH SPARSE SHAPE PRIOR SUPPORT AND DYNAMIC OCCLUSION CONSTRAINT.

Disjunctive Normal Shape and Appearance Priors with Applications to Image Segmentation.

A unified variational segmentation framework with a level-set based sparse composite shape prior.

Automatic segmentation of a fetal echocardiogram using modified active appearance models and sparse representation.

Joint Learning of Multiple Sparse Matrix Gaussian Graphical Models.

Hierarchical Sparse Learning with Spectral-Spatial Information for Hyperspectral Imagery Denoising.

Semantic Image Segmentation with Contextual Hierarchical Models.

Prediction of Infant MRI Appearance and Anatomical Structure Evolution using Sparse Patch-based Metamorphosis Learning Framework.

Deformable MR Prostate Segmentation via Deep Feature Learning and Sparse Patch Matching.

Predict brain MR image registration via sparse learning of appearance and transformation.

A JOINT FRAMEWORK FOR 4D SEGMENTATION AND ESTIMATION OF SMOOTH TEMPORAL APPEARANCE CHANGES.

Patchwise joint sparse tracking with occlusion detection.

Subject Specific Sparse Dictionary Learning for Atlas based Brain MRI Segmentation.

Joint infrared target recognition and segmentation using a shape manifold-aware level set.

Joint target tracking, recognition and segmentation for infrared imagery using a shape manifold-based level set.

Discriminative random field segmentation of lung nodules in CT studies.

The Generalized Log-Ratio Transformation: Learning Shape and Adjacency Priors for Simultaneous Thigh Muscle Segmentation.

Abdomen and spinal cord segmentation with augmented active shape models.

Statistical shape and appearance models in osteoporosis.

Focused shape models for hip joint segmentation in 3D magnetic resonance images.

Coupling Statistical Segmentation and PCA Shape Modeling.