Sparse coding induced transfer learning for HEp-2 cell classification.

Bio-Medical Materials and Engineering 24 (2014) 237–243 DOI 10.3233/BME-130804 IOS Press

237

Sparse Coding Induced Transfer learning for HEp-2 Cell Classification 1 Anan Liu a , Zan Gao b , Tong Hao c,∗ , Yuting Su a , and Zhaoxuan Yang a a School

of Electronic Information Engineering, Tianjin University, Tianjin 300072, China E-mail: {liuanan,ytsu, yangzhx}@tju.edu.cn b Laboratory of Computer Vision and System, Tianjin University of Technology, Tianjin 300191, China E-mail: [email protected] c College of Life Sciences, Tianjin Normal University, Tianjin 300387, China E-mail: [email protected]

Abstract. Automated human larynx carcinoma (HEp-2) cell classification is critical for medical diagnosis. In this paper, we propose a sparse coding-based unsupervised transfer learning method for HEp-2 cell classification. First, the low level image feature is extracted for visual representation. Second, a sparse coding scheme with the Elastic Net penalized convex objective function is proposed for unsupervised feature learning. At last, a Support Vector Machine classifier is utilized for model learning and predicting. To our knowledge, this work is the first to transfer the human-crafted visual feature, sensitive to the variation of appearance and shape during cell movement, to the high level representation which directly denotes the correlation of one sample and the bases in the learnt dictionary. Therefore, the proposed method can overcome the difficulty in discriminative feature formulation for different kinds of cells with irregular and changing visual patterns. Large scale comparison experiments will be conducted to show the superiority of this method. Keywords: HEp-2 cell, cell classification, transfer learning, sparse coding, elastic net

1. Introduction Automated human larynx carcinoma (HEp-2) cell classification is critical for effective medical diagnosis. HEp-2 cell is usually utilized as the substrate for Indirect ImmunoFluorescence (IIF) which is commonly considered as a powerful, sensitive and comprehensive test for antinuclear autoantibodies (ANAs) analysis [1]. The flow of IIF diagnostic procedure consists of four main steps, namely image acquisition, mitosis detection, fluorescence intensity classification, and staining pattern recognition. Among such steps, the last one is very challenging and important since several different patterns may be observed which match with different autoimmune diseases. Staining patterns can be roughly classified into six groups [2] shown in Fig.1: 1) homogeneous: diffuse staining of the interphase nuclei and staining 1

This work was supported in part by the National Natural Science Foundation of China (61100124, 21106095, 61170239, 61202168) and the Foundation of Introducing Talents to Tianjin Normal University (5RL123). * Corresponding author. E-mail: [email protected].

0959-2989/14/$27.50 © 2014 – IOS Press and the authors. All rights reserved

238

A. Liu et al. / Sparse coding induced transfer learning for HEp-2 cell classification

of the chromatin of mitotic cells; 2) fine speckled: a fine granular nuclear staining of the interphase cell nuclei; 3) coarse speckled: a coarse granular nuclear staining of the interphase cell nuclei; 4) nucleolar: large coarse speckled staining within the nucleus, less than six in number per cell; 5) cytoplasmic: fine fluorescent fibres running the length of the cell, which is frequently associated with other autoantibodies to give a mixed pattern; 6) centromere: several discrete speckles distributed throughout the interphase nuclei and characteristically found in the condensed nuclear chromatin during mitosis as a bar of closely associated speckles. However, humans are limited in their ability to detect and diagnose a disease during image interpretation due to their non-systematic search patterns and the presence of noise. In addition, the vast amount of image data generated make the detection of potential disease burdensome and may cause oversight errors as well. Therefore, the need for rapid and accurate medical diagnosis has made automated HEp-2 cell classification mandatory.

Fig. 1. Samples of Hep-2 staining patterns.Each class contains three samples to show the drastically high intra-class variation and relatively low inter-class difference of visual pattern and scale.

Automated cell classification methods usually involve two key steps, feature representation and model learning. From the view point of feature representation, many kinds of low level image features have been utilized for this task. The region-based shape features are always straightforwardly implemented for cell classification [3]. However, its dependence on cell segmentation is a severe burden because cell segmentation per se is challenging. To overcome this difficulty, the statistical chrematistics of intensity, texture, and morphological features are formulated for cell representation [4,5,6]. To improve the discrimination of feature representation, the strategy of the fusion of multiple image features has been widely evaluated [2,7]. Very recently, the popular feature descriptor, Scale Invariant Feature Transform (SIFT) [8], has been used for cell region representation because of its high discriminative capability and its robustness to image scale, rotation, illumination fluctuation, and minor changes of viewpoint [9]. From the view point of model learning, with the rapid development of machine learning, the powerful classifiers play important roles on cell classification. Multi-Layer Perceptron, k-Nearest Neighbour, Support Vector Machine, and Random Forest et al. have been extensively utilized for this task [4,2,7]. Although much work has been done on this task, there still exists one problem to be solved. The stateof-the-art low level visual features, like intensity, texture, and shape features, can not discriminatively represent multiple kinds of HEp-2 cells with irregular appearance changes. To tackle this challenging task, we propose an HEp-2 cell representation and classification method via sparse coding. First, the low level image feature is extracted for visual representation. Second, a sparse coding scheme with the Elastic Net penalized convex objective function is proposed for unsupervised feature learning. At last, a Support Vector Machine (SVM) classifier is utilized for model learning and predicting. Our main contribution


239

is that we can transfer the low level visual feature into the high level feature which directly indicates the similarity between one sample and the bases with sparse coding scheme. Therefore, the proposed method can overcome the difficulty in discriminative feature formulation for different kinds of cells with irregular and changing visual patterns. The rest of paper is structured as follows. In Section 2, we briefly introduce the systematic workflow. Then, the sparse coding scheme is illustrated in Section 3. The experimental method and results will be detailed in Section 4 and 5. At last, conclusions are presented. 2. System Overview The proposed method for HEp-2 cell classification proceeds through three consecutive steps. 1) Low Level Feature Extraction: This step converts each cell image into a feature vector, Xi (Xi ∈ Rd ), which implicitly represent the visual characteristics of one cell. With the visual characteristics shown in Fig.1, GIST was adopted for low level feature extraction, which can represent local texture and structural information [10]. The image is firstly divided into dense grids and the attribute of each grid is represented with the set of spatial envelope properties, including Naturalness, Openness, Roughness, Expansion, and Ruggedness. Then the attributes of all grids are combined to form GIST. 2) Unsupervised Feature Learning via Sparse Coding: Given a training set with the low level represenM d tation X = {Xi }N i=1 , sparse coding means to decompose Xi over a dictionary φ (φ = {φi }i=1 ,φi ∈ R is basis denoting a classic visual pattern) such that Xi = φ · wi + ri where wi is a sparse vector and ri is the residual. wi explicitly represents the similarity between Xi and each basis and therefore can be ∗ considered as the high level feature. To achieve the optimal W ∗ = {wi∗ }N i=1 and φ simultaneously, we designed the objective function for dictionary learning and sparse decomposition which will be detailed in Section 3. With the learned φ∗ , a test sample Y can be decomposed and represented by wy . 3) Cell Classification: In our work, Support Vector Machine (SVM) with RBF kernal was trained with the high level feature set W ∗ = {wi∗ }N i=1 and then utilized to classify each test sample wy . SVM depends on structural risk minimization induction principle to minimize a bound on the generalization error. Lots of literature state that SVM can show superior performance with a small amount of training data. 3. Sparse Coding for Feature Transfer 3.1. Problem Formulation Ideally, a sample S ∈ Rd should be faithfully represented simply by the linear combination of the members of the class it belongs to and moreover the decomposed coefficient w can preserve sparsity. For this problem,we designed the convex objective function regularized by Elastic Net penalty [11],F (w, γ1 , γ2 ), for high level feature representation: F (w, γ1 , γ2 ) = ||S − φ · w||22 + γ1 ||w||1 + γ2 ||w||22

(1)

where || · ||p means the Lp norm; the relative importance of the three terms is controlled by the positive weights γ1 and γ2 . In this way, the optimal decomposed coefficient w∗ can be achieved by solving w∗ = arg minw F (w, γ1 , γ2 ). Since w∗ directly indicates the correlation between S and each base of φ, w∗ can be considered the high level representation of one sample. The objective function in Eq. 1

240


consists of two parts: Fidelity: The first term penalizes the sum-of-squares difference between the reconstructed and original samples. Different from the classical Nearest Neighbor classifier fitting S with each training sample and the Nearest Subspace classifier fitting S with the training samples of the same class, the fidelity term in this sparse coding scheme will collectively take advantage of samples of all classes for reconstruction to avoid the unstable solution of sparse coding. Elastic Net penalty: The Elastic Net penalty consists of two factors, the lasso penalty (||w||1 ) and the ridge penalty (||w||22 ) to fulfill two constrains: a) Sparsity: Lasso penalty performs well to impose sparsity [12]. It is intuitive that a test sample should be reconstructed with both low residual and few bases. Although a negative sample can also be reconstructed with the same dictionary and the acceptable residual, it usually costs more bases for compensation and results in a dense coefficient. b) Consistence: If two bases are highly similar to each other, they should be assigned with almost the same weights. It is well known that the ridge penalty with strict convexity can preserve consistence for the decomposed coefficient[11]. Therefore, the Elastic Net penalty is used for objective function formulation. 3.2. Dictionary Learning and Sparse Decomposition Given a training set X, intuitively, there exists a latent dictionary of bases where each basis characterizes a classic visual pattern such that a new sample S can be sparsely reconstructed with respect to this dictionary. Therefore, we need to explore the training set to automatically learn a set of sparse bases, φ = {φi }M i=1 , so that any given image can be represented by a linear combination over the learned dictionary with the constraints of both sparsity and consistence. The proposed sparse coding scheme contains two consecutive steps: Dictionary learning: The optimal dictionary φ∗ and reconstruction coefficients W ∗ for training set can be obtained by: N

min

φ∈C,W ∈RM ×N

(||Xi − φ · wi ||22 + γ1 ||wi ||1 + γ2 ||wi ||22 )

(2)

i=1

d×M , s.t. ∀i, ||φ ||2 ≤ 1}. Apparently, the where W = {wi }N i 2 i=1 and the convex set C = {φ ∈ R optimization problem above is not convex with respect to both φ and W . Motivated by coordinate decent method, we can optimize one while fixing the other and improve both in iteration: a) Learning reconstruction coefficient W with fixed φ: Supposing the fixed dictionary in kth iteration, φk , the coefficient for sample Xi in kth iteration, wik , can be optimized by:

min

W k ∈RM ×N

N (||Xi − φk · wik ||22 + γ1 ||wik ||1 + γ2 ||wik ||22 )

(3)

i=1

For this objective function, we adopt the online learning algorithm [13] which shows great performance on accuracy. b) Learning dictionary φ with fixed W : Supposing the fixed wik for Xi in kth iteration, φk can be optimized by: min

φ∈Rd×M

N i=1

||Xi − φk · wik ||22 , s.t. ∀i, ||φki ||22 ≤ 1

(4)


241

The constraint above is implemented to avoid basis φki being arbitrarily large, which would result in arbitrarily small value of wik . Eq.4 is the Least Square Estimation with quadratic constraints. It can be simply solved using Lagrange dual. Sparse Decomposition Sparse decomposition simply represents a test sample Y as the linear combination of the bases of the learned dictionary φ∗ . Given φ∗ , the optimal coefficient wy∗ can be achieved by optimizing: min ||Y − φ∗ · wy ||22 + γ1 ||wy ||1 + γ2 ||wy ||22

wy ∈RM

(5)

Eq.5 is convex with respect to wy and can be solved in the same way according to Eq.3. 4. Experimental Method The HEp-2 image dataset [2] was used for evaluation. To our knowledge, this is the largest public dataset for HEp-2 cell classification. The dataset consists of 721 samples for research, containing 208 Centromere cases, 109 Coarse Speckled cases, 58 Cytoplasmatic cases, 94 Fine Speckled cases, 150 Homogeneous cases and 102 Nucleolar cases, as shown in Fig.1. The experiment includes the following three aspects: 1) For dictionary learning, we need to discover the best combination of γ1 and γ2 . The performance of each pair of γ1 and γ2 within [10−4 , 10−1 ] is compared. In experiment, we assigned the basis number of each class-wise dictionary equal to the number of training samples. 2) For classification, the samples of each class are randomly divided into 2 halves (one for dictionary learning and model training and the other for test) and repeated 10 times to get the average (Ave) and standard diviation (Std) of each criteria for evaluation. For each run, φ∗ and W ∗ can be learned from training set with respect to the optimal pair of γ1 and γ2 . Then wy of one test sample Y can be obtained over φ∗ for prediction. 3) To demonstrate the superiority of sparse coding scheme, the proposed method (GIST+Sparse Coding+SVM) was compared with the similar strategy without sparse coding (GIST+SVM). 4) To demonstrate the superiority of the proposed method from both aspects of feature representation and classifier, the proposed method was compared with multiple configurations of different features and classifiers [14] with the same experiment setting to form the fair comparison. To evaluate the performance of cell classification, we examined true positive(TP), false negaP TP tive(FN), false positive(FP),true negative(TN). Precision ( T PT+F P ), Recall ( T P +F N ), and F1 score recision×Recall ( 2×P P recision+Recall , representing the overall performance for each class of cell) are used as quantitative P +T N metrics. Accuracy ( T P +FT N +F P +T N ) is utilized to evaluate the overall classification performance for both single class and six classes. 5. Experimental Results 5.1. Dictionary Learning The performances of dictionary learning strategies with different configurations of γ1 and γ2 are shown in Table.1. From Table.1, we can achieve the best accuracy with γ1 = 10−2 and γ2 = 10−2 . It implied that the proper sparsity and consistence effects by the Elastic Net penalty can benefit the decomposed coefficients for model learning. Moreover, the Std of all accuracy in Table.1 is only 1.08% which denotes the proposed method have strong robustness with respect to a broad range of parameters.

242


5.2. Cell Classification From Table.2, it is obvious that the proposed method can consistently outperform the other in terms of Recall, Precision and F1 score. The comparison shows the sparse coding scheme can benefit cell classification because it can transform the low level visual feature into high level feature which can explicitly reflect the correlation between one sample and bases.This relationship can be achieved stably by solving the proposed objective function and therefore lead to the high robustness to rotation, shape and so on. Comparatively the irregular appearance changes have negative influence on GIST formulation and degrade the discriminative ability. From Table.3, the proposed method outperform other feature combination and classifier in terms of overall accuracy for the classification of six kinds of cells. Line 1 to Line 3 show the performance of SVM classifier with only one kind of feature. Line 4 to Line 7 show the performance of different classifiers with the fusion of three features. The comparison shows that the SVM classifier works best with the same visual representation. The comparison between the proposed method and all others show that the proposed method can outperform other methods for the classification of HO, FS, NU, CE cells. Since CS cell is visually similar to FS and HO cells, the proposed method is easy to falsely classify CS cell and class-wise accuracy is only 90.0%. The classification of CY cell can only achieve 79.3% because there were only 29 training samples to learn the dictionary with 29 basis. Comparatively, although the fusion of three kinds of features works better than the proposed method for CS and CY cell classification, the fused visual feature is 1562 dimension and will cost highly computational complexity. Table 1 Accuracy of classification for six classes of cells with respect to different γ1 and γ2 (Ave±Std %) γ2 /γ1

10−1

10−2

10−3

10−4

10−1 10−2 10−3 10−4

94.2 ± 2.1 91.1 ± 4.0 93.1 ± 2.0 92.7 ± 3.1

94.0 ± 4.5 94.2 ± 1.5 92.9 ± 2.8 91.5 ± 4.1

93.9 ± 4.1 91.1 ± 5.6 92.3 ± 2.8 91.6 ± 2.8

93.7 ± 4.0 93.7 ± 3.6 92.5 ± 3.5 92.7 ± 4.0

Table 2 Comparison of cell classification (Ave±Std %) GIST+Sparse Coding+SVM

GIST+SVM

Class

Recall (%)

Precision (%)

F1 (%)

Class

Recall (%)

Precision (%)

F1 (%)

Centromere Coarse Speckled Cytoplasmatic Fine Speckled Homogeneous Nucleolar

99.9 ± 0.3 90.0 ± 13.5 79.3 ± 11.4 94.0 ± 13.9 97.2 ± 3.0 99.4 ± 0.9

89.4 ± 10.1 99.1 ± 1.3 95.8 ± 0.0 98.3 ± 2.1 93.9 ± 3.4 100 ± 0.0

94.1 ± 6.1 93.7 ± 8.4 86.8 ± 8.6 95.5 ± 8.7 95.4 ± 1.9 99.7 ± 0.5

Centromere Coarse Speckled Cytoplasmatic Fine Speckled Homogeneous Nucleolar

88.0 ± 5.1 82.7 ± 7.2 74.8 ± 6.9 76.0 ± 9.1 94.5 ± 4.3 88.8 ± 8.9

86.0 ± 6.6 90.6 ± 6.4 83.9 ± 7.4 81.0 ± 7.0 86.6 ± 4.3 90.2 ± 4.0

86.7 ± 3.0 86.1 ± 2.7 79.0 ± 6.7 77.8 ± 3.6 90.3 ± 2.2 89.2 ± 4.5

243

A. Liu et al. / Sparse coding induced transfer learning for HEp-2 cell classification Table 3 Comparison of class-wise and overall accuracy by different methods [14] (HO:Homogeneous; FS:FineSpeckled; CS:CoarseSpeckled; Nucleolar:NU; CY:Cytoplasmatic; CE:Centromere) Method

HO(%)

FS(%)

CS(%)

NU(%)

CY(%)

CE(%)

Overall(%)

HoG+SVM Texture+SVM ROI+SVM HOG+Texture+ROI+SVM HOG+Texture+ROI+ANN HOG+Texture+ROI+KNN(Cosine) HOG+Texture+ROI+Naive-Bayes Proposed

82.7 87.3 87.3 90.7 91.3 93.3 81.3 97.2

64.9 68.1 0 74.5 75.5 53.2 53.2 94.0

93.6 96.3 89.0 99.1 93.6 96.3 78.0 90.0

70.6 86.3 58.8 88.2 94.1 88.2 84.3 99.4

82.8 69.0 86.2 89.7 86.2 93.1 79.3 79.3

89.4 93.8 70.2 96.6 94.7 83.7 81.7 99.9

82.3 86.4 67.1 91.1 90.6 82.9 77.5 95.4

6. Conclusion In this paper, we propose an HEp-2 cell representation and classification method via transfer learning based on sparse coding. This method can stably transform the low level visual feature into the high level feature which indicates the similarity between one sample and the bases. It can overcome the difficulty in discriminative feature formulation for different kinds of cells with irregular and changing visual patterns. Large scale comparison experiments show the proposed method can outperform competing methods in terms of Precision, Recall and F1 score. References [1] D.H. Solomon, A.J. Kavanaugh, and et al., Evidence-based guidelines for the use of immunologic tests: Antinuclear antibody testing, Arthritis Care. Res., 47 (2002), 434–444. [2] P. Foggia, G. Percannella, and et al., Early experiences in mitotic cells recognition on hep-2 slides, IEEE Symposium on Computer-Based Medical Systems (2010). [3] P. Strandmark , J. Ulen, and F. Kahl, Hep2 staining pattern classification, International Conference on Pattern Recognition (2012). [4] P. Soda and G. Iannello, Aggregation of classifiers for staining pattern recognition in antinuclear autoantibodies analysis, IEEE Trans. Inf. Technol. Biomed. 13 (2009), 322–329. [5] G. Thibault and J. Angulo, Efficient statistical/morphological cell texture characterization and classification, International Conference on Pattern Recognition (2012). [6] K. Li, J. Yin , and et al., Multiclass boosting svm using different texture features in hep-2 cell staining pattern classification, International Conference on Pattern Recognition (2012). [7] V. Snell, W. Christmas, and J. Kittler, Texture and shape in fluorescence pattern identification for auto-immune disease diagnosis, International Conference on Pattern Recognition (2012). [8] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Computer Vision, 60 (2004), 91–110. [9] A. Liu, K. Li, and T. Kanade, A semi-markov model for mitosis segmentation in time-lapse phase contrast microscopy image sequences of stem cell populations, IEEE Transactions on Medical Imaging, 31 (2012), 359–369. [10] A. Oliva and A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope, Int. J. Comput. Vision, 42 (2001), 145–175. [11] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society, B (2005), 301–320. [12] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, B (1996), 267– 288. [13] J. Mairal, F. Bach, and et al., Online dictionary learning for sparse coding, International Conference on Machine Learning (2009). [14] S. Ghosh and V. Chaudhary, Feature analysis for automatic classification of hep-2 florescence patterns: Computer-aided diagnosis of auto-immune diseases, International Conference on Pattern Recognition (2012).

Copyright of Bio-Medical Materials & Engineering is the property of IOS Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Sparse extreme learning machine for classification.

Sparse Bayesian extreme learning machine for multi-classification.

Multiple kernel learning for sparse representation-based classification.

Learning sparse kernel classifiers for multi-instance classification.

Classification of Histology Sections via Multispectral Convolutional Sparse Coding.

On A Nonlinear Generalization of Sparse Coding and Dictionary Learning.

CLASSIFICATION OF TUMOR HISTOPATHOLOGY VIA SPARSE FEATURE LEARNING.

Matrix variate distribution-induced sparse representation for robust image classification.

Joint sparse coding based spatial pyramid matching for classification of color medical image.

Sparse Representation Based Multi-Instance Learning for Breast Ultrasound Image Classification.

Supervised block sparse dictionary learning for simultaneous clustering and classification in computational anatomy.

Learning a weighted meta-sample based parameter free sparse representation classification for microarray data.

A new approach for clustered MCs classification with sparse features learning and TWSVM.

Maxdenominator Reweighted Sparse Representation for Tumor Classification.

Sensorimotor transformation via sparse coding.

Improving EEG-Based Emotion Classification Using Conditional Transfer Learning.

Learning Stable Multilevel Dictionaries for Sparse Representations.

MCI classification.

Correction: Multiple Sparse Representations Classification.

A temporal channel for information in sparse sensory coding.

Click prediction for web image reranking using multimodal sparse coding.

A bit allocation method for sparse source coding.

Sparse coding for flexible, robust 3D facial-expression synthesis.

Cell recognition based on topological sparse coding for microscopy imaging of focused ultrasound treatment.