A general framework for wireless capsule endoscopy study synopsis.

G Model

ARTICLE IN PRESS

CMIG-1271; No. of Pages 9

Computerized Medical Imaging and Graphics xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Computerized Medical Imaging and Graphics journal homepage: www.elsevier.com/locate/compmedimag

A general framework for wireless capsule endoscopy study synopsis Qian Zhao a,b,∗,1 , Gerard E. Mullin b , Max Q.-H. Meng a , Themistocles Dassopoulos c , Rajesh Kumar b a b c

The Chinese University of Hong Kong, Shatin, Hong Kong Johns Hopkins University (JHU), Baltimore, MD 21218, USA Washington University School of Medicine, St. Louis, MO 63110, USA

a r t i c l e

i n f o

Article history: Received 4 January 2014 Received in revised form 2 April 2014 Accepted 29 May 2014 Keywords: Computer-aided diagnosis Wireless capsule endoscopy Hidden Markov models Support vector machines Supervised classification Study synopsis

a b s t r a c t We present a general framework for analysis of wireless capsule endoscopy (CE) studies. The current available workstations provide a time-consuming and labor-intense work-flow for clinicians which requires the inspection of the full-length video. The development of a computer-aided diagnosis (CAD) CE workstation will have a great potential to reduce the diagnostic time and improve the accuracy of assessment. We propose a general framework based on hidden Markov models (HMMs) for study synopsis that forms the computational engine of our CAD workstation. Color, edge and texture features are first extracted and analyzed by a Support Vector Machine classifier, and then encoded as the observations for the HMM, uniquely combining the temporal information during the assessment. Experiments were performed on 13 full-length CE studies, instead of selected images previously reported. The results (e.g. 0.933 accuracy with 0.933 recall for detection of polyps) show that our framework achieved promising performance for multiple classification. We also report the patient-level CAD assessment of complete CE studies for multiple abnormalities, and the patient-level validation demonstrates the effectiveness and robustness of our methods. © 2014 Elsevier Ltd. All rights reserved.

1. Introduction Wireless capsule endoscopy (CE) [10,22] was invented to screen the gastrointestinal (GI) tract, especially the small bowel (previously not accessible non-invasively) using a simple outpatient test. It has significantly impacted the diagnostic approach for many diseases, such as bleeding, Crohn’s and Celiac diseases, tumors, polyps, and other lesions [22]. Introduced by Given Imaging Inc. in 2000, over 1,000,000 Pillcam small bowel (SB) capsules alone have already been swallowed in the past 10 years since the device was first approved by the U.S. FDA. The size of the CE device is 11 mm × 26 mm. It consists of an imaging sensor, associated optics, and communication electronics. An outpatient examination typically produces more than

∗ Corresponding author at: Johns Hopkins University (JHU), Baltimore, MD 21218, USA. Tel.: +1 2407227351; fax: +1 2024761270. E-mail addresses: [email protected], [email protected] (Q. Zhao). 1 Qian Zhao was a visiting graduate student at the Visual Imaging and Surgical Robotics Laboratory at JHU during this work. This work was supported by Dr. Kumar’s discretionary funds and the HK ITF project #6902928 (PI: Max Q.-H. Meng).

50,000 images, which are then manually and tediously examined by an expert reader by inspecting the full-length video. In addition to time constraints, this manual procedure cannot guarantee that every abnormality is detected due to their various sizes, positions and characteristics, and the experience of the clinicians. Development of a computer-aided diagnosis tool for CE assessment is therefore desirable and necessary. Fig. 1 shows some examples of CE images. The image resolution is 576 × 576. It can be seen that besides diagnostic relevant images (e.g. lesion (b) and polyp (c)), there are also a large amount of images with normal lumen (f), bile (a), air bubbles (d) and extraneous matter (e). In this work, we propose a general framework to summarize CE videos into multiple classes. Fig. 2 shows the conceptual objective of CE study synopsis. A general hidden Markove models (HMM) is built based on statistical classifiers integrating multiple image appearance attributes (color, edge, and texture). The underlying support vector machine classifier [5] outputs are encoded as the binary observations of HMM. The proposed method is a generalized model instead of discriminative one, which can generate instances for clinical training and education. We have evaluated this composite framework by performing the video synopsis on a database of complete CE videos where study images were summarized into six most commonly seen classes – normal images, lesions, polyps,

http://dx.doi.org/10.1016/j.compmedimag.2014.05.011 0895-6111/© 2014 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Zhao Q, et al. A general framework for wireless capsule endoscopy study synopsis. Comput Med Imaging Graph (2014), http://dx.doi.org/10.1016/j.compmedimag.2014.05.011

G Model CMIG-1271; No. of Pages 9

ARTICLE IN PRESS Q. Zhao et al. / Computerized Medical Imaging and Graphics xxx (2014) xxx–xxx

2

Fig. 1. Examples of CE image images – (a) bile, (b) lesion, (c) polyps, (d) large air bubble, (e) extraneous matter, and (f) normal lumen.

air bubbles, bile, and other extraneous matter. Compared to prior art reviewed above where image-based performance measures are reported, we describe patient-level validation of this general model to verify its effectiveness and robustness. Our main contributions lie in: (a) the proposal of a general framework based on HMM integrating temporal information; (b) investigation of multi-class study synopsis; (c) development of a CAD workstation for automated CE assessment; and (d) validation on complete CE videos providing great potential for direct clinical application. 2. Related work Prior related work mainly focused on statistical analysis for detection of various individual abnormalities, with bleeding [18,13,25] being the main focus. More recently, Yi et al. [25] introduced a clinically viable software for automated GI tract bleeding detection and classification. The major functional modules included a graph-based segmentation algorithm, specific feature selection and validation and cascade classification. The method focused on single abnormality, bleeding, while our framework is general for video synopsis with respect to multiple abnormalities. With wide application of CE, lesions [3,11,12], polyps [1,27], and non-informational frame detection [2] have also been explored in detail. Our group has also previously investigated several statistical methods for CE image analysis [12,3,20], primarily for Crohn’s disease, and for patient level disease severity assessment. However, all the above prior research (a) focuses on specific abnormalities, diseases or anatomy and (b) ignores the temporal information between neighboring images. Some efforts have been made to analyze CE studies in the form of video streams instead of individual images, which mainly focused

on applications such as small intestinal motility assessment [23], video summarization [7,15], and topographical segmentation [6]. These efforts are all limited to individual research goals or simple abnormality detection. Mackiewicz et al. [15] reported a scheme for color image analysis to discriminate between digestive organs. The method focused on detecting the boundary between the stomach and small intestine, and separately the small bowel and colon. Color, texture and motion features were used for SVM classification, and a multivariate Gaussian classifier built in a HMM framework. This topological segmentation using a simple left-to-right HMM is not sufficient to describe the complexity of abnormality sequences that may contain both normal and abnormal images. Our prior works investigated both supervised and unsupervised methods for CE video summarization [28,30,29]. All these works were validated using video clips instead of full-length videos. More recently, Htwe et al. presented a CE video summarization method for bleeding detection [8]. The bleeding images were first detected based on color histogram and supervised classification. Then the image-level summarization was performed by using motion estimation and clustering. The color bar map was finally generated and served as the summary. The method was validated on 9 real-patient videos that were not described in detail. Furthermore, the method was currently designed specifically to detect bleeding, while our proposed method is applicable to multiple abnormalities. Iakovidis et al. attempted to reduce the number of frames in CE videos based on clustering [9]. The frames were represented by the SURF keypoints, followed by clustering using Homography estimation. The method was validated on short CE video clips and the selection of the number of cluster is manual and empirical. Their recent work stitched multiple CE images to generate a panoramic visual summaries for CE videos [21]. Liu et al. developed another study summarization method for CE videos,

Fig. 2. The conceptual objective of CE study synopsis: a CE video is summarized into multiple classes including both abnormalities, extraneous matters and normal lumen.




3

Fig. 3. The information flow of our computational framework applied to image sequence analysis and study synopsis, including feature extraction, frame-based classification and HMM modeling.

which was based on camera motion estimation [14]. CE imaging motion was estimated in a hierarchical manner. The coarse level estimation was based on the Bee Algorithm and Mutual Information, while the fine level was achieved via SIFT flow. However, the dataset that the experiments performed on was not described and accessible. Moreover, the reduction scheme preserved key images in WCE video with scene changes, which may not be always valid as some abnormalities are easy to be missed. For this work to translate into clinical practice, new methods must be (a) valid for all common uses of the CE images, and validated using patient-level validation and (b) combine individual images in the same way a clinician manually fuses the information available in all relevant images containing the same anatomy. In our closest prior work, a framework based on hidden Markov model for image sequence analysis for polyps has been proposed in [27,31]. The method integrated the temporal information in CE image analysis, and was validated on detection of polyps, but the simple HMM model described previously used observations matching the states. The main differences between the prior work and this paper lie in the following aspects: (a) the proposed method allows integrating multiple appearance attributes instead of binary observations; (b) the proposed method was validated on patient-level leave-out-subject-out using full-length videos instead of shortduration video clips; and (c) the proposed method is extended to video synopsis from image sequence classification. 3. Methods The information flow of our framework is shown in Fig. 3. The supervised framework consists of model training and validation stages. For model training, we perform feature selection, frame-based classification and build an HMM for each class. This extensible framework can be applied to both multi-class image sequence analysis and study synopsis that will be described in detail in this section.

3.1. Frame-based supervised classification As the focus of this work is the higher-level framework, we used features and classifiers previously validated in prior work [26]. CE images contain rich color information, for example, bleeding and some lesions show reddish, bile appears greenish, and fecal matter presents brownish, etc. MPEG-7 dominant color descriptor (DCD) is employed as the color feature [16]. A MPEG-7 DCD clusters colors in an image in CIELUV color space with a generalized Lloyd algorithm to identify representative colors. It consists of the representative colors and their corresponding percentages. We adopt the DCD as a 24-dimensional histogram containing 6 colors and their percentages. An image with polyp and its reconstruction with 6 dominant colors computed using DCD is shown in Fig. 4. It can be seen that the 6 dominant colors represent the major color information in CE images (e.g. the color of polyp, intestinal villi and normal lumen). Besides color, edge information is also useful to represent CE images, for example, polyps and air bubbles have complex contours that can be captured by shape features. MPEG-7 edge histogram descriptor (EHD) is used to characterize the edge information in CE images. In our study, the edge histogram is computed for 5 types of edges (0◦ , 45◦ , 90◦ , 135◦ , and non-directional edges) with 80 dimensions [12]. Fig. 5 shows a CE image with a lesion and its corresponding edge features that depict the contour of the lesion. Texture is another important descriptor for CE image analysis. We utilize MPEG-7 homogeneous texture descriptor (HTD) that have been previously used in multiple studies and shown to be both computationally adequate and effective for several different abnormalities [12]. It is computed as the mean and standard deviation of the filtered outputs from a bank of Gabor filters, which contains 3 channel filters each in two radial scales. Along with the whole image average and standard deviation, the final texture feature vector has 14 dimensions. Fig. 6 shows a CE image with bubbles and its six Gabor filter responses.


G Model CMIG-1271; No. of Pages 9 4


Fig. 4. An CE image with polyp (a) and its reconstruction with only 6 dominant color computed using DCD. The color information for polyp, intestinal villi and normal lumen is all represented (b).

Fig. 5. An image with lesion (a) and corresponding edge features that depict the contour of the lesion (b).

Fig. 6. An image with bubbles (a) and the six Gabor filter responses (b).




5

Table 1 The summarization of image sequence dataset consisting six classes: bile, air bubbles, extraneous matter, lesions, normal lumen and polyps.

Fig. 7. The general HMM for CE image analysis containing two states and three observations.

Unlike the conventional methods, where the features are combined and fed into a single classifier, in this study each feature is separately classified by a support vector machine (SVM) classifier using a radial basis function (RBF) kernel. A SVM classifier projects the original labeled training data into a high-dimensional feature space where the data is separated into different classes by a hyperplane (ω, b). Let X ∈ R0 ⊆ Rn be the input vector, y ∈ { −1, + 1} be the labels, and : R0 → F be the mapping from input space to feature space. The corresponding decision function is f (X) = sign(ω, (X) − b),

(1)

where (X) is the mapping from the input space to the feature space. The RBF kernel is (xi , xj ) = exp(− xi − xj 2 ), where = 1/(2 2 ), and is the RBF kernel parameter which is found using a grid search. 3.2. Sequence-based classification We have previously presented a binary framework based on hidden Markov model (HMM) [19] for image sequence classification [26]. In this study, we extend the binary model to a general model suitable for any number of appearance attributes, multiple classifiers, and multiple abnormalities. The key differences between [26] and the work below are: (a) the appearance attributes encoded in the model: the specification of HMM is general, compared to the binary model in [26] where observations are simply the result of a SVM classifier, and are the same as the HMM states; and (b) The application of the framework: [26] only presented the framework for short image sequence analysis, whereas we perform the eventual clinical test, video synopsis for the complete CE videos. We specialize the HMM definition for CE study synopsis, shown in Fig. 7. The HMM elements are as follows: 1. Hidden states (alphabet set), denoted by S = {Positive, Negative} (N = 2) annotated by the ground truth, shown in Fig. 7, is defined as follows: • Positive (s1 ) An CE image labeled as “bile”, “bubble”, “extraneous”, “lesion”, “normal”, or “polyp” as the ground truth. • Negative (s2 ) An CE image not labeled as the positive case as the ground truth. We denote a fixed state sequence of length T as Q = {q1 , q2 , . . ., qT }. 2. Observations (alphabet set), denoted by V = {FC , FE , FT } (M = 8), are the encoded features, shown as Fig. 8, where FC , FE , FT represent color, edge and texture features, respectively. For the three types of features, the number of the observations is 23 = 8.

Class

# of Sequences

# of Images

Bile Air bubbles Extraneous matter Lesions Normal lumen Polyps

107 142 110 184 194 123

530 822 700 748 1657 572

Each image is classified as present or absent of a specific attribute shown in Table 5, and encoded as shown in Fig. 8. The observation sequence corresponding to Q is O = {o1 , o2 , . . ., oT }. 3. State transition matrix A = {aij , 1 ≤ i, j ≤ N}, aij = P(qt = sj |qt−1 = si ) characterizes the temporal relationship between hidden states of the neighboring images. 4. Observation distribution matrix B = {bj (k)}, bj (k) = P(ot = vk |qt = sj ) represents the probability of a specific observation generated by a state. In our application, it is the probability of features contained by a particular type of classes. 5. Initial state distribution = {i }, i = P(q1 = si ) means the probability of each hidden state occurring at the beginning in any sequence. We use a first-order discrete HMM which performs classification based on bj (k) and applies transition constraints aij . The process includes two stages, i.e., training and evaluation. The identical feature extraction and classification algorithms are applied to both the training and testing. For model learning, six HMMs (one model for each class) are trained using the current image sequence dataset. Color, edge and texture features are analyzed using SVM classifiers, and then the classifier output vectors are encoded to act as the input of HMM. Fig. 8 summarizes the encoding of features. The HMM is trained using the standard Baum–Welch algorithm [4] (forward procedure) which determines the parameters = (A, B, ) that maximize the probability P({Oi |}). ∗ = arg max P(O; A, B, ), P(O; A, B, ) =

=

=

t=1 T

(3)

T

Bqt ot )(

(

P(qt |qt−1 ; A))

t=1

Q

T

P(ot |qt ; B))(

(

(2)

P(O|Q; A, B, )P(Q; A, B, )

Q

T Q

t=1

Aqt−1 qt ).

t=1

A generalized expectation maximization (EM) algorithm is then used to uncover the best structure of the hidden states. The standard Viterbi algorithm [24] is used to estimate the optimal sequence of states for the given sequence, which acts as the estimated class for each sequence tested. Q∗

= argmaxP(Q|O; A, B, ) Q

P(O, Q; A, B, ) = argmax Q P(O, Q; A, B, ) Q

(4)

= argmaxP(O, Q; A, B, ) Q

3.3. CE study synopsis The sequence-based classification is extended to study synopsis directly as a video can be viewed as a long image sequence. For the supervised study synopsis, multiple HMM models are first trained for each class using image sequence dataset. Then the video (long




6

Fig. 8. The encoding scheme of the general HMM observations. Color, edge and texture features are encoded into 8 observations; 0 and 1 stand for absence and presence of a certain attribute.

image sequence) is classified to multiple labels using the trained model. The final result of CE study synopsis is shown in Fig. 2, which summarizes the CE video into multiple classes including abnormalities, extraneous matters and normal lumen. Algorithm 1 describes the entire algorithm for CE study synopsis based on HMMs. Algorithm 1.

CE study synopsis based on HMMs

Input: CE video O = {o1 , o2 , . . ., oT }, Multi-class HMMs 1: for i ← 1toTdo (i) (i) (i) 2: Extract color, edge and texture features {FC , FE , FT } for the ith image Section 3.1 (i) (i) (i) 3: Encode {FC , FE , FT } to eight attributes as the observation of HMMs 4: for k ← 1toK do 5: Infer the state sequence of Q(k) based on kth HMM model using Viterbi algorithm Eq. (4) 6: Label the CE video to multiple classes Output CE video O with multiple labels Q

4. Experiments The current Johns Hopkins University CE archive contains 75 full-length CE studies (approximately 50,000 images each) and over 120,000 annotated images collected under a Johns Hopkins Medical Institutions (JHMI) Institutional Review Board (IRB) protocol. The selected images in these studies were reviewed and annotated by our expert clinical collaborators. The annotating procedure was extremely time-consuming and also required validation for internal consistency and rather bias. The image annotations include six classes, i.e. bile, air bubbles, extraneous matter, lesions, normal lumen and polyps. The extraneous matter class includes stool, food,

Table 2 Complete CE studies – subject IDs for studies with manual expert annotations of the six classes. Class

Count (subject ID)

Bile Bubbles Extraneous matter Lesions Normal images Polyps

4 (25, 32, 46, 47) 2 (32, 47) 2 (37, 40) 6 (25, 29, 36, 38, 40, 41) 7 (32, 38–41, 44–47) 1 (18)

medicine and other foreign matter not of clinical interest. Ongoing work is adding bleeding, which will be evaluated in a follow-on study. The dataset used in this study, inclduing image sequences and full-length CE stuides was extracted from this annotation archive. The image sequences were used to train the classifiers and HMMs, while the 13 CE studies were used for patient-level validation. The dataset includes 860 image sequences of neighboring images totaling 5029 images. The length of image sequence varies from 2 to 10. Table 1 summarizes the image sequences and Fig. 9 shows some examples for each class. The 13 complete CE studies from different subjects were used for the patient-level evaluation (summarized in Table 2). For sequence classification and study synopsis, 3-fold cross validation was performed and the resutls were the average performance over three experiments. For patient-level validation, leave-one-subject-out validation was performed. All experiments were performed with MATLAB (MathWorks Inc. NATICK, MA) on a

Fig. 9. Examples of CE image sequences with varying lengths – (a) bile, (b) extraneous matter, (c) large air bubble, (d) lesion, (e) normal lumen, and (f) polyps.


G Model

ARTICLE IN PRESS


Q. Zhao et al. / Computerized Medical Imaging and Graphics xxx (2014) xxx–xxx

7

Table 3 Performance comparison between frame-based and sequence-based classification in terms of accuracy, precision and recall. Accuracy

Bile

Frame-based Sequence-based Frame-based Sequence-based Frame-based Sequence-based Frame-based Sequence-based Frame-based Sequence-based Frame-based Sequence-based

Bubble Extraneous Lesion Normal Polyp

Precision Edge

Texture

Color

Edge

Texture

Color

Edge

Texture

0.831

0.671 0.870 0.769 0.903 0.678 0.870 0.686 0.804 0.711 0.808 0.767 0.938

0.691

0.767

0.661

0.864

0.841

0.798

0.899

0.666

0.837

0.722

0.749

0.676

0.659

0.611

0.745

0.780

0.782

0.767

0.899

0.720

0.792

0.643

0.920

0.672 0.877 0.781 0.917 0.710 0.873 0.687 0.759 0.880 0.965 0.744 0.956

0.557

0.791

0.594 0.826 0.752 0.903 0.703 0.860 0.622 0.857 0.728 0.714 0.735 0.932

0.868 0.790 0.725 0.775 0.857

Table 4 Average synopsis performance for the image sequence dataset.

Bile Bubbles Extraneous Lesions Normal Polyps

Accuracy

Precision

Recall

0.968 0.998 0.895 0.685 0.854 0.933

0.974 0.972 0.810 0.582 0.838 1.000

0.984 1.000 0.665 0.549 0.980 0.933

Windows workstation with (Quad-core 2.66 GHz, 4 GB RAM). The UBC HMM Toolbox (Murphy [17]) provided the HMM framework. The performance of the framework was evaluated in terms of accuracy, precision, and recall. Only expert-labeled images were used for computing the total number of images. The number of positives was the number of images labeled with the target class, and tp, fn were also computed for these images alone. Patient-level accuracy was therefore measured using the total number of positives annotated by the expert, and positive or negative classification for images. The images that were not annotated by the expert were not used in the computation of these performance measures. accuracy =

tp + tn , tp + tn + fp + fn

(5)

precision =

tp , tp + fp

(6)

recall =

tp , tp + fn

Recall

Color

(7)

where tp, tn, fp, fn denote truth positive, truth negative, false positive and flase negative, respectively. Sequence-based classification: Table 3 shows the performance comparison between frame-based and sequence-based multiclassification. It can be seen that sequence-based classification outperformed frame-based one for all classes. In particular, the best performance was achieved by polyps with 0.938 accuracy, 0.932 precision and 0.956 recall. The accuracies increased by 22.4% and 20.1% for extraneous and polyp, respectively. Fig. 10 compares the

0.759 0.648 0.696 0.944 0.825

average performance of frame-based classification and sequencebased. The average computing time was 3.2 s per image and 44.4 h for a video containing 50,000 frames. But it is computed automatically and does not need human inspection during the process. Study synopsis: The features related to each class are shown in Table 5 that represent the appearance attributes of each state for HMMs, and Table 4 shows the average performance of CE study synopsis. The class bubbles achieved the best performance with 0.998 accuracy, 0.972 precision and 1.000 recall. For abnormalities, the polyps achieved the promising performance, while the lesions need further improvement. Patient-level validation: We performed a leave-one-subjectout validation for patient-level evaluation. Table 6 shows the average performance for all six classes across the subjects. As our CE studies were only partially labeled by the expert, all performance evaluations, e.g. precision and recall, could not be computed for these studies for some classes. The computation of precision and recall requires positive classifications. For a given study, if there were no positive samples for a specific class, then the precision and recall for that class would be meaningless, and was not computed. These invalid precision and recall values are marked as “Not Computed (NC)”. As only one study contains polyps (subject ID 18), we did not perform the held-out validation for polyps. Table 6 shows that the method achieved the best performance for detecting bubbles with 0.965 accuracy, 0.700 precision and 1.000 recall. Similar to the study synopsis results, normal images were also achieved high recall (0.955). The poor performance of lesions may due to the large variation in lesion data.

5. Discussion A framework for CE study analysis via video synopsis has been proposed. It can be applied to image sequence classification and video synopsis (labeling/annotating). In the experiments, the sequence-based classification outperformed frame-based one in terms of accuracy and recall for each class. For normal lumen, the precision of frame-based classification

Fig. 10. The average performance comparison between frame-based and sequence-based classification in terms of accuracy (a), precision (b) and recall (c).


G Model

ARTICLE IN PRESS


Q. Zhao et al. / Computerized Medical Imaging and Graphics xxx (2014) xxx–xxx

8

Table 5 The features and corresponding appearance attributes related to the six classes. Feature

Bile

Bubbles

Extraneous Matter

Lesions

Normal Lumen

Polyps

Color Edges Texture

greenish few plain

white-ish circular unique

varies complex disrupted villi

yellowish/reddish localized disrupted villi

normal villus distributed smooth texture

yellowish localized unique

Table 6 The selected top ranked geometric features and their clinical relevance.

Bile Bubbles Extraneous Lesions Normal Polyps 1

Accuracy

Precision

Recall

0.785 0.965 0.887 0.648 0.852 0.895

0.903 0.700 0.692 0.546 0.838 NC

0.746 1.000 0.875 0.389 0.955 NC1

Not computed.

was higher than sequence-based classification. It may be caused by the large number of normal lumen images compared to other classes resulting in a inbalanced dataset. For abnormalities (lesions and polyps), all three metrics improved by using sequence-based classification, especially the precision. It indicates that missing rate and false alarm rate are both lower, which is perferable for medical application. For study synopsis (trained on the image sequence dataset), image sequences were first automatically classified, and classification performance was then assessed using the available ground truth. Below, we analyze the performance for each of the six classes: 1. Bile: the framework performances with an accuracy of 0.968 with high precision and recall. Given prominent appearance characteristics in bile images, such good performance is expected and comparable to previous published methods for non-informative frames or bleeding. 2. Bubbles: Synopsis for air bubbles provided the best performance –the three measurements all exceed 0.97. This system performance again compares favorably with previous published methods. 3. Extraneous matter: The framework performance was promising, but may need further improvement. As extraneous matter class includes various types, such as medicine, stool, reflections, etc., further specialization in this class (or additional training data) may be required to improve the currently acceptable (∼0.9) accuracy. 3. Lesions: Poor performance for the lesion class (accuracy 0.685, recall 0.549) may be due to the diversity of lesion images. Our data set contains mild, moderate and severe lesions all annotated with the same classification, and appearance attributes vary significantly over severity. Further specialization of lesions may be needed to improve the performance. 4. Normal images: We achieved an accuracy >0.85 with high precision. In particular, the 0.980 recall indicates that most of normal images were correctly recognized. 5. Polyps: We achieved accuracy of 0.933 with high precision and recall, suggesting the appearance attributes (color, contour, and texture) currently used are effective to identify polyps. The leave-one-subject-out validation achieved similar results. The bubbles and polyps detection and annotation obtained promising results. But the performance for lesions needs further improvement. It may be caused by the various sizes and position of lesions, which make them difficult to detect. There are general technical shortcomings that apply to the HMM-based video synopsis as well. First, the results of HMMs depend on the output of the classifier in the frame-based

classification stage. The poor classification performance generate poor video synopsis result. So the selection of frame-based classifications is critical and not trivial. Secondly, the accurate modeling HMMs needs a large number of training samples, otherwise the instinct of the video cannot be revealed. Although we performed the patient-level evaluation on only 13 videos, the promising results proved the feasibility of the framework applicable to real clinical practice. Finally, for highly imbalanced data, the performance of sequence-based classification and video synopsis both degenerate. For such cases, HMMs may not be able to reveal the temporal pattern of the series data. The proposed framework can be applied to a range of other video analysis task, for example video summarization/abstract and series signal processing. The framework provide a video labeling/annotation fasion whose results could be further combined/fused in a higher level. It has great potential to dramatically reduce the reviewing time for CE diagnosis and such similar tasks. 6. Conclusions We describe the general framework for the computational engine of our CAD workstation currently in development. It is different from previous work in both model building and application. Encoded color, edge and texture feature vectors are regarded as the HMM observations, which agree with the manner of human being. Experiments performed on complete CE videos achieved promising results. The CE videos were summarized into six classes in which include two abnormalities, i.e. lesions and polyps. We have also demonstrated the experimental results with leave-one subject-out validation. The results show that our framework is robust. As compared to previous work e.g. [27] which reported an accuracy of 0.917, this work achieves a 0.933 accuracy, and 0.933 recall for polyps. The framework still requires further refinement for some classes (e.g. accuracy 0.685 for lesions) on complete CE videos. These improvements will include specialization of some classes into sub-classes with more uniform appearance attributes. Our ultimate goal is to clinically deploy the CE CAD workstation and we will continue investigating both computational aspects including other features and models. We are also evaluating our framework on a larger dataset which includes more classes, including bleeding, which was left out in this assessment. References [1] Alexandre L, Nobre N, Casteleiro J. Color and Position versus Texture Features for Endoscopic Polyp Detection. In: International conference on biomedical engineering and informatics (BMEI). 2008. p. 38–42. [2] Bashar M, Mori K, Suenaga Y, Kitasaka T, Mekada Y. Detecting Informative Frames from Wireless Capsule Endoscopic Video Using Color and Texture Features. In: Medical image computing and computer-assisted intervention (MICCAI). 2008. p. 603–10. [3] Bejakovic S, Kumar R, Dassopoulos T, Mullin G, Hager G. Analysis of Crohn’s disease lesions in capsule endoscopy images. In: IEEE international conference on robotics and automation (ICRA). 2009. p. 2793–8. [4] Boreczky J, Wilcox L. A hidden Markov model framework for video segmentation using audio and image features. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing. 1998. p. 3741–4. [5] Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. 1 ed. Cambridge University Press; 2000. [6] Cunha J, Coimbra M, Campos P, Soares J. Automated topographic segmentation and transit time estimation in endoscopic capsule exams. IEEE Trans Med Imaging 2008;27:19–27.




[7] Hai V, Echigo T, Sagawa R, Yagi K, Shiba M, Higuchi K, Arakawa T, Yagi Y. Adaptive Control of Video Display for Diagnostic Assistance by Analysis of Capsule Endoscopic Images. In: 18th International conference on pattern recognition, 2006. (ICPR 2006). 2006. p. 980–3. [8] Htwe TM, Poh CK, Li L, Liu J, Ong EH, Ho KY. Vision-based techniques for efficient wireless capsule endoscopy examination. In: Defense science research conference and expo (DSR). 2011. p. 1–4, http://dx.doi.org/10.1109/DSR.2011.6026865. [9] Iakovidis D, Spyrou E, Diamantis D. Efficient homography-based video visualization for wireless capsule endoscopy. In: IEEE 13th international conference on bioinformatics and bioengineering (BIBE). 2013. p. 1–4, http://dx.doi.org/10.1109/BIBE.2013.6701598. [10] Iddan G, Meron G, Glukhovsky A, Swain P. Wireless capsule endoscopy. Nature 2000;405:417. [11] Kumar R, Rajan P, Bejakovic S, Seshamani S, Mullin G, Dassopoulos T, Hager G. Learning disease severity for capsule endoscopy images. In: IEEE international symposium on biomedical imaging ISBI): from nano to macro. 2009. p. 1314–7. [12] Kumar R, Zhao Q, Seshamani S, Mullin G, Hager G, Dassopoulos T. Assessment of crohn’s disease lesions in wireless capsule endoscopy images. IEEE Trans Biomed Eng 2012;59:355–62. [13] Lau P, Correia P. Detection of bleeding patterns in WCE video using multiple features. In: International conference of the IEEE engineering in medicine and biology society (EMBS). 2007. p. 5601–4. [14] Liu H, Pan N, Lu H, Song E, Wang Q, Hung CC. Wireless capsule endoscopy video reduction based on camera motion estimation. J Digit Imaging 2013;26:287–301, http://dx.doi.org/10.1007/s10278-012-9519-x. [15] Mackiewicz M, Berens J, Fisher M. Wireless capsule endoscopy color video segmentation. IEEE Trans Med Imaging 2008;27:1769–81. [16] Manjunath B, Ohm J, Vasudevan V, Yamada A. Color and texture descriptors. IEEE Trans Circuits Syst Video Technol 2001;11:703–15. [17] Murphy K. A hidden Markov model (HMM) toolbox for Matlab. Software; 1998, last accessed February 2012. [18] Poh CK, Htwe TM, Li L, Shen W, Liu J, Lim JH, Chan KL, Tan PC. Multi-level local feature classification for bleeding detection in wireless capsule endoscopy images. In: IEEE conference on cybernetics and intelligent systems (CIS). 2010. p. 76–81.

9

[19] Rabiner L. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 1989;77:257–86. [20] Seshamani S, Kumar R, Rajan P, Mullin G, Dassopoulos T, Hager G. A meta method for image matching. IEEE Trans Med Imaging 2011, accepted. [21] Spyrou E, Diamantis D, Iakovidis D. Panoramic visual summaries for efficient reading of capsule endoscopy videos. In: 2013 8th international workshop on semantic and social media adaptation and personalization (SMAP). 2013. p. 41–6, http://dx.doi.org/10.1109/SMAP.2013.21. [22] Swain P. Wireless capsule endoscopy. Gut 2003;52:48–50. [23] Vilarino F, Spyridonos P, DeIorio F, Vitria J, Azpiroz F, Radeva P. Intestinal motility assessment with video capsule endoscopy: automatic annotation of phasic intestinal contractions. IEEE Trans Med Imaging 2010;29:246–59. ID: 1. [24] Viterbi A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 1967;13:260–9. [25] Yi S, Jiao H, Xie J, Peter M, Leighton JA, Pasha S, Abedi M. A clinically viable capsule endoscopy video analysis platform for automatic bleeding detection. In: Proc. SPIE 8670, medical imaging 2013: computer aided diagnosis. 2013. p. 867028–8-6. [26] Zhao Q, Dassopoulos T, Mullin G, Hager G, Meng MQ, Kumar R. Towards integrating temporal information in capsule endoscopy image analysis. In: International conference of the IEEE engineering in medicine and biology society (EMBC). 2011. [27] Zhao Q, Dassopoulos T, Mullin G, Meng M, Kumar R. A decision fusion strategy for polyp detection in capsule endoscopy. Stud Health Technol Inf 2012;173:559–65. [28] Zhao Q, Meng MQH. Novel detection strategy for abnormalities in wce video clips. In: 2010 Annual international conference of the ieee engineering in medicine and biology society (EMBC). 2010. p. 4084–7. [29] Zhao Q, Meng MQH. A strategy to abstract wce video clips based on lda. In: 2011 IEEE international conference on robotics and automation (ICRA). 2011. p. 4145–50. [30] Zhao Q, Meng MQH, Li B. Wce video clips segmentation based on abnormality. In: 2010 IEEE international conference on robotics and biomimetics (ROBIO). 2010. p. 442–7. [31] Zhao Q, Mullin GE, Dassopoulos T, Kumar R. Towards using multiple images simultaneously for automated assessment in capsule endoscopy; 2012.


Challenges and Future of Wireless Capsule Endoscopy.

Wireless powered capsule endoscopy for colon diagnosis and treatment.

Wireless fluorescence capsule for endoscopy using single photon-based detection.

Saliency based ulcer detection for wireless capsule endoscopy diagnosis.

Wireless capsule endoscopy as a tool in diagnosing autoimmune enteropathy.

Wireless endoscopy in 2020: Will it still be a capsule?

Automatic Hookworm Detection in Wireless Capsule Endoscopy Images.

Wireless capsule endoscopy of the small intestine in children.

Technology status evaluation report on wireless capsule endoscopy.

An integrated system for wireless capsule endoscopy in a liquid-distended stomach.

Application of wireless power transmission systems in wireless capsule endoscopy: an overview.

Opioid use is not associated with incomplete wireless capsule endoscopy for inpatient or outpatient procedures.

Singular Value Decomposition Based Features for Automatic Tumor Detection in Wireless Capsule Endoscopy Images.

Hardware-efficient low-power image processing system for wireless capsule endoscopy.

Ileal lines: a marker of the ileocecal valve on wireless capsule endoscopy.

Appendicitis following capsule endoscopy.

Aspiration of Capsule Endoscopy.

Diagnosis and Treatment of Small Bowel Cancers Using Radioactive Gold Nanoparticles and Wireless Fluorescence Capsule Endoscopy.

OdoCapsule: next-generation wireless capsule endoscopy with accurate lesion localization and video stabilization capabilities.

Advances in Capsule Endoscopy.

Optimal Bowel Preparation for Video Capsule Endoscopy.

Role of wireless capsule endoscopy in the follow-up of inflammatory bowel disease.

The history of time for capsule endoscopy.

Capsule endoscopy in Portugal.