Automatic Segmentation of the Rima Glottidis in 4D Laryngeal CT Scans in Parkinson’s Disease Sajini Hewavitharanage1 , Jayavardhana Gubbi1 , Dominic Thyagarajan2 , Ken Lau2 and Marimuthu Palaniswami1 Abstract— Parkinson’s disease (PD) is a progressive, incurable neuro-degenerative disease. Symptoms appear when approximately 70% of mid-brain dopaminergic neurons have died. Temporal analysis of the calculated area of the rima glottidis may give an indication of vocal impairment. In this paper, we present an automatic segmentation algorithm to segment the rima glottidis from 4D CT images using texture features and support vector machines (SVM). Automatic two dimensional region growing is then applied as a post processing step to segment the area accurately. The proposed segmentation algorithm resulted in accurate segmentation and we demonstrate a high correlation between the manually segmented area and automatic segmentation.

I. INTRODUCTION Parkinson’s disease (PD) is a progressive neurodegenerative disorder, in which there is striatal dopamine deficiency due to loss of mid brain neurons in the substantia nigra. Options for routine diagnostic imaging are limited and the diagnosis is usually based history and clinical examination. Better diagnostic tools are needed. About 72%-89% patients with PD suffer from voice related disorders [1][2]. In PD, the voice is characterized perceptually by limited pitch and loudness variability, breathiness, harshness and reduced loudness [1]. Therefore, these vocal abnormalities may be useful as indicators of the onset and progression of the illness. The distance between arytenoid cartilages during a vocalisation task (repeatedly uttering the monosyllable /i/) manually measured on computerised tomography (CT) images has revealed vocal cord hypokinesis in early PD [3]. However, such manual measurements are laborious and detection of arytenoid cartilage in CT is a non-trivial task. The arytenoid appearance may be masked due to the presence of other cartilages and soft tissues in the CT image. The cricoid cartilage, thyroid cartilage and the air-outlined rima glottidis are the three most prominent features that are clearly visible in CT images taken in the vocal tract region. This paper deals with the accurate detection of the air filled rima glottidis in a 4D CT image using an automated algorithm. Various methods have been published to segment the airways from medical images. However, segmentation of glottis 1 S. Hewavitharanage, J. Gubbi, and M. Palaniswami are with the Department of Electrical and Electronic Engineering, the University of Melbourne, Vic - 3010, Australia.

[email protected], [email protected], [email protected] 2 D. Thyagarajan and K. Lau are with the Monash Medical Center 246 Clayton Rd, Clayton, Vic - 3168, Australia. [email protected],

[email protected]

978-1-4244-9270-1/15/$31.00 ©2015 IEEE

has not been addressed. Textural features have been used as a successful discriminative feature for organ and tissue identification in medical images for many years [4][5][6] with varying results. In statistical methods, feature descriptors are derived based on gray-level relationship matrices like graylevel co-occurrence matrix (GLCM), gray-level dependence matrix (GLDM) or gray-level run length matrix (GLRLM) of the image [7][5]. Several papers have been published to analyze human lungs using texture features [8][9][10]. In 1992, Chung et. al. [11] derived some texture features like GLDM, Fourier spectrum and spatial gray-level dependence matrices (SGLDM) and classified ultrasound liver images with a 90% classification accuracy. In 2004, Raicu et. al. [12] investigated the most discriminative texture features to differentiate human organs. They analyzed the Haralick and GLRLM texture descriptors in CT images of five different organs, heart, liver, kidney, spleen and backbone with good results. Tesar et. al. [6] provided a three dimensional extension to Haralick features computed based on the GLCM calculated for volumetric abdominal CT data. In 2014, Torheim et. al. [13] used second order feature descriptors derived from GLCM of MRI images with 70% accuracy to classify status of cervical cancer. In this paper, we use GLDM based features to segment the glottis region from laryngeal CT images using support vector machines. The output of SVM is used as a seed point for region growing algorithm that results in accurate delineation of the rima glottidis area. The automatically calculated area in different temporal frames are compared with the manually segmented areas and appear to be concordant.

II. DATA PD patients with disease duration of less than 6 years and controls of similar age (50-90 years) were recruited from the Movement Disorder clinic in Monash Medical Center. Subjects were instructed to make 5 short, fast /i/ phonemes during a 5-seconds period. For every 100 milliseconds, images of the entire larynx were obtained using a 320-slice CT machine. The resulting images were 512×512 pixels and 12-bit gray-levels. Images which were blurred or with low contrast were excluded and the remaining 4D datasets of 20 subjects (patients and controls) were used in the experiment. The ground truth rima glottidis area values were provided for each subject using manual segmentation by two expert analysts blinded to the disease state of the subjects.

739

III. METHODOLOGY The flow of the proposed methodology are shown in Fig. 1. The process involves accurate detection of rima glottidis using texture features and support vector machine classifier. Once the region is narrowed down, a seed point for region growing is automatically calculated and from that the rima glottidis area is grown. Amongst several slices in each frame, a scheme for selecting the glottis plane is developed and the area of rima glottidis from the selected frame is calculated to generate the time series. The Ethics committee of Monash medical centre approved all experimental procedures involving human CT images (Application No:11230B). A. Feature Extraction GLDM features [5] were used to extract texture for representing rima glottidis. In GLDM, first-order statistics based on the absolute gray-level difference was computed between two pixels of the image. Let us say I(x, y) is the image and for a displacement δ = (∆x, ∆y), let Iδ (x, y) = |I(x, y) − I(x + ∆x, y + ∆y)|. When the estimated probability density function (PDF) of Iδ (x, y) is p(i|δ) and the number of graylevels is NG , each component of p(i|δ) gives the probability that ith gray-level. When the image has a relatively coarse texture, then the displacement δ is small compared to the size of the texture elements. Therefore, Iδ (x, y) would be very small and the PDF will concentrated near 0th gray-level. For a finer texture, δ is larger and the PDF will have a broader range. To analyze a directional texture, the amount of spread in the PDF should be calculated for various directions of δ. In this work, five different displacements (δ = 1, 2, 3, 4, 5 pixels) in 4 possible forms (0, δ), (−δ, δ), (δ, 0), (−δ, −δ) which gave 4 directions 0, 45, 90 and 135, were used. Five different properties including Contrast, Entropy, Mean, Angular Second Moment and Inverse Difference Moment were calculated to analyze the texture characteristics of the images. Contrast =

NX G −1

i2 p(i | δ)

(1)

p(i | δ) log p(i | δ)

(2)

to extract the feature vectors of the original image. A 96×96 pixels window was considered as the sliding block for the 512 × 512 pixels image and each block was overlapped by 8 pixels with its neighboring block. The size of this window was decided empirically to include the whole airway region inside a single 96 × 96 pixels block. Feature vectors of each block were normalized using z scores. B. Classification Due to the better performance in higher dimensional spaces, which is often the case of texture analysis, support vector machine (SVM) classifier was chosen for automatic classification. The classifier was implemented in Matlab 2014b using LibSVM library [14]. Feature vectors calculated on the training set were used to train the SVM. Ten fold cross validation was performed using linear, polynomial and Radial basis function (RBF) kernels to decide the optimal kernel and kernel parameters. Results for the three kernels are listed in Table I. RBF kernel with c = 25.5 and γ = 2−6.5 parameters gave the optimum performance. In order to reduce the false positive detections, only the predictions with probabilities greater than 0.9 (empirically chosen) were considered as meaningful detections. C. Removal of False Positives The output of the SVM had significant number of false positives. To decide the optimal detection window a threshold based technique was applied to the detection windows using a threshold TS . First, to remove the partial detections, any window having boundary pixels with gray-level smaller than TS was discarded. Then, the window having the maximum number of darker pixels (gray-level smaller than TS )

i=0

Entropy =

NX G −1 i=0

Mean =

NX G −1

ip(i | δ)

(3)

i=0

Angular Second Moment =

NX G −1

p(i | δ)2

Fig. 1.

(4)

Rima glottidis area detection framework

i=0

Inverse Difference Moment =

NX G −1 i=0

p(i | δ) i2 + 1

(5)

All possible displacements and angles gave 20 different PDFs and calculating the above descriptors for each PDF resulted in 100 texture feature descriptors per image window. To save the time and computational cost the number of graylevels were reduced to 256 before calculating the gray-level difference matrices. The sliding window technique was used

740

TABLE I R ESULTS OF CLASSIFICATION USING A 10 FOLD CROSS VALIDATION

Kernel

C

Linear Polynomial RBF

210 220 25.5

2

Precision (%) 99.28

Recall % 82.30

fScore 0.90

Accuracy (%) 99.87

2−8

99.32

87.03

0.93

99.82

2−6.5

99.44

87.45

0.93

99.86

γ

was taken as the airway window for the respective image slice. When calculating the number of darker pixels, the background pixels (gray-level Tmin ) were not taken into consideration. To automatically calculate Tmin , all the border pixels of the original 512 × 512 image were scanned and the minimum gray value was taken as Tmin . For the data set used in this experiment, the value of Tmin was -2048. Similarly, to calculate TS , optimal threshold (Otsu’s method) [15] [16] method was used. The background was removed from the image using Tmin threshold and the optimal thresholding was applied to the resulting image to find the threshold that separates air voxels and non-air voxels. This algorithm requires an initial threshold as an input. According to Wu et. al. [17], air in a CT image will appear with the mean Hounsfield value of −1000HU and airway and lung voxels will have a value between −910HU to −500HU . Bones, cartilages, blood and other body voxels will have values well above −500HU . Therefore, the initial threshold was calculated based on these values. In the dataset we have used, for any initial threshold between 0HU to −1000HU , the calculated TS from the above optimal threshold technique was −436HU .

Subject 1

0

Subject 7

0

100

100

100

200

200

200

300

300

300

400

400

400

500

500 0

200

400

Fig. 2.

Subject 15

0

500 0

200

400

0

200

400

Rima glottidis delineation output

shown in Figures 3 and 4. The similarity indices, correlation coefficient, mean squared error, Euclidean distance and dynamic time warping distance were measured between the generated graph and the ground truth graph for each subject were calculated. It is summarized in Table III. 55% of the subjects have shown an excellent similarity to their ground truth area while another 10% are moderately correlated with the ground truth. However, 35% of subjects were found to be uncorrelated. The main reason was found to be the low quality of the images due to the noise TABLE II P ERCENTAGE AREA OVERLAP FOR ALL THE SUBJECTS

D. Region Growing Subject ID 1 2 3 4 5 6 7 8 9 10

An automatic 2D region growing was used to extract the boundary of the airway in the resulting image slice from the false positives removal step. The region growing algorithm implemented was adapted from the algorithm proposed by Van et. al. [18]. First, all the background pixels were excluded from the image slice using Tmin and the seed point was selected as the pixel having the lowest gray-level.Then the area was grown using threshold TS .

Area Overlap % 81.32 99.25 99.28 99.21 93.72 65.76 98.92 98.60 97.30 98.26

Subject ID 11 12 13 14 15 16 17 18 19 20

Area Overlap % 98.06 96.20 64.23 93.31 99.26 97.61 96.73 98.23 25.90 88.55

E. Calculation of Area and generation of temporal maps Once the region was accurately segmented, the areas were generated by using the voxel information in the original NifTI-1 header. The area of the airway was calculated as the product of number of pixels (output from region growing), the voxel width and the voxel height. From the detected airway images the rima glottidis plane was identified using the known anatomical features of the human larynx. According to the human laryngeal anatomy, the rima glottidis is the narrowest part of the laryngeal cavity. Therefore, it can be assumed that the slice with the minimum number of airway pixels is the rima glottidis plane. However, to minimize the errors, a number of slices around the slice with the minimum area were considered. The average area of all those slices (5 slices) were taken to generate the temporal maps. All the values were normalized with respect to the subject before plotting the temporal maps. IV. RESULTS AND DISCUSSION The results of delineation of rima glottidis are shown in Fig.2. The overall results were validated using percentage area overlap and 85% of the subjects showed over 80% area overlap as shown in Table II. Furthermore, the temporal maps were drawn for all the controls and the patients as

741

TABLE III S IMILARITY INDICES CALCULATED FOR ALL THE SUBJECTS

Subject ID

Correlation Coefficient

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0.90 -0.33 0.98 0.06 0.01 0.45 0.89 0.88 0.56 0.96 0.91 0.85 -0.13 0.64 0.73 0.91 -0.16 0.76 -0.47 0.74

Mean Squared Error (mm4 ) 0.03 0.20 0.01 0.10 0.14 0.11 0.03 0.02 0.04 0.00 0.02 0.03 0.16 0.06 0.06 0.03 0.20 0.15 0.20 0.02

Euclidean Distance (mm2 ) 1.26 3.1 0.62 2.24 2.60 2.37 1.19 1.08 1.43 0.47 0.93 1.30 2.81 1.61 1.69 1.07 3.10 2.69 3.13 1.08

Dynamic Time Warping Distance (mm2 ) 2.62 5.50 2.37 7.14 9.12 11.38 4.40 3.78 3.47 1.14 3.47 5.22 8.98 3.91 3.03 3.32 7.44 13.94 17.24 3.85

Rima Glottidis Index (mm)

0.02 0.015 0.01 0.005 0

0

1

2

0.03

3

4

0

1

2

3 Time (s)

0.01 0.005 1

0

Subject 20

0.02

0

0.01 0.005

Subject 16

0.015

0

0.02 0.015

5

0.025

2

3

4

R EFERENCES

Subject 11

0.03 0.025

Time (s)

Rima Glottidis Index (mm)

Rima Glottidis Index (mm) Rima Glottidis Index (mm)

Subject 10

0.03 0.025

0.025

4

5

4

5

0.02 0.015 0.01 0.005 0

5

0

1

2

Time (s)

3 Time (s)

Subject 1 Rima Glottidis Index (mm)

0.04 0.03 0.02 0.01 0

0

1

2

3 Time (s)

4

5

0.015 0.01 0.005

6

0.02 0.015 0.01 0.005 0

1

2

0

0

1

2

3

4

5

4

5

Time (s)

0.025

0

0.02

Subject 9

0.03

Subject 8

0.025

Rima Glottidis Index (mm)

Rima Glottidis Index (mm)

Rima Glottidis Index (mm)

Fig. 3. Time-series of rima glottidis area for controls. Red indicates ground truth and blue indicates calculated values.

3

4

5

Subject 12

0.04 0.03 0.02 0.01 0

0

1

Time (s)

2

3 Time (s)

Fig. 4. Time-series of rima glottidis area for PD patients.Red indicates ground truth and blue indicates calculated values.

and blurring (subject 5). Furthermore, the appearances of laryngeal ventricles in some image slices (in subjects 2, 4, 6, 9, 17) resulted in failure. The results reflect the fact that for about 65% of the patients, vocal fold plane detection that is a non-trivial process may not be required at all. In future work, we plan to detect the plane of the vocal fold thereby improving the results of the proposed algorithm. V. CONCLUSIONS In this paper, an automatic algorithm based on texture features was proposed to segment rima glottidis from laryngeal CT scans. The gray-level difference statistics in CT images were used and an automated algorithm was developed using support vector machines to detect the airway. Then a 2D region growing was used to find the area inside the detected airway. Using knowledge about the human laryngeal anatomy, the vocal folds plane was identified and the timeseries curves of rima glottidis area were generated. The algorithm reported results well correlated with the ground truth for 65% subjects. However, scans of some subjects which were blurred, misshaped or appeared with laryngeal ventricles did not perform well in our algorithm. Being the first work reported in the area, future work will be focused on detecting airway after adjusting the vocal plane and further improving the algorithm to achieve better results.

[1] R. J. Holmes, D. J. Oates Jm Fau Phyland, A. J. Phyland Dj Fau Hughes, and A. J. Hughes, “Voice characteristics in the progression of parkinson’s disease,” Internation Journal of Language and Communication Disorders, vol. 35, no. 1368-2822 (Print), pp. 407– 418, 2000. [2] M. Liotti, D. Ramig Lo Fau Vogel, P. Vogel D Fau New, C. I. New P Fau Cook, R. J. Cook Ci Fau Ingham, J. C. Ingham Rj Fau Ingham, P. T. Ingham Jc Fau Fox, and P. T. Fox, “Hypophonia in parkinson’s disease: neural correlates of voice treatment revealed by pet,” Neurology, vol. 60, no. 1526-632X (Electronic), pp. 432–440, 2003. [3] L. Perju Dumbrava, K. Lau, D. Phyland, P. Finlay, R. Beare, P. Bardin, P. Stuckey, P. Kempster, and D. Thyagarajan, “Vocal cords are hypokinetic in parkinson’s disease,” Departments of Neuroscience, Respiratory Medicine, Radiology Monash Medical Centre, Clayton, Victoria, Australia, Tech. Rep., 2014. [4] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for image classification,” Systems, Man and Cybernetics, IEEE Transactions on, vol. SMC-3, no. 6, pp. 610–621, 1973. [5] A. H. Mir, M. Hanmandlu, and S. N. Tandon, “Texture analysis of ct images,” Engineering in Medicine and Biology Magazine, IEEE, vol. 14, no. 6, pp. 781–786, 1995. [6] L. Tesa, A. Shimizu, D. Smutek, H. Kobatake, and S. Nawano, “Medical image analysis of 3d ct images based on extension of haralick texture features,” Computerized Medical Imaging and Graphics, vol. 32, no. 6, pp. 513–520, 2008. [7] G. Castellano, L. Bonilha, L. M. Li, and F. Cendes, “Texture analysis of medical images,” Clinical Radiology, vol. 59, no. 12, pp. 1061– 1069, 2004. [8] Y. P. Chien and F. King-Sun, “Recognition of x-ray picture patterns,” Systems, Man and Cybernetics, IEEE Transactions on, vol. SMC-4, no. 2, pp. 145–156, 1974. [9] E. L. Hall, R. P. Kruger, and A. F. Turner, “An optical-digital system for automatic processing of chest x-rays,” Optical Engineering, vol. 13, no. 3, pp. 133 250–133 250, 1974, 10.1117/12.7971702. [10] A. F. Turner, R. P. Kruger, and W. B. Thompson, “Automated computer screening of chest radiographs for pneumoconiosis,” Investigative Radiology, vol. 11, no. 4, pp. 258–266, 1976. [11] W. Chung-Ming, C. Yung-Chang, and H. Kai-Sheng, “Texture features for classification of ultrasonic liver images,” Medical Imaging, IEEE Transactions on, vol. 11, no. 2, pp. 141–152, 1992. [12] D. S. Raicu, J. D. Furst, D. Channin, D.-H. U. I. Xu, A. Kurani, and S. Aioanei, “A texture dictionary for human organs tissues’ classification,” in Proceedings of the 8th World Multiconference on Systemics, Cybernetics and Informatics, 2004, Book, pp. 18–21. [13] T. Torheim, E. Malinen, K. Kvaal, H. Lyng, U. G. Indahl, E. K. F. Andersen, and C. M. Futsaether, “Classification of dynamic contrast enhanced mr images of cervical cancers using texture analysis and support vector machines,” Medical Imaging, IEEE Transactions on, vol. 33, no. 8, pp. 1648–1656, 2014. [14] L. C. Chang Chih, “Libsvm: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., no. 3, 2011. [15] N. Otsu, “A threshold selection method from gray-level histograms,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 9, no. 1, pp. 62–66, 1979. [16] S. Hu, E. A. Hoffman, and J. M. Reinhardt, “Automatic lung segmentation for accurate quantitation of volumetric x-ray ct images,” Medical Imaging, IEEE Transactions on, vol. 20, no. 6, pp. 490–498, 2001. [17] M. T. Wu, A. A. Chang Jm Fau Chiang, J. Y. Chiang Aa Fau Lu, H. K. Lu Jy Fau Hsu, W. H. Hsu Hk Fau Hsu, C. F. Hsu Wh Fau Yang, and C. F. Yang, “Use of quantitative ct to predict postoperative lung function in patients with lung cancer,” Radiology, vol. 191, no. 00338419 (Print), pp. 257–262, 1994. [18] E. M. van Rikxoort, B. de Hoop, S. van de Vorst, M. Prokop, and B. Van Ginneken, “Automatic segmentation of pulmonary segments from volumetric chest ct scans,” Medical Imaging, IEEE Transactions on, vol. 28, no. 4, pp. 621–630, 2009, 9 citations.

742

Automatic segmentation of the rima glottidis in 4D laryngeal CT scans in Parkinson's disease.

Parkinson's disease (PD) is a progressive, incurable neuro-degenerative disease. Symptoms appear when approximately 70% of mid-brain dopaminergic neur...
566B Sizes 1 Downloads 9 Views