CME JOURNAL OF MAGNETIC RESONANCE IMAGING 00:00–00 (2013)

Original Research

Automatic Detection of Anatomical Landmarks on the Knee Joint Using MRI Data Ning Xue, MSc,1,2* Michael Doellinger, PhD,1 Charles P. Ho, MD, PhD,3 Rachel K. Surowiec, MSc,3 and Raphael Schwarz, PhD2 ANATOMICAL LANDMARKS ARE biologically meaningful points in the anatomy, which are used for establishing correspondences within the same species. Examples of anatomical landmarks are the fingertip, the mouth corner, the top of skull, etc. Such landmarks are widely used as initialization or constraints in medical image analysis (1). Reliable landmark detection provides the essential groundwork for higher level image processing, e.g., segmentation and registration of anatomical structures (2,3), which can be used for the assessment of morphology and function of human organs and structures in clinical routines. However, locating landmarks on anatomical structures manually is usually a challenging and time-consuming task and largely relies on expert knowledge. Thus, there is interest in developing an automated method to detect landmarks. Anatomical landmarks can be defined on different tissue interfaces in medical images according to various clinical routines or studies. In this work we demonstrate the proposed landmark detection method on the knee joint in MR image data because MR is one of the most frequently used clinical routine applications and musculoskeletal scans are among the most commonly requested routines in MR imaging (4). Knee landmarks defined on MR images can be used to describe anatomical coordinate systems in kinematic description of the knee joint and to outline clinically interested tissues (such as the articular cartilage) for diagnosing knee pathologies (5). The goal of this work is to automatically localize the clinically interested anatomical landmarks on the knee joint in threedimensional (3D) MR image data. Several methods of automated landmark detection have been proposed for medical images. The methods described in (6–8) involve an optimization problem in detecting landmarks or morphological abnormality. The methods offer good landmark detection accuracy (within five pixels). However, a manual adjustment of optimization parameters is required for medical image data with different contrasts. The principal reason is that the optimization is a global algorithm based on gray levels of images, and gray levels in medical images are different due to various contrasts. Other algorithms (9–11) solve specific landmark detection problems by carefully studying the appearance

Purpose: To propose a new automated learning-based scheme for locating anatomical landmarks on the knee joint using three-dimensional (3D) MR image data. Materials and Methods: This method makes use of interest points as candidates for landmarks. All candidates are evaluated by a “coarse to fine” 3D feature descriptor computed from manually placed landmarks in training datasets. The results are refined using a multiclassifier boosting system. We demonstrate our method by the detection of 24 landmarks on the knee joint of 35 subjects. To verify the robustness, the test datasets differ in contrast, resolution, patient positioning, and health condition of the knee joint. The proposed method is evaluated by measuring the distance between manually placed landmarks and automatically detected landmarks and the computational cost for detecting one landmark in a 3D dataset. Results: The results reveal that the method is capable of localizing landmarks with a reasonable accuracy (1.64 6 1.03 mm [mean 6 standard deviation]), sensitivity (97%) and run time efficiency (4.82 s). Conclusion: This study suggests that the proposed method is an accurate and robust approach for the automated landmark detection in various MR datasets. The proposed method can be used as the initialization or constraint in higher level medical image processing workflows such as in kinematic description, segmentation and registration. Key Words: landmark detection; interest points; feature description; boosting J. Magn. Reson. Imaging 2013;00:000–000. C 2013 Wiley Periodicals, Inc. V

1 Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Germany. 2 Image and Therapy Division, Healthcare Sector, Siemens AG, Erlangen, Germany. 3 Steadman Philippon Research Institute, Vail, Colorado, USA. Contract grant sponsor: Healthcare Sector, Siemens AG. *Address reprint requests to: N.X., Allee am R€ othelheimpark 2, 91052, Erlangen, Germany. E-mail: [email protected] Received May 25, 2013; Accepted October 23, 2013. DOI 10.1002/jmri.24516 View this article online at wileyonlinelibrary.com. C 2013 Wiley Periodicals, Inc. V

1

2

Xue et al.

Figure 1. Twenty-three landmarks defined on the surfaces of three bones in the knee joint (one landmark posterior central patella is not shown because of the limit of views). The underlined landmarks are used to indicate the location of bonecartilage-interface and the other landmarks are used to describe knee kinematic.

characteristics of shape priors of the landmarks. Although these methods display robustness to common image distortions and have a low computational complexity, they demonstrate a lack of potential to be extended to other studies using different landmarks, different joints or different MR scans (12). Another widely used method for landmark detection is a boosting cascade of simple detectors (13–15). By combining results of different detectors, boosting increases the accuracy and robustness of landmark detection (16). Our objective is to develop a new scheme of landmark detection, which is capable of dealing with MR images with various contrasts, patient positioning and health conditions of the knee joint, and also offers promising accuracy and run time efficiency. As a learning-based technique, the proposed method uses a set of manually placed landmarks to describe the appearance characteristics of predefined landmarks, and applies a boosting system to combine detection results obtained using different sets of training landmarks. Thus, the method has the potential to be extended to other studies using different landmarks and joints.

MATERIALS AND METHODS Materials This study was approved by the Institutional Review Board and all subjects provided informed consent before inclusion. The clinically interested anatomical landmarks are predefined on a Statistical Shape Model (SSM), which was built from a set of 118 training labeled shapes by the CSIRO Computational Informatics (17). The ideal landmarks should be capable of localizing bone and cartilage in MR images and describing tibiofemoral and patellofemoral kinematics among knee bones. Thus, the landmarks should be defined on the surface of knee bones, including bonecartilage-interface (BCI) (5,18,19). In addition, because the proposed automated landmark detection

method requires training with manual landmark placement, the predefined landmarks should be characteristic enough to be manually and repeatedly placed in different MR datasets. Thus, the landmarks should be defined on the most lateral/medial/superior/posterior points or eminences and notches of the bones. According to these clinical and technical requirements, we propose 8 landmarks for the femur, 10 landmarks for the tibia and 6 landmarks for the patella (Fig. 1). Training data used in the preparation of this article were obtained from the Osteoarthritis Initiative (OAI) database, which is available for public access at http://oai.epi-ucsf.org/datarelease/. The OAI establishes a natural history database for osteoarthritis (OA), including clinical evaluation data and MR images. The commonly used sequences in OAI protocol for the morphological imaging of the knee joint include sagittal T2*W 3D double-echo steady-state (DESS) with fat suppression, sagittal PD TSE with fat suppression, sagittal T2 map, etc. The participants of the OAI database are age 45–79 and there are approximately equal numbers of men and women for all datasets in the OAI database. The OAI database is divided into three groups: (a) progression group with symptomatic knee OA; (b) incidence group with risk of developing symptomatic knee OA defined by knee symptoms, age, overweight, injury, knee surgery, and family history; and (c) reference group with healthy knee joint. Twelve subjects (four from each group) were selected as training data, so that training datasets offer information for healthy, at-risk and pathological knee joints (Table 1). The training data were acquired at 3.0 Tesla (T) (Magnetom Trio, Siemens, Erlangen, Germany) and consisted of a fatsuppressed sagittal 3D DESS dataset and a T2 map dataset for each subject. The imaging parameters for the DESS dataset were: repetition time (TR) ¼ 16.3 ms; echo time (TE) ¼ 4.7 ms; field of view (FOV) ¼ 140 mm; flip angle ¼ 25; slice thickness ¼ 0.7 mm; inplane resolution ¼ 0.36 mm; matrix size ¼ 384  384.

Automatic Landmark Detection

3

Table 1 Health Conditions of the Training Subjects and the Test Subjects Obtained From the OAI Database Health condition 12 Training subjects (DESS and T2 map)

31 Test subjects (DESS and T2 map)

Healthy: KL*¼0 (4 subjects) At risk of disease: KL¼0 (2 subjects), KL¼1 (2 subjects) Pathological: KL¼2 (2 subjects), KL¼3 (1 subject), KL¼4 (1subject) Healthy: KL¼0 (10 subjects) At risk of disease: KL¼0 (6 subjects), KL¼1 (4 subjects) Pathological: KL¼2 (6 subjects), KL¼3 (4 subjects), KL¼4 (1subject)

*The Kellgren-Lawrence (KL) grading system is a radiological classification of knee OA. It is based on X-rays and consists of Grade 0 to 4. Grade 0 demonstrates normal knee joint with no signs of OA and Grade 4 denotes multiple large osteophytes, severe joint space narrowing, marked sclerosis and definite deformation of bone ends.

The imaging parameters for the T2 map were: TR ¼ 2700 ms; 7 TEs ¼ 10 ms, 20 ms, 30 ms, 40 ms, 50 ms, 60 ms, 70 ms; field of view (FOV) ¼ 120 mm; flip angle ¼ 180; slice thickness ¼ 3 mm with a 0.5 mm gap; in-plane resolution ¼ 0.3 mm; matrix size ¼ 384  384. For evaluation of the automated detection of anatomical landmarks, 35 test subjects acquired using various MR sequences differed in contrast, resolution, patient positioning, health condition of knee joint and image quality. Thirty-one subjects were from OAI database and the imaging parameters for DESS and T2 map datasets were the same as those of training data. OAI test subjects included 10 healthy subjects, 10 subjects from incidence group and 11 subjects from progression group, i.e., 21 test subjects already have degenerative cartilage or the risk of development of degenerative cartilage.

To obtain test datasets under different patient poses, the remaining four test subjects were locally acquired at 1.5T (Magnetom Avanto, Siemens, Erlangen, Germany) using a 15-channel knee coil. When measuring the locally obtained datasets, the volunteer was asked to flex and twist the knee joint approxi mately 5 from the neutral alignment. The heath condition of the knee joint of the volunteer was unknown at the time of imaging. The imaging parameters for DESS datasets were: TR ¼ 19.4 ms; TE ¼ 7 ms; FOV ¼ 150 mm; flip angle ¼ 25; slice thickness ¼ 0.6 mm; in-plane resolution ¼ 0.3 mm; matrix size ¼ 512512. The T2 map sequences had TR ¼ 1811 ms, 7 TEs ¼ 10 ms, 20 ms, 30 ms, 40 ms, 50 ms, 60 ms, 70 ms, FOV ¼ 150 mm, flip angle ¼ 180, slice thickness ¼ 3.6 mm, in-plane resolution ¼ 0.6 mm, matrix size ¼ 256  256. In addition, to evaluate the algorithm on the other body parts, the landmark detection was also performed on thirteen fat-suppressed sagittal 3D SPACE datasets of the hip joint. The datasets were obtained by the Steadman Philippon Research Institute and the imaging parameters were: TR ¼ 1500 ms; TE ¼ 44 ms; FOV ¼ 180 mm; flip angle ¼ 120; slice thickness ¼ 0.9 mm; in-plane resolution ¼ 0.7 mm; matrix size ¼ 256  256. Three datasets were randomly selected as the training datasets. Three anatomical landmarks defined on the BCI of the head of the femur were manually placed on the training datasets. The same three anatomical landmarks were subsequently searched for in the remaining ten test datasets. Methods A brief overview of the proposed scheme of detecting anatomical landmarks in MR datasets is described in Figure 2 with details to be found in the following sections. For training the landmark detection, the

Figure 2. Pipeline of the proposed automatic landmark detection. A: In the training datasets predefined landmarks (blue points) are manually placed. B: As landmark candidates, interest points (red points) are detected in target datasets using DoG detector. C: Training landmarks and interest points are characterized with the same SIFT descriptor. The SIFT distance between training landmarks and interest points are applied as classifiers for selecting landmark candidates. D: The candidates (yellow points) chosen by classifiers trained from different training datasets are combined with a boosting system to be the final detected landmark (green point).

4

predefined landmarks were manually placed in the training datasets by experienced technicians (Fig. 2A). As a first step of automated landmark detection, interest points with a high gradient magnitude of pixel intensity were searched for in target datasets using a simplified Difference of Gaussian (DoG) detector (Fig. 2B) (20). Next, both training landmarks and interest points in target datasets were characterized using a Scale Invariant Feature Transform (SIFT) descriptor (21). This SIFT descriptor was generated by computing the gradient magnitude and orientation at each image sample point in its surrounding region. Interest points with minimal SIFT distance to training landmarks were chosen as candidates for landmarks (Fig. 2C). Finally, to achieve high localization accuracy and robustness a multi-classifier boosting system (16) was implemented to combine results derived from the different training datasets (Fig. 2D). All image processing steps were performed with the Insight Segmentation and Registration Toolkit (ITK) (http://www.itk.org/) on an Intel(R) Core(TM) i5 2.4 GHz CPU. ITK is a free open-source and crossplatform application development framework widely used for the development of image processing programs. The software is implemented in Cþþ. The additional image analysis and the assessment of results were performed in MeVisLab (http://www.mevislab.de/), a cross-platform application framework for medical image processing and scientific visualization.

Step 1: Interest Points Detection The fundamental idea of our method is to search for the landmark candidates by comparing the SIFT descriptors of the image points in target datasets with those of the training landmarks. Because the landmarks are predefined on the surface of knee bones, it would be time-consuming to compute the SIFT descriptor of all image points in target datasets. To reduce the computational cost, a detector was implemented to choose a minority of interest points as inputs for the SIFT description. The interest point refers to an image point with high image gradient, i.e., the image point located on the interface among different tissues. One commonly used interest point detector is the Difference of Gaussian (DoG) (20). The DoG algorithm detects high frequency edges of an image and returns these edges as interest points. Because the target images may not be in the same scale space, a cascade filtering DoG approach has been commonly implemented to detect scale-invariant features (20). In our case most MR images contain plenty of image parameters such as resolution and slice thickness. Using these image parameters, a 3D linear interpolation was performed to normalize the target datasets so that all target datasets have the same resolution and slice thickness. After the normalization all target datasets were in the same scale space. In this way the cascade filtering approach is not necessary, and the DoG detector can be simplified to reduce the computational cost (see Algorithm 1).

Xue et al.

Algorithm 1. Interest Point Detector 1. Input image I, Gaussian filters Gs1 , Gs2 , Gs3 . ðs1 ¼ 2; s2 ¼ 1:5; s3 ¼ 1Þ 2. Apply Gaussian filters on I and generate blurred images Is1 , Is2 , Is3 . 3. Is1 , Is2 , Is3 are subtracted to generated DoG images: D1 ¼ Is1  Is2 , D2 ¼ Is2  Is3 . 4. Binary DoG images D1 and D2 are produced with the thresholding defined by the average value of the mean and maximal pixel intensity of D1 and D2 , respectively. 5. A combined binary image Dcombined is produced by applying an AND gate on D1 and D2 . 6. return pixels with non-zero value in Dcombined as interest points. Step 2: Feature Descriptor The interest points detected by DoG are regarded as landmark candidates. A feature descriptor is required to evaluate which interest point is the best candidate. In the large variety of feature descriptors, the SIFT (21) is widely used as it captures a substantial amount of information on the spatial patterns of image and is the most robust to the displacements of objects in different images (22). Thus, the SIFT descriptor was modified and implemented to evaluate interest points in this work. To reduce computational cost, a “coarse to fine” structure (Fig. 3) was built for 2D feature descriptors using Algorithm 2. Algorithm 2. 2D Feature Descriptor 1. input An image point P in the image I. 2. A region of interest R with 2n 3 2n pixels is set in the neighborhood of P. 3. A gradient map G with gradient magnitude f(x,y) and orientation uðx; yÞ is computed by means of qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi f ðx; yÞ ¼ ðIðx þ 1; yÞ  Iðx; yÞÞ2 þ ðIðx; y þ 1Þ  Iðx; yÞÞ2   ðIðx; y þ 1Þ  Iðx; yÞÞ uðx; yÞ ¼ arctan ðIðx þ 1; yÞ  Iðx; yÞÞ [1] 4. An orientation histogram H is computed from G. H  has 16 bins covering 360 of orientation. The histogram H is created by adding the gradient magnitude of each pixel in the adjacent bins close to the orientation of the gradient vector of this pixel. A 16-dimensinal vector Vcoarse is generated. 5. R is refined to a 2 3 2 array of n 3 n subregions. For each subregion an orientation histogram with 8 bins are computed. As result a 32-dimensinal vector Vfine is obtained. 6. return Vcoarse and Vfine as feature descriptors of P. The size of the descriptor array and the size of the subregion n can be arbitrarily defined by user according to the problem to be solved. In our case, the landmarks were predefined on the surface of bones and the interface from bone to other tissues is approximately 4 mm (12 pixels). Thus, to fully describe features in the neighborhood of predefined landmarks and to increase the landmark detection accuracy, the

Automatic Landmark Detection

5

Figure 3. A “coarse to fine” feature descriptor. The red point indicates interest points. Black arrows annotate the gradient of each pixel. Red arrows are bins of the orientation histogram. The yellow point in the upper middle is the preliminary candidate chosen by the coarse matching criterion and the white dots are the excluded interest points. The yellow point in the upper right is the final candidate chosen by the feature descriptor.

descriptor array in Algorithm 2 was extended to a 4 3 4 array of 4 3 4 subregions in this work. For 3D datasets, we observed that many descriptors in the 3D (e.g., 4 3 4 3 4) array provided only redundant information for solving the matching problem. To increase computing speed at a low cost in terms of lost accuracy, the 3D feature descriptor was simplified into two 2D feature descriptors: one in the x-y plane and the other in the x-z plane. Because of various contrasts, resolutions, patient positioning, and health conditions of the knee joint, the feature descriptors of the same landmark in different MR datasets are usually dissimilar. It is critical to use landmarks in a single training dataset as criterion to determine the best matching interest points in the target datasets. Thus, a multi-classifier boosting system was implemented to refine the detection results. Step 3: Boosting System The principle of boosting systems is that a strong classifier in a probably approximately correct (PAC) sense can be generated by combining multiple weak classifiers (16). In this work, a comparator where the SIFT feature descriptors of a candidate landmark and a training landmark are compared was registered as a weak classifier. Each classifier selected the candidate with minimum SIFT distance to the training landmark as the detected landmark. A weighted majority voting was implemented to combine results of all weak classifiers (16): pd ¼

N X

wi pi :

[2]

i¼1

pd is the final position of the detected landmark, pi represents the best matching interest points chosen by the weak classifier i, wi defines the weighting factor

of each detected interest point which is derived from its SIFT distance to training landmarks, the sum of wi is normalized to 1. N represents the number of weak classifiers. Too many classifiers may greatly increase the computational cost whilst improving performance only minimally. In this work the number of classifiers is set to five (detailed in the Results section), in which two classifiers are trained from pathological subjects, two from healthy subjects and one from an at-risk subject. The weighted mean position of candidates chosen by all five classifiers was defined as the detected landmark. In this way the boosting system works well for both healthy and pathological subjects with reasonable costs. Performance Evaluation Methodology The proposed method was first tested on 35 various test subjects each containing both DESS and T2 map dataset. The landmarks were manually placed in all 70 test datasets by experienced technician. It takes approximately 20 min for an experienced technician to manually place all 24 predefined landmarks in a 3D dataset. An inexperienced technician requires approximately 1 h learning how to manually place the landmarks. The performance evaluation of the proposed method was performed by comparing the manually placed landmarks and automatically detected landmarks and assessment of detection results were performed in MeVisLab. The accuracy of detection results was evaluated according to two statistical criteria: the mean distance between manually placed landmarks and automatically detected landmarks and the standard deviation of the landmark distance. The sensitivity of the proposed method was evaluated as the proportion of landmarks which were correctly detected. A correctly detected landmark was

6

Xue et al.

Table 2 Accuracy, Sensitivity, and Computational Costs of Detecting Landmarks in Various MR Test Datasets Contrast Different groups of test subjects Mean Distance(mm) Standard deviation (mm) Sensitivity Computational cost per landmark (s)

DESS

T2

1.53 0.93 100% 4.82

1.74 1.16 94% 1.43

In-plane resolution

Health condition

Patient positioning

0.3 mm

0.6 mm

healthy

At-risk

Pathological

Rigid movements

1.41 0.91 99% 2.31

1.63 1.19 96% 2.4

1.43 0.91 99% 2.39

1.56 0.89 96% 2.35

1.68 1.23 95% 2.36

1.54 0.9 98% 2.34

defined as false if its location was not obvious on the surface of joint bones, e.g., inside the bone or cartilage, or in other soft tissues (fat, muscle, and ligament) or if it was located more than 4 mm away from the manually placed landmarks. The authors used 4 mm as criterion because the landmarks were defined on the surface of joint bones, and the thickness of the interface from bones to other tissues (such as cartilage, ligament, etc.) amounts to approximately 3 mm to 4 mm in our case. The efficiency of the proposed method was dependent on how many classifiers were used in the boosting system. To determine the appropriate number of classifiers, namely the size of the required training datasets, a boosting system with one to nine classifiers was, respectively, tested by detecting landmarks on five test subjects. The efficiency was evaluated by the computational cost of detecting one landmark in a 3D dataset. The robustness of the proposed method was evaluated by comparing detection results of test subjects with the following different features: The image contrasts of DESS and T2 map datasets are inherently different. In the T2 image, the bone has higher signal intensities than the cartilage at the BCI and, in the DESS image, the cartilage has higher signal intensities. The range of signal intensities of bone and cartilage is approximately 200 to 1200 in T2 images and 0 to 400 in DESS images. To validate whether the automated landmark detection is contrast-dependent, the detection results of DESS and T2 map datasets were compared. It is of clinical interest to determine whether the automated detection is also available for pathological subjects. The test subjects included 10 healthy subjects and 21 subjects being either with known pathology of the knee, or at risk of developing degenerative cartilage. The ability to detect landmarks on pathological subjects was evaluated by comparing detection results of the groups with different joint health statuses. All test MR datasets had different patient positioning in comparison with the training datasets. Different patient positioning causes various positions (rigid movements) of knee joint in different MR datasets. To evaluate the detection performance for knee datasets acquired at different knee poses (nonrigid movements), measurements were obtained locally and the volunteer was asked to flex and twist the knee joint between serial measurements.

Nonrigid movements 1.69 1.07 95% 2.37

To evaluate the general potential of the proposed method to be extended to other body joints, the automated landmark detection was also validated by detecting three landmarks on 10 MR hip datasets. The parameters used for the hip landmark detection were the same as those used for the knee landmark detection. The detection results of hip datasets were compared with those of knee datasets. Because the proposed method is a learning-based technique, the influence of the accuracies or errors in the manual landmark placement on the performance of automated landmark detection is a critical issue for evaluating the method. To validate the intra- and interobserver variability, the predefined landmarks were manually placed by two experienced technicians twice in the five training datasets. The detection results using different sets of training landmarks were compared by detecting landmarks in 10 knee test datasets.

RESULTS The performance (Accuracy, Sensitivity, and Efficiency) of landmark detection in different groups of test subjects (e.g., healthy, at-risk, and pathologic) is shown in Table 2. Accuracy and sensitivity The distance between the manually placed and automatically detected landmarks in all test datasets is 1.64 6 1.03 mm (mean 6 standard deviation). The mean sensitivity for all test datasets amounts to 97%. To display the accuracy of the detection results more intuitively, an example is given in the upper left of Figure 4. Two landmarks, posterior medial femoral condyle and posterior lateral femoral condyle, are indicated on the surface of a 3D model of the femur. The distances between detected landmarks and manually placed landmarks are 2.42 mm and 1.52 mm, respectively. The detection accuracy of the proposed method was also compared with those reported by others (Table 3). Efficiency The correlation between detection performance and number of classifiers used in boosting system is shown in Figure 5. As the number of classifiers

Automatic Landmark Detection

7

Figure 4. The detection results are viewed with software MeVisLab. Each predefined landmark is defined as a pixel and cannot be clearly indicated in the image. Thus, the landmarks are represented as a set of pixels to make the landmark more visible. The asterisks and the squares indicate manually placed landmarks and the circles denote automatically detected landmarks. Upper left: A 3D model of bones is generated from automatic segmentation results (28). Upper middle: T2 image. Upper right and lower right: DESS image of a healthy case and a pathological case from OAI, respectively. The white arrow indicates the cartilage beside the landmark. The cartilage of a pathological subject is largely degenerated in comparison with that of a healthy subject. Lower left and lower middle: Two DESS images of the same subject. The subject was asked to flex and twist in the lower right case.

increases the mean distance between manually placed and detected landmarks was reduced. The standard deviation of mean distance revealed the same trend. However, only nine classifiers improved detection accuracy by 2.6% but increased computational cost by 80% in comparison with five classifiers. Thus, we set the number of classifiers to five in this work, reaching a compromise between computational cost and performance. The mean computational cost of detecting one landmark, amounts to 4.82 s for DESS datasets and 1.43 s for T2 map datasets. In comparison, manual displacement takes approximately 1 min for each landmark. The computational cost was also

compared with automated landmark detection methods reported by others (Table 4). Robustness Some examples of detection results in MR datasets with different contrasts, different health conditions and different patient positioning are indicated in Figure 4. Image Contrast The detection accuracy for 3D DESS datasets is 1.53 6 0.93 mm and for T2 map datasets the accuracy

Table 3 Comparisons of the Proposed Method With Alternative Landmark Detection Techniques Alternative technique in (23)

Alternative technique in (24)

Alternative technique in (25)

Alternative technique in (26)

DESS and T2 map knee datasets (0.3 mm or 0.6 mm in-plane resolution) 1.64 mm (2.56 pixels)

3D gradient echo knee datasets (1mm isotropic resolution)

T1-weighted knee datasets

5.3 mm (5.3 pixels)

4.4 mm

T1raw-weighted knee datasets (0.55 mm in-plane resolution) 2.01 mm (3.7 pixels)

3D DESS knee datasets (0.6 mm isotropic resolution) 4.48 mm (7.9 pixels)

1.03 mm (1.67 pixels)

4.3 mm (4.3 pixels)

2.3 mm

1.27 mm (2.3 pixels)

_______

Proposed method Materials

Mean distance Standard deviation

8

Xue et al.

Figure 5. Detection performance and number of classifiers used in boosting system. The mean value and the standard deviation of the distance between manually placed and automatically detected landmarks were reduced as the number of classifiers increases. However, the improvement of detection accuracy by adding one classifier is not significant when the number of classifiers exceeds 5.

is 1.74 6 1.16 mm. The mean detection accuracy of DESS datasets is improved by 12% compared with that of T2 map datasets.

tion results using different sets of training landmarks are demonstrated in Table 5. DISCUSSION

Health Condition The mean detection accuracy for healthy subjects is improved by 12% in comparison with that for pathological/at-risk subjects. Patient Positioning The detection accuracy for subjects with nonrigid movements (flex and twist) is 1.69 6 1.03 mm, which is greater than that for subjects with rigid displacement by 9%. Other joint The detection accuracy for 3D SPACE hip datasets is 1.81 6 1.44 mm, which differs from that for knee datasets by 10%. Intra- and Interobserver Variability The mean distance between different sets of training landmarks placed by the same experienced technician is 1.65 mm. The mean training landmark distance placed by different technicians is 2.73 mm. The detec-

The test results reveal that the automated landmark detection is capable of localizing anatomical landmarks in various MR datasets with a promising accuracy. There were no observed significant differences in detection accuracy among test groups with various features such as image contrast, health conditions, patient positioning, etc. The accuracy of landmark detection by the proposed method was improved by 20% to 70% in comparison with those reported by others. The improved detection accuracy is achieved by using a 4 3 4 descriptor array of 4 3 4 subregions to fully describe the features in the neighborhood of predefined landmarks and by applying a boosting system to reduce the influence of poorly performing classifiers in the final decision. The high landmark detection accuracy will improve the performance of high level landmark-based image processing techniques. For example, Morton et al (5) found that a standard deviation of 2 mm in landmark placements resulted in vari ability of up to 6.5 in tibiofemoral kinematics. The computational cost of the proposed method was reduced by 67% to 85% compared with those of alternative techniques for processing similar tasks. This

Table 4 Computational Cost of Detecting One Landmark With Various Landmark Detection Techniques

Materials

Computational cost

Proposed method

Proposed method

Alternative techniques in (23)

Alternative technique in (24)

Alternative technique in (27)

DESS knee datasets (512512 160 voxels)

T2 map knee datasets (384384 27 voxels)

Gradient echo knee datasets (256256 160 voxels)

T1-weighted knee datasets (300300 240 voxels)

T1-weighted brain datasets (256256  150 voxels)

4.82 seconds

1.43 seconds

18 seconds

15 seconds

30 seconds

Automatic Landmark Detection

9

Table 5 Intra- and Interobserver Variability Results Mean distance Standard deviation

Technician 1, 1st manual placement

Technician 1, 2nd manual placement

Technician 2, 1st manual placement

Technician 2, 2nd manual placement

1.62 mm 0.98mm

1.68 mm 1.07 mm

1.74 mm 1.13mm

1.77 mm 1.19 mm

owes to the “coarse to fine” scheme used in the algorithm. Using a simplified DoG detector only approximately 0.1% voxels in the original images are selected as interest points (landmark candidates). Only interest points will be processed by the SIFT feature descriptor, which is the most time-consuming part of the entire landmark detection algorithm. In addition, the SIFT feature descriptor is also divided into two steps. After computing orientation histogram of interest points, only the most suitable candidates (10% of interest points) will be evaluated by the final complicated descriptor array. The 4-s processing time of detecting one landmark could be integrated into existing high level image processing techniques and clinical workflows with less delay to the clinical diagnosis. In MR imaging, there are many different sequences designed to generate MR images with various contrasts. Contrast-independence is an important requirement for automated landmark detection. In our method the algorithm itself does not need to be modified to detect landmarks in MR datasets with different contrasts. Only different training datasets are required. The detection results of DESS datasets are more accurate than T2 map datasets. This is because DESS is a gradient echo based sequence that accentuates susceptibility changes and interfaces/edges such as at BCI, particularly with fat suppression for high contrast changes at BCI. Thus, DESS sequence permits precise analysis of cartilage morphology in the knee joint (29). In comparison, a T2 map is sensitive to changes of biochemical compositions (30). Deep zones of cartilage going to the BCI are lower T2 signal which decreases bone/cartilage contrast at the BCI, “blurs” the BCI and makes detection more difficult. Also, the T2 map sequence is not fatsuppressed, and is susceptible to chemical shift artifact which obscures the BCI, while the fat-suppressed DESS does not have that problem. All of these factors help make BCI more distinct in DESS datasets than that in T2 map datasets. The predefined landmarks located at the BCI can be also more precisely localized. Despite this, the landmark can still be efficiently detected in T2 map datasets. This demonstrates that our algorithm itself is contrast-independent and has the potential to be extended to MR datasets with different contrasts and detect landmarks in different body parts. This is because the proposed method is a learning-based technique by carefully studying the appearance characteristics of training landmarks. The only requirement for the algorithm to be directly extended to other studies is adding appropriate training landmarks. The estimation is further validated by the test results of detecting three landmarks on 3D SPACE hip datasets. The landmarks can be detected in the hip datasets with reasonable accuracy using the

same parameters as used in the algorithm for knee datasets. The difference between detection accuracies of knee datasets and hip datasets was not significant. However, because of lack of test data, the test of proposed method for arbitrary tissue interface is still not complete. This will be investigated in further studies. Some pathological subjects have only degenerative cartilage on the BCI. This increases the difficulty of detecting landmarks on the BCI because the image information in the neighborhood of the landmark of pathological subjects is different from those of healthy subjects. In this work, the proposed method is also trained by MR datasets of three pathological subjects, which have degenerative cartilage of varying degree. In this way, the automated detection is also capable of localizing landmarks in the pathological or at-risk subjects with comparable accuracies as those of healthy subjects. This demonstrates that for subjects with different health conditions regarding OA, the automated landmark detection can be self-adapting by adding the corresponding training data. Under different acquisition cases, MR datasets are commonly acquired at different patient positioning and knee poses, which causes a movement of the knee joint in test MR datasets in comparison with that in training datasets. Most movements can be seen as a rigid displacement of structures, including translation and rotation. In some special cases, such as flexion or twisting of the knee joint, the movements can be nonrigid. The landmark detection is required to be invariant to these rigid/ nonrigid movements. In this work, a 16-dimensional SIFT vector is computed for orientation assignment, and a coarse estimation of the position of each predefined landmark was established from all training datasets. In this way the automated landmark detection works well for subjects with different patient positioning. As a learning-based method, the automated landmark detection requires a manual intervention for training the feature descriptors. This is a timeconsuming task. However, it is a one-off procedure that will not increase the computational cost of detecting landmarks in new MR datasets. Once the training is finished, landmarks can be detected in arbitrary new MR datasets using the same sequence. As a common drawback of learning-based techniques, any manual errors occurring in the training process would weaken the performance of automated detection. In the test regarding the intra- and interobserver variability, the differences between different sets of training landmarks cause no significant differences in the automated landmark detection results. This can be attributed to the boosting system implemented in the algorithm. Not only one classifier trained by one training dataset but a set of classifiers trained by various training datasets was used to determine the final

10

position of the detected landmark. A classifier may provide a poor decision due to errors in the manual training landmark placement. However, combining the results provided by several different classifiers by weighted averaging would reduce the risk of an unfortunate selection of a particularly poorly performing classifier, and consequently reduce the intra- and interobserver variability in the detection results. Several limitations are evident in this study. First, it involved only three MR sequences (DESS, T2 map and SPACE) and two body part (knee and hip) in the validation of the method. Further validation is needed to demonstrate that the proposed method can be applied on MR datasets of different body parts acquired under arbitrary MR sequences. Second, for some real-time applications requiring higher computational speeds, the efficiency of the proposed method still needs to be improved. One possible approach to achieve this is using a parallelized implementation, as all the results we reported used single CPU for the computation. Another limitation is that errors in the manual training landmark placement will still influence the detection accuracy to some extent, despite using boosting system to reduce this influence. In conclusion, a new and efficient scheme for automatically detecting landmarks on anatomical structures of the knee joint in MR datasets is presented. The method is a learning-based technique using a set of manually placed landmarks to describe the feature of predefined landmarks. The computational cost is largely reduced by using interest points generated by a simplified DoG detector as candidates for landmarks. A “coarse to fine” cascade structure of 3D feature descriptor improves the accuracy of detection results. The boosting system based on various training data increases the robustness of the landmark detection. Experimental results reveal that the method offers promising accuracy and run time efficiency in comparison with existing techniques, and displays good robustness to MR datasets with various contrasts, patient positioning and health conditions of the knee joint. Due to its contrast-independence, our method has the potential to be extended to other MR sequences and to MR images of other body parts. This will be investigated in further work. REFERENCES 1. Karl R. Landmark-based image analysis: using geometric and intensity models, 21st edition. Dordrecht, the Netherlands: Kluwer Academic Publishers; 2001. 318 p. 2. Kurkure U, Le YH, Paragios N, et al. Landmark/image-based deformable registration of gene expression data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, Rhode Island, 2011. p 1089–1096. 3. Besbes A, Paragios N. Landmark-based segmentation of lungs while handling partial correspondences using sparse graph-based priors. In: Proceedings of the 8th IEEE International Symposium on Biomedical Imaging, Chicago, 2011. p 989–995. 4. Cheng SG. Musculoskeletal MRI: contrast and non-contrast applications. Appl Radiol 2002;31:81–86. 5. Morton NA, Maletsky LP, Pal S, Laz PJ. Effect of variability in anatomical landmark location on knee kinematic description. J Orthop Res 2007;25:1221–1230. 6. Betke M, Hong H, Thomas D, et al. Landmark detection in the chest and registration of lung surfaces with an application to nodule registration. Med Image Anal 2003;7:265–281.

Xue et al. 7. Suinesiaputra A, Frangi AF, Kaandorp TA, et al. Automated regional wall motion abnormality detection by combining rest and stress cardiac MRI. J Magn Reson Imaging 2011;34:270–278. 8. Izard C, Jedynak B, Stark CE. Spline-based probabilistic model for anatomical landmark detection. Med Image Comput Comput Assist Interv 2006;9:849–856. 9. Zheng Y, Lu X, Georgescu B, et al. Robust object detection using marginal space and ranking-based multi-detector aggregation: application to left ventricle detection in 2D MRI images. In: Proceedings of the IEEE Conference on Comput Vision and Pattern Recognition, Miami, 2009. p 1343–1350. 10. Cheng Y, Jin Q, Zhao J, et al. An analysis algorithm for accurate determination of articular cartilage thickness of hip joint from MR images. J Magn Reson Imaging 2011;34:136–149. 11. Iskurt A, Becerikli Y, Mahmutyazicioglu K. Automatic identification of landmarks for standard slice positioning in brain MRI. J Magn Reson Imaging 2011;34:499–510. 12. Dikmen M, Zhan Y, Zhou XS. Joint detection and localization of multiple anatomical landmarks through learning. In: Proceedings of SPIE Medical Imaging, 2008; Vol. 6915, 691538. 13. Doan NT, Rooden S, Versluis MJ, et al. Combined magnitude and phase-based segmentation of the cerebral cortex in 7T MR images of the elderly. J Magn Reson Imaging 2012; 36:99–109. 14. Liu A, Du Y, Wang T, et al. Fast facial landmark detection using cascade classifiers and a simple 3D model. In: Proceedings of the 18th IEEE International Conference on Image Processing, Brussels, 2011. p 845–848. 15. Farshbaf M, Pouyan AA. Landmark detection on cephalometric radiology images through combining classifiers. In: Proceedings of the 17th Iranian Conference of Biomedical Engineering, Isfahan, 2010. p 1–4. 16. Polikar R. Ensemble based systems in decision making. IEEE Trans Circuits Syst Magazine 2006;6:21–45. 17. Fripp J, Crozier S, Warfield SK, Ourselin S. Automatic segmentation and quantitative analysis of the articular cartilages from magnetic resonance images of the knee. IEEE Trans Med Imaging 2010;29:55–64. 18. Cobb JP, Dixon H, Dandachli W, Iranpour F. The anatomical tibial axis: reliable rotational orientation in knee replacement. J Bone Joint Surge Br 2008;90:1032–1038. 19. Browne K, Kurtz CA. How to perform a comprehensive examination of the knee. JAAPA 2009;22:20–25. 20. Lowe DG. Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, 1999. p 1150–1157. 21. Lowe DG. Distinctive image feature from scale-invariant keypoints. Int J Comput Vis 2004;60:91–110. 22. Moreels P, Perona P. Evaluation of features detectors and descriptors based on 3D objects. Int J Comput Vis 2005;73:800–807. 23. Bauer S, Ritacco LE, Boesch C, et al. Automatic scan planning for magnetic resonance imaging of the knee joint. Ann Biomed Eng 2012;40:2033–2042. 24. Bystrov D, Pekar V, Young S, et al. Automated planning of MRI scans of knee joints. Proc SPIE 2007;6509:65092Z. 25. Pedersen DR, Klocke NF, Thedens DR, et al. Integrating cartilagespecific T1rho MRI into knee clinic diagnostic imaging. Iowa Orthop J 2011;31:99–109. 26. Jin Q, Cheng Y, Guo C, Li G, Sato Y. Point to point registration based on MRI sequences. In: Proceedings of the 2009 WRI Global Congress on Intelligent Systems, Washington, 2009. Volume 03, p 381–384. 27. Lu HX, Nolte LP, Reyes M. Interest points localization for brain image using landmark-annotated atlas. Int J Imaging Syst Technol 2012;22:145–152. 28. Fripp J, Crozier S, Warfield SK, Ourselin S. Automatic segmentation of the bone and extraction of the bone-cartilage interface from magnetic resonance images of the knee. Phys Med Biol 2007;52:1617–1631. 29. Eckstein F, Hudelmaier M, Wirth W, et al. Double echo steady state magnetic resonance images of knee articular cartilage at 3 Tesla: a pilot study for the Osteoarthritis Initiative. Ann Rheum Dis 2006;65:433–441. 30. Joseph GB, Baum T, Carballido-Gamio J, et al. Texture analysis of cartilage T2 maps: individuals with risk factors for OA have higher and more heterogeneous knee cartilage MR T2 compared with normal controls. Arthritis Res Ther 2011;13:R153.

Automatic detection of anatomical landmarks on the knee joint using MRI data.

To propose a new automated learning-based scheme for locating anatomical landmarks on the knee joint using three-dimensional (3D) MR image data...
569KB Sizes 1 Downloads 0 Views