Automated contouring error detection based on supervised geometric attribute distribution models for radiation therapy: A general strategy Hsin-Chen Chen, Jun Tan, Steven Dolly, and James Kavanaugh Department of Radiation Oncology, Washington University, St. Louis, Missouri 63110

Mark A. Anastasio Department of Biomedical Engineering, Washington University, St. Louis, Missouri 63110

Daniel A. Low Department of Radiation Oncology, University of California Los Angeles, Los Angeles, California 90095

H. Harold Li, Michael Altman, Hiram Gay, Wade L. Thorstad, Sasa Mutic, and Hua Lia) Department of Radiation Oncology, Washington University, St. Louis, Missouri 63110

(Received 18 June 2014; revised 24 November 2014; accepted for publication 2 January 2015; published 29 January 2015) Purpose: One of the most critical steps in radiation therapy treatment is accurate tumor and critical organ-at-risk (OAR) contouring. Both manual and automated contouring processes are prone to errors and to a large degree of inter- and intraobserver variability. These are often due to the limitations of imaging techniques in visualizing human anatomy as well as to inherent anatomical variability among individuals. Physicians/physicists have to reverify all the radiation therapy contours of every patient before using them for treatment planning, which is tedious, laborious, and still not an error-free process. In this study, the authors developed a general strategy based on novel geometric attribute distribution (GAD) models to automatically detect radiation therapy OAR contouring errors and facilitate the current clinical workflow. Methods: Considering the radiation therapy structures’ geometric attributes (centroid, volume, and shape), the spatial relationship of neighboring structures, as well as anatomical similarity of individual contours among patients, the authors established GAD models to characterize the interstructural centroid and volume variations, and the intrastructural shape variations of each individual structure. The GAD models are scalable and deformable, and constrained by their respective principal attribute variations calculated from training sets with verified OAR contours. A new iterative weighted GAD model-fitting algorithm was developed for contouring error detection. Receiver operating characteristic (ROC) analysis was employed in a unique way to optimize the model parameters to satisfy clinical requirements. A total of forty-four head-and-neck patient cases, each of which includes nine critical OAR contours, were utilized to demonstrate the proposed strategy. Twenty-nine out of these forty-four patient cases were utilized to train the inter- and intrastructural GAD models. These training data and the remaining fifteen testing data sets were separately employed to test the effectiveness of the proposed contouring error detection strategy. Results: An evaluation tool was implemented to illustrate how the proposed strategy automatically detects the radiation therapy contouring errors for a given patient and provides 3D graphical visualization of error detection results as well. The contouring error detection results were achieved with an average sensitivity of 0.954/0.906 and an average specificity of 0.901/0.909 on the centroid/volume related contouring errors of all the tested samples. As for the detection results on structural shape related contouring errors, an average sensitivity of 0.816 and an average specificity of 0.94 on all the tested samples were obtained. The promising results indicated the feasibility of the proposed strategy for the detection of contouring errors with low false detection rate. Conclusions: The proposed strategy can reliably identify contouring errors based upon inter- and intrastructural constraints derived from clinically approved contours. It holds great potential for improving the radiation therapy workflow. ROC and box plot analyses allow for analytically tuning of the system parameters to satisfy clinical requirements. Future work will focus on the improvement of strategy reliability by utilizing more training sets and additional geometric attribute constraints. C 2015 American Association of Physicists in Medicine. [http://dx.doi.org/10.1118/1.4906197] Key words: contouring error detection, attribute distribution model, geometric attribute, principal component analysis, radiation therapy

1048

Med. Phys. 42 (2), February 2015

0094-2405/2015/42(2)/1048/12/$30.00

© 2015 Am. Assoc. Phys. Med.

1048

1049

Chen et al.: Supervised contouring error detection with GAD models

1. INTRODUCTION Accurate target and organ-at-risk (OAR) delineation from CT and/or MR simulation images is an essential step in assuring accurate radiation therapy planning.1–4 Due to the limitations of current imaging techniques in visualizing human anatomy (e.g., insufficient contrast, resolution, or both) as well as inherent anatomical variability among individuals, both manual and automated structure contouring processes are subject to errors. These errors include mislabeling, missed slices, overlapped structures, isolated voxels or small regions, extremely large or small slices, and structure volume and location distortions.5–7 The contouring errors are major patient safety issues in radiation therapy because they degrade dose distributions, dosimetry analyses, and contour-based visual guidance used in image-guided procedures.7–12 High rates of OAR contouring errors have recently been reported, strongly suggesting the need for effective and practical peer review.7,11–20 While peer review may be practical for larger clinics, smaller clinics will be challenged to develop such a review process due to limited staffing. Even in larger clinics, manual contouring review is the only available option. During the manual contouring review process, clinicians/physicists/dosimetrists need to spend significant time to check all the organ/tumor contours slice by slice. Manually reviewing multiple anatomical volumes, which is required by most treatment planning could be very time-consuming. Moreover, it requires availability of independent reviewers, relies heavily on reviewers’ experience and alertness, and itself prone to errors due to display technology limitations and a lack of quantitative evaluation tools.12 Hence, there is a perceived requirement for automated contouring error detection methods that can assist secondary manual peer review to decrease the work burden and time cost on identifying contouring errors while increasing the robustness and reliability of quality assurance process. Geometric attributes of radiation therapy contours, such as the volume/centroid of an individual contour and the distances between neighboring contours’ centroids, have been considered as useful information to identify contouring errors.14 However, using simple statistical descriptions of geometric attributes (e.g., mean and/or standard deviation σ) is insufficient for separating correct and incorrect contours and might yield improper contouring error detection results under, but not limited to, the following situations. First, a single set of statistical criteria is inapplicable to handle the size variations among patients of different ages, races, genders, etc. For example, the geometric attributes of a single organ derived from male adults might lead to high false positives while using them directly to examine the contours of the same organ from a given female patient. Second, if a contour attribute yields large variations within the training population, the decision criteria based only on the simple statistical descriptions of each individual contour could be too loose and might yield a high false negative rate. Spatial relationships between neighboring contours (interstructural information) should be considered for such a situation. Third, if an incorrectly delineated contour has shape errors but yields Medical Physics, Vol. 42, No. 2, February 2015

1049

reasonable volume size and centroid position, the shape error might be missed if intrastructural shape variations are not considered. Based on our observations, both the intra- and interstructural geometrical information should be considered in order to handle these issues and provide reliable contouring error detection results. In recent years, pattern classification techniques such as support vector machines,21,22 random forests,14 and backpropagation neural networks14,23 have been actively investigated and applied to computer-aided medical diagnosis,14,24,25 human face recognition,26,27 and other applications.24,28,29 These classifier-based approaches typically require balanced (approximately equal number of) positive and negative training samples to derive reliable decision rules for properly separating the sample space. For example, Silva and coauthors incorporated geometric features of lung nodules into a support vector machine for differentiating benign or malignant lung nodules.22 McIntosh and coauthors used a random forest classification technique to estimate delineated contour quality.14 Even though they took into account neighboring contour constraints with the hope to increase the stability of contour classification, the conditional probability distributions used for the forest inference process in their method required training on a large amount of error samples and was sensitive to the size and asymmetry of the correct and incorrect samples.30 Principle component analysis (PCA), which is a statistical modeling method widely utilized in image segmentation applications, can reconfigure the basis vectors of feature space into a set of orthogonal eigenvectors that captures the noticeable variations of a feature set.31–34 As such, we were inspired to use PCA to depict geometric attribute spaces specifically for correct contour samples and then use the resulting spaces for identifying incorrect contours that do not accommodate for the expected correct feature spaces. In this way, the statistical attribute modeling process will not be sensitive to the number of incorrect samples. In this study, we developed a new supervised evaluation strategy for automatically detecting radiation therapy contouring errors based on geometric attribute distribution (GAD) models, which yields affordable training complexity and promising accuracy. More specifically, the proposed error detection method is not designed for reducing interphysician contouring variability. A total of 44 head-and-neck cancer patients were used to demonstrate the strategy effectiveness on assessing the contouring accuracy of 9 high-priority radiation therapy contours: brain, brainstem, optic chiasm, right/left eyes, right/left optic nerves, and right/left parotids. We constructed three GAD models of centroid, volume, and shape to characterize the inter- and intrastructural geometric relationships of radiation therapy contours and conducted a new iterative model-fitting algorithm for contouring error detection. The GAD models are scalable and deformable, and constrained by the respective principal attribute variations calculated from the training sets with clinically verified contours. Specifically, receiver operating characteristic (ROC) analysis was employed in a unique way to optimize the model performance for specific clinical requirements. An evaluation

1050

Chen et al.: Supervised contouring error detection with GAD models

tool that implements the supervised contouring error detection strategy, DICOM file input/output operations, user-friendly editing, and graphical visualization was developed as well.

2. METHOD AND MATERIALS 2.A. Strategy overview

The proposed contouring error detection strategy consists of model-training and error-detecting stages, as shown in Fig. 1. In the model-training stage, a number of patient cases with approved contours are utilized as the training sets to build the GAD models. First, three geometric attributes (centroid, volume, and shape) of each contour in the training sets are calculated. Then, interstructural GAD models are constructed to characterize the centroid and volume relationships between neighboring contours in a patient case, and intrastructural GAD models are constructed to describe the slice-by-slice shape variations of individual contours. These GAD models are subsequently employed in the error-detecting stage to detect contouring errors in a given to-be-examined patient case. In the error-detecting stage, the inter- and intrastructural contouring geometric attributes of the given patient case are first computed and then are compared to that from the trained GAD models through an iterative model-fitting process. If any of the comparisons yields a contouring attribute difference larger than a predefined, training-data-adaptive threshold (explained in Sec. 2.D), the tested contour will be determined as incorrect and reported for further verification. As such, clinicians will not need to spend significant time to check all the OAR contours slice by slice and will verify those reported incorrect contours only. Any patient cases with correct contours are added as new training sets to update the GAD training models and further improve the accuracy of contouring error detection.

1050

2.B. GAD models 2.B.1. Interstructural GAD models of centroid and volume

The interstructural GAD models of centroid are constructed based on training data sets to characterize the centroid relationships between neighboring contours and will be employed to analyze a given patient case. Without loss of generality, we assume that the total number of training sets is M, and each training set includes N contours. For the nth contour in the mth training set, we denote its centroid as gn, m = (x, y,z), where n = 1, 2, ..., N − 1, N as the index of contours; m = 1, 2, ..., M − 1, M as the index of training sets. The centroid attributes are computed in 3D Euclidean space (in mm). The centroid attributes of the N contours in the mth training set are arranged as Gm = [g1, m , g2, m , ..., g N −1, m , g N, m ]. Procrustes analysis was used to minimize the pose differences (isotropic scale, orientations, and positions) among the training contour sets for accurate centroid attribute distribution analysis.35,36 The mean geometM Gm/M. The meanric attribute is computed by Gcen = m=1 offset covariance matrix COV that describes the variance between each contour attribute ( Gm and ) T ( the corresponding ) mean Gcen is calculated by Gm − G Gm − G . Singular value decomposition (SVD) is used to decompose COV to obtain the corresponding eigenvalues λ i , where i = 1, . . . , S, and S is the dimension of covariance matrix.37 Given the eigenvalues λ i , which are arranged in decreasing order, we can determine a value of t, which is the minimum numt  ber satisfying i=1 (λ i / Sj=1 λ j ) ≥ 0.95. Considering the t, eigenvalues can preserve 95% of the total variance of the attributes. The t eigenvectors are represented by a matrix Ecen = (e1, e2, ..., ei , ..., et−1, et ). By assigning a weighting factor to each component of Ecen, the combination of Gcen and weighted Ecen can represent a new centroid attribute as Gcen = (g1, g2, ..., gi , ..., g N −1, g N )T ≈ Gcen + EcenBcen, (1) where Bcen = (b1, b2, ..., bi , ..., bt−1, bt )T represent the set of weighting factors. In addition, following the same steps described above, we can build the interstructural GAD models of volume, which are represented as Gvol ≈ Gvol + EvolBvol, to characterize the volume relationships between neighboring contours. The volume attributes are computed in cubic millimeters (mm3). Here, Gvol represent the mean volume attribute of the M training sets, Evol denote the eigenvectors, and Bvol are the weighting factors of Evol. In the error-detecting stage (Sec. 2.C), Gcen or Gvol will be deformed to fit to the attribute of a given contour by adjusting either Bcen or Bvol. The fitting difference will then be used to detect if the given contour has any errors due to incorrect centroid shifts or volume differences. 2.B.2. Intrastructural GAD models of shape

F. 1. Flowchart of the proposed contouring error detection strategy. Medical Physics, Vol. 42, No. 2, February 2015

The interstructural GAD models described above characterize the positional and volumetric relationships between neighboring contours. Here, the intrastructural GAD models

1051

Chen et al.: Supervised contouring error detection with GAD models

of shape are further proposed to detect contouring errors associated with incorrectly shaped contours, such as isolated points, missing slices, or unusually large/small slices. In order to construct a 3D surface point distribution model using conventional methods, a large number of corresponding landmarks should be selected from the training shapes.32 However, due to the lack of specific geometric properties and/or clear intensity characteristics, it is difficult to select a sufficient number of appropriate and consistent landmarks on some radiation therapy contours (e.g., brainstem in headand-neck site). In order to resolve this issue, we employed an implicit surface function, which can provide accurate continuous shape representation for landmark-less contours and high curvature boundaries, to construct the intrastructural shape GAD models.38,39 To effectively compute the shape attributes of individual contours from a radiation therapy simulation image, which can be large, a 3D local cuboid region that contains the contour is determined and utilized. The shape attribute of the nth contour of the mth training set can be represented by Ψ n, m

1051

the following minimization problem:  ( ( ))  1 e

Wk Gk −T(Θk ) Gk + Ek Bk

, ε = min 2 Θk ,Bk ,Wk N

(4)

where T is an affine transformation; Θk are the pose parameters (isotropic scaling factor, three-axes orientations, and translations factors) representing either the centroid attribute (k = cen) or the volume attribute (k = vol). Wk is a diagonal matrix whose elements are binary values and are used to either include the related contouring attributes into the model-fitting process or exclude them from the model-fitting process. Gk will be deformed to fit to the attribute of a given contour by adjusting Bk . Specifically, Θk and Bk are updated iteratively until the difference ε is minimized.40 Θk,(t+1) in the (t + 1)th iteration are estimated by using the Powell conjugate gradient method41 to minimize the following cost function:  ( 1

Wk T −1(Θk, t )Gek Θk, t+1 = argmin N Θk, t , ))  ( (5) − Gk + Ek Bk, t

. 2

 −dis(ph ), if(ph ) belongs to the background     0, if(ph ) is on the surface of the structure , (2) Ψ n, m (ph ) =      dis(ph ), if(ph ) belongs to the structure  where ph represents the 3D coordinate of the hth voxel in the cuboid region of the nth contour; dis is a signed-distance transform returning the shortest distance value of ph to the surface. The mean shape attribute of the nth contour for all M Ψ n, m/M, with the M training sets is obtained by Ψ n = m=1 aligned M structure surfaces. After applying PCA to the meanoffset covariance matrix of the shape attribute, we can obtain the corresponding eigenvalues Un . By assigning a weighting factor to each component of Un , the combination of Ψ n and weighted Un can represent a new shape attribute, defined as Ψ n = (Ψ n (p1),Ψ n (p2), ..., Ψ n (ph ), ..., Ψ n (p H −1),Ψ n (p H ))T ≈ Ψ n + Un An ,

(3)

where An represents the weighting factors of Un . H is the dimension of the shape attribute model determined by the number of voxels in the aligned local cuboid region that contains the nth contour. As with the interstructural GAD models, Ψ n will be deformed to fit to a given contour shape by adjusting An in the error-detecting stage, and then the fitting difference will be used to detect the shape related contouring errors. 2.C. Supervised contouring error detection strategy

In the error detection stage, the centroid, volume, and shape e e contour attributes of a given contour set, denoted as Gcen , Gvol , e and Ψ n , are first computed. The centroid and volume related contouring errors are detected by measuring the differences between the given contour’s attribute distributions and those from the trained GAD models (Gcen, Gvol, and Ψ n ). Either the centroid or the volume difference ε can be assessed by solving Medical Physics, Vol. 42, No. 2, February 2015

The residual ∆B are calculated as ( ( )) ∆Bk, t+1 = Ek T −1(Θk, t+1)Gek − Gk + Ek Bk, t ,

(6)

and Bk,(t+1) in the (t + 1)th iteration are updated as Bk, t+1 = Bk, t + ∆Bk, t+1,

(7)

Bk are normalized if they are over ±2 times the principal component variations (the square root of eigenvalues) of the attribute distributions in order to avoid generating over distorted distribution models.32 The iterative weighted inter structural GAD model-fitting procedure is schematically illustrated in Fig. 2, of which T1 and T2 are two system parameters that are determined based on ROC analysis (Sec. 2.D). Unlike the interstructural centroid and volume evaluation, the intrastructural shape model-fitting process is performed to detect incorrect slices in each individual anatomical contour by solving the following minimization problem:  e  Ψ n −T(Θn )Ψ n 2 ε = min ∗ ∗ Θ n,A n

 ( )  e

. Ψ −T(Θ ) ≡ min Ψ + U A n n n n n 2 Θ∗n,A∗n

(8)

As shown in Eq. (3), the nth contour shape is estimated as ¯ n + Un An . The estimation of Θn and An is similar to Ψn ≈ Ψ that of the estimation of Θk and Bk shown in Eqs. (5)–(7). Then, the closest distance for each point on the given surface with respect to the fitted GAD model surface is calculated, and the mean distance Ei,c of all the surface points on the cth slice is obtained as Q

Ei,c =

1 min{∥vr − vs ∥, s = 1, 2, ..., P − 1, P}, Q r =1

(9)

where vr represents the rth surface point of the total Q points on the cth slice of the given contour; vs is the sth surface point of the total P points on the fitted GAD model surface closest

1052

Chen et al.: Supervised contouring error detection with GAD models

F. 2. Schematic diagram of the proposed iterative weighted interstructural GAD model-fitting algorithm.

to vr . If Ei,c is smaller than a system-determined parameter T3, the contour on the cth slice is considered as correct; otherwise, it is reported as incorrect. A k-dimensional tree, which is based on a binary partitioning process to handle the problem of nearest neighbor search with a complexity of O(log N), was implemented to speed up the multidimensional search process in Eq. (8).42,43 As a single subject-dependent parameter, T3 can be determined based on a box-and-whisker plot for each case, as explained in Sec. 2.D. 2.D. System parameter determination with ROC analysis and box-and-whisker plot

The ROC curve is a plot of the true positive rate (sensitivity) against the false positive rate (1-specificity). It is employed to select optimal system parameters T1 and T2 in the iterative weighted model-fitting algorithm (Fig. 2) through the analysis of the contouring error detection results on a group of incorrect and correct contours (described in Sec. 2.E). The sensitivity and specificity are defined as TN TP , Specificity = , (10) TP + FN TN + FP where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives. A true positive represents a correct contour that is detected as correct. Similar definitions were used for true negative, false positive, and false negative. With a group of known incorrect and correct contour samples (described in Sec. 2.E), we first set both T1 and T2 as 0 in the iterative GAD model fitting algorithm and applied the algorithm to detect these contouring samples. The numbers of true positives and false positives can be obtained by comparing the detection results to the ground truth. The pair of Sensitivity =

Medical Physics, Vol. 42, No. 2, February 2015

1052

(1-specificity, sensitivity) represents a point on the ROC curve and corresponds to the detection results using a specific set of T1 and T2.44,45 By fixing T1 as 0 and discretely increasing T2 we can create a ROC curve through repeating the detection process. The cut-off point, which is the closest point to the upper left corner of this specific ROC curve and represents the highest achievable sensitivity and specificity of the test, will be used to determine the corresponding T2 value. Then, by fixing the obtained T2 and discretely increasing T1, we can obtain a new ROC curve through repeating the detection process. The updated cut-off point is then used to determine the corresponding T1. The determined T1 and T2 values are consequently used for testing any new patient case. Alternatively, in order to avoid the effects of shape fluctuations among different patient cases on contouring error detection accuracy, the parameter T3 in the intrastructural shape GAD models for each subject is determined by utilizing a nonparametric box-and-whisker plot. The box plot is not limited by assumptions regarding the underlying statistical distribution, and thus is popular for depicting numerical data groups. The degree of dispersion (spread) and the outliers can be indicated by different fences of the box.46 In this study, the value of T3 is adaptively determined subject-by-subject and is used for identifying the inliers (correctly contoured slices) and outliers (incorrectly contoured slices) of each contour. The mean distances Ei,c defined in Eq. (9) could be either a correct or a noncorrect distribution. Considering that quartiles can preserve the center and the degree of dispersion of a specific distribution, we used the sum of the third quartile (Q3) and the weighted interquartile range (IQR), T3 = Q3 + w· IQR to determine the outlier fence to, therefore, identify incorrectly contoured slices. In radiation therapy, contouring errors that decrease the accuracy of treatment planning should be detected as much as possible while certain false alarms are allowed as no serious negative impacts will be induced. Therefore, the weighting factor w of IQR was empirically assigned as 0.25 in our experiments to meet the expectation of this specific application, instead of using the value of 1.5 reported by Frigge et al.46 2.E. Training and testing data sets

Forty-four physician-approved head-and-neck patient cases were utilized to demonstrate the performance of the proposed contouring error detection strategy. For each patient case, contours of nine OARs including brain, brainstem, optic chiasm, right/left eyes, right/left optic nerves, and right/left parotids, were collected. The patient data were divided into two sets, of which 29 cases were used as the training set, and the remaining 15 cases were used as the testing sets. During the routine clinical workflow, contouring errors are corrected immediately when they are found. Due to the lack of enough incorrect clinical contours for testing the proposed strategy, we introduced deviations into the contours’ geometric attributes of these 44 correct cases to simulate clinically possible contouring errors. The quantities σc and σ v represent the standard deviations of the centroid and volume, respectively, which were estimated from the original

1053

Chen et al.: Supervised contouring error detection with GAD models

29 training sets. We first randomly selected L contours out of the nine as the correct samples. L is a computer generated random number between 0 and 4. Each of the L correct samples was given small random fluctuations (within 0.5 σc or 0.5 σ v ) on centroid and volume to simulate slightly different but still correct samples. The rest (9 − L) samples were modified with larger random deviations (between 3∼6 σc and 3∼6 σ v ) to simulate incorrect contouring samples. For each of the 29 training sets, such a procedure was repeated five times and thus 145 training sets were obtained. In total, 1305 training data-based attribute samples, which include 1091 correct and 214 incorrect attributes, were produced. These generated correct and incorrect samples were used to optimize the system parameters T1 and T2, and to verify the GAD models as well. The same procedure was also repeated 10 times to the 15 testing sets. Thus, a total of 1350 testing data-based attribute samples including 1123 correct and 227 incorrect attributes samples were produced. In addition, brainstem and left parotid contours were selected for simulating correct/incorrect shape attributes to verify the effectiveness of the intrastructural shape GAD models. These contours were specifically selected because they normally present very weak intensity contrast and fuzzy boundaries in CT simulation images. For each of the 44 patients’ brainstems and left parotids, we randomly selected L axial slices and kept them as the correct contoured slices. L was a random number between 0 and 4 as well. For the remaining axial slices, an enlargement/shrinking of the corresponding contour shapes was performed with a random scaling factor (i.e., shrinking in the interval of [0.1, 0.5] and enlarging in the interval of [1.5, 3.0]) to simulate shape-related contouring errors. This procedure was performed on all 44 data sets. In total, we generated 561 axial brainstem slices (507 correct and 54 incorrect) and 525 left parotid axial slices (474 correct and 51 incorrect) from the original 29 training sets, and 295 axial brainstem slices (268 correct and 27 incorrect) and 275 left parotid axial slices (242 correct and 33 incorrect) from the original 15 testing sets. Moreover, we randomly selected 60 contours from the 44 testing data sets to simulate 20 missing slice errors, 20 mislabeling errors, and 20 incorrect overlap between structures, respectively. We also added isolated voxels to 20 randomly selected contours to simulate contouring errors with isolated

1053

voxels. These data were applied to test the proposed strategy as well. All the employed random values were generated with a computational random number generation function.

3. RESULTS 3.A. Inter- and intrastructural GAD models conducted with training data sets

As the first step of the proposed strategy, interstructural centroid and volume GAD models were constructed based on the clinician-approved contours of the 29 head-andneck cancer patient cases. Figure 3(a) illustrates the aligned centroids of all nine radiation therapy contours of the 29 training patients. The centroid of each individual contour was shown in the smallest sphere; the corresponding mean centroid was displayed with the middle-size sphere, while the largest sphere represents the maximum difference between each single centroid and the mean value. The constructed centroid GAD model can characterize the spatial relationship between the nine centroids. As for the volume attribute, Fig. 3(b) illustrates the volume distributions from 4 training patient cases. Figure 3(c) illustrates the deformed interstructural volume GAD model after adjusting the weighting factors to −2, 0, and +2. This example demonstrates that the GAD model can be deformed to approximate the volume attribute in any new contour set. If the difference between a contour in a given set and that from the deformed model is larger than a threshold, it would be detected as an incorrect contour. Figure 4 illustrates the procedure to determine the parameters T1 and T2 by using ROC analysis. We first employed the interstructural volume and centroid GAD models to detect the 1091 correct and 214 incorrect contour samples by setting T1 as 0 and discretely changing T2, thus forming the ROC curve shown in Fig. 4(a). The value of T2 related to the cut-off point was determined and fixed. Next, the value of T1 was changed and the error detection process was repeated to obtain a new ROC curve shown in Fig. 4(b). The T1 corresponding to the cut-off point of the ROC curve was then determined. The area under the ROC curve shown in Fig. 4(b) was 0.97 and far above 0.9, indicating an excellent parameter determination test.44,45 In this study, the optimized parameters T1 and T2 for the centroid and volume models were (0.8, 0.4) and (11, 25),

F. 3. Centroid and volume GAD models constructed from the 29 training patient data; (a) the interstructural centroid GAD model of nine contours; (b) the volume distributions from 4 training data sets; (c) by adjusting the weighting factors to −2, 0, and +2, the GAD model can be deformed to approximate the volume attribute in any new contour set. Medical Physics, Vol. 42, No. 2, February 2015

1054

Chen et al.: Supervised contouring error detection with GAD models

1054

F. 4. Determination of the system parameters T1 and T2 of the volume GAD model in the model-training stage; (a) the obtained ROC curve by altering T2 with a fixed T1 as 0; (b) the obtained ROC curve by altering T1 with the updated T2 determined based on ROC curve shown in (a).

respectively, which were utilized to detect contouring errors in the testing data sets. The intrastructural shape GAD models were conducted based on the 29 training data sets. As an example, Fig. 5(a) shows the volume subdivision results of brainstem contours. The shape attribute was computed only within the volume subdivision instead of the whole image space for efficient computation. Figure 5(b) shows examples of deformed intrastructural shape GAD models formed by adjusting An [see Eq. (3)]. The three rows show the deformed shape models formed by applying three different weighting factors (−2, 0, and +2) to the first three principal components of eigenvectors Un of the brainstem shape attribute. It demonstrates that the intrastructural GAD model is capable of deforming to approximate the shape attribute in a new contour set. If any contour in a given contour set cannot fit to this deformed model, it would be detected as the incorrect contour. 3.B. Quantitative analysis of automated contouring error detection

A quantitative experiment was conducted to evaluate the accuracy of the proposed strategy on all 44 data sets by utilizing the measures of balanced accuracy (BA), recall, specificity, and F-score.47 BA is a parameter that quantifies a system’s ability to avoid false classification (including

false positive and false negative). Recall is used to evaluate the effectiveness of a system on identifying positive labels. Specificity reflects how effectively a system identifies negative samples. F-score represents the harmonic mean of the precision and recall. It is considered as a more complete measure of a test’s accuracy. These measures were defined as ( ) TP TN TP 1 BA = + , Recall = , 2 TP + FN TN + FP TP + FN 2TP F-score = . (11) 2TP + FN + FP Table I lists the contour error detection results on all 1305 training set-based samples as well as 1350 testing set-based samples by using interstructural centroid and volume GAD models. Table II lists the contour error detection results of using intrastructural shape GAD models on the 561 and 525 training set-based samples for brainstem and left parotid, respectively, as well as the 295 and 275 testing set-based samples for brainstem and left parotid, respectively. As described in Sec. 2.E, major radiation therapy contouring errors were simulated to test the proposed strategy. In both tables, the high recall values (ranging from 0.848 to 1) indicated that more than 90% of the correct contours can be recognized correctly and no future manual rechecking is required. High F-score (ranging from 0.901 to 0.997) indicated that the study achieved low FP and FN, as expected. In addition, the proposed strategy

F. 5. Intrastructural shape GAD models; (a) volume subdivision of each contour; (b) various brainstem shapes generated by different weighted combinations of the mean shape attribute and principal components of the eigenvectors of the brainstem shape attribute. Medical Physics, Vol. 42, No. 2, February 2015

1055

Chen et al.: Supervised contouring error detection with GAD models

1055

T I. Quantitative evaluation of contouring error identification results using the interstructural centroid and volume GAD models.

Number of samples 1305 (training)

Number of correct samples

Number of incorrect samples

GAD model

1091

214

BA

Recall

Specificity

Sensitivity

F-score

Centroid Volume

0.989 0.972 0.981

1 0.963 0.982

0.977 0.981 0.979

1 0.963 0.982

0.997 0.979 0.988

Centroid Volume

0.911 0.842 0.877

0.997 0.848 0.923

0.824 0.837 0.831

0.908 0.848 0.878

0.981 0.901 0.941

Average test accuracy of the 1305 training set based samples 1350 (testing)

1123

227

Average test accuracy of the 1350 testing set based samples

can reach 100% detection rates on the contouring errors of mislabeling, missing contours, missing slices, isolated voxels, and overlapped structures. 3.C. Contouring error detection tool design

The proposed error detection strategy was implemented along with graphical visualization techniques, and the software interface was illustrated in Fig. 6. The pretrained centroid, volume, and shape GAD models were first constructed using the training sets as described in Sec. 2. The radiation therapy simulation image and contours of a given patient are the input of the software, which can be imported together or separately. Users can observe and evaluate the radiation therapy contours by superposing them onto the corresponding simulation image. An example was shown in the left of Fig. 6. Then, the three geometric attributes of each contour were computed and compared to the trained GAD models through the proposed model-fitting processes described in Sec. 2.C. If any of the comparisons yielded the difference larger than T1, T2, or T3, the evaluated contour was determined as incorrect. The contouring error detection result was reported as well. In addition, the software provides additional functions to facilitate visual comparison. The software enables radiation therapy contour interpolation and surface triangulation to render surface models with adjustable opacity, as seen in the right of Fig. 6. The pretrained GAD models can also be loaded as a default and displayed for visual comparison with the given data, as seen in the upper right of Fig. 6. The software can automatically parse the DICOM header and list the header information in the “InfoDisplay” tablet for clinicians to view. The 3D visualization of the critical radiation therapy contours can be rotated 360◦ freely, with zoom-in and

zoom-out accomplished by a simple manipulation of mouse click and drag, so that users can easily get a visual sense to verify the contouring errors that are reported in the “summary” tablet. The software also allows clinicians to directly modify the reported incorrect radiation therapy contours by moving control points of radiation therapy contours (i.e., the small empty circles shown in the left of Fig. 6). The verified correct testing set can be added to the training set to update the trained GAD models and improve the stability of the contour quality evaluation strategy. The corrected radiation therapy contours can also be exported in a compatible format to other radiation therapy planning systems, so that it can be imported back to the clinical planning system if necessary. 3.D. Visualization of contouring error detection results

With the designed tool, the contouring error detection results can be displayed and doubly evaluated by simultaneously rendering the calculated contour attributes and the reference inter- and intrastructural GAD models. Figures 7(a) and 7(b) show incorrect centroid attributes of the nine contours from two difference patient cases. The line-connected spheres represent the centroid attributes of the fitted GAD model, the cubic dots show the centroids of the given contour, while the incorrectly contoured structures clearly deviating from the GAD models were indicated with the arrows. Similarly, Figs. 7(c) and 7(d) show incorrect volume attributes of contours from two different patient cases. Using the proposed iterative model-fitting algorithm, the contours with incorrect volume sizes, which yielded larger-than-threshold differences were easily distinguished as indicated by the arrows. One of the advantages of the proposed strategy is that it could further locate the incorrect slices with the intrastructural

T II. Quantitative evaluation of contouring error identification results using the intrastructural shape GAD model. Number of incorrect samples

GAD model

561 (training, brainstem) 507 525 (training, left parotid) 474 Average test accuracy of the training set based samples

54 51

Shape

295 (testing, brainstem) 268 275 (testing, left parotid) 242 Average test accuracy of the testing set based samples

27 33

Shape

Number of samples

Number of correct samples

Medical Physics, Vol. 42, No. 2, February 2015

BA

Recall

Specificity

Sensitivity

F-score

0.918 0.856 0.887

0.929 0.929 0.929

0.907 0.784 0.846

0.929 0.928 0.929

0.958 0.951 0.955

0.881 0.856 0.869

0.948 0.955 0.952

0.815 0.757 0.786

0.947 0.955 0.951

0.964 0.960 0.962

1056

Chen et al.: Supervised contouring error detection with GAD models

1056

F. 6. Prototype interface of the proposed automated contouring evaluation and visualization tool; (left) visualization modules of DICOM images and (right) radiation therapy structural models.

shape GAD models, instead of detecting the existence of incorrect shape attributes only. These incorrect slices can be easily visualized with the designed tool. Figure 8 demonstrates 3D surfaces of the brainstem (upper row) and left parotid (bottom row) with the simulated incorrect slices from six difference patient cases. The automatically identified incorrect slices were highlighted by the black contours. All the 15 incorrect slices were detected and indicated with the arrows. Four false negatives (false alarms) were yielded and indicated by the dashed arrows. 4. DISCUSSION Autoevaluation of contouring quality is a complex and challenging task because clinical contours possess great randomness, which is caused by subjective anatomical varia-

tions and the inconsistency of clinicians’ contouring experience and knowledge. In this study, we proposed an automated contouring error detection strategy based on the inter- and intrastructural GAD models. The experimental results have illustrated that using the eigenconstraints of correct contour attributes can effectively differentiate correct and incorrect contours. As shown in Figs. 7(a) and 7(b), the global geometric feature spaces of correct contours can be employed to detect incorrect attribute samples by performing the supervised iterative model-fitting process. The proposed strategy does not rely on incorrect contour samples to train the GAD models, thus avoiding biased detection results caused by imbalanced sizes of correct and incorrect contour samples. Such a merit considerably increases the practicality of the strategy and the related computer-aided tool. Other than that, our strategy also enables incorrect slice identification

F. 7. Visual demonstration of contour error detection results using the interstructural GAD models; (a) and (b) centroid error detection; (c) and (d) volume error detection. Medical Physics, Vol. 42, No. 2, February 2015

1057

Chen et al.: Supervised contouring error detection with GAD models

based on the individual shape GAD model-fitting process. The simulated contouring errors with discontinuous slices in Fig. 8 often happen in manually delineating soft tissues, such as the brainstem and parotid, which present weak intensity contrast to surrounding tissues. These incorrect contours can be picked out automatically by the proposed strategy for physicians/physicists to do further verification. The quantitative measures defined by Eq. (10) allow for a more comprehensive understanding of the contour error detection results. The measure of BA is a combinative value reflecting the sensitivity and specificity. As shown in Table I, the interstructural GAD models of centroid and volume yielded BAs over 0.95 (0.989 and 0.972, respectively) on the training data-based samples while producing the BAs of 0.911 and 0.842, respectively, on the testing data-based samples. In addition, the associated iterative fitting algorithm was able to cope with the sensitivity issue to outliers and eliminate the negative impact of incorrect contour attributes on the fitting result (explained in Sec. 2.C). Thus, promising contour assessments were achieved with high F-scores over 0.9. As for the shape error detection, the performance of using the intrastructural GAD models reached BAs higher than 0.85 and F-scores above 0.95. As shown in Table II, the error detection results for brainstem were better than those for parotid. This was mainly because the variations of parotid shape are much larger than those of brainstem shape, certainly increasing the numbers of false positives and false negatives. Overall, the proposed strategy can reach promising true positive and true negative detection while yielding relatively low false classification. The GAD models developed in this study were scalable and deformable. The deformation algorithm was implemented based on the active shape model deformation method,32 in which an iterative optimization approach is suggested to solve the global alignment and local deformation parameters of the model for recognizing target objects from images. In

1057

this specific study of contouring error detection, we first established inter- and intrastructural GAD models to depict geometric attribute spaces specifically for correct contour samples. Then, the GAD models were deformed under the eigenconstraints of correct samples to predict a contour attribute to match the given contour. So, incorrect contouring can be inspected and identified if there is a greater-thanthreshold discrepancy between the deformed GAD models and its contour attributes based on either Eq. (4) or Eq. (9). Normally, ROC analysis has been considered an efficient method for cross-system performance evaluation. However, in this study, it was utilized in a promising way to analytically select system parameters T1 and T2 that maximize both error detection parameters sensitivity and specificity. The optimized thresholds were then utilized to obtain the results listed in Table I. This system parameter selection (or optimization) strategy maintains the flexibility to balance the requirements for sensitivity and specificity by selecting the cut-off point on a ROC curve and to accommodate them for specific clinical requirements. Regarding the intrastructural shape GAD model-fitting process, we assigned the threshold T3 with the value of Q3 + 0.25 · IQR based on the box-and-whisker plot of each patient. With the simulation data used in the experiments, the fence was set strictly for achieving betterbalanced measures, i.e., higher BA and F-score. Similar to the parameters selected in the interstructural GAD models, the outlier fence of the box-and-whisker plot can be adjusted to satisfy clinical requirements. The designed contouring error detection tool was implemented as a stand-alone software package based on # and open sources of programming library (e.g., Visualization Toolkit). It can be directly run on any Microsoft Windowsbased computer without the support of other software packages. Although the diagonal matrix searching process needs to perform the iterative model-fitting (Fig. 2), it is an efficient process because of the small size of Gcen and

F. 8. Visualization of automated detection results for local shape errors; (a)–(c) evaluation results of three brainstem contours; (d)–(f) evaluation results of three left parotid contours. The contours were triangulated and displayed with colored fitting-error maps. All the slices were rendered on the triangulated meshes. The incorrect slices detected by the automated algorithm were displayed in black color and pointed out by colored arrows. Medical Physics, Vol. 42, No. 2, February 2015

1058

Chen et al.: Supervised contouring error detection with GAD models

Gvol (27 × 1 and 9 × 1, respectively). The designed tool also provides convenient graphical visualization of the error detection results (shown in Figs. 7 and 8), and all the correct and incorrect slices can be visualized. Regarding the computational complexity of the proposed strategy, the computational time required for evaluating one patient case was less than 50 s. As for the modeltraining stage, the interstructural GAD models were trained within 30 s and the intrastructural GAD models were built within about 5 min. The reason that it took more time for training the intrastructural GAD models is because the size of covariance matrix of shape attributes is larger and thus solving the eigensystem requires more computational time. The computation time was assessed on a computer with an Intel® Xeon(R) processor (3.07 GHz) and 4.0 GB of memory. As a clinical contouring error detection strategy, robustness on error detection is a major requirement. The robustness of the proposed strategy can be improved along with increased numbers of clinical correct and incorrect training data sets. Even though the model training process does not require the use of incorrect sets, we believe that if more clinical incorrect data sets are used to test the system, the system decision parameters can be more realistically determined and fine-tuned. Therefore, we will collect more clinical data in the future. Although only centroid, volume, and shape features were utilized in the current study to establish the GAD models, the proposed general strategy allows to characterize relationships of other inter- and/or intrastructural contour features (such as mean intensity and/or textures) via PCA-based distribution modeling. As long as the geometric distributions of these features present certain similarity among different patients, they can be incorporated into this strategy by modifying the inter- and intrastructural attributes defined in Sec. 2.B. One of our future works will focus on investigating other feature distribution models (e.g., intensity and texture properties) to refine and complete the automated evaluation criteria for further reducing false classifications. In the proposed general strategy, the centroid and volume GAD models were built based on clinically approved contours to characterize the interstructural relationship of neighboring contours. The centroid and/or volume errors of a given single structure were detected not only by considering the contouring difference of the structure itself to that in the training model but also by utilizing the interstructural relationship between the structure and its neighboring structures to improve the detection accuracy. Therefore, when the incorrect contour of a small structure (e.g., optical chiasm) presented large deviations in centroid and volume with respect to the interstructural GAD models, it could be identified. Figure 6 shows the examples of correct detection of optical chiasm contouring error. Also, we did observe that the inter- and intraobserver variability on contours of small organs is greater than that of other large organs. One of our on-going research topics is to utilize additional reference landmarks (with image intensity related information) to reliably refine the detection of contouring errors of small organs, and to overcome the problems caused by the sensitivity of small structures to certain geometric attributes. We will also investigate how to Medical Physics, Vol. 42, No. 2, February 2015

1058

use organ-specific model parameters to refine the accuracy of contouring error detection, especially for small structures. In this study, we used the head-and-neck organs-at-risk contours to demonstrate the proposed general strategy. Surely, it is expected that tumor/target contours can be included in the automated contouring error detection strategy. However, due to the high biological and physical variability of tumors, specific domain knowledge (e.g., types of cancer and morphological characteristics) may be required for modeling the intensity and texture of tumors. One of our future works will focus on applying the proposed strategy to tumors that yield manageable properties. 5. CONCLUSION In this paper, we have proposed a general strategy based upon inter- and intrastructural GAD models for automated detection of radiation therapy contouring errors. ROC analysis provides a promising way to analytically tune the system parameters based on specific clinic requirements. The scalable and constrained deformable GAD models and the iterative model-fitting algorithm make the strategy suitable to distinguish contouring errors of a given patient from geometric feature aspects. The stand-alone and compact software combining the automated contouring error detection strategy and the computer graphics visualization interface provides a clinically usable and convenient tool. The proposed automated contouring error detection strategy will serve as a supporting tool to manual peer review and help on detecting most contouring errors. As such, tumor and target contours will get a greater level of manual review, and the total time of manual review process will be reduced. The proposed strategy has great potential for facilitating the routine clinical workflow. a)Author

to whom correspondence should be addressed. Electronic mail: [email protected]; Telephone: 314-362-0129. 1C. Moretones, D. Leon, A. Navarro, O. Santacruz, A. M. Boladeras, M. Macia, M. Cambray, V. Navarro, I. Modolell, and F. Guedea, “Interobserver variability in target volume delineation in postoperative radiochemotherapy for gastric cancer. A pilot prospective study,” Clin. Transl. Oncol. 14, 132–137 (2012). 2J. Kalpathy-Cramer, S. D. Bedrick, K. Boccia, and C. D. Fuller, “A pilot prospective feasibility study of organ-at-risk definition using target contour testing/instructional computer software (TaCTICS), a training and evaluation platform for radiotherapy target delineation,” AMIA Annu. Symp. Proc. 2011, 654–663 (2011). 3S. Dewas, J. E. Bibault, P. Blanchard, C. Vautravers-Dewas, Y. Pointreau, F. Denis, M. Brauner, and P. Giraud, “Delineation in thoracic oncology: A prospective study of the effect of training on contour variability and dosimetric consequences,” Radiat. Oncol. 6, 118 (9pp.) (2012). 4A. N. Badouna, C. Veres, N. Haddy, F. Bidault, D. Lefkopoulos, J. Chavaudra, A. Bridier, F. de Vathaire, and I. Diallo, “Total heart volume as a function of clinical and anthropometric parameters in a population of external beam radiation therapy patients,” Phys. Med. Biol. 57, 473–484 (2012). 5J.-F. Daisne and A. Blumhofer, “Atlas-based automatic segmentation of head and neck organs at risk and nodal target volumes: A clinical validation,” Radiat. Oncol. 8, 154 (11pp.) (2013). 6C. Brouwer, R. Steenbakkers, E. van den Heuvel, J. Duppen, A. Navran, H. Bijl, O. Chouvalova, F. Burlage, H. Meertens, J. Langendijk, and A. van ’t Veld, “3D variation in delineation of head and neck organs at risk,” Radiat. Oncol. 7, 32 (9pp.) (2012). 7L. Santanam, R. S. Brame, A. Lindsey, T. Dewees, J. Danieley, J. Labrash, P. Parikh, J. Bradley, I. Zoberi, J. Michalski, and S. Mutic, “Eliminating

1059

Chen et al.: Supervised contouring error detection with GAD models

inconsistencies in simulation and treatment planning orders in radiation therapy,” Int. J. Radiat. Oncol., Biol., Phys. 85, 484–491 (2013). 8W. R. Hendee and M. G. Herman, “Improving patient safety in radiation oncology,” Pract. Radiat. Oncol. 1, 16–21 (2011). 9W. H. Hall, M. Guiou, N. Y. Lee, A. Dublin, S. Narayan, S. Vijayakumar, J. A. Purdy, and A. M. Chen, “Development and validation of a standardized method for contouring the brachial plexus: Preliminary dosimetric analysis among patients treated with IMRT for head-and-neck cancer,” Int. J. Radiat. Oncol., Biol., Phys. 72, 1362–1367 (2008). 10M. Feng, C. Demiroz, K. A. Vineberg, A. Eisbruch, and J. M. Balter, “Normal tissue anatomy for oropharyngeal cancer: Contouring variability and its impact on optimization,” Int. J. Radiat. Oncol., Biol., Phys. 84, e245–e249 (2012). 11J. Stanley, P. Dunscombe, H. Lau, P. Burns, G. Lim, H. W. Liu, R. Nordal, Y. Starreveld, B. Valev, J. P. Voroney, and D. P. Spencer, “The effect of contouring variability on dosimetric parameters for brain metastases treated with stereotactic radiosurgery,” Int. J. Radiat. Oncol., Biol., Phys. 87, 924–931 (2013). 12A. C. Lo, M. Liu, E. Chan, C. Lund, P. T. Truong, S. Loewen, J. Cao, D. Schellenberg, H. Carolan, T. Berrang, J. Wu, E. Berthelet, and R. Olson, “The impact of peer review of volume delineation in stereotactic body radiation therapy planning for primary lung cancer: A multicenter quality assurance study,” J. Thorac. Oncol. 9, 527–533 (2014). 13M. H. Nielsen, M. Berg, A. N. Pedersen, K. Andersen, V. Glavicic, E. H. Jakobsen, I. Jensen, M. Josipovic, E. L. Lorenzen, H. M. Nielsen, L. Stenbygaard, M. S. Thomsen, S. Vallentin, S. Zimmermann, and B. V. Offersen, “Delineation of target volumes and organs at risk in adjuvant radiotherapy of early breast cancer: National guidelines and contouring atlas by the Danish Breast Cancer Cooperative Group,” Acta Oncol. 52, 703–710 (2013). 14C. McIntosh, I. Svistoun, and T. G. Purdie, “Groupwise conditional random forests for automatic shape classification and contour quality assessment in radiotherapy planning,” IEEE Trans. Med. Imaging 32, 1043–1057 (2013). 15R. M. Macklis, T. Meier, and M. S. Weinhous, “Error rates in clinical radiotherapy,” J. Clin. Oncol. 16, 551–556 (1998). 16J. Kavanaugh, H. Wooten, O. P. Green, T. Dewees, H. Li, S. Mutic, and M. Altman, “TH-A-116-07: Validation of patient contours for head and neck treatments using population-based metrics,” Med. Phys. 40, 530 (2013). 17T. K. Yeung, K. Bortolotto, S. Cosby, M. Hoar, and E. Lederer, “Quality assurance in radiotherapy: Evaluation of errors and incidents recorded over a 10 year period,” Radiother. Oncol. 74, 283–291 (2005). 18G. Huang, G. Medlam, J. Lee, S. Billingsley, J.-P. Bissonnette, J. Ringash, G. Kane, and D. C. Hodgson, “Error in the delivery of radiation therapy: Results of a quality assurance review,” Int. J. Radiat. Oncol., Biol., Phys. 61, 1590–1595 (2005). 19S. Mutic, R. S. Brame, S. Oddiraju, P. Parikh, M. A. Westfall, M. L. Hopkins, A. D. Medina, J. C. Danieley, J. M. Michalski, I. M. El Naqa, D. A. Low, and B. Wu, “Event (error and near-miss) reporting and learning system for process improvement in radiation oncology,” Med. Phys. 37, 5027–5036 (2010). 20Y. Sun, X.-L. Yu, W. Luo, A. W. M. Lee, J. T. S. Wee, N. Lee, G.-Q. Zhou, L.-L. Tang, C.-J. Tao, R. Guo, Y.-P. Mao, R. Zhang, Y. Guo, and J. Ma, “Recommendation for a contouring method and atlas of organs at risk in nasopharyngeal carcinoma patients receiving intensity-modulated radiotherapy,” Radiother. Oncol. 110, 390–397 (2014). 21N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, Cambridge, England, 2000). 22C. A. Silva, A. C. Silva, S. M. B. Netto, A. C. d. Paiva, G. B. Junior, and R. A. Nunes, “Lung nodules classification in CT images using Simpson’s index, geometrical measures and one-class SVM,” in Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition (Springer-Verlag, Leipzig, Germany, 2009), pp. 810–822. 23A. Miller and B. Blott, “Review of neural network applications in medical imaging and signal processing,” Med. Biol. Eng. Comput. 30, 449–464 (1992).

Medical Physics, Vol. 42, No. 2, February 2015

1059

24K. Leszczynski, S. Cosby, R. Bissett, D. Provost, S. Boyko, S. Loose, and E.

Mvilongo, “Application of a fuzzy pattern classifier to decision making in portal verification of radiotherapy,” Phys. Med. Biol. 44, 253–269 (1999). 25H. Seker, M. O. Odetayo, D. Petrovic, and R. N. Naguib, “A fuzzy logic based-method for prognostic decision making in breast and prostate cancers,” IEEE Trans. Inf. Technol. Biomed. 7, 114–122 (2003). 26Z. Liu, J. Yang, and C. Liu, “Extracting multiple features in the CID color space for face recognition,” IEEE Trans. Image Process. 19, 2502–2509 (2010). 27S. Liao, A. K. Jain, and S. Z. Li, “Partial face recognition: Alignment-free approach,” IEEE Trans. Pattern Anal. Mach. Intell. 35, 1193–1205 (2013). 28M. Banning, “A review of clinical decision making: Models and current research,” J. Clin. Nurs. 17, 187–195 (2008). 29K. J. Cho and J. H. Han, “Classification of complex patterns for surface inspection,” in Proceedings. 1991 IEEE International Conference on Robotics and Automation (IEEE, Sacramento, CA, 1991), Vol. 1802, pp. 1802–1807. 30Y. Song, L. P. Morency, and R. Davis, “Distribution-sensitive learning for imbalanced datasets,” in 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG) (IEEE, Shanghai, China, 2013), pp. 1–6. 31G. Langs, P. Peloschek, H. Bischof, and F. Kainberger, “Automatic quantification of joint space narrowing and erosions in rheumatoid arthritis,” IEEE Trans. Med. Imaging 28, 151–164 (2009). 32T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active shape models-their training and application,” Comput. Vis. Image Understanding 61, 38–59 (1995). 33K. Lekadir, R. Merrifield, and G. Z. Yang, “Outlier detection and handling for robust 3-D active shape models search,” IEEE Trans. Med. Imaging 26, 212–222 (2007). 34Y. Li, L. Gu, and T. Kanade, “Robustly aligning a shape model and its application to car alignment of unknown pose,” IEEE Trans. Pattern Anal. Mach. Intell. 33, 1860–1876 (2011). 35C. Goodall, “Procrustes methods in the statistical analysis of shape,” J. R. Stat. Soc. Ser. B 53, 285–339 (1991). 36M. B. Stegmann and D. D. Gomez, “A brief introduction to statistical shape analysis,” in Informatics and Mathematical Modelling (Technical University of Denmark, DTU, Denmark, 2002), Vol. 15. 37L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear singular value decomposition,” SIAM J. Matrix Anal. Appl. 21, 1253–1278 (2000). 38A. Tsai, W. Wells, C. Tempany, E. Grimson, and A. Willsky, “Mutual information in coupled multi-shape model for medical image segmentation,” Med. Image Anal. 8, 429–445 (2004). 39J. Yang, L. H. Staib, and J. S. Duncan, “Neighbor-constrained segmentation with level set based 3-D deformable models,” IEEE Trans. Med. Imaging 23, 940–948 (2004). 40R. Baldock and J. Graham, Image Processing and Analysis (Oxford University Press, Oxford, UK, 2000). 41W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, and M. Metcalf, Numerical Recipes in : The Art of Scientific Computing, 2 ed. (Cambridge University Press, New York, NY, 1992). 42J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Commun. ACM 18, 509–517 (1975). 43V. Ramasubramanian and K. K. Paliwal, “Fast K-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding,” IEEE Trans. Signal Process. 40, 518–531 (1992). 44M. H. Zweig, “ROC plots display test accuracy, but are still limited by the study design,” Clin. Chem. 39, 1345–1346 (1993). 45M. H. Zweig and G. Campbell, “Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine,” Clin. Chem. 39, 561–577 (1993). 46M. Frigge, D. C. Hoaglin, and B. Iglewicz, “Some implementations of the boxplot,” Am. Stat. 43, 50–54 (1989). 47M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Inf. Process. Manage. 45, 427–437 (2009).

Automated contouring error detection based on supervised geometric attribute distribution models for radiation therapy: a general strategy.

One of the most critical steps in radiation therapy treatment is accurate tumor and critical organ-at-risk (OAR) contouring. Both manual and automated...
3MB Sizes 0 Downloads 2 Views