Eur Radiol (2014) 24:2719–2728 DOI 10.1007/s00330-014-3329-0

ONCOLOGY

Toward clinically usable CAD for lung cancer screening with computed tomography Matthew S. Brown & Pechin Lo & Jonathan G. Goldin & Eran Barnoy & Grace Hyun J. Kim & Michael F. McNitt-Gray & Denise R. Aberle

Received: 14 January 2014 / Revised: 22 April 2014 / Accepted: 8 July 2014 / Published online: 24 July 2014 # European Society of Radiology 2014

Abstract Objectives The purpose of this study was to define clinically appropriate, computer-aided lung nodule detection (CAD) requirements and protocols based on recent screening trials. In the following paper, we describe a CAD evaluation methodology based on a publically available, annotated computed tomography (CT) image data set, and demonstrate the evaluation of a new CAD system with the functionality and performance required for adoption in clinical practice. Methods A new automated lung nodule detection and measurement system was developed that incorporates intensity thresholding, a Euclidean Distance Transformation, and segmentation based on watersheds. System performance was evaluated against the Lung Imaging Database Consortium (LIDC) CT reference data set. Results The test set comprised thin-section CTscans from 108 LIDC subjects. The median (±IQR) sensitivity per subject was 100 (±37.5) for nodules≥4 mm and 100 (±8.33) for nodules≥ 8 mm. The corresponding false positive rates were 0 (±2.0) and 0 (±1.0), respectively. The concordance correlation coefficient between the CAD nodule diameter and the LIDC reference was 0.91, and for volume it was 0.90. Conclusions The new CAD system shows high nodule sensitivity with a low false positive rate. Automated volume measurements have strong agreement with the reference standard. Thus, it provides comprehensive, clinically-usable lung nodule detection and assessment functionality.

M. S. Brown (*) : P. Lo : J. G. Goldin : E. Barnoy : G. H. J. Kim : M. F. McNitt-Gray : D. R. Aberle Center for Computer Vision and Imaging Biomarkers, Department of Radiological Sciences, David Geffen School of Medicine at UCLA, 924 Westwood Blvd., Suite 615, Los Angeles, CA 90024, USA e-mail: [email protected]

Key Points • CAD requirements can be based on lung cancer screening trial results. • CAD systems can be evaluated using publically available annotated CT image databases. • A new CAD system was developed with a low false positive rate. • The CAD system has reliable measurement tools needed for clinical use. Keywords Lung cancer . Multiple pulmonary nodules . Computer-assisted diagnosis . Early detection of cancer . X-ray computerized axial tomography

Introduction Recent clinical trials of lung cancer screening with computed tomography (CT) have provided new data and diagnostic protocols upon which to base computer-aided diagnosis (CAD) targets— in particular, the size of lung nodule that should be detected. The National Lung Screening Trial (NLST) enrolled 53,454 participants and showed a reduction in mortality of 20 % with CT screening compared to radiography [1]. In this study, a non-calcified 4-mm-diameter nodule was considered a positive screen. The Nederlands-Leuvens Longkanker Screenings Onderzoek (NELSON) trial [2] used a volume-based criterion of a solid nodular component≥ 500 mm3 (equivalent to a 9.8-mm spherical nodule) for a positive screen. A solid component≥50 mm3 (equivalent to a 4.6-mm spherical nodule) or a GGO of mean diameter> 8 mm was considered an indeterminate screen. Although these trials used a nodule size threshold of around 4 mm in their screening protocols, currently available results indicate that malignancies are typically larger. In the baseline NLST data, the PPVof a 4–6-mm nodule was only

2720

0.5 %, but 7–10-mm nodules had a positive predictive value (PPV) of 1.7 % [3]. Furthermore, a recent retrospective analysis from the I-ELCAP trial found that using diameter thresholds of 6 mm, 7 mm and 8 mm could decrease further workup by 36, 56, and 68 %, respectively, while resulting in a maximum delay in lung cancer diagnosis of 9 months in 0, 5, and 5.9 % of cases, respectively [4]. These results suggest that clinical screening algorithms may ultimately utilize a higher nodule size threshold. Automated CAD technology for lung nodule detection and measurement is needed if lung cancer screening is to become routinely available. Numerous research CAD systems for CT have been developed and published, and some have been commercialized [5], however, they are rarely used in clinical practice. This is due primarily to inadequate performance levels and lack of workflow integration and assessment tools [6]. In particular, CAD is not in widespread clinical use because of an inability to limit false positive detections, e.g., normal anatomy such as vascular or airway branches that are incorrectly detected by CAD as nodules. These false positives take more time to rule out, and some studies suggest that radiologists can incorrectly accept false positives, which in practice would lead to unnecessary workups. In addition, it has been difficult to establish the relative efficacy of CAD systems because of a lack of standardization in evaluation criteria and test data sets. The target nodule size and characteristics for CAD detection have not been welldefined or consistently applied. The Lung Imaging Database Consortium (LIDC) now provides a public CT image data set with annotated nodules [7]. Applying the NLST/NELSON nodule size criteria to this data set provides a CAD gold standard with a clinically-based rationale. Since criteria for a positive screen involve nodule size measurements, automated volume and diameter measurement capabilities are needed, and their accuracy should be evaluated in CAD systems. Such tools are also important for nodule workup and comparison on follow-up scans. The goals of this paper are threefold: 1) Based on the consideration of CAD technical factors and clinical trials, to specify clinically appropriate CAD protocols for lung cancer screening, including: (a) target lesion criteria, (b) lesion assessment capabilities, and (c) CT imaging protocol requirements 2) To describe a CAD evaluation methodology based on the public, annotated LIDC image data set, including nodule detection and measurement accuracy 3) To demonstrate the evaluation of a new CAD system with the performance and measurement functionality approaching that required for adoption in clinical practice for lung cancer screening.

Eur Radiol (2014) 24:2719–2728

Materials and methods CAD Detection Protocol Recent clinical trials, in particular the NELSON and NLST studies, have defined positive/indeterminate CT screens as nodules ranging in diameter from 4 to 10 mm. Therefore, there is now a basis in clinical trials for adopting a size threshold of 4 mm for CAD detection. This is significant because some systems, including those that we previously reported, were developed and evaluated on nodules as small as 1 mm in diameter [8]. Increasing the size threshold creates the possibility of improving performance, and in particular, reducing the number of false positives. Further, as described in the Introduction, a higher size threshold in the range of 7–10 mm has been shown to increase the PPV of the screening test [3]. Therefore, we also evaluated the CAD system at a higher size threshold of 8 mm. A higher threshold may ultimately be more appropriate for CAD given the high false positive rates for lung cancer at the 4-mm level found in recent lung cancer screening trials, and also given that the clinical significance of 4-mm nodules remains in question [9]. CT imaging protocol CAD performance is intimately linked to image quality and acquisition parameters. Both spatial resolution and noise levels have been shown to influence performance [10]. Thus, it is important to specify the imaging protocol required. CAD systems directly or indirectly apply intensity and shape filtering to detect pulmonary nodules from vessels. The smallest symmetric 3D filter kernel size is 3 × 3 × 3 voxels. Therefore, for a nodule to be detectable by CAD we require that it appear on at least three axial slices without partial voluming. The in-plane nodule dimensions have similar requirements; smaller nodules cannot be reliably distinguished from vessels that appear fragmented due to partial voluming as they branch and curve in and out of the scan-plane. Therefore, we specify that the minimum detectable nodule size must have all three dimensions≥4 × slice thickness for a given CT, i.e., the CT slice thickness must be≤(0.25×nodule dimension) to ensure adequate resolution (see schematic in Fig. 1). So, the detection of 4-mm nodules requires a slice thickness≤ 1 mm, and a slice thickness≤2 mm is required for 8-mm nodules. In addition, we recommend relatively smooth reconstruction kernels to avoid artificial edge enhancement of lung lesions and to reduce noise [11]. The NELSON CT imaging protocol utilized 1-mm slice thickness, and thus, CAD could be expected to reliably detect nodules with dimensions of 4 mm. The NLST imaging protocol required≤2.5-mm slice thickness, which permits the

Eur Radiol (2014) 24:2719–2728

2721

The system also generates Response Evaluation Criteria in Solid Tumors (RECIST) reports of lesion burden using DL measurements. CAD evaluation

Fig. 1 Schematic demonstrating that an object with longitudinal dimension of 4 × slice thickness will appear on at least three slices without partial voluming

detection of nodules of 10 mm or smaller for scans using thinner slices. Low-dose imaging was recommended, and the NLST provided imaging model-specific technique charts with effective mAs2 ranging from 20–80 [11].

CAD system Based on these clinically appropriate target nodule and imaging protocol requirements, we have developed and evaluated a new, automated lung nodule detection and measurement system. The CAD system is an extension of a previously published algorithm [8, 12–14]. Following intensity thresholding, the new algorithm incorporates a Euclidean Distance Transformation (EDT) and segmentation based on watersheds and a connected component analysis. In the EDT image the value of a voxel is computed as the minimum distance from that voxel to a voxel not included in the threshold. Nodule candidates are detected by watershed segmentation of the EDT image. Threedimensional seeded region-growing is performed from the local EDT maxima of each nodule candidate to include all contiguous voxels within a percentage threshold of this maxima. This step prunes contacting vessels from nodule candidates while preserving the elongated shape if the seed is within a vessel. Rules pertaining to volume and shape (sphericity) of segmented regions are then applied to distinguish nodules from vessels. The system reports the following nodule measurements: & & & &

Volume (V)—sum of nodule ROI voxels in mm3; Volume-equivalent diameter (DV)—diameter of a sphere with volume equal to V; Longest diameter (DL)—maximum axial distance between any two boundary voxels from a nodule crosssection; Mean CT attenuation (M)—mean attenuation of nodule ROI voxels in Hounsfield Units (HU).

The publically available Lung Imaging Database Consortium (LIDC) data set was used for CAD system evaluation [7]. Sites contributing to the LIDC data set obtained informed consent, as required by their Institutional Review Boards, and submitted de-identified image data. CT scanner models and acquisition parameters vary by participating site. A maximum slice thickness of 5 mm was a scan inclusion criteria, while there were no requirements with respect to scanner pitch, exposure, tube voltage, or reconstruction algorithm. The minimum effective diameter for included nodules was 3 mm, with an upper limit of 30 mm. The contours of the included nodules were independently constructed manually by a panel of four thoracic radiologists, using a combination of blinded and unblinded reviews by multiple radiologists. Nodules smaller than 3 mm that were suspicious for cancer were also identified, but these nodules were not contoured. The CAD test set included 120 consecutive LIDC cases with IDs 0081–0200 (the first 80 cases were used for system development). Cases were excluded when the DICOM instance numbers were not contiguous, resulting in scans from 108 test subjects confirmed as having all cross-sectional image slices present. The CT slice thickness ranged from 0.6 to 3.0 mm. Nodules were included that were marked by a majority of the four radiologist readers. The annotations provided by the radiologists were combined at the voxel level, such that each voxel marked as a nodule by at least three of the four radiologists was included in a “majority nodule ROI”. For each majority nodule ROI, V, DV, DL, and M were computed. The ROI was accepted as a reference nodule if: (1) DV ≥4 mm, and (2) all three of the nodule dimensions were≥4 × slice thickness. As described in the CAD detection protocol, the system is operable at 4-mm and 8-mm nodule size thresholds, depending on the desired sensitivity/specificity for lung cancer. Therefore, the detection performance assessment was stratified in terms of (all) nodules≥4 mm and the subset of nodules≥8 mm. The following definitions were used to calculate system performance metrics: & & &

True positive—detected nodule that overlaps with a reference nodule False negative—reference nodule that does not overlap with a detected nodule False positive—detected nodule that does not overlap with a reference nodule

2722

Eur Radiol (2014) 24:2719–2728

Fig. 2 Flowchart showing derivation of 44 subjects with gold standard reference nodules

Statistical analysis

CAD and a reference for these measurements [16]. All statistical analyses were performed using R (version 2.15.1).

CAD detection performance was reported using summary statistics in terms of nodule sensitivity per subject examination and the number of false positive detections per image, with stratification into nodules≥4 mm and the subset of nodules≥ 8 mm. Although nodules within a subject are not independent, we also reported the mean sensitivities over the total number of nodules. This was done to provide a comparison with other systems that use this metric. Summary statistics comparing automated CAD diameter/ volume against the LIDC majority nodule ROI reference were also reported with the same size stratification. The agreement between the automatically computed CAD volume (V) and the reference volume was evaluated using the concordance correlation coefficient (CCC) [15]. Similarly, CCCs were used to report agreement for DL, and M. Bland-Altman plots were also used to show the limit of agreement between computed

Results The CT scans of the 108 LIDC test subjects were acquired with both slice thickness and reconstruction intervals ranging from 0.5 to 3 mm (mean=2.1, Std dev=0.7), mAs ranging from 75 to 300, and kVp ranging from 120 to 140. The scanner manufacturers included GE, Siemens, and Toshiba. Kernels ranged from standard to sharp, but were predominantly standard, as defined in [17]. Scans from all 108 subjects were processed automatically by the CAD system. Forty-four subjects had one or more true nodules with DV ≥4 mm and an acceptable image slice thickness for detecting them with CAD (see Fig. 2 for a flowchart showing Table 2 Summary statistics of CAD false positive rate per subject

Table 1 Summary statistics of CAD sensitivity per subject Sensitivity per subject

Nodules with DV ≥4 mm

Nodules with DV ≥8 mm

Number of subjects Total # true lesions Median (±IQR), % Mean ( ± Std. dev.), % Range, %

44 68 100 (±37.5) 79.8 (±34.7) [0, 100]

44 58 100 (±8.33) 82.2 (±34.0) [0, 100]

False positive (FP) rates per subject

Nodules with DV ≥4 mm (V≥34 mm3)

Nodules with DV ≥8 mm (V≥268 mm3)

Number of subjects Total # FP Median (±IQR) Mean (±Std. dev) Range

108 221 0 (±2.0) 2.05 (±5.32) [0, 42]

108 109 0 (±1.0) 1.01 (±2.40) [0, 20]

Eur Radiol (2014) 24:2719–2728

2723

Table 3 Summary statistics of comparison between CAD measurements and LIDC reference Measure

Volume (mm3) Longest diameter (mm) Mean HU

CAD measurements

Difference from LIDC reference

% Difference from LIDC reference

Mean (±Std. dev)

Mean (±Std. dev)

Mean (±Std. dev)

DV ≥4 mm

DV ≥8 mm

DV ≥4 mm

DV ≥8 mm

DV ≥4 mm

DV ≥8 mm

1,560.5 (±1,605.7) 15.0 (±6.1) −125.2 (±119.1)

1,988.2 (±1,626. 9) 17.2 (±5.2) −102.8 (±110.4)

−42.9 (±740.5) −0.1 (±2.9) 25.6 (±80.2)

−28.6 (±850.1) 0.3 (±2.9) 6.6 (±73.1)

−26.6 (±57.0) −6.0 (±25.6) −22.2 (±216.8)

−14.6 (±38.3) −1.5 (±16.5) −19.3 (±241.9)

Table 4 Concordance correlation coefficients of automated CAD measurements with LIDC reference Overall Measure Volume Longest diameter Mean HU

CCC 0.90 0.91 0.77

95 % CI for CCC [0.84, 0.93] [0.87, 0.94] [0.66, 0.85]

extraction of these subjects). Sensitivity can only be computed for a subject with at least one true nodule, so that the denominator is non-zero, and so that CAD sensitivity can be computed for these 44 subjects (see Table 1). Statistics on false positive rate per subject were computed for all 108 test cases (see Table 2). If nodules were treated as independent samples, the sensitivity for the 68 nodules with DV ≥4 mm was 75 %, and for the 58 nodules with DV ≥8 mm it was 79 %. Of the 10 nodules with DV ≥4 mm and DV

Toward clinically usable CAD for lung cancer screening with computed tomography.

The purpose of this study was to define clinically appropriate, computer-aided lung nodule detection (CAD) requirements and protocols based on recent ...
1MB Sizes 2 Downloads 3 Views