Review
CT screening for lung cancer: countdown to implementation John K Field, David M Hansell, Stephen W Duffy, David R Baldwin
Implementation of lung cancer CT screening is currently the subject of a major policy decision within the USA. Findings of the US National Lung Screening Trial showed a 20% reduction in lung cancer mortality and a 6·7% decrease in all-cause mortality; subsequently, five US professional and clinical organisations and the US Preventive Services Task Force recommended that screening should be implemented. Should national health services in Europe follow suit? The European community awaits mortality and cost-effectiveness data from the NELSON trial in 2015–16 and pooled findings of European trials. In the intervening years, a recommendation is proposed that a demonstration trial is done in the UK. In this Review, we summarise the existing evidence and identify questions that remain to be answered before the implementation of international lung cancer screening programmes.
Introduction Until recently, evidence was unclear about a mortality benefit from early detection of lung cancer.1,2 However, in 2011, findings of the US National Lung Screening Trial (NLST) showed a 20·0% decrease in mortality from lung cancer and a 6·7% all-cause reduction.3 NLST researchers compared low-dose CT at baseline, at 1 year, and at 2 years, with chest radiography used in the control arm. People were selected who were aged 55–74 years with a smoking history of 30 or more pack-years and they had to have smoked within the previous 15 years. In the USA at least, screening is now recommended by several professional organisations for people who match NLST entry criteria, with some additions (table 1).4–8 Potentially, large numbers of individuals could be screened who will gain only minimum benefit, and some people who are at high risk might not be screened, with concomitant reductions in cost-effectiveness. Data for costeffectiveness have not yet been published by NLST investigators, but estimates based on models of NLST data vary from US$19 000 to $126 000–169 000 per qualityadjusted life-year (QALY).9,10 Cost-effectiveness is a key issue for many countries, including the UK, and will be strongly influenced by the design of the screening programme. In Europe, seven randomised controlled trials are underway of low-dose CT screening for lung disease. These trials have recruited substantially fewer individuals than NLST and only one (NELSON, the Dutch–Belgian lung cancer screening trial) is powered at 80% to show a reduction in lung cancer mortality of at least 25% at 10 years after randomisation.11–13 Table 2 presents an overview of the European trials.11,14–19 Recruitment has been completed in all trials, although the UK Lung Screening (UKLS) trial has recruited into the pilot phase only (the full trial planned to recruit an additional 28 000 people but has not been funded at this time). Volumetric and two-dimensional (2D) analysis of nodules21 is done in UKLS,19 NELSON,11 the Danish Lung Cancer Screening Trial (DLCST),14 the German LUng cancer Screening Intervention (LUSI) study,16 and the Multi-centric Italian Lung Detection (MILD) trial.18 In NELSON and UKLS, prespecified algorithms are used to manage indeterminate nodules, rather than regarding www.thelancet.com/oncology Vol 14 December 2013
all nodules of a specific size as positive. Due to the singlescreen design of the UKLS trial, the lowest limit of nodule volume and diameter is specified to prompt further imaging (15 mm³). Recruitment in UKLS also differs from other studies, whereby a randomised population postal approach was used, recruiting people within the eligible age band, followed by individual risk stratification for lung cancer with a validated risk assessment method. By contrast, in NELSON,11 LUSI,16 and the Italian lung cancer CT screening trial (ITALUNG),17 participants were recruited via random samples from age bands of the population, followed by selection on the basis of smoking habit. Although individually small, the European trials will together contribute important information that could help us to design future screening programmes. Already, they have provided valuable data on smoking cessation,22 chronic obstructive pulmonary disease,23 coronary artery calcification,24 surgical resection methodology,25 and the value of biomarkers—the most notable being circulating DNA26,27 and microRNA plasma signatures.28 Findings of NELSON and MILD will enable some comparison of annual screening intervals with intervals of 2 years. People have been recruited with different age distributions, and findings already show, as expected, higher detection rates with an older average age (table 2). Table 3 presents early mortality data from three European trials and NLST. In DLCST and DANTE (the Detection And screening of early lung cancer by Novel
Age (years)
Lancet Oncol 2013; 14: e591–600 Roy Castle Lung Cancer Research Programme, University of Liverpool Cancer Research Centre, Liverpool, UK (Prof J K Field PhD); Department of Radiology, Royal Brompton Hospital, London, UK (Prof D M Hansell MD); Wolfson Institute of Preventive Medicine, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK (Prof S W Duffy PhD); and Respiratory Medicine Unit, David Evans Research Centre, Nottingham University Hospitals, Nottingham, UK (Prof D R Baldwin MD) Correspondence to: Prof John K Field, Roy Castle Lung Cancer Research Programme, University of Liverpool Cancer Research Centre, Liverpool L3 9TA, UK
[email protected] NCCN
ALA
AATS
ACCP and ACS ASCO
55–74
55–74
55–79
55–74
55–74
Smoking history (pack-years) 30
30
30
30
30
Last smoked within (years)
15
15
NA
15
15
Other recommendations
.. 20 pack-years, age >50 years, and one other risk factor
Interval
Annual
.. 20 pack-years, age >50 years, and if 5% risk over 5 years
Annual Annual
Annual
..
Annual
NCCN=National Comprehensive Cancer Network. ALA=American Lung Association. AATS=American Association of Thoracic Surgeons. ACCP=American College of Chest Physicians. ASCO=American Society of Clinical Oncology. ACS=American Cancer Society. NA=not available.
Table 1: Recommendations for lung disease CT screening
e591
Review
NELSON, van Klaveren DLCST, Pedersen et al (2007)11 et al (2009)14
LUSI, Becker et al (2012)16
DANTE, Infante et al (2007)15
ITALUNG, Lopes Pegna MILD, Pastorino et al et al (2009)17 (2012)18
UKLS, Baldwin (2011)19
Number of rounds
3
5
5
5
4
5
1
Number of screening sites
4
1
1
3
5
3
2
Vendor CT scanner
Siemens and Philips
Number of rows
Philips
16
Toshiba and Siemens
16
Philips
16 and 128
Siemens and General Electric
1 and 16
Yes
Yes
Yes
2nd reading
Yes
Yes
Yes
Quality control
Training set
Expert opinion
Expert opinion
1 year
1 year
Screen interval
1, 2, and 2·5 years
1 year
No
1 and 16
Volumetric software
Siemens and Philips 6–16 Yes
Yes
Yes
Yes
Yes
Yes
Training set
Training set
Training course
Training course
Randomisation to 1 or 2 years
One screen design
1 year
7515
2052
2029
1276
1613
1185 and 1182
Number in control arm
7907
2052
2023
1196
1593
1630
59 (6)
57 (5)
16
No
Number in screen arm Mean (SD) age at randomisation (years)
Siemens
58 (5)
2030* 2030*
65 (5)
61 (4)
59 (6)
NA
Current smokers at randomisation (%)
55%
76%
61%
55%
65%
63%
NA
Mean (SD) pack-years
42 (19)
36 (13)
36 (18)
47 (25)
43 (18)
43 (15)
NA
Women (%)
16%
45%
34%
35%
32%
NA
6
5
3
6
6
5
NA
90 655
23 248
4073
13 541
14 453
15 589
NA Yes
Follow-up since randomisation (years) Person-years of follow-up†
0%
Recruitment completed
Yes
Yes
Yes
Yes
Yes
No
Screening completed
No
Yes
No
Yes
Yes
No
Detection rate baseline (%)
0·9%
0·8%
1·1%
2·2%
1·5%
No 0·8%
NA
Adapted from Field et al,20 with permission of Wiley. NELSON=Dutch–Belgian lung cancer screening trial. DLCST=Danish Lung Cancer Screening Trial. LUSI=German LUng cancer Screening Intervention study. ITALUNG=ITAlian LUNG cancer CT screening trial. DANTE=Detection And screening of early lung cancer by Novel imaging TEchnology and molecular assays. MILD=Multi-centric Italian Lung Detection trial. UKLS=UK Lung Screening trial. NA=not available or applicable. Data are up to August, 2010, apart from *UKLS, 2013. †Cutoff date, Jan 1, 2011.
Table 2: Overview of European randomised CT screening trials
Patients in study/ Planned screening Average control groups (n) rounds in study group follow-up (years)
Intervention regimen Control regimen Risk ratio (95% CI) for lung cancer mortality
DLCST, Saghir et al (2012)29
2052/2052
Five
4·8
Annual CT
Usual care
1·37 (0·62–2·99)
DANTE, Infante et al (2009)30
1276/1196
Five
3
Annual CT
Baseline chest radiograph
0·94 (0·50–1·79)
Three
6·2
Annual CT
Annual chest radiograph
0·80 (0·73–0·93)
Five every year or three every 2 years
4·4
Smoking CT every 1 or 2 years plus smoking cessation cessation advice advice and spirometry and spirometry
NLST, Aberle et al (2011)3
26 722/26 732
MILD, Pastorino et al (2012)31
2376/1723
1·50 (0·62–3·60)
DLCST=Danish Lung Cancer Screening Trial. DANTE=Detection And screening of early lung cancer by Novel imaging TEchnology and molecular assays. NLST=National Lung Screening Trial. MILD=Multi-centric Italian Lung Detection trial.
Table 3: Low-dose CT trials reporting effects on lung cancer mortality
imaging TEchnology and molecular assays study),29,30 the intervention group was offered annual low-dose CT screening, and in MILD,31 two active intervention groups of CT screening every 1 or 2 years were included. The European trials are underpowered and have suboptimum follow-up periods; therefore as expected, no significant reduction was reported in lung cancer mortality. However, findings of a meta-analysis (table 3) of these trials, including NLST, showed an overall mortality reduction of 19% (risk ratio 0·81, 95% CI 0·70–0·92), very similar to the NLST result alone. Analysis of mortality data from NELSON could be possible in e592
2015–16, and pooling of data is intended if similarity of trial design allows.32 In the meantime, findings of NLST are the dominant driver for implementation of low-dose CT screening.
Selection of the population at risk Screening based on individual risk estimation is likely to be cost effective and will reduce harm to people with the least risk of lung cancer and, therefore, who are unlikely to benefit. Individualised risk estimation has been developed in several models.33–37 The Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial www.thelancet.com/oncology Vol 14 December 2013
Review
lung cancer risk model37 was developed from the largest dataset known to date. A revised version of this model has recently been applied to the NLST dataset and selected 81 additional people for screening who received a diagnosis of lung cancer in follow-up, which would have resulted in 12 fewer deaths.38 The Liverpool Lung Project (LLP) risk model was used to select participants for the UKLS trial.19,39 As far as we know, this trial remains the only randomised controlled trial of low-dose CT to select on the basis of formal individual risk estimation. LLP has similar receiver-operator characteristics to the PLCO model.34,35 Ideally, the lung cancer community would like to identify high-risk individuals for CT screening trials with biomarkers and, indeed, use these biomarkers to assist in the management of indeterminate CT-detected nodules. However, even though much work has been published on biomarkers for early diagnosis of lung cancer, none has been added to criteria in Early Detection Research Network (EDRN) guidelines and, thus, no biomarker is ready for integration into national screening programmes.40 However, in expectation that such biomarkers will, in time, be validated, all current and planned CT screening trials (eg, NLST, NELSON, UKLS) must gather suitable biological material and specimens as part of the trial protocol.
Recruitment into screening programmes The NLST investigators noted that, when compared with US census figures for people aged 55–74 years, their participants were younger (91% were aged 55–69 years) and better educated.3 In the UKLS trial,41 of people approached for inclusion who were younger than 60 years, very few were at high risk of lung cancer and, therefore, eligible for screening. Worldwide, lung cancer is more common with increasing deprivation and age; only researchers on the UKLS trial have looked at the effect of these variables on level of recruitment. Older people and those with greater levels of deprivation (the hard-to-reach group) are less likely to participate in trials but are more likely to fulfil risk criteria. These findings might predict what will happen in a national screening programme. To increase participation from the hard-to-reach community, innovative awareness programmes will be needed, with integration of smoking cessation, symptom awareness, and screening participation to maximise the cost-effectiveness of the exercise. Some of these interventions could be cross-cutting and used for prevention and awareness of other tumours.
Technical considerations The application of CT to screen for lung cancer only became possible with the advent of multidetector CT (MDCT), which allows quick acquisition (in one breathhold) of thin overlapping slices. The resulting volumetric scan has near-isotropic resolution—that is, equal www.thelancet.com/oncology Vol 14 December 2013
resolving capability in all three dimensions—meaning that small nodules are not blurred in cranio-caudal axis and, therefore, are less likely to be missed. The ability to scan the lungs at a low radiation dose, especially important in a screening setting, predates the development of MDCT.42 For the specific task of nodule detection in the lungs, dose reduction can be quite extreme because, although image noise (graininess) is increased, the high contrast between white nodules and background black lung means that the conspicuity of solid nodules is not compromised. However, the disadvantage to dose reduction is that small semi-solid or pure ground-glass nodules might be less well visualised on very-low-dose CT examinations, but these types of nodules only represent a small proportion of all those identified to date in screening trials. Nevertheless, rigorous attempts to keep the radiation dose as low as possible while obtaining diagnostically adequate images remains a key tenet of any screening programme. The main determinant of how low the tube current and, therefore, the radiation dose can go is body habitus (ie, body type or physique), in particular weight. In the near future, developments in CT technology are unlikely to allow a further large step-down in radiation dose. With current MDCT, the effective radiation dose to an individual is below 1·6 mSv, which is roughly the amount of background natural radiation that a person receives in 1 year and compares with up to 8 mSv from regular CT of the thorax.43 Detection of small nodules among the structured noise of branching vessels and bronchi could be judged a task that requires considerable skill and is best left to expert chest radiologists. However, non-radiological personnel can, with adequate training, read CT scans and identify nodules with a performance that approaches that of skilled (and expensive) radiologists.44 The idea of using technologists to read screening examinations is not new and has been applied successfully in mammographic screening programmes.45 However, can radiographers (for example) be used as sole readers or do they need to be part of a dual-read approach, whereby they read the CT before a radiologist assesses it? Cost-effectiveness has been examined in the setting of mammographic screening46 but needs further investigation in lung cancer screening with CT. Another potential aid to increasing the yield—if not accuracy—of nodule identification is computer-assisted detection software.47–52 Variations in software design, and the way in which the effect of computer-assisted detection is measured, make generalisations about use of this approach unhelpful. Nevertheless, some conclusions can be drawn from the plethora of studies related to computer-assisted detection. First, the sensitivity of computer-assisted detection grows with increasing nodule size. Second, this technique is most effective when used after, rather than before, reading of a CT scan. Finally, the incremental effect of computer-assisted
For EDRN guidelines see http:// edrn.nci.nih.gov/
e593
Review
detection applies to both expert and inexperienced (or less skilled) readers.53 It is noteworthy that no completed or extant lung cancer screening trial uses computer-assisted detection, even though this method has been available at various levels for many years. Nodule ID: 17 Status: reported
A
17 R
B
Volume [mm3]: 9·59 X-diameter [mm]: 0·70 Y-diameter [mm]: 0·00 Z-diameter [mm]: 0·35 Min diameter [mm]: 0·00 Max diameter [mm]: 0·78 Algorithm: AllSizeNodule VOL-Size [mm]: 40 Nodule ID: 1 Status: reported
1R
Volume [mm3]: 32·90 X-diameter [mm]: 3·15 Y-diameter [mm]: 3·15 Z-diameter [mm]: 3·85 Min diameter [mm]: 2·60 Max diameter [mm]: 4·27
C
3R
Volume [mm3]: 163·73 X-diameter [mm]: 7·00 Y-diameter [mm]: 7·00 Z-diameter [mm]: 6·65 Min diameter [mm]: 5·50 Max diameter [mm]: 6·16 Density average [HU]: 35·54 Density stddev [HU]: 37·84 Nodule ID: 4 Status: reported
D 4R
Volume [mm3]: 789·18 X-diameter [mm]: 11·20 Y-diameter [mm]: 11·90 Z-diameter [mm]: 12·95 Min diameter [mm]: 8·43 Max diameter [mm]: 14·95 Density average [HU]: –147·17 Density stddev [HU]: 92·34
Figure: Examples of nodules detected visually and characterised volumetrically Siemens LungCare software was used for characterisation. Images are from patients enrolled in the UK Lung Screening (UKLS) trial. (A) Inconspicuous small nodule not fulfilling volumetric size criterion for UKLS category 1 nodule (>15 mm3); this 9·59 mm3 nodule would not be followed up in the UKLS care pathway. (B) Category 2 nodule (15–49 mm3); follow-up CT would be done at 1 year. (C) Category 3 nodule (50–500 mm3); follow-up CT would be done at 3 months. (D) Category 4 nodule (>500 mm3) such a nodule would mandate referral for multidisciplinary team assessment.
e594
Both non-radiological readers and computer-assisted detection systems have an inability to contextualise— that is, they are less able to apply a level of significance to nodules that they detect. In both cases, the size of the nodule, either its diameter or volume, is the main determinant of whether it is reported and how it will be followed up or managed (figure). However, other features indicate (with varying degrees of certainty) that a nodule is a normal structure or is part of benign disease; these characteristics would be recognised and dismissed for what they are by a radiologist but might be recorded by a computer-assisted detection system or a non-radiologist reader as a nodule needing further assessment.
Screen interval and threshold for further work-up Advances in technology have enabled tiny nodules to be identified with the minimum of radiation exposure. Harm in these circumstances is much more likely to be attributable to the effect of the findings than from radiation. These harms can be physical and psychological. We need to establish at what point further investigation of nodules is unnecessary and likely to cause more harm than good. This process will depend (to some extent) on patient-related factors, including fitness and comorbidity, and institutional factors, such as local expertise in undertaking and interpreting investigations. However, the most important and inter-related factors are nodule size threshold for further investigation and the frequency of the screen. Screening undertaken at frequent intervals allows the nodule size threshold to increase because smaller nodules will be followed up at the next interval screen. A less frequent screen will cost less but risk missing very early cancers that might shift stage in the intervening year but that still could be treated successfully. Therefore, if lung cancer screening is undertaken less frequently, the size threshold of nodules will need to be reconsidered. Interval cancers are inevitable, but a longer screen interval might miss more indolent tumours that have a greater chance of benefiting from early diagnosis. Henschke and colleagues54 reported a retrospective review of the I-ELCAP database from 2006 to 2010. They examined data for the effect of nodule size threshold for further radiological or other work-up on the proportion of nodules that were cancerous and on the proportion of cancers that had a diagnostic delay of 9 months or less (the effect of this delay on outcome could not be assessed). As expected, compared with a 5 mm cutoff, the larger the nodule size threshold, the greater the number of delayed diagnoses (6·7% for a 9 mm cutoff, 5·9% for 8 mm, 5% for 7 mm, and 0% for 6 mm). The findings also showed that diagnostic work-up (including repeat CT) could be reduced by 75% for a 9 mm cutoff. Opinion is divided on whether raising the threshold for follow-up is justifiable on size alone,55 www.thelancet.com/oncology Vol 14 December 2013
Review
because of the possibility of missing interval cancers and the scarcity of data for the effect on mortality. Another way to improve cost-effectiveness could be to lengthen the screen interval. Two trials have incorporated screening every 2 years: NELSON (final screen) and the MILD project. Results for 2-year intervals are only available for MILD,31 in which 4099 participants were randomised in a three-way comparison to either no CT (n=1723), CT every 2 years (n=1186), or CT every year (n=1190). Pastorino and co-workers on the MILD trial recorded cumulative 5-year lung cancer incidences of 311 per 100 000 for no CT, 457 per 100 000 for CT every 2 years, and 620 per 100 000 for CT every year (p=0·036); mortality rates were 109 per 100 000, 109 per 100 000, and 216 per 100 000, respectively (p=0·21). Nodules with a volume of 60 mm³ (4·8 mm diameter) or more were treated as positive. Although no mortality benefit was seen in this small study, clearly more cancers were detected in the annual group. These study findings provide some insight into the effect of extending the screen interval to detect interval cancers. In the NELSON trial,11 in which a cutoff was used of 50 mm³ (4·6 mm diameter), the chance of detecting lung cancer on a CT scan after a baseline negative screen was 0·1% at 1 year and 0·3% at 2 years. In the same trial, the baseline cancer detection rate was 0·9% and at the second annual screen the rate was 0·7%. Therefore, extending the screening interval to 2 years might delay diagnosis of a substantial proportion of cancers, even when the threshold for further work-up is as low as 50 mm³. The effect of the screening interval on mortality is not known but can be modelled. Table 4 presents modelling data based on a high-risk group for screening every year or every 2 years, with a threshold of 4 mm (as used in NLST). In table 4 we consider two possible response rates: the 30% rate is roughly that observed in the UKLS trial, and the higher rate of 60% might be anticipated if there were a national programme, with the public health endorsement implied in such a programme. We should not assume that both respondent populations have the same risk. Therefore, we postulate that around 10% of responders in the 30% group would have sufficient risk to qualify, versus 8% of those in the 60% group. The detection rate of cancers at screening is dependent on whether it is the first or a subsequent screen, the incidence of the disease in the screened population, mean sojourn (in years), and the sensitivity of the screening test.56,57 Estimates of incidence in the populations are taken from the NLST trial, from the reported empirical annual incidence. The estimated average incidence is taken from the LLP risk model in a subset of positive responders who met the 5% risk criteria. Based on the formulae of Launoy and colleagues56 and Duffy and co-workers,57 the numbers of cancers detected at screening and the numbers arising in the intervals between screens can be estimated for the different screening frequencies. www.thelancet.com/oncology Vol 14 December 2013
Screen every 2 years
Screen every year
30% response
60% response*
30% response 60% response*
Willing to participate
300 000
600 000
300 000
600 000
Eligible to participate
30 000
48 000
30 000
48 000
First screens
30 000
48 000
30 000
48 000
Subsequent screens
60 000
96 000
120 000
192 000
Cancers detected at first screen
810
1300
810
1300
1020
1640
1320
2110
Interval cancers
660
1050
360
580
Cancer deaths prevented (NLST period)
100–240
160–380
160–280
260–440
Predicted deaths prevented in long term
180–430
290–680
290–500
470–790
Cancers detected at subsequent screens
Data are number of people. A million people aged 60–74 years were approached with various scenarios. Data for size threshold parameters and assumptions are available from the authors. The ratio of screen-detected to interval cancers is similar to that in mammography programmes, although survival from lung cancer is much shorter than for breast cancer. NLST=National Lung Screening Trial. *60% response is the approximate rate for bowel screening in the UK.
Table 4: Estimated activity and outcomes over 4 years of a national low-dose CT screening programme
From a million individuals approached, we estimated that either 300 000 or 600 000 would respond (table 4). Of these, 30 000 or 48 000 would reach the UKLS criterion of a minimum risk of 5% in the next 5 years, with an average risk of 1·4% per year. With an approximate sensitivity of 95% and mean sojourn of 2 years,58 a detection rate would be implied of 2·7% at the first screen and 1·1% and 1·7%, respectively, with 1-year and 2-year intervals between subsequent screens.56 Probable mortality reductions are calculated from NLST results essentially; the lower bound is based on the recorded number of deaths prevented in NLST per screening episode and the upper bound is based on deaths prevented per screen-detected cancer. Longer term estimates are based on inflation of absolute numbers by a factor of 1·8, to give 85% fatality in the NLST control group, as noted at 10 years in US Surveillance, Epidemiology and End Results (SEER) data.59 Balance must exist between the number of lives that can be saved and the cost of implementing a yearly screening programme. Currently, no solid costeffectiveness data are available. A judgment call might have to be made between providing an affordable screening programme with 2-year screen intervals that does save lives (180–430 lives would be saved per 30 000 people screened in the short term for 90 000 CT screening episodes; table 4) or potentially no screening programme. The absolute effect of screening is very dependent on baseline risk, which in turn depends on age, smoking, and other risk factors. The screening interval and nodule work-up threshold could be tailored to individual risk in future programmes. The screen interval and size threshold would have to be based on accurate individual risk estimation; the next generation of risk-prediction models that incorporate baseline CT characteristics (other than merely size) potentially could provide a personalised screening interval. e595
Review
Work-up of nodules
Harms
Debates about the best way to investigate nodules are typically complex, but the aim is to avoid harmful investigations while diagnosing lung cancer promptly. Nodules deemed to represent an intermediate or positive test (according to size threshold) can be managed in three ways: further imaging, either immediate or after an interval; minimally invasive biopsy, usually imageguided transthoracic biopsy; or by surgical resection. All these strategies have their merits, and the ideal one has long been a cause for debate. For small nodules (