Review

CT screening for lung cancer: countdown to implementation John K Field, David M Hansell, Stephen W Duffy, David R Baldwin

Implementation of lung cancer CT screening is currently the subject of a major policy decision within the USA. Findings of the US National Lung Screening Trial showed a 20% reduction in lung cancer mortality and a 6·7% decrease in all-cause mortality; subsequently, five US professional and clinical organisations and the US Preventive Services Task Force recommended that screening should be implemented. Should national health services in Europe follow suit? The European community awaits mortality and cost-effectiveness data from the NELSON trial in 2015–16 and pooled findings of European trials. In the intervening years, a recommendation is proposed that a demonstration trial is done in the UK. In this Review, we summarise the existing evidence and identify questions that remain to be answered before the implementation of international lung cancer screening programmes.

Introduction Until recently, evidence was unclear about a mortality benefit from early detection of lung cancer.1,2 However, in 2011, findings of the US National Lung Screening Trial (NLST) showed a 20·0% decrease in mortality from lung cancer and a 6·7% all-cause reduction.3 NLST researchers compared low-dose CT at baseline, at 1 year, and at 2 years, with chest radiography used in the control arm. People were selected who were aged 55–74 years with a smoking history of 30 or more pack-years and they had to have smoked within the previous 15 years. In the USA at least, screening is now recommended by several professional organisations for people who match NLST entry criteria, with some additions (table 1).4–8 Potentially, large numbers of individuals could be screened who will gain only minimum benefit, and some people who are at high risk might not be screened, with concomitant reductions in cost-effectiveness. Data for costeffectiveness have not yet been published by NLST investigators, but estimates based on models of NLST data vary from US$19 000 to $126 000–169 000 per qualityadjusted life-year (QALY).9,10 Cost-effectiveness is a key issue for many countries, including the UK, and will be strongly influenced by the design of the screening programme. In Europe, seven randomised controlled trials are underway of low-dose CT screening for lung disease. These trials have recruited substantially fewer individuals than NLST and only one (NELSON, the Dutch–Belgian lung cancer screening trial) is powered at 80% to show a reduction in lung cancer mortality of at least 25% at 10 years after randomisation.11–13 Table 2 presents an overview of the European trials.11,14–19 Recruitment has been completed in all trials, although the UK Lung Screening (UKLS) trial has recruited into the pilot phase only (the full trial planned to recruit an additional 28 000 people but has not been funded at this time). Volumetric and two-dimensional (2D) analysis of nodules21 is done in UKLS,19 NELSON,11 the Danish Lung Cancer Screening Trial (DLCST),14 the German LUng cancer Screening Intervention (LUSI) study,16 and the Multi-centric Italian Lung Detection (MILD) trial.18 In NELSON and UKLS, prespecified algorithms are used to manage indeterminate nodules, rather than regarding www.thelancet.com/oncology Vol 14 December 2013

all nodules of a specific size as positive. Due to the singlescreen design of the UKLS trial, the lowest limit of nodule volume and diameter is specified to prompt further imaging (15 mm³). Recruitment in UKLS also differs from other studies, whereby a randomised population postal approach was used, recruiting people within the eligible age band, followed by individual risk stratification for lung cancer with a validated risk assessment method. By contrast, in NELSON,11 LUSI,16 and the Italian lung cancer CT screening trial (ITALUNG),17 participants were recruited via random samples from age bands of the population, followed by selection on the basis of smoking habit. Although individually small, the European trials will together contribute important information that could help us to design future screening programmes. Already, they have provided valuable data on smoking cessation,22 chronic obstructive pulmonary disease,23 coronary artery calcification,24 surgical resection methodology,25 and the value of biomarkers—the most notable being circulating DNA26,27 and microRNA plasma signatures.28 Findings of NELSON and MILD will enable some comparison of annual screening intervals with intervals of 2 years. People have been recruited with different age distributions, and findings already show, as expected, higher detection rates with an older average age (table 2). Table 3 presents early mortality data from three European trials and NLST. In DLCST and DANTE (the Detection And screening of early lung cancer by Novel

Age (years)

Lancet Oncol 2013; 14: e591–600 Roy Castle Lung Cancer Research Programme, University of Liverpool Cancer Research Centre, Liverpool, UK (Prof J K Field PhD); Department of Radiology, Royal Brompton Hospital, London, UK (Prof D M Hansell MD); Wolfson Institute of Preventive Medicine, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK (Prof S W Duffy PhD); and Respiratory Medicine Unit, David Evans Research Centre, Nottingham University Hospitals, Nottingham, UK (Prof D R Baldwin MD) Correspondence to: Prof John K Field, Roy Castle Lung Cancer Research Programme, University of Liverpool Cancer Research Centre, Liverpool L3 9TA, UK [email protected]

NCCN

ALA

AATS

ACCP and ACS ASCO

55–74

55–74

55–79

55–74

55–74

Smoking history (pack-years) 30

30

30

30

30

Last smoked within (years)

15

15

NA

15

15

Other recommendations

.. 20 pack-years, age >50 years, and one other risk factor

Interval

Annual

.. 20 pack-years, age >50 years, and if 5% risk over 5 years

Annual Annual

Annual

..

Annual

NCCN=National Comprehensive Cancer Network. ALA=American Lung Association. AATS=American Association of Thoracic Surgeons. ACCP=American College of Chest Physicians. ASCO=American Society of Clinical Oncology. ACS=American Cancer Society. NA=not available.

Table 1: Recommendations for lung disease CT screening

e591

Review

NELSON, van Klaveren DLCST, Pedersen et al (2007)11 et al (2009)14

LUSI, Becker et al (2012)16

DANTE, Infante et al (2007)15

ITALUNG, Lopes Pegna MILD, Pastorino et al et al (2009)17 (2012)18

UKLS, Baldwin (2011)19

Number of rounds

3

5

5

5

4

5

1

Number of screening sites

4

1

1

3

5

3

2

Vendor CT scanner

Siemens and Philips

Number of rows

Philips

16

Toshiba and Siemens

16

Philips

16 and 128

Siemens and General Electric

1 and 16

Yes

Yes

Yes

2nd reading

Yes

Yes

Yes

Quality control

Training set

Expert opinion

Expert opinion

1 year

1 year

Screen interval

1, 2, and 2·5 years

1 year

No

1 and 16

Volumetric software

Siemens and Philips 6–16 Yes

Yes

Yes

Yes

Yes

Yes

Training set

Training set

Training course

Training course

Randomisation to 1 or 2 years

One screen design

1 year

7515

2052

2029

1276

1613

1185 and 1182

Number in control arm

7907

2052

2023

1196

1593

1630

59 (6)

57 (5)

16

No

Number in screen arm Mean (SD) age at randomisation (years)

Siemens

58 (5)

2030* 2030*

65 (5)

61 (4)

59 (6)

NA

Current smokers at randomisation (%)

55%

76%

61%

55%

65%

63%

NA

Mean (SD) pack-years

42 (19)

36 (13)

36 (18)

47 (25)

43 (18)

43 (15)

NA

Women (%)

16%

45%

34%

35%

32%

NA

6

5

3

6

6

5

NA

90 655

23 248

4073

13 541

14 453

15 589

NA Yes

Follow-up since randomisation (years) Person-years of follow-up†

0%

Recruitment completed

Yes

Yes

Yes

Yes

Yes

No

Screening completed

No

Yes

No

Yes

Yes

No

Detection rate baseline (%)

0·9%

0·8%

1·1%

2·2%

1·5%

No 0·8%

NA

Adapted from Field et al,20 with permission of Wiley. NELSON=Dutch–Belgian lung cancer screening trial. DLCST=Danish Lung Cancer Screening Trial. LUSI=German LUng cancer Screening Intervention study. ITALUNG=ITAlian LUNG cancer CT screening trial. DANTE=Detection And screening of early lung cancer by Novel imaging TEchnology and molecular assays. MILD=Multi-centric Italian Lung Detection trial. UKLS=UK Lung Screening trial. NA=not available or applicable. Data are up to August, 2010, apart from *UKLS, 2013. †Cutoff date, Jan 1, 2011.

Table 2: Overview of European randomised CT screening trials

Patients in study/ Planned screening Average control groups (n) rounds in study group follow-up (years)

Intervention regimen Control regimen Risk ratio (95% CI) for lung cancer mortality

DLCST, Saghir et al (2012)29

2052/2052

Five

4·8

Annual CT

Usual care

1·37 (0·62–2·99)

DANTE, Infante et al (2009)30

1276/1196

Five

3

Annual CT

Baseline chest radiograph

0·94 (0·50–1·79)

Three

6·2

Annual CT

Annual chest radiograph

0·80 (0·73–0·93)

Five every year or three every 2 years

4·4

Smoking CT every 1 or 2 years plus smoking cessation cessation advice advice and spirometry and spirometry

NLST, Aberle et al (2011)3

26 722/26 732

MILD, Pastorino et al (2012)31

2376/1723

1·50 (0·62–3·60)

DLCST=Danish Lung Cancer Screening Trial. DANTE=Detection And screening of early lung cancer by Novel imaging TEchnology and molecular assays. NLST=National Lung Screening Trial. MILD=Multi-centric Italian Lung Detection trial.

Table 3: Low-dose CT trials reporting effects on lung cancer mortality

imaging TEchnology and molecular assays study),29,30 the intervention group was offered annual low-dose CT screening, and in MILD,31 two active intervention groups of CT screening every 1 or 2 years were included. The European trials are underpowered and have suboptimum follow-up periods; therefore as expected, no significant reduction was reported in lung cancer mortality. However, findings of a meta-analysis (table 3) of these trials, including NLST, showed an overall mortality reduction of 19% (risk ratio 0·81, 95% CI 0·70–0·92), very similar to the NLST result alone. Analysis of mortality data from NELSON could be possible in e592

2015–16, and pooling of data is intended if similarity of trial design allows.32 In the meantime, findings of NLST are the dominant driver for implementation of low-dose CT screening.

Selection of the population at risk Screening based on individual risk estimation is likely to be cost effective and will reduce harm to people with the least risk of lung cancer and, therefore, who are unlikely to benefit. Individualised risk estimation has been developed in several models.33–37 The Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial www.thelancet.com/oncology Vol 14 December 2013

Review

lung cancer risk model37 was developed from the largest dataset known to date. A revised version of this model has recently been applied to the NLST dataset and selected 81 additional people for screening who received a diagnosis of lung cancer in follow-up, which would have resulted in 12 fewer deaths.38 The Liverpool Lung Project (LLP) risk model was used to select participants for the UKLS trial.19,39 As far as we know, this trial remains the only randomised controlled trial of low-dose CT to select on the basis of formal individual risk estimation. LLP has similar receiver-operator characteristics to the PLCO model.34,35 Ideally, the lung cancer community would like to identify high-risk individuals for CT screening trials with biomarkers and, indeed, use these biomarkers to assist in the management of indeterminate CT-detected nodules. However, even though much work has been published on biomarkers for early diagnosis of lung cancer, none has been added to criteria in Early Detection Research Network (EDRN) guidelines and, thus, no biomarker is ready for integration into national screening programmes.40 However, in expectation that such biomarkers will, in time, be validated, all current and planned CT screening trials (eg, NLST, NELSON, UKLS) must gather suitable biological material and specimens as part of the trial protocol.

Recruitment into screening programmes The NLST investigators noted that, when compared with US census figures for people aged 55–74 years, their participants were younger (91% were aged 55–69 years) and better educated.3 In the UKLS trial,41 of people approached for inclusion who were younger than 60 years, very few were at high risk of lung cancer and, therefore, eligible for screening. Worldwide, lung cancer is more common with increasing deprivation and age; only researchers on the UKLS trial have looked at the effect of these variables on level of recruitment. Older people and those with greater levels of deprivation (the hard-to-reach group) are less likely to participate in trials but are more likely to fulfil risk criteria. These findings might predict what will happen in a national screening programme. To increase participation from the hard-to-reach community, innovative awareness programmes will be needed, with integration of smoking cessation, symptom awareness, and screening participation to maximise the cost-effectiveness of the exercise. Some of these interventions could be cross-cutting and used for prevention and awareness of other tumours.

Technical considerations The application of CT to screen for lung cancer only became possible with the advent of multidetector CT (MDCT), which allows quick acquisition (in one breathhold) of thin overlapping slices. The resulting volumetric scan has near-isotropic resolution—that is, equal www.thelancet.com/oncology Vol 14 December 2013

resolving capability in all three dimensions—meaning that small nodules are not blurred in cranio-caudal axis and, therefore, are less likely to be missed. The ability to scan the lungs at a low radiation dose, especially important in a screening setting, predates the development of MDCT.42 For the specific task of nodule detection in the lungs, dose reduction can be quite extreme because, although image noise (graininess) is increased, the high contrast between white nodules and background black lung means that the conspicuity of solid nodules is not compromised. However, the disadvantage to dose reduction is that small semi-solid or pure ground-glass nodules might be less well visualised on very-low-dose CT examinations, but these types of nodules only represent a small proportion of all those identified to date in screening trials. Nevertheless, rigorous attempts to keep the radiation dose as low as possible while obtaining diagnostically adequate images remains a key tenet of any screening programme. The main determinant of how low the tube current and, therefore, the radiation dose can go is body habitus (ie, body type or physique), in particular weight. In the near future, developments in CT technology are unlikely to allow a further large step-down in radiation dose. With current MDCT, the effective radiation dose to an individual is below 1·6 mSv, which is roughly the amount of background natural radiation that a person receives in 1 year and compares with up to 8 mSv from regular CT of the thorax.43 Detection of small nodules among the structured noise of branching vessels and bronchi could be judged a task that requires considerable skill and is best left to expert chest radiologists. However, non-radiological personnel can, with adequate training, read CT scans and identify nodules with a performance that approaches that of skilled (and expensive) radiologists.44 The idea of using technologists to read screening examinations is not new and has been applied successfully in mammographic screening programmes.45 However, can radiographers (for example) be used as sole readers or do they need to be part of a dual-read approach, whereby they read the CT before a radiologist assesses it? Cost-effectiveness has been examined in the setting of mammographic screening46 but needs further investigation in lung cancer screening with CT. Another potential aid to increasing the yield—if not accuracy—of nodule identification is computer-assisted detection software.47–52 Variations in software design, and the way in which the effect of computer-assisted detection is measured, make generalisations about use of this approach unhelpful. Nevertheless, some conclusions can be drawn from the plethora of studies related to computer-assisted detection. First, the sensitivity of computer-assisted detection grows with increasing nodule size. Second, this technique is most effective when used after, rather than before, reading of a CT scan. Finally, the incremental effect of computer-assisted

For EDRN guidelines see http:// edrn.nci.nih.gov/

e593

Review

detection applies to both expert and inexperienced (or less skilled) readers.53 It is noteworthy that no completed or extant lung cancer screening trial uses computer-assisted detection, even though this method has been available at various levels for many years. Nodule ID: 17 Status: reported

A

17 R

B

Volume [mm3]: 9·59 X-diameter [mm]: 0·70 Y-diameter [mm]: 0·00 Z-diameter [mm]: 0·35 Min diameter [mm]: 0·00 Max diameter [mm]: 0·78 Algorithm: AllSizeNodule VOL-Size [mm]: 40 Nodule ID: 1 Status: reported

1R

Volume [mm3]: 32·90 X-diameter [mm]: 3·15 Y-diameter [mm]: 3·15 Z-diameter [mm]: 3·85 Min diameter [mm]: 2·60 Max diameter [mm]: 4·27

C

3R

Volume [mm3]: 163·73 X-diameter [mm]: 7·00 Y-diameter [mm]: 7·00 Z-diameter [mm]: 6·65 Min diameter [mm]: 5·50 Max diameter [mm]: 6·16 Density average [HU]: 35·54 Density stddev [HU]: 37·84 Nodule ID: 4 Status: reported

D 4R

Volume [mm3]: 789·18 X-diameter [mm]: 11·20 Y-diameter [mm]: 11·90 Z-diameter [mm]: 12·95 Min diameter [mm]: 8·43 Max diameter [mm]: 14·95 Density average [HU]: –147·17 Density stddev [HU]: 92·34

Figure: Examples of nodules detected visually and characterised volumetrically Siemens LungCare software was used for characterisation. Images are from patients enrolled in the UK Lung Screening (UKLS) trial. (A) Inconspicuous small nodule not fulfilling volumetric size criterion for UKLS category 1 nodule (>15 mm3); this 9·59 mm3 nodule would not be followed up in the UKLS care pathway. (B) Category 2 nodule (15–49 mm3); follow-up CT would be done at 1 year. (C) Category 3 nodule (50–500 mm3); follow-up CT would be done at 3 months. (D) Category 4 nodule (>500 mm3) such a nodule would mandate referral for multidisciplinary team assessment.

e594

Both non-radiological readers and computer-assisted detection systems have an inability to contextualise— that is, they are less able to apply a level of significance to nodules that they detect. In both cases, the size of the nodule, either its diameter or volume, is the main determinant of whether it is reported and how it will be followed up or managed (figure). However, other features indicate (with varying degrees of certainty) that a nodule is a normal structure or is part of benign disease; these characteristics would be recognised and dismissed for what they are by a radiologist but might be recorded by a computer-assisted detection system or a non-radiologist reader as a nodule needing further assessment.

Screen interval and threshold for further work-up Advances in technology have enabled tiny nodules to be identified with the minimum of radiation exposure. Harm in these circumstances is much more likely to be attributable to the effect of the findings than from radiation. These harms can be physical and psychological. We need to establish at what point further investigation of nodules is unnecessary and likely to cause more harm than good. This process will depend (to some extent) on patient-related factors, including fitness and comorbidity, and institutional factors, such as local expertise in undertaking and interpreting investigations. However, the most important and inter-related factors are nodule size threshold for further investigation and the frequency of the screen. Screening undertaken at frequent intervals allows the nodule size threshold to increase because smaller nodules will be followed up at the next interval screen. A less frequent screen will cost less but risk missing very early cancers that might shift stage in the intervening year but that still could be treated successfully. Therefore, if lung cancer screening is undertaken less frequently, the size threshold of nodules will need to be reconsidered. Interval cancers are inevitable, but a longer screen interval might miss more indolent tumours that have a greater chance of benefiting from early diagnosis. Henschke and colleagues54 reported a retrospective review of the I-ELCAP database from 2006 to 2010. They examined data for the effect of nodule size threshold for further radiological or other work-up on the proportion of nodules that were cancerous and on the proportion of cancers that had a diagnostic delay of 9 months or less (the effect of this delay on outcome could not be assessed). As expected, compared with a 5 mm cutoff, the larger the nodule size threshold, the greater the number of delayed diagnoses (6·7% for a 9 mm cutoff, 5·9% for 8 mm, 5% for 7 mm, and 0% for 6 mm). The findings also showed that diagnostic work-up (including repeat CT) could be reduced by 75% for a 9 mm cutoff. Opinion is divided on whether raising the threshold for follow-up is justifiable on size alone,55 www.thelancet.com/oncology Vol 14 December 2013

Review

because of the possibility of missing interval cancers and the scarcity of data for the effect on mortality. Another way to improve cost-effectiveness could be to lengthen the screen interval. Two trials have incorporated screening every 2 years: NELSON (final screen) and the MILD project. Results for 2-year intervals are only available for MILD,31 in which 4099 participants were randomised in a three-way comparison to either no CT (n=1723), CT every 2 years (n=1186), or CT every year (n=1190). Pastorino and co-workers on the MILD trial recorded cumulative 5-year lung cancer incidences of 311 per 100 000 for no CT, 457 per 100 000 for CT every 2 years, and 620 per 100 000 for CT every year (p=0·036); mortality rates were 109 per 100 000, 109 per 100 000, and 216 per 100 000, respectively (p=0·21). Nodules with a volume of 60 mm³ (4·8 mm diameter) or more were treated as positive. Although no mortality benefit was seen in this small study, clearly more cancers were detected in the annual group. These study findings provide some insight into the effect of extending the screen interval to detect interval cancers. In the NELSON trial,11 in which a cutoff was used of 50 mm³ (4·6 mm diameter), the chance of detecting lung cancer on a CT scan after a baseline negative screen was 0·1% at 1 year and 0·3% at 2 years. In the same trial, the baseline cancer detection rate was 0·9% and at the second annual screen the rate was 0·7%. Therefore, extending the screening interval to 2 years might delay diagnosis of a substantial proportion of cancers, even when the threshold for further work-up is as low as 50 mm³. The effect of the screening interval on mortality is not known but can be modelled. Table 4 presents modelling data based on a high-risk group for screening every year or every 2 years, with a threshold of 4 mm (as used in NLST). In table 4 we consider two possible response rates: the 30% rate is roughly that observed in the UKLS trial, and the higher rate of 60% might be anticipated if there were a national programme, with the public health endorsement implied in such a programme. We should not assume that both respondent populations have the same risk. Therefore, we postulate that around 10% of responders in the 30% group would have sufficient risk to qualify, versus 8% of those in the 60% group. The detection rate of cancers at screening is dependent on whether it is the first or a subsequent screen, the incidence of the disease in the screened population, mean sojourn (in years), and the sensitivity of the screening test.56,57 Estimates of incidence in the populations are taken from the NLST trial, from the reported empirical annual incidence. The estimated average incidence is taken from the LLP risk model in a subset of positive responders who met the 5% risk criteria. Based on the formulae of Launoy and colleagues56 and Duffy and co-workers,57 the numbers of cancers detected at screening and the numbers arising in the intervals between screens can be estimated for the different screening frequencies. www.thelancet.com/oncology Vol 14 December 2013

Screen every 2 years

Screen every year

30% response

60% response*

30% response 60% response*

Willing to participate

300 000

600 000

300 000

600 000

Eligible to participate

30 000

48 000

30 000

48 000

First screens

30 000

48 000

30 000

48 000

Subsequent screens

60 000

96 000

120 000

192 000

Cancers detected at first screen

810

1300

810

1300

1020

1640

1320

2110

Interval cancers

660

1050

360

580

Cancer deaths prevented (NLST period)

100–240

160–380

160–280

260–440

Predicted deaths prevented in long term

180–430

290–680

290–500

470–790

Cancers detected at subsequent screens

Data are number of people. A million people aged 60–74 years were approached with various scenarios. Data for size threshold parameters and assumptions are available from the authors. The ratio of screen-detected to interval cancers is similar to that in mammography programmes, although survival from lung cancer is much shorter than for breast cancer. NLST=National Lung Screening Trial. *60% response is the approximate rate for bowel screening in the UK.

Table 4: Estimated activity and outcomes over 4 years of a national low-dose CT screening programme

From a million individuals approached, we estimated that either 300 000 or 600 000 would respond (table 4). Of these, 30 000 or 48 000 would reach the UKLS criterion of a minimum risk of 5% in the next 5 years, with an average risk of 1·4% per year. With an approximate sensitivity of 95% and mean sojourn of 2 years,58 a detection rate would be implied of 2·7% at the first screen and 1·1% and 1·7%, respectively, with 1-year and 2-year intervals between subsequent screens.56 Probable mortality reductions are calculated from NLST results essentially; the lower bound is based on the recorded number of deaths prevented in NLST per screening episode and the upper bound is based on deaths prevented per screen-detected cancer. Longer term estimates are based on inflation of absolute numbers by a factor of 1·8, to give 85% fatality in the NLST control group, as noted at 10 years in US Surveillance, Epidemiology and End Results (SEER) data.59 Balance must exist between the number of lives that can be saved and the cost of implementing a yearly screening programme. Currently, no solid costeffectiveness data are available. A judgment call might have to be made between providing an affordable screening programme with 2-year screen intervals that does save lives (180–430 lives would be saved per 30 000 people screened in the short term for 90 000 CT screening episodes; table 4) or potentially no screening programme. The absolute effect of screening is very dependent on baseline risk, which in turn depends on age, smoking, and other risk factors. The screening interval and nodule work-up threshold could be tailored to individual risk in future programmes. The screen interval and size threshold would have to be based on accurate individual risk estimation; the next generation of risk-prediction models that incorporate baseline CT characteristics (other than merely size) potentially could provide a personalised screening interval. e595

Review

Work-up of nodules

Harms

Debates about the best way to investigate nodules are typically complex, but the aim is to avoid harmful investigations while diagnosing lung cancer promptly. Nodules deemed to represent an intermediate or positive test (according to size threshold) can be managed in three ways: further imaging, either immediate or after an interval; minimally invasive biopsy, usually imageguided transthoracic biopsy; or by surgical resection. All these strategies have their merits, and the ideal one has long been a cause for debate. For small nodules (

CT screening for lung cancer: countdown to implementation.

Implementation of lung cancer CT screening is currently the subject of a major policy decision within the USA. Findings of the US National Lung Screen...
648KB Sizes 0 Downloads 0 Views