Epidemiologic Reviews Copyright © 1992 by The Johns Hopkins University School of Hygiene and Public Health All rights reserved

Vol. 14, 1992 Printed in U.S.A.

The Benefits and Risks of Mammographic Screening for Breast Cancer

Susan F. Hurley1 and John M. Kaldor2

Proposals for mass mammographic screening of asymptomatic women to detect early breast cancer have evolved over many years. Salomon reportedly made the first mammograms, of mastectomy specimens, in 1913 (1), but it was not until 1930 that Warren (2) successfully performed mammography on patients. In the 1950s the use of mammography to detect early breast cancer in healthy women was proposed by Gershon-Cohen et al. (3). Reports describing mammographic detection of impalpable carcinomas in series of women were published in the 1950s and 1960s by Gershon-Cohen et al. (4, 5) and Egan (6), but the impact of early detection on survival was not investigated. A randomized controlled trial, sponsored by the National Cancer Institute and conducted by the Health Insurance Plan of Greater New York, was, therefore, started in 1963 to investigate whether periodic mammographic and physical examination of the breasts reduced deaths from breast cancer

(7). Early results of this study, 6 years after the trial began, indicated that deaths were reduced in the study group offered screening compared with the control group (8). This reduction was sustained, and 10 years from entry the number of breast cancer deaths for women in the study group, with cancer diagnosed at screening or within 5 years of entry, was 28.6 percent less than for the control group (7). Despite encouraging findings in this and subsequent large prospective studies and case-control studies, acceptance of the wisdom of mass mammographic screening of asymptomatic women is not universal. The debate over screening has centered, first, on the effectiveness of screening at reducing deaths from breast cancer, particularly in women aged less than 50 years. Second, there has been controversy as to whether the benefits of screening justify the risks and economic costs. Consideration of adverse effects as well as benefits is important because screening is offered to asymptomatic women. Most of these women will not develop breast cancer in their lifetime, and will, therefore, not benefit individually from reduction in breast cancer mortality, but may well experience adverse effects. Third, it is unclear whether the benefits of screening observed in research settings can be realized in routine screening practice (9). A number of reviews of mammographic screening have been published but have focused almost exclusively on the effect of mammographic screening on breast cancer mortality. In order to make policy recommendations, breast cancer screening was reviewed by a Canadian Workshop Group in 1988 (10), by the United States Preventive Services Task Force (11) and the Council on

Received for publication January 6, 1992, and in final form July 14, 1992. Abbreviations: BCDDP, Breast Cancer Detection Demonstration Projects; Cl, confidence interval; HIP, Health Insurance Plan; RR, relative risk; SEER, Surveillance, Epidemiology and End Results. 1 Department of Social and Preventive Medicine, Monash University, Victoria, Australia. 2 School of Community Medicine, University of New South Wales, and National Centre in HIV Epidemiology and Clinical Research, New South Wales, Australia. Reprint requests to Dr. Susan Hurley, Department of Social and Preventive Medicine, Monash Medical School, Alfred Hospital, Commercial Road, Prahran, Victoria, Australia, 3181. This work was supported in part by the Anti-Cancer Council of Victoria. The authors thank Dr. B. Armstrong, Dr. J. Carlin, Dr. P. Champness, Prof. N. Day, Prof. A. Miller, Prof. S. Shapiro, and Prof. N. Weiss for comments on an earlier draft of this paper, and Michelle Rimmer for typing the tables.

101

102

Hurley and Kaldor

Scientific Affairs of the American Medical Association (12) in 1989, and by an Advisory Committee of the Department of Health in the United Kingdom in 1991 (13). The wisdom of the 1987 decision to introduce a national screening program in the United Kingdom was reviewed recently by Forrest (14), and the International Union Against Cancer has also evaluated breast cancer screening at a series of workshops (15, 16). In the absence of proven primary prevention strategies for breast cancer, mammographic screening has taken on considerable importance as a public health measure. This review was prompted by recent publication of further analyses of the effectiveness of mammographic screening (17-20), continuing controversy over the balance between risks and benefits (21-28), and the need for a comprehensive examination of the epidemiologic data concerning both the benefits and risks of screening. This review has the following goals: first, to describe the methodological issues in evaluating the effectiveness of screening; second, to critically review studies of the ability of screening to reduce mortality from breast cancer in the context of these issues; third, to review the epidemiologic data relating to the risks of mammographic screening; fourth, to describe methods for comparing risks and benefits; and finally, to summarize reported comparisons of risks and benefits. The healthrelated benefits and risks which have been attributed to mammographic screening are listed in table 1. Mammographic screening is also associated with economic costs (such as the cost of the screening procedure and consequential assessment) and economic

benefits (such as savings in treatment costs), but these are beyond the scope of this paper. In this review, mammographic screening refers to any radiographic examination of the breasts of asymptomatic women. Screening practice has varied over time and across countries, and has undoubtedly evolved in sophistication. In discussing each study of screening we will, therefore, consider the specific form of mammography used. Apart from mammography, physical examination of the breasts by a physician or nurse (11), breast self-examination (29), ultrasound, (29) thermography (30), transillumination light scanning (diaphanography), computed tomography, and magnetic resonance imaging (31) have been used to detect breast cancer. Physical examination, alone or in combination with mammography (11, 3234), and breast self-examination, in combination with either physical examination or mammography (32, 33), are recommended for breast cancer screening by a number of professional organizations. However, neither physical examination, breast selfexamination, nor the other tests listed above have been studied as intensively as mammographic screening, and none have been evaluated in a randomized controlled trial. In this paper we, therefore, consider only screening examinations which include mammography. BENEFIT: REDUCED BREAST CANCER MORTALITY Measurement of effectiveness

Mammographic screening aims to detect malignant disease at a stage where effective treatment can be offered. Its effectiveness,

TABLE 1. Benefits and risks of mammographic screening Benefits

Reduced breast cancer mortality (and, hence, increased life expectancy) Increased use of conservative surgery Reassurance that breast cancer is not present

Risks

Discomfort associated with procedure Unnecessary investigations (including biopsy) in women with false-positive results Overtreatment of breast abnormalities False reassurance Radiation-induced breast cancer Psychologic morbidity (screening, investigations, and early diagnosis)

Benefits and Risks of Mammographic Screening

in simplest terms, is measured by the extent to which it reduces mortality in screened women. Breast cancer screening has been evaluated by standard epidemiologic methods, ranging from randomized controlled trials to case-control studies. In addition, special statistical methods have been developed to investigate the relative effectiveness of different screening strategies and to allow the use of surrogate endpoints for mortality. In this section, we briefly describe the study designs, outcome measures, and statistical methods (see table 2) that have been used to assess the effectiveness of mammographic screening. Prospective studies with internal comparison groups (controlled trials). In prospective studies with internal comparison groups, women are assigned to a study group which is offered periodic screening (for a fixed or indefinite period) or to a control group which receives normal care. Women can be assigned randomly or nonrandomly, and either individually or in groups (clusters or blocks), on the basis of area of residence or

TABLE 2.

103

registration with a doctor's practice, for example. Random allocation decreases the possibility of systematic differences, other than screening policy, between the study and control groups which might affect the outcome measure. Individual randomization results in effect estimates which, in general, have a lower variance than those from cluster randomized trials of the same size (35), but may be more difficult to organize and more expensive. Randomization within blocks of study subjects with similar characteristics has been used in some intervention studies, but not in trials of mammographic screening. The most appropriate endpoint in a trial of mammographic screening is death from breast cancer. Although death from any cause would provide information on the overall impact of screening on life expectancy, and would be an easier endpoint to monitor, screening is not thought to affect death from other causes. Since breast cancer comprises only between 2 and 5 percent of all deaths in women (36), any effect of

Measurement of effectiveness of mammographic screening Study design

Prospective study Internal comparison (controlled trials)

External comparison (screening programs)

Case-control study

Groups being compared

Outcome measures

Women offered screening vs. those not offered screening Women attending screening vs. those not attending screening Women attending screening vs. those not offered screening

Breast cancer death Mortality rate Person-years of life lost Case survival rate

Women attending screening vs. general population or a selected control group

Surrogates Ratio of prevalence rate in screened group and incidence rate in comparison group Ratio of interval cancer rate in screened group and incidence rate in comparison group (proportionate incidence) Proportion of breast cancers diagnosed in advanced stage Proportion of breast cancers larger than a specified size

Women who died of breast cancer vs. living women

Prevalence of prior screening Frequency of prior screening Time since last screening

104

Hurley and Kaldor

screening may be diluted by group differences in the much larger number of deaths from other causes. In screening trials, cause of death has been determined from death certificates or from reviews of clinical and pathologic records, using a variety of protocols, ideally by people who are unaware of the woman's trial group status. In trials which have been conducted for a limited number of screening rounds, often only deaths occurring in women with breast cancer diagnosed over a defined period have been included, on the grounds that a brief exposure to screening cannot be expected to improve the prognosis of women who have breast cancer diagnosed some years later. As shown in table 2, three different outcome measures based on death from breast cancer have been used within prospective studies. The cumulative population breast cancer mortality rate, defined as the number of deaths per unit of person-time at risk, is the most commonly used outcome measure. Some trials have used the number of breast cancer deaths, rather than the rate, as a summary measure of outcome. However, this measure is only valid if group sizes are equal and the population at risk is large compared with the number of deaths. Screening's effectiveness is most appropriately assessed by calculating the ratio of the mortality rates (the relative risk of death from breast cancer) in the group offered screening and the group not offered screening. Such comparisons are based on the "intention to treat" principle (37) and ignore both nonparticipation in screening by women assigned to the study group and any self-initiated screening by women in the control group. The result is an evaluation of the effectiveness of an invitation to attend a screening program, rather than the screening test itself. The number of person-years of life lost is a measure of effectiveness which incorporates both the number of breast cancer deaths and the age at which they occur (38). It is calculated by determining, from a life table, the number of additional years that each woman who died of breast cancer

was expected to live, and summing for each trial group. In some studies, the effectiveness of screening has been evaluated by measuring the survival time from diagnosis of breast cancer, and comparing case survival rates (or case mortality rates) between groups. However, comparisons of this kind are potentially subject to the well-described phenomena of lead-time bias and length-biased sampling (39). Lead time bias occurs because screening advances the date of diagnosis by the interval between the time when the cancer would have been detected clinically and the time of detection by screening, virtually guaranteeing that survival rates measured from the date of diagnosis will be better for screen-detected than clinicallydetected patients, even though overall life expectancy is not necessarily greater. Length-biased sampling arises because screening has a higher probability of detecting slow-growing tumors than fast-growing tumors, which tend to spend less time in the preclinical phase. If slow-growing tumors have a better prognosis, screening evaluated in this way will again appear to be associated with better survival. Tarone and Gart (40) have recently proposed that comparisons of case survival rates between the study and control groups of a trial are appropriate when screening has been offered for only a fixed time and when sufficient follow-up time has subsequently elapsed for the cumulative breast cancer incidence rates in the two groups to be equal. (The breast cancer incidence rate in the study group would be higher than in the control group during the screening period, due to early diagnosis). Under these circumstances, they claim that comparisons of proportions of women with breast cancer surviving are more powerful than comparisons of population mortality rates using Poisson models. Even if these conditions are not met, the validity of case survival rate comparisons can be improved in a number of ways. First, survival can be measured from date of entry to the trial, rather than date of diagnosis, thereby avoiding lead-time bias (41). Sec-

Benefits and Risks of Mammographic Screening

ond, mathematical models in conjunction with data from prospective studies can be used to estimate the extent to which lead time or other factors may have affected the observations, and to adjust the comparisons accordingly (39, 42-44). Because breast cancer mortality is a relatively infrequent event, several researchers have considered surrogate endpoints, which occur earlier than breast cancer death, in the evaluation of screening (45, 46). As mammographic screening is thought to reduce the risk of breast cancer death through early diagnosis, both the time of diagnosis and the severity of disease at diagnosis may reflect the mechanism by which screening affects outcome and are, therefore, potentially useful surrogate endpoints. The extent of advancement of the time of diagnosis can be assessed by two outcome measures described by Day et al. (table 2) (47, 48). The first measure is the ratio of the prevalence rate of cancers detected at the first screening round to the expected annual incidence rate (i.e., the incidence rate in the comparison population). This ratio provides an indication of the number of years by which the date of diagnosis is advanced through screening. However, caution is required when drawing inferences about the effectiveness of screening from this ratio, as prevalent cases at the first screening examination may include cancers which would not have become clinically apparent in the woman's lifetime (49). The second, more useful measure of advancement of the time of diagnosis is the proportionate incidence, defined as the ratio of the incidence rate of interval cancers (cancers diagnosed in the period between screening examinations) in the screened group to the incidence rate of breast cancer in the comparison group (47, 50). A low proportionate incidence indicates high sensitivity of the screening process to detect cases of cancer which would otherwise progress during the period following screening. The severity of disease at diagnosis can be measured by tumor size, axillary lymph node involvement, presence of distant me-

105

tastases, malignancy grade, or combinations of these parameters expressed as clinical or pathologic stage (48, 51). In particular, two outcome measures based on severity of disease at diagnosis have been proposed, namely, the proportion of stage II or worse cancers, and the proportion of breast cancers larger than a certain size (table 2) (48). Day (45, 47) has emphasized that surrogate measures are useful as potential predictors of the effectiveness of a prospective study and can be used to evaluate the effectiveness of screening in specific subgroups, at different intervals or using different modalities, once effectiveness at a global level has been established. However, reduction in breast cancer mortality remains the ultimate measure of effectiveness. Prospective studies with external comparison groups. The outcome measures described for prospective studies with internal comparison groups can also be measured within programs in which women are invited for periodic screening. Because there is no concurrent control group, effectiveness must be evaluated on the basis of comparisons with an external reference group, usually the general population (52). However, such assessments are clearly less reliable than those from a controlled trial, and mathematical models have been adopted to improve the validity of effectiveness estimates derived from screening programs (43, 53, 54). Case-control studies. The principles behind the use of case-control studies to evaluate the efficacy of screening have been described by Morrison (55, 56), Weiss (57), and Sasco et al. (58). If the screen is for precancer, then the endpoint in a casecontrol study can be cancer. However, for evaluation of a screening examination such as mammography, which aims to detect cancer rather than precancerous lesions, the case definition must be based on a subsequent disease manifestation such as breast cancer death or advanced disease. To date, all casecontrol studies of mammography have defined a case as a death from breast cancer. The rationale for such studies is that if

106

Hurley and Kaldor

screening decreases breast cancer mortality, women who die will have a history of less intensive screening. If screening is effective, case-control studies may be expected to give larger estimates of relative protection than controlled trials analyzed on the basis of intention to treat, as case-control studies evaluate the effect of screening itself rather than the invitation to attend screening. Controls are chosen from the population which gave rise to the woman who died from breast cancer. The controls may have had breast cancer diagnosed, but, in order to equalize opportunity for exposure to screening, they should not have been diagnosed before the case (58). Control groups consisting of women with early stage disease, or those who have died from another disease, are undesirable, as the former can potentially introduce length bias favoring screening and the latter can lead to confounding if the overall death rate differs between screened women and those who refused screening (57, 58). In case-control studies, exposure to screening needs to be defined particularly carefully. The sensitivity of analyses is increased by focusing on the period when screening was most likely to have had an effect (for example, by including deaths occurring after a woman was invited for screening) rather than after the screening program was started. All mammograms for cases and controls performed up to the time of diagnosis of the case, but not between diagnosis and death, should be regarded as screening exposure. Morrison (56) has also drawn attention to the potential for "healthy screenee" bias in case-control studies, which can occur because diagnosis of disease, either clinically or through screening, precludes the possibility of subsequent screening. Knox (59) has recently challenged the validity of case-control studies for measurement of the effectiveness of screening, in the context of screening procedures aimed at preventing disease. Other practical requirements for case-control studies are a means of accessing screening program records, and the widespread prior availability of screening in the

population, to increase the chance of detecting an effect. Observational biases. A proportion of subjects in any study population will decide not to participate, and if participants and nonparticipants have different sociodemographic characteristics they may well have different breast cancer incidence and survival rates. Comparisons of women attending screening with those not attending, or not offered, screening (table 2) are, therefore, potentially affected by selection bias. Selection bias will result in confounding if breast cancer mortality is independently associated with screening attendance. Subgroup analyses. In order to investigate the age-specific effects of screening, analysis of prospective studies and case-control studies has been stratified retrospectively by age at entry, but such analyses often have low statistical power. Stratification by age at diagnosis, modality of cancer detection (for example, mammography or physical examination), or pathologic stage of cancer at diagnosis is invalid as these variables, which have been described as pseudo covariates or pseudo variables by Prorok et al. (60), are not available at the time of randomization and are potentially confounded with screening. Prospective studies with internal comparison groups

Eight prospective studies of mammographic screening with internal comparison groups have been started since 1963; seven were randomized controlled trials and one was a nonrandomized controlled trial with geographic comparison groups. Study designs are summarized in table 3, attendance and breast cancer detection rates are summarized in table 4, and the breast cancer mortality findings published to date are summarized in table 5. Results at different follow-up times are included in table 5, when available, to allow examination of any change in the effect of screening over time. The Health Insurance Plan (HIP) study. The HIP study, mentioned above, involved

Benefits and Risks of Mammographic Screening

randomization to a study or a control group of approximately 62,000 women aged 40 to 64 years, who had been members of the Health Insurance Plan of Greater New York for at least 1 year (see table 3). Women in the study group were offered an initial screening examination and three additional screenings at annual intervals. Screening examinations comprised physical examination and mammography; the control group received routine care. Mammography consisted of two radiographic views (craniocaudal and mediolateral) of each breast. Approximately 67 percent of women attended the initial screening session, and declining proportions of these women attended subsequent examinations (table 4). Of the women who attended at least one screening examination, 59 percent attended all four (38). As the study progressed, it was noted that the sociodemographic characteristics of participants and nonparticipants in the study group differed (77), and that participants had higher breast cancer mortality and lower all-causes mortality rates than those who declined screening. However, these rates were similar in the total study and control groups (7). Follow-up of all women in the trial was continued for 18 years from entry through mail surveys, tracer organizations, medical insurance claims, the New York State Cancer Registry, death records from four states, and the National Death Index. Ten years after enrollment, follow-up details were available for 82 percent of study and 80.2 percent of control group women, and an estimated 98 percent of breast cancer cases had been identified. Survival status 18 years from entry was determined for all women known to have had breast cancer (38). Mortality analyses, using the number of breast cancer deaths and person-years of life lost as outcome measures, have been reported for different durations of follow-up and included cancers diagnosed within different periods of the study's commencement. A statistically significant reduction in breast cancer deaths in the total study group, compared with the control group, persisted

107

even 18 years after entry to the trial and with inclusion of cases diagnosed within 10 years of entry, although the difference between study and control groups diminished over time (table 5). Eighteen years after entry to the trial, there were 22.7 percent fewer breast cancer deaths in the study than in the control group resulting from cancers diagnosed within 5 years of entry, and 14.3 percent fewer for cancers diagnosed within 10 years of entry (relative risk (RR) 0.86,95 percent confidence interval (CI) 0.72-1.01) (38). The reverse trend over time was observed for differences in person-years of life lost due to breast cancer. There were 21.3 percent fewer person-years of life lost in the study group, compared with the control group, 5 years from entry, 29.7 percent fewer after 10 years from entry, and 25.4 percent fewer after 18 years from entry (considering cancers diagnosed within 5 years) (38). The increase in effectiveness of screening with longer follow-up, according to this outcome measure, may be due to the possible emerging effect on deaths in younger women (see table 5), which contribute larger numbers of person-years of life lost. The earliest analyses of the HIP study suggested that the effect of the screening regimen on breast cancer deaths was restricted to women aged 50 years and older at enrollment (see table 5). As follow-up continued, a favorable, although not statistically significant, effect of screening in women aged 40-49 years at entry emerged, as table 5 shows. Using original records from the study, Habbema et al. (78) analyzed 14year follow-up data for breast cancer cases diagnosed within 7 years of entry, at which time the number of breast cancer cases in the control and study groups were approximately equal. They found no significant difference in the effectiveness of screening across age groups, and concluded that there was no evidence of lack of benefit in women below age 50 years. An alternative explanation may be that the study did not have sufficient power to detect differences across age groups. More recently, Chu et al. (79) analyzed case mortality rates after 18 years

LE 3.

Design of prospective studies with internal comparison groups

Study (reference no.)

Year started

Design

Screening interval (months)

P* (7, 60)

1963

1. PE* + two-viewt mammography; 2. control Individual randomization

12

wedish two-

1977

1. Single-viewf mammogra-

24 (subjects aged

counties

phy; 2. control

(61,62)

Number of rounds

Ongoing

Age at entry (years)

Number of women Study

Control

40-64

30,239

30,756

Blinded review of death certificate, hospital and physicians' records

40-74

78,085

56,782

Review of clinical and pathologic records by committee unblinded to woman's trial group status

45-69

21,088

21,195

Independent blinded review of clinical records and postmortem findings by pathologist and oncologist, and internist if there was disagreement

45-64

1.45,841 2. 63,636

127,117

Determination of cause of death

40-49 years)

Cluster randomization^

33 (subjects aged 50-74 years)

alm6 (63, 64)

1976

1. Two-view mammography; 2. control Individual randomization

18-24

nited Kingdom (65, 66)

1979

1. PE + mammography§; 2. BSE* instruction; 3. control Nonrandomized allocation by health district

12,2411

dinburgh (67)

1979

1. PE + mammography; 2. control Cluster randomization

12, 24H

45-64

23,226

21,904

Death certificate, breast cancer as primary or secondary cause

ockholm (68)

1981

1. Single-view mammography; 2. control Individual randomization ||

28

40-64

40,318

20,000

As for the Swedish two-counties study

1.7 2. 1

Death certificate, breast cancer as primary cause

Benefits and Risks of Mammographic Screening 109



E

. ofcu> g>

**• co -^" N

c en

SI CM * "

a> CO

|o2

fiiiiiif i ifi

2 CD

O 00

CO CO

m

T: 5.' o

cf S

K

•- o

g 5 .S ^ jn | < 9: .S o 3 B C ^ I cocoO c < 2

follow-up for women with breast cancer diagnosed within 6 years of entry, using the method of Tarone and Gart (40). The mortality rate of the study group was lower than that of the control group both for women aged 40-49 years at entry (40 percent vs. 53 percent, p = 0.019, Fisher's exact test) and women aged 50-64 years at entry (42 percent vs. 53 percent, p = 0.014). Effectiveness indices based on surrogate endpoints have also been reported for the HIP study. The ratio of prevalence rate at the first screening round to the annual incidence rate of the control group was 1.30, and the proportionate incidence of interval cancers in the first 12 months after a negative screening test was 31.7 percent (80). The Swedish two-counties trial. After publication of the early results from the HIP study, four separate randomized controlled trials investigating the effectiveness of mammographic screening, without physical examination, were started in Sweden between 1976 and 1982 (table 3). The Swedish twocounties and Stockholm trials used a single mediolateral oblique mammographic examination, a technique developed in Sweden by Lundgren and Jakobsson (81) to maximize the amount of breast tissue visualized in one view. The Malmo and Goteborg trials used two-view mammography (a craniocaudal and oblique view in Malmo (63)). Breast cancer mortality results for the Swedish two-counties, Malmo, and Stockholm studies have been published, but are not yet available for the Goteborg trial (table 5). The Swedish two-counties study involved cluster randomization of 19 communities of women aged 40 years or older who were resident in the counties of Kopparberg and Ostergotland. Women in the study group were invited for mammographic screening at intervals which depended on their age (table 3). Women in the control group were not invited for screening, but 13 percent had a mammographic examination as part of routine care. Participation in the trial was high, with more than 90 percent of women in the study group aged less than 60 years attending the first round (table 4). As less

110

Hurley and Kaldor

TABLE 4. groups

Attendance and breast cancer detection rates for prospective studies with internal comparison Breast cancer detection rates*

Attendance

Study group

Study (reference no.) Percent

Round

Rate

Group HIPt (38)

1 2 3 4

67 80 of round 1 75 of round 1 69 of round 1

Screened Round 1 Subsequent Interval Nonattenders Total

2.26 2.72 1.49 0.92 1.61 2.05

Control group 1.95

Swedish two-counties (18, 73)

1 2 Control

89.2 83.3 79.9-90.4*

Screened Round 1 Interval Total Control

5.56 n/af 2.18 3.05-11.99$

1.91

Malmo (63, 64)

1 Subsequent

74 70

Screened Round 1 Subsequent Interval Nonattenders

7.5 2.0-2.2 0.9 2.0

2.7

72; 60§

Screened Round 1 PEf rounds PE + mammography rounds Interval Nonattenders Total

5.2 1.3

1.62-1.84

United Kingdom (66)

Edinburgh (67)

Stockholm (17, 24)

Canadian NBSSt (70, 75) Women aged 40-49 years

Women aged 50-59 years

Goteborg (72)

1

1 2 3

61.3 54.6 53.1

3.3 n/a 1.7 2.57

Screened Round 1 PE rounds PE + mammography rounds Interval Nonattenders Total

311 0.53 1.8 2.5

Screened Round 1 Round 2 Interval Control

1.75 4.0 3.1 1.8 4.6

6.2

1.9

m

1 2 Control

80.7 80.1 77

1 2 Subsequent

Not relevant 88 Over 82

Round 1 Round 2 Round 5 Interval

2.18, 5.790 1.18,2.16 1.70,1.68 0.86

n/a

1 2 Subsequent

Not relevant 89 Over 83

Round 1 Round 2 Round 5 Interval

6.48, 7.96# 3.41,3.77 2.06,4.19 0.88

n/a

n/a

n/a

1

82.7

* For round 1, rates per 1,000 women screened; interval cancer rates are per 1,000 person-years for all intervals for the HIP study, per 1,000 person-years for the first and second intervals for Malmo, and per 1,000 women screened for the first interval for Edinburgh, Stockholm, and the NBSS trial; otherwise rates per 1,000 person-years. For HIP study, cancers detected in first 5 years after entry. For the Swedish two-counties and Stockholm trials rates for screening of the control group are included in the study column. t HIP, Health Insurance Plan; n/a, not available; PE, physical examination; NBSS, National Breast Screening Study. t Age-specific rates. § Attendance are for Guildford; Edinburgh. 1! Approximate figures over the three rounds. || For women aged 40-44 years, 45-49 years. # For women aged 50-54 years, 55-59 years.

Benefits and Risks of Mammographic Screening

TABLE 5.

111

Relative risk of breast cancer death in prospective studies < with internal comparison groups RR' (95% CI*) study group : control group

Study (reference no.)

Follow-up (years)

Total trial

Women aged 60 years at entry

HIP*,t (38)

5 10 18

0.62(0.41-0.91) 0.71 (0.55-0.93) 0.77(0.61-0.97)

0.95(0.51-1.78) 0.77(0.50-1.16) 0.75(0.52-1.09)

0.46 (0.25-0.84) $ 0.69 (0.46-1.02) 0.67 (0.34-1.31) 0.77 (0.55-1.08) 0.83 (0.46-1.51)

Swedish twocounties§ (18,62,76)

6 8 11

0.69(0.51-0.92) 0.70 (0.56-0.88) 0.71 (0.59-0.88)

1.26 (0.56-2.84) 0.92(0.52-1.60) 1.03 (0.65-1.63)

0.61 (0.44-0.84)11 0.60 (0.40-0.90) 0.65 (0.44-0.95)1 0.63 (0.45-0.88) 0.62 (0.45-0.84)1

Malmo(19, 64)

8.8 11

0.96 (0.68-1.35) 0.90 (0.67-1.22)

1.29 (0.74-2.25)# 1.08 (0.65-1.79)#

0.79(0.51-1.24)** 0.82(0.57-1.19)**

United Kingdom (66)

7

0.80 (0.64-1.01 ) f t

Edinburgh (67)

7

0.83(0.58-1.18)

0.98(0.45-2.11)

0.80 (0.54-1.17)§§

Stockholm (17)

7

0.71 (0.40-1.20)

1.09 (0.40-3.00)

0.57 (0.30-1.10)§§

* RR, relative risk; CI, confidence interval; HIP, Health Insurance Plan. f For cancers diagnosed within 5 years of entry; calculated from breast cancer deaths not population mortality rates. t Numbers too small for calculation. § Crude rates, except for age-specific rates at 8 years follow-up which are adjusted for county and time. H For women aged 50-74 years at entry. || For women aged 60-69 years at entry. # For women aged 55 years at entry. t t Adjusted for pretrial differences in breast cancer mortality rates. tt Not published. §§ For women aged 50-64 years at entry.

than 50 percent of women over 74 years of age attended, results were reported for the 134,867 women aged 40-74 years at randomization. The breast cancer mortality rate after a mean follow-up of 6 years (with two screening rounds completed) was reduced by 31 percent in the study group compared with the control group (62). At this point, the reduction in mortality was statistically significant only in Kopparberg county, but after 8 years of follow-up the difference was significant in each county separately, and the confidence interval for the relative risk had narrowed (table 5) (76). Screening was offered to the control group in 1986 after publication of the first mortality results. A recently published analysis shows that after 11 years follow-up, and including cancers diagnosed up until completion of the first screening round for controls, the risk of breast cancer death in the study group relative to the control group was 0.70 (table 5) (18). This and previous analyses were based on Poisson distribution models, which may be inappropriate be-

cause of cluster randomization. However, when deaths were modeled using a negative binomial distribution, the relative risk was 0.72 (95 percent CI 0.57-0.90), indicating that the cluster randomization was efficient (18,82). As with the initial HIP study results, the Swedish two-counties study has not shown a clear reduction in breast cancer mortality in all age groups. Table 5 shows that after 8 and 11 years follow-up, mortality was reduced in women aged 50-59 years and 6069 years at enrollment, but not in women aged 40-49 years (76). The Swedish twocounties study investigators, however, have stressed that the trial was not designed to test hypotheses about age-specific effects and that, as Habbema et al. (78) had found with the HIP study, there was not a significant difference in effect between age groups (76). Estimates of effectiveness based on surrogate endpoints have also been reported for the Swedish two-counties study. The ratios of prevalence rate to annual control group incidence rate for women aged 40-49 years,

112

Hurley and Kaldor

50-59 years, and 60-69 years at entry were 1.95, 3.09, and 4.59, respectively. Estimates of proportionate incidence in the 12 months following a negative test were 37.8 percent, 11.6 percent, and 14.3 percent, respectively, for these age groups (80). As the prevalence to incidence ratios were higher and the proportionate incidence was lower than the HIP study, Day et al. (80) suggested that screening technology had improved between the 1960s and 1970s. The cumulative number of stage II or more advanced cancers was 25 percent lower in the study group than in the control group, suggesting benefit of similar magnitude to that of other indices of effectiveness (62). A recent analysis used proportional hazards regression to investigate the relation between breast cancer mortality at 11 years follow-up and three potential prognostic factors, namely, tumor size, malignancy grade, and nodal status (51). The hazard ratio for survival of women with cancers detected at incidence screening rounds (second and subsequent screenings) relative to control subjects was 0.26 (95 percent CI 0.17-0.38). When adjusted for the above three factors, the relative hazard was 0.66 (95 percent CI 0.43-1.00). As the adjusted hazard ratio was no longer statistically significant, the authors suggested that these indices of disease severity at diagnosis explain, to a large extent, the better survival of women with screen-detected cancers, and could, therefore, be useful early predictors of effectiveness. The Malmo trial. The Malmo study was smaller than the Swedish two-counties trial, involved individual rather than cluster randomization, and was less well attended— only 74 percent of women invited for screening participated in the first round (tables 3 and 4). A survey of 500 women in the control group found that 24 percent had undergone mammography during the study period, mostly only once. After a mean follow-up of 8.8 years, only a 4 percent nonsignificant reduction in breast cancer mortality was found (table 5) (64). The risk of breast cancer death was increased for women younger than 55 years and decreased

for women aged 55 years and over, although neither difference was statistically significant. Results at 11 years follow-up have recently been published (19). The relative risk reduction had increased to 0.90 (table 5), although data for the last year of followup were preliminary. The United Kingdom trial of early detection of breast cancer. A nonrandomized comparison was made of breast cancer mortality rates between women in two health districts which offered screening by physical examination and mammography, two districts where training in breast self-examination was offered, and four comparison districts where only normal care was available (table 3) (65). A separate randomized trial was conducted at the Edinburgh center (see below). For the United Kingdom trial as a whole, the risk of breast cancer death in the two screening districts combined, relative to the comparison districts, was 0.8 (not statistically significant) at 7 years after adjustment for differences in pretrial breast cancer mortality rates between the districts (table 5) (66). Age-specific results were not published. The Edinburgh trial. At the Edinburgh center for the United Kingdom trial, 87 general practices were randomized to a screening or a control group. Approximately 60 percent of women in the study group attended the first round (table 4). The investigators noted irregularities in the randomization process, and subsequent studies indicated that the socioeconomic class of women was probably higher in the practices offered screening (83). There was a 17 percent reduction in breast cancer mortality in the study group compared with the control group (table 5) which increased to 19 percent (RR = 0.81, 95 percent CI 0.55-1.18) after adjustment for socioeconomic status. However, neither difference was statistically significant (67). Agespecific analyses showed a nonsignificant decrease in risk of breast cancer for study group women aged 50-64 years, and no change in risk for women aged less than 50 years (table 5). The Stockholm trial. In this study, women were allocated to screening or a control

Benefits and Risks of Mammographic Screening

group on the basis of day of the month they were born. Screening rounds occurred in 1981 and 1983, and in 1986 the control group was also invited for screening. After a mean follow-up of 7.4 years, and considering only cancers diagnosed before 1987, there were 29 percent fewer breast cancer deaths in the study group than in the control group, but the reduction was not statistically significant (see table 5) (17). In study-group women aged 50-64 years at enrollment, the relative risk reduction was greater than for the total trial, but was still nonsignificant. In women aged 40-49 years there was no reduction in risk. Other studies. The mortality results of the other Swedish trial in Goteborg and the Canadian National Breast Screening Study have not yet been published, although some preliminary findings for the Canadian National Breast Screening Study were reported in a Lancet editorial (84). The Canadian study was an individually randomized trial investigating the benefit in women aged 4049 years of annual screening with mammography and physical examination, and the additional benefit of annual mammography compared with physical examination alone in women aged 50-59 years (71). Unlike other trials of mammographic screening, participants were recruited and then randomized to the study or control group. This trial, therefore, evaluated the effect of randomization to screening rather than randomization to an invitation to attend screening. The Lancet editorial reported a 52 percent increase in breast cancer deaths in women aged 40-49 years who were screened, compared with controls (44 vs. 29 deaths), an average of 7 years from entry into the trial. Miller (85), the trial's principal investigator, has cautioned that the results are still being checked for validity. The editorial raised the possibility that failure to give appropriate therapy to women with mammographically-detected cancers may explain this outcome. However, Day and Duffy (86) suggested that the result was more likely to be due to the 58 percent excess of cancers with positive axillary lymph nodes

113

at diagnosis in the screened group, which in turn could be due to faulty randomization, an unforeseen effect of poor quality mammography, or delayed presentation in women in the screened arm who anticipated a screening visit. Prospective studies with external comparison groups

Screening programs established in a number of countries have subsequently evaluated effectiveness using an external comparison group. The largest program studied in this way has been the Breast Cancer Detection Demonstration Projects (BCDDP), conducted in 29 United States centers between 1973 and 1977. The program did not include a control group, as it was established after early publication of the HIP study results on the premise that screening was beneficial. Initially the project intended to offer five annual screening examinations, comprising physical examination, two-view mammography, thermography, and breast selfexamination instruction to women aged 3574 years. After a review of the project in 1975, mammography was restricted to women aged 50 years and older (due to concerns about the risk of radiation-induced breast cancer), thermography was discontinued, and a review of pathology of minimal tumors was initiated (87). A total of 283,222 women participated, and 51.7 percent completed all five examinations. There was evidence that participation was greater for women who were white, married, or collegeeducated (88). Two analyses of follow-up data for the whole project have been published. Seidman et al. (89) compared relative survival rates (observed survival rates divided by expected survival rates for women of the same age) of women with screen-detected breast cancer and those reported to the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) program from 19771982. One year was subtracted from the survival time of all screen-detected cancers in the BCDDP to allow for lead time. Five

Hurley and Kaldor

a> co

o

29-0

co

13-0.

year relative survival rates were 87 percent and 74 percent, respectively, for women in the BCDDP and SEER, and 8-year rates were 81 percent and 65 percent, respectively, a 46 percent reduction in breast cancer mortality. Stage-specific survival rates were similar in the two groups, suggesting that the benefit of screening resulted from a shift toward less severe disease at diagnosis. The second analysis involved comparison of the cumulative breast cancer incidence and mortality rates 9 years after entry for 55,053 white women enrolled in the BCDDP with expected rates based on SEER data (52). The breast cancer mortality rate for the SEER group was calculated using age-specific incidence rates and age-specific case fatality rates according to time since diagnosis. Although breast cancer incidence in the BCDDP group was 1.34 times greater than expected, mortality was only 79.6 percent of expected. Breast cancer mortality was 89 percent of expected for women aged 3549 years, 76 percent for women aged 50-59 years, and 74 percent for women aged 6074 years. Confidence intervals for these measures of difference in incidence and mortality rates were not reported.

10-0.

114

o co

o. o

o

CO

in

o

o

o

CM

g CM

o o co

o co

co in

o

cu

f|

CO

U

The benefits and risks of mammographic screening for breast cancer.

Epidemiologic Reviews Copyright © 1992 by The Johns Hopkins University School of Hygiene and Public Health All rights reserved Vol. 14, 1992 Printed...
2MB Sizes 0 Downloads 0 Views