STATISTICS IN MEDICINE, VOL. 9, 969-980 (1990)

ESTIMATING BENEFITS OF SCREENING FROM OBSERVATIONAL COHORT STUDIES W. DANA FLANDERS AND IRA M. LONGINI, Jr. Department of Epidemiology and Biostatistics, Emory University School of Medicine. 1599 Clifton Rd, N.E., Atlanta, CA 30329, U S A .

SUMMARY Analysis and interpretation of observational studies of screening effectiveness is difficult because several biases threaten validity, including the structural healthy screenee bias, length bias, and effects of lead time. Although methods for the analysis of observational studies of screening effectiveness have been proposed, most have limitations such as incomplete control of length bias, or a heavy reliance on distributional assumptions. In this report we present a method for the analysis of observational cohort studies of screening effectiveness. Although developed independently and formulated specifically for estimating benefits of screening, our approach is implied by a more general approach developed previously by Robins. Our approach, in contrast to other available methods, avoids the healthy screenee bias, and length and lead time bias, and allows an empirical approach to analysis that need not depend highly on distributional assumptions. We illustrate application of the approach with analysis of published data from a study of breast cancer screening.

INTRODUCTION Observational studies of screening effectiveness are increasingly common,' - O in part because ethics, expense, or other obstacles may preclude a clinical trial. Observational studies, however, are particularly difficult to analyse and interpret because several biases, some unique to studies of screening, threaten validity.' For example, such studies may be affected by lead time, length bias, the structural healthy screenee bias,' or confounding. These issues have been reviewed elsewhere.' Several methods for the analysis of observational studies of screening effectiveness have been proposed in an attempt to avoid these and other potential biases. Sasco et al." described a casecontrol design, but results may be affected by length bias or lead time.'2v13 Morrison also described a case-control design,' but noted that the comparisons were 'impure' because some members of each comparison group may have been 'exposed' to screening.' Walter and Stitt recently outlined a method of analysis which involves comparison of the survival of screeneddetected cases, after adjustment for lead time, with that of other cases.I4 This method, although useful, depends on modelling and distributional assumptions which may be difficult to verify. Many additional methods have been proposed to study the benefits of screening and related i s s ~ e s , ' ~ -but ~ ' most have limitations similar to those noted above. For example, some methods rely on a specific model whose validity may be difficult to verify,'***' while others" are not designed to estimate directly benefits of screening in observational studies. The purpose of this report is to present a method for the analysis of cohort studies of screening effectiveness. Although developed independently and formulated slightly differently, o u r approach is implied by a general method previously developed by robin^.^^.^^ Robins showed 0277-67 15/90/080969-12$06.00 0 1990 by John Wiley & Sons, Ltd.

Received January 1989 Revised December 1989

970

\V. D. FLANDERS AND I. M. LONGINI Jr.

that his general method could be used to analyse cohort studies of the effectiveness of cervical cancer screening. We formulate results specifically in terms of rates available from published analyses of screening programmes and apply the approach to data from the HIP study.” In contrast to other available methods for the analysis of observational studies, our approach avoids the healthy screenee bias, and length and lead time bias, and allows an empirical approach to analysis that need not depend highly on distributional assumptions. We consider three aspects of the analysis we propose. First, we discuss the approach qualitatively, specifying the analytic goals, reviewing the structural healthy screenee bias, and outlining the relevant sequence of events which need to be considered in the anaiysis. Second, we define the parameters needed for the proposed analysis, and calculate the rates of interest using these parameters. Third, we show how to estimate the parameters with use of data from the HIP study” and our analysis of this randomized study of breast cancer screening as though it were an observational cohort study. BACKGROUND We wish to estimate the proportion, P , of a hypothetical cohort which would survive from age A , to age A , if each cohort member underwent screening at, and only at, ages S, < S, < . . . < S,, unless he or she becomes ineligible for screening. In estimating P , we assume that a person becomes ineligible for screening if he or she develops symptomatic or known disease by age S , , that A , d S , , that S, d A,, and that all members of this hypothetical cohort are screen eligible at age A , . We can interpret P as the survival of a cohort offered the screening regimen, with the assumption of usual care, no screening outside the screening programme, and that all eligible members used the programme at the indicated ages. Thus, P estimates the survival of a cohort in which everyone complies fully with the regimen. For a more realistic situation in which some people do not utilize the regimen fully and others obtain additional screening exams, we can estimate survival by calculating P for a variety of regimens and then obtaining a weighted average of the resulting survival estimates. Comparison of P with the estimated survival for a similar, but unscreened, cohort yields an estimate of the screening programme’s benefits. This approach also provides a way to project benefits of a screening programme at the community level. To illustrate, suppose we introduce a screening programme with screening offered at two year intervals. Further suppose, for simplicity, that one third of the community would not use the programme and would never be screened, that one third of the community would fully use the programme and not obtain additional screening, and that one third of the community would use the programme and obtain additional exams so that they were screened every one year. We estimate the expected survival in the community by calculating P for each combination of screening schedules, and averaging the results. Similar calculations for, or direct observation of, a similar community without the screening programme, and comparison of results provides an estimate of the programme’s benefits. These projections require only calculation of P and specification of the expected pattern of screening. Our goal is to estimate P with use of results of an observational cohort study. The study cohort consists of people screened at various times. Since the times at which people in the cohort choose to undergo screening may not conform to the regimen of interest, the observed survival may be a biased estimate of P . For simplicity, we assume temporarily that J = 1 and A , < S, < A,. We relax these assumptions in the next section. The observed survival of the study cohort is unlikely to estimate P validity because some members will, presumably, be screened at ages less than S, ,others at older ages, and some not at all. To avoid this problem, we might restrict the analysis to study subjects screened at age S , . This

97 1

BENEFITS OF SCREENING STATE 1

STATE 2

STATE 5

No DlaQnorlr

r

/

I

Figure 1

approach, however, is subject to the structural healthy screenee bias, described in detail by Morrison.’ The bias arises because restriction to those screened at S , implies that all study definition of screening eligibility. The restricted study group should, therefore, have lower than average mortality for the period A , to A , and bias arises. Another simple approach to estimation of P might involve consideration of subjects screened before age S , as censored at the time of screening. This approach is also subject to bias since the censored (excluded)subjects were free of symptoms and known disease when censored, a reflection of their eligibility for screening. In contrast, the underlying cohort would, presumably, include some with symptoms or known disease which implies that the censored subjects should have a lower than average mortality. These simple approaches illustrate the phenomenon that underlies the structural healthy screenee bias and that threatens the validity of many analyses based on simple restriction criteria: selection criteria or censoring related to screening history will tend to associate with prognosis according to the definition of screen-eligibility. With application of these criteria after the start of the risk period, bias may result. We develop and justify an appropriate analysis by: 1. considering relevant states of health which may exist; 2. showing that we can estimate transition probabilities between states; 3. combining these probabilities to estimate the desired probability P. We consider five states (Figure 1; we consider additional states in the ‘Methods’ section). Initially, each person has no symptoms, no known disease, and no history of previous screening exam (Figure 1, state 1). Then, one of two events may occur: either the person may develop symptoms or be diagnosed (Figure 1, arrow from state 1 to state 2), or the person may decide to undergo screening (Figure 1, arrow from state 1 to state 3 or 4). The person may subsequently die from the target disease (Figure 1, arrow from state 2 to state 5, from state 3 to state 5, or from state 4 to state 5). A key feature of screening programmes, reflected in Figure 1, is that subjects with symptoms or a diagnosis are no longer eligible for screening. (This restriction implies interest in screening for the first occurrence of disease. The approach generalizes simply to evaluate screening for second occurrences, upon restriction to those with one previous diagnosis.) Figure 1 illustrates only one screening exam in detail; an arbitrary number is possible by extension of the diagram. The events and states illustrated in Figure 1 are similar to those used by Eddy in his comprehensive screening

972

W. D. FLANDERS AND I. M. LONGINI Jr.

A cohort study will provide estimates of each of the required transition rates. We consider only the transition rate from state 1 to state 2, h(c, t), since estimation of the other rates is straightforward. To estimate this rate we define a subcohort who, through age A , , have no symptoms, no known disease, and no prior screen and observe the incidence rate of symptoms or diagnoses made outside the screening programme. We censor screened subjects at the time of screening. This censoring is ‘independent’ if we assume that those who undergo screening are ‘similar’ to those who do not. That is, if screened subjects had, in fact, not been screened, their subsequent risk would have been the same as that of unscreened subjects. Since, however, we can control for known risk and prognostic factors by restriction, stratification, or regression analyses, this ‘similarity’must only hold conditional on known risk and prognostic factors. For example, in the evaluation of screening mammography, even if screened women were younger, of lower parity, and more likely to use breast self-examination than other women, bias need not arise provided data were available so that we could control for these factors in the analysis. Thus, assuming that events not illustrated, such as competing risks, are independent of those illustrated, we can estimate the transition rates illustrated in Figure 1 from the information available in a cohort study by, for example, a tabulation of person-years and events. Why does our approach, but not the approach described previously in which one censors subjects who are screened early, allow valid estimation? The answer reflects differences between the approaches in the definitions of the cohort at risk and the outcome of interest. With our approach, both the censored subjects and the underlying subcohort at risk are free of symptoms and known disease. In contrast, with the previously described approach, those censored, but not all members of the underlying cohort, are free of symptoms and known disease. With the previously described approach but not our approach, therefore, the censored subjects should have different risk than uncensored subjects. The key assumption is that screened subjects are ‘comparable’ with other subjects who are also free of symptoms and known disease, conditional on known risk and prognostic factors. We also assume that all who die from disease are diagnosed prior to death. As with other techniques that deal with competing risks, special care and specific biologic assumptions may be required for valid i n t e r p r e t a t i ~ n . ~ ~ ’ ~ ~ We claim that we can combine the transition rates to estimate P. This claim is supported by arguments given in the ‘Methods’ section, and was proven formally by Robins (see theorem and corollary 4.1 of Robins22and theorem 1 of Robins27).The essence of the argument is that subjects in the hypothetical cohort can survive by experiencing one and only one of three (for J = 1) mutually exclusive and collectively exhaustive events: may survive without symptoms and known disease to S , , be screened positive and survive to A , ; survive without symptoms and known disease to S,, be screened negative and survive to A , ; or develop symptoms before age S, and then survive to A , . As shown in the ‘Methods’ section, we can calculate each of these probabilities from the transition rates and then combine them to estimate P. METHODS We assume that at most J , screening exams are possible. Since transition rates may depend on age, screening and disease history, and other covariates, we denote, for each subject, these risk determinants by C = C(t),where t is current age. For example C might be a 1 by M vector, ( x l , . . . , xM), with: x, ( t ) = number

of screens that occurred by age t

x,(t) = age at first screen, if age < t; 0 otherwise

BENEFITS OF SCREENING

= 1 if first screen positive and age

x3(t)

973

< t; 0 otherwise

x z J ( t ) = age at Jth screen, if age < t; 0 otherwise x Z J +l(t) = 1 if Jth screen positive and age < t; 0 otherwise x Z J + , ( t )= 1 if symptoms or diagnosis occured prior to t and all screens before symptoms negative; 0 otherwise xZJfj(t) = age at diagnosis or symptoms, if age < t; 0 otherwise ~ , > + ~ (.t.),X,(r) . = encode other covariates.

To illustrate this coding scheme, suppose that J = 3 and that the only covariate is gender (encoded as 1 for males, 0 for females), so that M = 10. For a man screened negative at ages 30 and 35 and who developed symptoms at age 37, C(40) is (2, 3 0 , 0 , 3 5 , 0 , 0 , 0 , 1,37, 1). We define the following rates: h(C, t ) = hazard or incidence rate of symptoms and diagnosis at age t; this rate is defined, of course, only for subjects without symptoms and known disease, and is conditional on age, screening history and other risk determinants. M ( C , t ) = mortality rate at age t, conditional on having developed symptoms at age x Z J +3 ( t ) ,or on having been screen detected at the age encoded by x 2 ( t ) ,. . x z J +l(t); it is defined only for subjects with previously known disease or symptoms and is conditional on time since occurrence of symptoms or diagnosis, screening history, age and other risk determinants. Each of the parameters, h(C, t ) and M ( C , t), depends only on events known or observable prior to t. In addition, we need the proportion of subjects, P ( C , t), who will be detected at a screening exam at age t (true positives), a proportion that may depend on age, prior screening history and covariates as encoded in C(t).To facilitate implementation and estimation of transition rates, one can assume that rates are constant over calendar time (time homogeneous) and over specified intervals of age. We do not need to specify the sensitivity or specificity of the screening test since P(C, t ) encodes the required information. With these definitions, we derive an expression for the P, expected survival cohort with the specified screening regimen and a specified combination of baseline risk determinants. An individual who survives must experience one of the following 25 + 1 mutually exclusive and exhaustive sequences of events: the person may be diagnosed at one of the J screens and then survive to A , with disease (screen detected disease); the person may be screen negative at each of the first i screens (i = 1,. . . ,J ) , develop symptoms or be diagnosed after the ith exam, and then survive with disease to A , (interval case); or the person may survive without symptoms or known disease to A , . Since the 25 + 1 events are mutually exclusive and exhaustive, the expected survival is the sum of the 25 + 1 corresponding probabilities. Now, to have disease detected at the ith screen and survive to A , , the following sequence of events must occur: the person must survive without disease from the ( j- 1)th to the jth screen and the jth screen must be negative (probability exp [ h(C,, u) du] (1 - P ( C j - S,)), for j = 1 , . . . , i - 1; the person must survive from the i - 1 to the ith screen, and the ith screen must be positive (probability exp [ h(C,, u)du] P(C,, S,)); and finally, the person must survive with disease until A , (probability exp [ - '1 ; M ( C : ,u)du]), where C j and C(iencode the prior screening history and covariates. For example, the co-ordinates of C j , for j x i, are: x1 = j ; x2 = S , ; x3 = 0; . . . ; x Z j = Sj; x , ~ +to~ x , $ + ~are 0; and x2J+4to x, encode the other risk determinants. The co-ordinates of Cf are similar, except that x z i + = 1. For simplicity, we define the product overj to be 1 if i = 1.

I,":-,

I:;-,

W. D. FLANDERS AND I. M. LONGINI Jr.

974

Combining results, the probability,fli, that disease is detected at the ith screen (i = 1,. . . ,J ) with subsequent survival to A, is:

Similarly, the probability of being screened negative at the first i screens, becoming an interval case before the i + 1 screen (i = 1,. . . ,J ) , and surviving to A , is:

x

{

I1

lS:+'h(Ci,t)exp[ - J:,h(Ci, u)du]exp[ - r 2 A 4 ( C i ,w)dw dt ,

(2)

where S J + l= A , , and where the co-ordinates of C ; , for Si < t < S i + l , are: x1 = i; x, = S , ; x3 = 0;. . . ; xi = S i ; x i + l to x ~ are~ 0; x+ , ~ ~+=~I; x , ~ + =~ t; and x , ~ + ~ ., ..,xM encode the other risk determinants. The integration with respect to t is required in equation (2) because symptoms or diagnosis can occur at any time between S , and Si+1 . Finally, the proportion expected to survive without symptoms and known disease (all screens negative) from A, to A,, for a particular screening schedule is:

1

h(Cj,u)du (1 - P(Cjpl, Sj))}exp[

-

Is:

h ( C J ,u)du].

(3)

We obtain the expected survival, P,for the particular screening schedule by summing over the 25 1 events:

+

If we take J = 1 and use P ( C , S , ) = 0, equation (4) gives the expected survival for a group not offered screening. Denoting the estimated risk for two screening schedules of interest by Q = 1 - P and Q' = 1 - P , respectively, we can calculate a risk ratio as Q/Q'. We interpret this risk ratio as the risk of death from the disease, assuming no competing risks and a particular screening history, divided by the corresponding risk for those with the specified alternative screening history. Robins" derived an equivalent expression and offered a convenient graphical approach for evaluating equation (4) called the G-computation algorithm. To estimate P , results of Robins show that it is actually sufficient to know only the time, if any, at which each subject became ineligible for screening (due to development of symptoms or diagnosis, see Reference 22, page 1497). In fact, Robins shows that if the raw data from the cohort study were available, no loss of efficiency would result if the investigator deleted data concerning whether ineligibility arose because disease was diagnosed at a screening exam or during the interval between exams. Our formulation, however, specifically uses information such as fatality rates of screen-detected cases and fatality rates of interval cases because it is readily available from published data.

975

BENEFITS OF SCREENING

Table I. Incidence, interval cases, by interval since last exam* Interval from last exam

Incidence rate per 1000 women-years

(A

h, j

1

0.67 1.oo 0.90 1.92 1.77

2 3 4 5t

* t

derived from data of Shapiro et al. (Reference 15, Table 6) assumed to be identical to that among unscreened women after 5 years

EXAMPLE: PUBLISHED RESULTS FROM THE HIP STUDY We illustrate our approach with use of published results from the HIP study.15 We restrict our analyses to data that one could have obtained in an observational follow-up study. So that we can derive our results from published data, we use a model in which rates depend in a particularly simple way on prior screening history and other covariates. Specifically, we use the following rates: P(C, t ) = P , = P, = P, = P,

for for for for

initial screening exam first annual exam second annual exam third annual exam

h(C, t ) = h, if i = 0, first 10 years of follow-up = h,j if i 2 1, 0 6 j d 5; j is the number of years since the previous exam = h , if ib 1, 5 < j

M ( C , t ) = M,, if screen detected = M , , if an interval case (prior negative screen) = M,, no never screened; k is the number of years since diagnosis.

We assume that the interval between exams is 1 year. Each of the preceding rates are estimable in an observational study; published data that permit direct estimates are available from the HIP study. With use of data from Shapiro et al. (Reference 15, Table 5, right column), we estimate that P , = 0.00273, P , = 0.00186, P , = 0.00115, and P, = 0-00175for women examined at the initial, first, second and third annual screening exams, respectively. Similarly, we estimate that h, is OG0177/person-year (Reference 15, Tables 3 and 4). This estimate of h , is the weighted average incidence of clinical disease for the control group combined with the study group who refused screening calculated from data of Shapiro et al., since in an observational study we could not distinguish between study and control groups (we would only know that some subjects were unscreened). We also derived the estimated incidence of clinical disease among those with a prior negative screen, h,j (Table I), from data of Shapiro (Reference 15, Table 6). Finally, we calculated observed mortality (case fatality) rates by disease and screening history (M,,, M,, and M,,), summarized in Table 11, from the empirical survival results published by Walter and Stitt (Reference 14, Tables I and 11). Using these parameter estimates, we calculated that among a group of women, screened four times at yearly intervals, the 10 year risk of death from breast cancer is 0.00439 and that the

976

W. D. FLANDERS AND I. M. LONGINI Jr.

Table 11. Observed mortality by disease and screening history* Screening history

Years since diagnosis (k)

+

Screen (Mlk)

+

interval case (Mall

Refusers controls (M3A

0.0661 00976 0.1016 0.0632 0.0426 008 15 0.0401 00401 0.0471 0,0438

0.1087 0.1210 0.0934 0.0759 0,0481 00373 0.0701 00333 00400 0.0436

~

0 1 2 3 4

5 6 7 8 9

0.0230 0.0396 o.Oo00 0.0163 0.0508 0.045 1 0.0096 0.0400 0.0104 0.0671

* derived from data of Walter and Stitt (Reference 14, Tables I and 11)

corresponding risk in a group of unscreened women is 0.00570. In the HIP study, of the 30,289 women offered screening, about 67 per cent actually attended the initial screen and various combinations of subsequent screens. Thus, with use of a simple weighted average, we estimate that for women offered screening, assuming that 67 per cent accepted, the average cumulative mortality is about 0.00488, and that screening prevented about 26.5 breast cancer deaths among the 30,289 women offered screening in the HIP study. DISCUSSION The proposed method of analysis provides a straightforward means for analysis of cohort studies to estimate the benefits of different screening programmes. Our approach avoids the structural healthy screenee bias since we can control for potentially biasing events in the analysis. The approach avoids application of potentially biasing restriction criteria after inception of the subcohort used to estimate a particular transition probability. The approach is somewhat analogous to methods for control of confounding since, in each situation, the analysis is conditional on biasing events or factors. The approach is designed to assess the effect of screening on mortality, but the analysis does not account specifically for screening exams that occur after diagnosis. Instead we treat them analytically as part of medical ‘treatment and follow-up’. Thus, the approach is designed to assess the effects of screening for first, but not subsequent, occurrences of disease. The approach, however, easily extends to assessment of effects of screening for the second occurrence of disease by modelling occurrence of symptoms and diagnosis of second tumours, and by modelling mortality conditional on the more complicated disease and screening history. Our approach avoids two other key biases that threaten validity of observational studies of screening effectiveness: lead time and length bias.’ Lead time, if not accounted for, may distort results if one compares survival of screen-detectedcases with that of cases detected through other means (interval cases). Lead time corresponds to the amount of time that diagnosis is advanced among screen-detected cases. Since we usually estimate survival from the time of diagnosis, the early diagnosis of screen-detectedcases would make survival rates appear to be lower than that of other cases, even if earlier diagnosis did not affect ultimate survival. Length bias, like lead time,

BENEFITS OF SCREENING

977

may occur with direct comparisons of survival of screen-detected cases with that of other cases. It reflects a statistical phenomenon in which cases with a longer detectable preclinical phase (DPCP) tend to be over-represented among screen-detected cases.’ If cases with a long DPCP have a more slowly progressing disease, this phenomenon may make the survival of screen detected cases artifactually appear greater than that of other cases. We avoid biases associated with lead time and length bias, in part, because we avoid direct comparisons of screen-detected with other cases. To illustrate our approach with published data, we assumed in our example that transition rates depended on prior screening history in a very simplistic way. Our approach, however, does not require such restrictive assumptions. In practice, we would probably need some simplifying assumptions to limit the number of parameters, but the overly simplistic assumptions used in our example could and presumably should be relaxed. Although the analysis requires estimation of multiple rates, the necessary data are obtainable in an observational study; for the HIP study, published data permit an analysis based on simplifying assumptions. The rates used to estimate benefits of screening with our approach depend on prior screening history. As the complexity of screening history increases, the investigator will, presumably, need to make simplifying assumptions concerning the way in which rates depend on screening history and other factors. Unfortunately, the approach may not be robust to model misspecification. We might reduce the problem by minimizing the number of simplifying assumptions or by assessing the impact of alternative simplifications. Alternatively, we might apply a techn;que that Robins developed to avoid this problem.28 The possibility of bias due to self-selection for screening is important to consider in observational studies of screening effectiveness. In fact, some self-selection may have occurred in the HIP study since mortality rates of women who were offered screening but refused it were lower than those of women randomized to the control group which was not offered screening.I5 If the differences are real, women who accepted screening may have been at higher risk than women who did not. If so, we may have underestimated the benefits of screening since we did not control for this bias in our analyses of data published from the HIP study. The proposed approach, however, allows the investigator to address this bias and, potentially, to control it in the analysis since ‘comparability’ must hold only conditional on recognized risk and prognostic factors which we can control in the analysis. Specifically, we can estimate each required parameter with use of multivariate analyses and with control of recognized risk and prognostic factors that may also relate to self-selection for screening. For example, we could use Poisson regression to derive estimates of each incidence and case fatality rate, conditional on important risk and prognostic factors and we could apply logistic regression to develop similar estimates of the proportions expected to have screen-detectable disease. Then, we can combine model-predicted rates for specific combinations of risk determinants with use of equation (4)to estimate survival for each combination. We can calculate summary survival estimates, if desired, by standardizing or averaging survival estimates, perhaps with use of precision-based weights. Thus, we can attempt to control the potentially serious bias due to self-selection by adjusting for recognized risk and prognostic factors that might also associate with self-selection for screening. This approach, compared with others, may allow us to control the bias more easily because we can do so, at least partially, by our inclusion of relevant terms in the regression analysis. Our estimate of benefit among those offered screening in the HIP study, 26.5 breast cancer deaths prevented, is similar to that of Walter and Stitt14 who estimated 25 deaths prevented. On the other hand, our estimate is lower than those of Aron and Prorok16 (about 40 deaths prevented) and that of Habbema et a l l 7 (also about 40 deaths prevented). Modest differences between the databases used may explain most of the difference between our estimate of benefit

978

W. D. FLANDERS AND I. M.LONGINI Jr.

and those of Aron and Prorok16 and of Habbema et a l l 7 Specifically, we estimated the incidence of breast cancer among unscreened women with use of data from the first five years of follow-up (these data were the only published data that allowed computation of incidence among those who refused screening). On the other hand, rates used by Aron and Prorok,16 based on data from the first 10 years of follow-up, were higher among the controls during the first five years than corresponding estimates published by Shapiro et ul.,' and increased with age. If we use h , = 0.00183 for the first five years (the incidence rate published by Shapiro et a/.16for the first five years among those who refused screening averaged with the corresponding rate among controls derived from Table 2 of Aron and Prorok16), and h , = 0.00209 for years 6 to 10 (the incidence among the control group for years 6-10 derived from Table 2 of Aron and Prorok), our estimate of benefit is 35.8 deaths prevented, much closer to the estimates of Habbema et a/. and of Aron and Prorok. Part of the remaining difference might be accounted for by controlling for differences between those who accepted and those who refused screening. Our approach differs importantly from the method proposed by Walter and Stitt for analysis of observational studies. First, our approach need not entail assumptions about lead time and length bias. Thus, our approach allows an empirical approach for assessment of screening effectiveness and can, therefore, complement and verify other approaches based on different assumptions. Second, implementation of our approach involves relatively few modelling assumptions. A key modelling assumption, at least as illustrated in our example, is uniform rates over age intervals. In large studies, one can minimize the importance of this assumption by using narrow intervals. Third, with use of multivariate analyses to estimate parameters, one can address bias due to self-selection for screening. Finally, the approach does not preclude use of assumptions about lead time and length bias. For example, we could have used the model-predicted survival estimates derived by Walter and Stitt,14 rather than the empirical estimates we used in our example. Eddy developed a sophisticated method to assess cost and benefits of screening programme^.^^ In his approach, he considers events related to screening as a series of Markov processes in which the different states are related by transition rates. One might also formulate the analysis described here more formally in terms of Markov processes, with the various events corresponding to the state transitions of a Markov process, and the incidence rates corresponding to transition rates. Thus, the analysis proposed here and that of Eddy24have fundamental similarities. Despite these similarities, our approach differs in other ways from that of Eddy. His approach was designed to estimate benefits and costs of screening for different screening programmes, perhaps with use of results from many different studies, whereas ours is designed primarily for analysis of observational studies. Moreover, his approach reflects important modelling assumptions such as the 'progression assumption', perhaps reasonable for many chronic diseases, whereas our approach can be empirical and allows use of different, relatively simple, assumptions such as constancy of rates over short intervals of age. In summary, our approach provides a method for the analyses of observational studies of screening effectiveness.To apply the approach, the investigator estimates transition rates with use of data from observational or other studies. The transition rates depend on prior screening history, have simple interpretations as mortality rates and incidence rates, and can be estimated directly in observational studies. Using our results, the investigator can then combine these transition rates to estimate, compare and study the influence of screening on survival. Although our approach could incorporate modelling assumptions that concern lead time distributions or effects of length bias, its major advantage is that it permits an empirical analysis based primarily on the assumption of constant rates over specified age intervals and that those who elect

BENEFITS OF SCREENING

979

screening are comparable to those who do not conditionally on measured risk and prognostic factors. Our approach, therefore, complements analyses based on different assumptions. ACKNOWLEDGEMENT

We thank Dr. Jamie Robins for his helpful comments.

REFERENCES 1. Morrison, A. S . Screening in Chronic Disease, Oxford University Press, New York, 1985. 2. Dubin, N., Friedman, D. R., Toniolo, P. G. and Pasternack, B. S . ‘Breast cancer detection centers and case-control studies of the efficacy of screening’, Journal of Chronic Diseases, 40, 104-1050 (1987). 3. Lynge, E. and Poll, P. ‘Incidence of cervical cancer following negative smear’, American Journal of Epidemiology, 124, 345-352 (1986). 4. Collette, H. J. A., Day, N. E., Rombach, J. J. and DeWaard, F. ‘Evaluation of screening for breast cancer in a non-randomised study (the DOM project) by means of a case-control study’, The Lancet, i, 1224-1226 (1984). 5. Verbeek, A. L. M., Hendriks, J. H. C. L., Holland, R., Mravunac, M., et al. ‘Reduction of breast cancer mortality through mass screening with modern mammography’, The Lancet, i, 1222-1224 (1984). 6. Johannesson, G., Geirsson, G . and Day, N. ‘The effect of mass screening in Iceland, 1965-74, on the incidence and mortality of cervical carcinoma’, International Journal of Cancer, 21, 418-425 (1978). 7. Foster, R. S. and Costanza, M. C . ‘Breast self-examination practices and breast cancer survival’, Cancer, 53,999-1005 (1984). 8. Feldman, J. G., Carter, A. C., Nicastri, A. D. and Hosat, S. T. ‘Breast self-examination, relationship to stage of breast cancer at diagnosis’, Cancer, 47, 2740-2745 (1981). 9. Huguley, C. M. and Brown, R. L. ‘The value of breast self-examination’, Cancer, 47, 989-995 (1981). 10. Morrison, A. S. ‘Case definition in case-control studies of the efficacy of screening’, American Journal of Epidemiology, 115, 6-8 (1982). 11. Sasco, A. J., Day, N. E. and Walter, S. D. ‘Case-control studies for the evaluation of screening’, Journal of Chronic Diseases, 39, 399-405 (1986). 12. Frommer, D. J. ‘Case-control studies of screening (letter)’, Journal of Clinical Epidemiology, 41, 101 (1988). 13. Sasco, A. J. ‘Lead time and length bias in case-control studies for the evaluation of screening (response)’, Journal of Clinical Epidemiology, 41, 103-104 (1988). 14. Walter, S. D. and Stitt, L. W. ‘Evaluating the survival of cancer cases detected by screening’, Statistics in Medicine, 6, 885-900 (1987). 15. Shapiro, S., Goldberg, J. D. and Hutchinson, G . B. ‘Lead time in breast cancer detection and implications for periodicity of screening’, American Journal of Epidemiology, 100, 357-366 (1974). 16. Aron, J. L. and Prorok, P. C. ‘An analysis of the mortality effect in a breast cancer screening study’, International Journal of Epidemiology, 15, 36-43 (1986). 17. Habbema, J. D. F., Van Oortmarssen, G. J. and Van Putten, D. J. ‘An analysis of survival differences between clinically and screen-detected cancer patients’, Statistics in Medicine, 2, 279-285 (1983). 18. Dubin, N. ‘Benefits of screening for breast cancer: application of a probabilistic model to a breast cancer detection project’, Journal of Chronic Diseases, 32, 145-151 (1979). 19. Day, N. E. and Walter, S. D. ‘Screening for cancer of the breast and cervix - estimating the duration of the detectable preclinical phase’, In Moolgavkar, S . H. and Prentice, R. L. (eds.) Modern Statistical Methods in Chronic Disease Epidemiology, Wiley, New York, 1986, pp. 247-258. 20. Chen, J. S. and Prorok, P. C. ‘Lead time estimation in a controlled screening program’, American Journal of Epidemiology, 118, 740-751 (1983). 21. Brookmeyer, R., Day, N. E. and Moss, S. ‘Case-control studies for estimation of the natural history of preclinical disease from screening data’, Statistics in Medicine, 5, 127-138 (1986). 22. Robins, J. M. ‘A new approach to causal inference in mortality studies with a sustained exposure period - application to control of the healthy worker survivor effect’, Mathematical Modelling, 7, 1393-1512 (1986).

980

W. D. FLANDERS AND 1. M.LONGINI Jr.

23. Robins, J. M. ‘A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods’, Journal of Chronic Diseases, 40, Suppl. 2, 139s-161 s (1987). 24. Eddy, D. M. Screening for Cancer: Theory, Analysis, and Design, Prentice-Hall, Englewood Cliffs, 1980. 25. Kalbfleisch, J. D., and Prentice, R. L. The Statistical Analysis of Failure-time Data, Wiley, New York, 1980. 26. Slud, E. and Byar, D. ‘How dependent causes of death can make risk factors appear protective’, Biometries, 44, 265-269 (1988). 27. Robins, J. M. ‘The control of confounding by intervening variables’, Statistics in Medicine, 8, 679-701 ( 1989). 28. Robins, J. M. ‘The analysis of randomized and non-randomized AIDS treatment trials using a new approach to causal inference in longitudinal studies’, in Sechest, L., Freeman, H., Mulley, A. (eds.), Health Service Research Methodology: A Focus on AIDS, NCHSR, USPHS, 1989, pp. 113-159.

Estimating benefits of screening from observational cohort studies.

Analysis and interpretation of observational studies of screening effectiveness is difficult because several biases threaten validity, including the s...
872KB Sizes 0 Downloads 0 Views