STATISTICS IN MEDICINE. VOL. I I . 1759-1766 (1992)

CONTROL SAMPLE SIZE WHEN CASES ARE GIVEN I N CONSTANT RATIO STRATUM-MATCHED CASE-CONTROL STUDIES JUN-MO NAM AND THOMAS R. FEARS Biosinrisiics Branch, National Cancer Insiiiirte, 6130 Erecutiiie Bhd., EPNl403 Rocki'ille, Maryland 20892, U , S . A

SUMMARY

Strata-matched case-control studies based on a given number of cases and k times as many controls are common. We obtain a necessary and sufficient condition for there to exist a numerical solution for k with a desired level of a power. We derive the maximum power of the Cochran test, which may be less than one, when the number of cases is fixed. We provide an approximate formula for the minimum number of cases required for a specified power. There exists a numerical solution (that is, convergence of an iterative method) for k if the number of cases given is greater than this minimum number and no such a solution otherwise.We also show that the incremental gain in relative efficiency of the test with respect to k is diminished as k gets large. 1. INTRODUCTION The specifications of sample sizes for the case and control groups is a complex issue in the design of epidemiologic case-control studies. Sample size for a specific power of Cochran's test has been investigated by Woolson et a/.' and Nam' based on a model of several pairs of binomials. Nam and fear^^,^ have derived the optimum sample size allocation and its determination for a specified power in strata-matched case-control studies with cost consideration. Motivation for this paper came from problems encountered in the design of an epidemiologic study of rare cancers. Case acquisition was difficult and continued over several years. The number of cases was limited to those ascertained. Comparable controls were easily obtainable at relatively low cost. A chronic disease such as cancer has a strong association with age, so we used stratification by age to remove its confounding effect. The design called for a strata-matched case-control study with k controls for each case. We note that we do not consider 1 to k individually matched case-control designs in this paper. With a desired power specified, we found that no matter how large the control group, we could not obtain that power. The number of cases was too small. We consider a frequency-matched case-control study where we classify given cases into several categories (strata) according to the level of a confounding variable and we sample k times as many strata-matched controls as cases from a relevant control pool. We derive the limiting bound for the power of the test when k approaches infinity in Section 2 and present a formula for the minimum number of cases required for a specified power of the test for a constant odds ratio in Section 3. Section 4 presents examples and Section 5 contains final remarks. Researchers have been well aware that if there are too few cases, no increase in the number of controls can really generate a powerful study. In our paper, we make this idea precise by This paper was prepared under the auspices of the U.S. Government and is therefore not subject to copyright

Received September 1991 Revised April 1992

1760

J.-M. N A M AND T. R. F E A R S

determining the minimum number of cases needed to achieve a given power for various values of a constant ratio of controls to cases. including the limiting one sample situation in which controls are unlimited. 2. ASYMPTOTIC POWER OF STRATIFIED TEST Consider the design of a strata-matched case-control study with given cases and the use of k times as many controls as cases in each stratum. Denote the proportions of controls and cases exposed to the risk factor in the j t h stratum by p o j and p l j , sample sizes of controls and cases by t i o j and t i l j , and observed exposure rates of cases and controls by fioj = x o j / t ~ oand j f i l j = x I j / n l j for j = I , 2, . . . , J . Also, define qij = 1 - pij for i = 0. 1. Denote summation by dots, for example, s.,= .xoj x l j and ti.j = tioj + t 7 1 j . Define j j = . ~ . , , ' i i ,and ~ (ij = 1 - fij for j = 1, 2, . . . J . Assume a constant odds ratio between cases and controls over the J strata, $ = p l j q o j / ( p o j q l j )for every j. Cochran's statistic5 with a continuity correction for testing H o : $ = 1 against H 1 : $ > 1 is

+

.

n

zc = (U- l / 2 ) / { v a r ( U ) n ) 1 ' z A

where U = c w j ( f i l j- f i o j ) , var( U ) , = x \ v j f i j 4 j and w j = r ~ , ~ n , ~f o/ nr j, =~ 1, 2, . . . , J . A modification of the statistic is due to Mantel-Haenszel." All summations are henceforth, from 1 to J . We can express the asymptotic power of Cochran's test (for example, Nam') as P r ( z , 3 zll-llIH1)= 1

-

@(u)

(1)

where 'I = [ = I 1

'var(U)oj.' - E ( u )

-1) (

var(U), = x w j p , q j ,

pj

= (110jpoj

+ 1/2]/{var(~))l/'

+ tiI,pl,)/n,

+

var(U) = C ~ i ( p l j q l , , ' t l l , po,qo,/:t~o,) E ( U ) = x \ v j ( p l j - p o j ) . - I , is the 100 x ( 1 - 2 ) percentile of the normal distribution and @ denotes the cumulative standard normal. For large tiiys, (1) is the approximate power of the test. Since the design uses a constant ratio of controls to cases, i z o j = A n k j for every j. the weights

and exposure rates under the null hypothesis become w j= t i l j / ( l

+ L k ) and p j = ( p o j + p l j / k ) / ( l + I l k )

(2) for j = 1,2, . . . , J , so that we can express u in k without terms involving noj's. Since the partial derivative of u with respect to 1 I: k is always positive under the condition that the total number of cases is greater than a certain number, ti is monotone increasing with respect to l/k. Thus, u is a monotone decreasing function of k when the total number of cases is not very small (Nam and Fears'). Since 1 - @ ( u ) is inversely related to u, the approximate power of Cochran's test, (l),is a monotone increasing function of k . As k + x. the variance of U under the null and the alternative and the expectation of U approach var( U ) , = x n 1 , P o j q o j . var(U) = C n l j p l j q I j a n d E ( U ) = x n l j ( p l j - p o i ) , respectively. The approximate power of Cochran's test with a continuity correction for given cases converges to P r ( z c 3 :,l-z,lH1)= 1

-

@(u')

where u' =

Izll - z , ( C n , j P o j q o j ) 1!2 - C n , j ( P l j

- poj)

+ 1/2)/(Cnljp,jq,j)"'.

(3)

1761

CONTROL SAMPLE SIZE IN CASE-CONTROL STUDIES

Table I. The limit of the power of a one-sided stratified test with a continuity correction for detecting when two-fold increases in odds ratio in one-to-k strata-matched case-control studies as k increases to cases are given (number of strata = 2 and level of significance = 0.05) )Il1 = ifl2* =

Nlt/2

rill = N 1 i 3 and n 1 2 = 2 N l / 3

cases N,

po2* = 0.1

0.3

0.5

0.7

0.9

0.1

0.3

0.5

0.7

0.9

30 60 120

0.3 1 0.50 0.74

0.39 0.64 0.89

0.36 062 0.89

026 050 0.80

0.15

0.30 0.55

0.32 0.52 0.77

0.44 0.70 0.93

0.40 0.69 0.93

0.28 0.55 0.85

0.12 0.27 0.52

0.25

30 60 120

0.44 0.69 0.92

0.52 080 097

0.51 0.79 0.97

0.43 072 0.95

0.33 0.59 0.87

0.41 0.66 0.89

0.53 0.80 0.97

0.50 0.80 0.97

0.40 0.70 0.94

0.26 0.49 0.79

0.45

30 60 t 20

0.43 0.7 1 0.93

0.53 081 098

0.51 0.81 0.98

043 0.75 0.96

0.33 0.61 0.89

0.40 0.66 0.91

0.53 0.81 0.98

0.51 0.81 0.98

0.40 0.72 0.95

0.25 0.50 0.82

0.65

30 60 120

0.36 0.63 0.90

046 076 096

0.45 0.76 0.97

0.36 0.67 0.94

0.23

0.50 0.83

0.35 0.61 0.87

0.49 0.78 0.97

0.46 0.78 0.97

0.35 0.66 0.94

0.18 0.41 0.75

30 60 120

0.25 0.47 0.76

037 0.65 091

0.34 0.64 0.92

0.23 0.51 0.84

0.10 0.27 0.42

0.29 0.51 0.78

0.43 0.71 0.94

0.40 0.70 0.95

0.27 0.56 0.88

0.09 0.24 0.54

POI‘

0.05

0.85

+

* n,,: number of cases in thejth stratum t N , = n,, + nI2 2 p o l : exposure rate of controls in the j t h stratum

The power of the test is a monotone increasing function of k that converges to and is therefore bounded above by 1 - cD(u’) which is finite and less than 1. Thus, when the number of cases is fixed, the power of the continuity corrected Cochran test is bounded away from one even if the total sample size of a case-control study tends to infinity. It is therefore possible to specify power requirements that are unachievable no matter how many controls one selects. This statement also holds for general strata-matched case-control studies, noj = k i n l j for j = 1,2, . . . , J , when the minimum of the ratios of cases to controls, minj(kj), increases to infinity. Note that the result becomes essentially the theory of one-sample tests extended to several strata, that is, the control exposure rates approach the corresponding parameters. It is easy to show that as k + ix?, Cochran’s test is analogous to the test based on the standardized mortality ratio’ (SMR) where SMR = C.Xlj/C?IIjpoj. Applying the definition by Stuart,8 we obtain the asymptotic efficiency of Cochran’s test as

for given cases. When k + co,the limit of the efficiency of the test approaches ~ n l j p o j q O jThe . asymptotic relative efficiency of the test with k to that with k -, 00 is therefore 1 - ( 1 + k ) - ’ which increases monotonically with k. The increase in relative efficiency from k to k + 1 is { ( k + l ) ( k + 2 ) I - l so that the incremental gain diminishes as k increases. We calculated by (3) power limits for a one-sided continuity corrected Cochran’s test of H o : $ = 1 versus H,: $ = 2 for a wide range of control exposure proportions and numbers of

1762

J.-M. N A M A N D T. R. FEARS

cases. Table I provides a summary for J = 2. Power ranges from 9 per cent to 52 per cent for N , = 30,24 per cent to 81 per cent for N , = 60 and 42 per cent to 98 per cent for N1= 120. One needs a large number of cases for good power when the exposure rate among controls is less than 10 per cent or greater than 90 per cent.

3. CASES REQUIRED FOR SOLUTION FOR k FOR SPECIFIC POWER Since the asymptotic power of the continuity corrected Cochran test is monotonically increasing with k , we can find, in theory and by the inverse relationship, a value of k that corresponds with a given asymptotic power. We cannot. however, express such a k analytically, but we can obtain it by an iterative method, for example, Nam and Fears." We have interest in the necessary and sufficient condition under which a numerical solution for the inverse relationship exists. Define the proportion of all cases in the j t h stratum by t , = ti,, N ,where N l = z i t , , forj = 1,2, . . . , J . From (1 ) and (2), for given matching ratio k , we can write the minimum number of cases required for a given power = 1 - /3 of Cochran's test with a continuity correction

where

(see Appendix). A"l is the number of cases required for the power of the test without a continuity correction. As the matching ratio increases to infinity, the minimum number of cases for the power approaches

where N J1 - r(

-x)(zfjp"jqoj)'

~ ( 1

+

~ p , ( z t j P l j q 1 j ) " Z ) . 2 j j C t j ( Pl j Poj)}'.

If the number of cases is smaller than (3,then the power of the corrected test is less than 1 - b, no matter how many controls one samples. Therefore the necessary and sufficient condition required for a numerical solution for k for asymptotic power = 1 - b, is that the number of cases is greater than that provided by ( 5 ) . The form (4) is a special case of the general sample size formula for a strata-matched case-control study, see Nam.2 Table I1 provides the type of information a researcher requires to determine if the number of available cases is sufficient for a specific power. We calculated lower limits of N , for the existence of a value of k with power = 80 per cent and 90 per cent for H o : t+b = 1 against H , : IC/ = 2 for various exposure rates among controls. Table I1 summarizes results for J = 2 and t l == t 2 = 1/2. The minimum value of N is small for poj's near to 112 and large for poj's away from 1/2. For example, N 1 = 219 for pol = 0.05 and p o z = 0.90. and N1 = 59 for pol = 0.45 and p o 2 = 0.50. The former is roughly four times greater than the latter.

1763

CONTROL SAMPLE SIZE IN CASE-CONTROL STUDIES

Table 11. The minimum total number of cases required for power = 80 per cent and 90 per cent of one-sided stratified test for detecting II/ = 1 against II/ = 2 when the number of strata is 2, CI = 0.05 and r , = t 2 = 1/2 (numbers in upper and lower cases are those for 80 per cent and 90 per cent powers, respectively) Required number of total cases = N , Po 1

Po2

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.05

144 204

104 145

92 126

89 121

93 125

102 137

120 160

152 203

219 295

0.15

96 134

76 106

69 96

68 93

70 95

76 102

85 114

100

134

124 169

81 112

62 92

61 84

60 82

62 83

66 88

73 97

83 112

135

0.35

76 104

63 87

58 79

57 77

59 79

62 83

68 91

78 104

92 123

0.45

77 104

64 86

59 79

57 77

59 79

62 83

69 91

78 103

92 123

0.55

81 110

67 90

61 82

60 80

61 82

65 86

72 95

83 108

99 130

0.65

91 122

73 98

66 89

65 86

66 88

71 93

79 103

92 120

112 143

0.75

106 172

83 111

74 99

72 96

74 98

80 105

90 118

107 139

137 177

0.85

133 179

98 132

86 116

84 111

87 114

94 124

109 142

135 174

185 237

~~~~~

~

0.25

100

4. NUMERICAL EXAMPLES 4.1. Example 1

In a balanced case-control study for testing a possible association of colon cancer and chlorinated drinking water among males in Iowa,' proportions of all cases for ages 25-54, 55-69, 7&79 and 80-84 were 10, 40, 35 and 15 per cent, respectively, and the exposure rates for controls by ages were pol = 0.75, p O 2 = 0.70, p o 3 = 0.65 and po4 = 0.60. We consider a 1 to k strata-matched case-control study with 90 per cent power for a one-sided stratified test to detect a two-fold increase in the odds ratio. For a constant ratio matched design, we need N1 3 103 by (5). If we have, say, 14,56,49 and 21 cases in the four age groups, an iterative procedure (Nam and Fears'; Sections 4.2 and 4.3) provides k = 2.1 since N1 = 140. There is, however, no solution for k if we have 10,40, 35 and 15 cases since the total number of cases is 100, less than the 103 required. We must lower the required power for the study or accumulate more cases to meet the requirement. If constraints on funds and time require that k = 4 is the affordable upper bound for k, then the minimum number of cases needed for the power is, by (4), N 1 = 129. For the same 90 per cent power, we need a total control sample size of only 294 for N l = 140 and a much larger control size of 516 for N, = 129. The change in control size is surprisingly large and shows that the

1764

J.-M. NAM AND T. R. FEARS

efficiency of a design decreases rapidly as the disparity between case and control proportions increases. It also demonstrates the diminishing return in the gain of efficiency as control size increases. 4.2. Example 2

Nasopharyngeal carcinoma (NPC) is a rare cancer among Whites in the U.S.A.: annual age-sex adjusted incidence and mortality rates in 1983-87 were 0.4 and 0.2 per 100,000 respectively (Ries et ~ 1 . ~ To ) . investigate the role of smoking as an NPC risk factor, Nam et a[." conducted a case-control study based on the National Mortality Followback Survey data. Their study employed NPC deaths as cases and non-NPC deaths, excluding those deaths related to tobacco, as the pool of potential controls. Male cases by ages 25-29,3&39,40-49,5&59,6(r69 and 70 were r i l l = 2. n 1 2 = 10. r i 1 3 = 26. n , , = 58, u I 5 = 43. and n16 = 2 with the total N1 = 141. The study plan was to sample randomly k controls per case for the desired level of power. We wish to know the minimum number of cases required to provide an 80 per cent power for a one-sided stratified test at the 0.05 level to detect a two-fold increase in the odds ratio. For males, the control exposure rates to smoking by age categories were 0.67,0.70,0.77,0.78,0.79, and 0.66, respectively. Using the alternative odds ratio of two. we can specify exposure rates of cases as p I 1 = 0.802, p I 2 = 0.824, p 1 3 = 0.870, p 1 4 = 0.876, p i 5 = 0.883. and p16 = 0.795. Recalling t j = u l j / N l for everyj. we h a v e ~ t j p O j q O =j( 0 ~ 4 1 8 2 ) ' . ~ t , p l j q= l j(0.3343)' a n d E f j ( p l j - p o j ) = 0.0988. Using ( 5 ) . we obtain the minimum number of cases required for power = 80 per cent for the test with a continuity correction, as N , = 107. Since the number of cases exceeds 107, we proceed to a solution for k using the method of Nam and Fears,' Sections 4.2 and 4.3. A trial value. k = 2, yields power = 76 per cent. A few iterations lead to k = 2.8. Therefore, the control sample sizes sampled for 80 per cent power of the tests are liO, = 6, no2 = 28, no3 = 73, nO4 = 163, no5 = 121 and no6 = 6. These control sizes, together with given cases, yield u = -0.841 from (1) and we verify the 80 per cent power of the test for the design. One might have concern with the small size ( 2 cases and 6 controls) of the youngest and oldest age strata in a relation based on the asymptotic power of Cochran's stratified test that assumes large strata sizes. The contribution of these sparse strata, however, to the power is minimal (compared with that from the other four strata) and any adverse effect on approximations is negligible in terms of power. The actual power for this design, estimated with use of a Monte Carlo experiment with 10,000 simulations, is 80.9 per cent, indeed close to a nominal 80 per cent. The asymptotic formula worked reasonably well even with some strata sizes small.

+

5. DISCUSSION Nam and Fears' have presented a numerical method for determining the sample size for a specific power of the Cochran test in a strata-matched case-control study with a given number of cases. They warned that iterated values by their method may not converge if cases are sparse. In this paper. we provide an explicit formula for the minimum number of cases required for the existence of a numerical solution. Asymptotic power for the stratified test increases with k in a one to k strata-matched design. The incremental gain, however, diminishes as k increases. For a single stratum, Gail et a/." have suggested that values of k beyond 4 are hardly worthwhile when the alternative is near the null hypothesis. Breslow et nl." have noted that one might need a value of k more than 4 for a precise estimate of relative risk associated with rare exposures. Others, for example, Miettinen,13 UryI4 and Taylor.' have also investigated the efficiency of individually matched case-control studies

CONTROL SAMPLE SIZE IN CASE-CONTROL STUDIES

1765

with multiple controls per case and have reported a diminishing return in the gain of the efficiency as the number of controls per case increases. Section 3 provides the minimum number of cases needed for a numerical solution of k for a specified power. If the observed number of cases is more than the required minimum, we can find a value of k by an iterative method (Nam and Fears4). We should then decide whether or not the value is appropriate to proceed to the sampling of controls. Another strategy is to find a numerical solution of k subject to a practical upper bound for k, for example, k = 4 or 5. On a related problem, sample size for the power of the Mantel-Haenszel test based on hypergeometric distributions has been studied by Muiioz and Rosner16 as well as Wittes and Wallenstein.” Their sample size methods apply when all marginals are fixed. In planning a stratified case-control study, the case-control marginals are fixed but the exposure marginals are not. Sample size formulation for the power of the Cochran test based on pairs of binomials, that is, Woolson et a/.,‘ Nam2 and Nam and Fears? are therefore more appropriate in a case-control study. Finally, in this paper, we assumed that the number of given cases and that of controls sampled for each stratum is not small and the number of strata is not large. Our formula appeared to be satisfactory as long as most strata are large, even when some strata are sparse. We do not consider a refined matching where the number of strata is very large and all samples within stratum are sparse. APPENDIX: DERIVATION O F FORM (4) From ( 1 ) and (2),we can express the variances and expectation of U under noj = k n l j for every j as

where a=

b

{ k / ( k + 111 { C t j ( P l j - P o j ) }

+ 1 ) ) 3 ’ 2 { C t j (+~POl jj/ k ) ( q o j + q , j / k ) ) l i 2 + z , l - P ) { k / ( k+ 1 ) ) ( C t j ( ~ l j q l+j P o j q 0 j / k ) } ’ 1 2 .

=z(l-a){k/(k

A solution of the equation (6) is N

When there is no

:’2

= {b

+ (b2 + 2 a ) ” 2 } / ( 2 a ) .

continuity correction, that is, a N 1 - bN:/’ = 0 from (6),

(N;)’”’ = b/a. The square of (7) with the relation of (8) leads to N1 = N ; [ 1

where a, b and N ; are defined in (6) and (8).

+ { 1 + 2/(aN;)}’/2]2/4

1766

J.-M. N A M A N D T. R. FEARS ACKNOWLEDGEMENTS

The a u t h o r s wish t o t h a n k t h e referees for their helpful c o m m e n t s and Mrs. Jennifer Donaldson for typing the manuscript. REFERENCES 1 . Woolson, R. F., Bean, J. A. and Rojas, P. B. ‘Sample size for case-control studies using Cochran’s statistics’, Biometrics, 42, 927-932 (1986).

2. Nam, J. ‘Sample size determination for case-control studies and the comparison of stratified and unstratified analyses’, Biometrics, 48, 389-395 (1992). 3. Nam, J. and Fears, T. R. ‘Optimum allocation of samples in strata-matching case-control studies when cost per sample differs from stratum to stratum’, Statistics in Medicine, 9, 1475-1483 (1990). 4. Nam, J. and Fears, T. R. ‘Optimum sample size determination in stratified case-control studies with cost considerations’, Statistics in Medicine, 11, 547-556 (1992). 5. Cochran, W. G. ‘Some methods for strengthening the common ,y2 tests’, Biometrics, 10,417451 (1954). 6. Mantel, N. and Haenszel, W. ‘Statistical aspects of the analysis of data from retrospective studies of diseases’, Journal of the National Cancer Institute, 22, 719-748 (1959). 7. Registrar General of England and Wales. Decennial Supplement. Occupational Mortality, Part 11, Vol. 2, Tables, H. M. Stationary Office, London, 1958. 8. Stuart, A. ‘Asymptotic relative efficiencies of distribution-free tests of randomness against normal alternative’, Journal of the American Staristical Association, 49, 147-1 57 (1954). 9. Ries, L. A., Hankey, B. F. and Edwards, B. K. Cancer Statistics Rei%w 1983-87. National Cancer Institute, NIH Pub. No. 90-2789, Bethesda, Maryland, 1990. 10. Nam, J., McLaughlin. J. K. and Blot, W. J. ‘Cigarette smoking, alcohol and nasopharyngeal carcinoma: A case-control study among U S . whites’, Journal ofthe National Cancer Institute, 84, 619--622 (1992). 1 1 . Gail, M., Williams, R., Byar, D. P. and Brown, C . ‘How many controls?, Journal ofchronic Diseases, 29, 723-731 (1976). 12. Breslow, N. E., Lubin, J. H., Marek, P. and Langholz. B. ‘Multiplicative models and cohort analysis’, Journal ofthe American Staristical Association, 78, 1-12 (1983). 13. Miettinen, 0. S. ‘Individual matching with multiple controls in the case of all-or-none responses’, Biometrics, 25, 339-355 (1969). 14. Ury, H. K. ‘Efficiency of case-control studies with multiple controls per case: continuous or dichotomous data’, Biometrics, 31, 643-649 (1975). 15. Taylor, J. M. G. ‘Choosing the number of controls in a matched case-control study, some sample size, power and efficiency considerations’, Statistics in Medicine, 5, 29-36 (1986). 16. Mufioz, A. and Rosner, B. ‘Power and sample size for a collection of 2 x 2 tables’, Biometrics, 40, 995-1004 (1984). 17. Wittes, J. and Wallenstein. S. ‘The power of the Mantel-Haenszel test’, Journal of rhe American Statistical Association, 82. 1104-1 109 (1987).

Control sample size when cases are given in constant ratio stratum-matched case-control studies.

Strata-matched case-control studies based on a given number of cases and k times as many controls are common. We obtain a necessary and sufficient con...
488KB Sizes 0 Downloads 0 Views