Oral Epidemiology

Community Dent. Oral Epidemiol. 1977: 6; 30-35 (Key words: dental caries, clinical trials)

Adequate cohort sizes for caries clinical trials ALBERT KINGMAN Biometry Section, Nationat Caries Program, Nationat Institute oj Dentat Research, National Institutes oj Health, Bethesda, MD, U.S.A. ABSTRACT - Methods of determining cohort sizes were examined to determine their appropriateness for use in multi-group earies elinieal trials. The appropriate method to use depends on the type of trial being planned. It is shown that multiple eomparison melhods using certain Bonferonni type t-stadstics ought to be used in trials in whieh different levels or frequencies of application of known earies inhibitors are being tested. It is also demonstrated that using the F-test proeedure in the determination of cohort sizes can result in unacceptable low sensitivity levels being realized for the comparisons of primary concern. Tables are presented which ean be used to determine group sizes needed to achieve specified sensitivity levels for two-group trials and multi-group trials. (Accepted jor pubtication 25 September 1977)

1 he determination of an adequate number of subjects to iticlude in a controlled clinical trial has been a topic addressed by investigators in various clinical settings and appears in the dental and medical literature periodically^'"''' '"''^ Many of these reports include tables providing needed cohort sizes in elinieal trials having specific significance levels, power or sensitivity levels, magnitudes of meaningful differences, and expected variation among responses. It is generally agreed that when any of the cohort size determination methods are used for caries clinical trials, one must consider: (1) the minimum between-group difference the investigator considers worth detecting, and (2) the specificity (1—a) and sensitivity (1—(3) levels at which this differenee must be detected. MARTHALER" discussed the case of two study groups. His tables present cohort size as a function of the coefficient of variation and anticipated reduction levels in caries increments, assuming either constant or proportional variances within groups, but only for the 50 % power or sensitivity level (1—P). These calculated cohort sizes, as MARTHALER" noted, should only serve as minimum figures, because using them will result in a specified between-group difference in caries iticidence being correctly detected only 50 % of the time. MCCLENDON, DRISCOLL, ABRAMS & BARBANO"'

discussed the multi-group situation and obtaitied approximate cohort sizes for different power levels based on the analysis of variance F-test procedure^ which tests the hypothesis of all group increment means being equal. Whether this procedure is appropriate to use for cohort size determination depends upon the nature of the trial. In situations where previously untested agents are being compared, or when little prior information is available! about expected efficacy levels, one may be forced to use this approach. However, if this is done, it should be clear that the power one achieves is relative to the conclusion that the groups behave differently, and does not hold for specific comparisons. However, in trials where known cariostatic agents are being tested at new dosage levels or in trials where comparisons of efficacy among two or more active agents are the intent of the study, cohort size determinations based on the F-procedure are inappropriate. Further, they can be severe underestimates of the true required sizes. The degree of underestimation is highly dependent on the magnitudes of the anticipated differences in treatments being compared. The purposes of this paper are: (1) to present a method for determining cohort sizes for multi-group caries clinical trials, specifically trials in which the relative effectiveness among treatments is of prij

Adequate cohort size mary concern, in terms of parameters familiar to investigators in caries research, and (2) to illustrate the lack of power or sensitivity which will exist in such trials when the cohort sizes are estimated by an inappropriate procedure,

MATERIAL AND METHODS In the typieal prospective study for testing the efficacy of a potential caries inhibitor, subjects are randomly assigned to the treatment and control groups, are examined at specifie time points, their caries increments, either DMFT or DMFS (or both), are computed and the group means are compared by a t-test, Typieally, the specificity of the test is set at the 95 % level, that is, a = 0,05, insuring that the investigator will not falsely eonelude that a non-effective treatment is effective more than 5 % of the time. It will be assumed throughout this paper that a = 0,05, although analogous resuits obtain for other values of «• However, the selection of a does not proteet the investigator from failing to correctly identify an effective caries inhibitor. This type of error is referred to as the p error, or a type 2 error, and can be of eonsiderable magnitude. The power or sensitivity level of the test is then defined as 1 —(5, In the past this type of error has not caused a problem in two-group trials because the typieal eohort sizes in these trials were usually sufficiently large to eonsistently deteet earies reduction levels of 25 % or larger. However, the (3 error becomes an important eonsideration in a two-group trial when testing an agent whose anticipated effectiveness level is relatively low, say less than 2 0 % , For example, suppose a 15 % reduction in caries increment were expected for a treatment group in a 3-year study where the eoefficients of variation could be expected to be 1, If there were 300 subjects per group at the completion of the study, then a 15 % reduction in caries incidenee could be corTahle 1, Number of subjeets needed per group to detect a speeified percent reduction in caries increment (R) between a treatment group and a eontrol group for a = 0,05 and selected power or sensitivity levels. Group sizes were calculated using the assumption of proportional variances, that is, the coeffieienls of variation in earies increments equal 1,0

R

50

60

70

80

90

95

10 15 20 25 30 35

695 295 160

893 375 203

1,124

1,913 801 432

2,368 990 534

99

125

265

327

66 48 36 28

84 60 44 34

22

27

1,429 599 323 198 132 94 69 53 42

176 124 92 70 55

218 153 114 86 68

45 50

rectly deteeted only about 50 % of the time (see Table 1), The problem is not with the t-test, but rather with the lack of power inherent in the test procedure when small numbers of subjects are used. As a result it will be difficult to correctly identify effeetive cariostatic agents when there are relatively few participants in the trial and small reduetion levels are expeeted. Another important faetor whieh needs eonsideration when planning a caries trial is the precise definition of a meaningful earies reduction level, A minimum value for this reduction should be identified and agreed upon by those planning the trial. Once this has been aeeomplished the investigator is able to select a manageable sensitivity level for the trial. Also, in those instances where eohort sizes are fixed due to other eonsiderations, it enables him to eompute the sensitivity level that can be achieved for the speeilied reduetion level. The methods used here follow the fundamental prineiples of statistieal hypothesis testing. We will use the twogroup trial to review the procedure. To simplify the argument, we will assume both groups have the same number of subjeets (n). If we let |X(, and jx, represent the true mean earies increment values that would occur in the population under the eontrol and treatment eondidons respectively, then A can be defined by

SENSITIVITY=.6O

ta/2 VALUE OF t-STATISTIC

SENSITIVITY = .9O

'ti/2

"

'a/2

VALUE OE 1 STATISTIC

Power or sensitivity ( l-(3)

% Re-

40

31

472 255 157 105

74 55 42 33

SENSITIVITY = .4O

a/2 VALUE OE t-STATISTIC

THE^ SHADED AREA REPRESENTS THE POWER OR SENSITIVITY LEVEL, THE ^ SHADED AREA REPRESENTS THE TYPE 1 ERROR OR SIGNIEICANCE LEVEL OE THE T-TEST,

Fig, 1, A-G, Differences in sensitivity levels that ean be achieved for a fixed value of A as a funetion of group size.

32

KINGMAN

A = |t,. - ^tt with its corresponding proportionate treatment effect The associated pereent reduction is then 100 R A The probability of eorreetly deteeting this A, or R A I is called the power of the test, A suggested rule of thumb^ is to set P = 4a, Gonsequently, the power or sensitivity level, 1 — p, would be 80 % for most trials. This would be consistent with those two-group trials of cariostatic agents reported in the literature, most of which had sensitivity levels ranging from 7 0 % to 99 %, Fig, 1 graphically illustrates the differenee in sensitivity levels that can be aehieved for a fixed value of A as a function of group size. The curve labeled Po represents the situation in whieh there is no treatment effect and the curve labeled P A the one in whieh a difference in group means of size A is experienced. Both distributions are ttype distributions having 2(n—1) degrees of freedom. Any value of the t-statistic larger than t^/a will result in the conelusion that the corresponding reduetion was signifieant. In Fig, 1 the shaded area under the eurve labeled P A to the right of the point t^/g represents the power of the test, that is, the percentage of trials in whieh we are able to correctly conclude that a significant caries reduetion has occurred in the treated group. The shape of the curves illustrated in A ean be altered by varying the group size. Increasing the group size results in a compression of this curve (see B), whereas a decrease in group size results in an expansion of the curve (see G), Thus the sensitivity of the study can be regulated by altering the size of the cohorts, A closer examination of Fig, lA suggests that for a and P speeified, the following relationship must hold'3. '"'^ "

1/ /

2 4-—2Y7

'•'"''''

^^^

where A is the minimum value for the differenee in group increments, and s,, and Sj are the standard deviations for the eontrol and treatment groups respectively. If estimates for these coefficients of variation, say k^ and kj respectively, are available to the investigator (from previous studies), then (3) ean be solved for n and beeomes (4)

where 100 R ^ 100 k^, and 100 kj represent the eorresponding percent reduction in caries incidence, and the coeffieients of variation expressed as pereentages of their respective group means, Glearly the needed eohort size is a function of both the speeifieity level (1—a) and the sensitivity level (1—p), MULTI-GROUP TRIALS In multi-group trials the principal goal is usually to determine whieh of the agents being tested is the most effeetive or to compare the relative effectiveness levels of different pairs of agents. The methods for the general case of g groups are derived and the three-group trial is discussed in some detail, Speeifically, we assume that we have several agents

being eompared together with a placebo. One method for determining cohort sizes needed in a multi-group study is to use the F-test procedure^, which was designed for testing the equality of all group means simultaneously. If the F-statistie is signifieant, it indieates there are significant differenees among the group means. Unfortunately, coneluding that differences exist does not identify which differences are significant. Further, the inclusion of a control ; group in sueh trials results in substantial power being' aehieved by this test even for relatively small numbers of _ partieipants. Thus the investigator may be deceived into believing that he is making specifie comparisons with much higher power than really exists. For these reasons individual comparisons between treatment groups were chosen as the basis for determining power or sensitivity levels that should be aehieved in a multi-group trial. It was reeognized that in these types of trials some group comparisons intrinsically would be of greater interest to the investigator than others. It is assumed here that there are c comparisons of primary interest to the investigator, and further, that c is small, Gertain Bonferronni type t-statistics^^ were used to make these eomparisons, and their properties were used to determine the eohort sizes needed to aehieve the desired specificity (1—a) and sensitivity (1—p) levels for sueh trials, Monte Garlo methods were used to simulate a 3-year trial in which two frequeneies of application, F and G, of a known cariostatic agent were compared with a placebo to illustrate the low sensitivity that can occur in multi-group trials when improperly determined group sizes are used, A negative binomial distribution for individual earies increments was assumed, with the parameters varied for each group. The expected percent reduetions were assumed to be 20 % and 35 % for the F and G groups respectively. The parameter values were selected so that the control group would experience a caries incidence of about 2 DMFS per year. The data from each trial were analyzed by t-tests and also by the F-test, The significanee level for the F-test was set at 5 % and a per comparison error rate oi 5 % was assumed for each of the three eomparisons made in these trials. This approach implies that the investigator states in advance the particular comparisons to be tested with the trial data. Provided that e is small, this approach generally produces eohort sizes that are smaller than those which would result from the determinations based on the more eonservative Tukey or Seheffe methods,3 These latter proeedures were designed for a posteriori type eomparisons and are also appropriate when un-, planned comparisons are made post hoe,

RESULTS

'?

Two-group trials - Table 1 displays the group sizes needed to achieve selected sensitivity levels for a = 0.05 and coefficients of variation equal to 1. The entries in Table 1 were obtained by solving (4) for the minimum value of n that satisfies the equation. To minimize the problem caused by unequal vari-

Adequate cohort size

33

Table 2, Values of the quantity (tj,/2 - ti_^)2/n for a = 0,05 and selected power or sensitivity levels Power or sensitivity (1-p) n

,50

,60

,70

,75

,80

,85

,90

,95

50 100 150 200 250 300 350 400 450 500 550 600

,08075 ,03936 ,02603 ,01943 ,01551 ,01290 ,01105 ,00966 ,00858 ,00772 ,00701 ,00642

,10250 ,05008 ,03313 ,02475 ,01975 ,01644 ,01408 ,01231 ,01094 ,00983 ,00893 ,00819

,12871 ,06299 ,04171 ,03116 ,02488 ,02070 ,01773 ,01550 ,01377 ,01239 ,01126 ,01032

,14459 ,07080 ,04688 ,03503 ,02797 ,02327 ,01993 ,01743 ,01548 ,01393 ,01266 ,01160

,16340 ,08002 ,05301 ,03960 ,03162 ,02632 ,02254 ,01971 ,01751 ,01575 ,01431 ,01311

,18688 ,09152 ,06063 ,04529 ,03616 ,03010 ,02578 ,02253 ,02002 ,01800 ,01636 ,01499

,21890 ,10718 ,07097 ,05303 ,04234 ,03524 ,03017 ,02638 ,02344 ,02108 ,01916 ,01755

,27170 ,13277 ,08788 ,06564 ,05240 ,04360 ,03734 ,03264 ,02901 ,02609 ,02371 ,02172

ances between groups, which is usually the case, (n—1) d.f. were assumed for the t-statistic rather than 2(n—1). An example will illustrate how Table 1 can be used. Suppose a 25 % reduction in caries incidence were considered worth detecting with a power or sensitivity level of 80 % in a 3-year caries trial. Further assume that the coefficients of variation can be estimated as 1 for each group (plausible for a 3year study). It can be determined from (4) and Table 1 that 198 persons per group are needed for a total of 396 persons at the termination of the trial. Further, note that MARTHALER'S tables" can be obtained by setting the sensitivity level (1 —P) at 50%. Multi-group trials - For the general multi-group trial let us assume there are g groups with anticipated reduction levels Ri, Ra, • • • Rs, where Ri denotes the expected reduction for the ith treatment group if it were compared with a control. If a control group is present in the trial, then the corresponding Ri = 0. Further, let ki, ks, . . . ks represent the anticipated coefficients of variation for the g groups respectively. Then the number of subjects needed per group for the comparison between group X and group y at the power or sensitivity level 1—13 is given by = 2 (ta/2-tl-/!)2

(5)

g This derivation follows similarly to that given in (3) and (4) if the pooled estitnate SD is substituted for the standard deviation in the t-statistic used for

comparing the average incretiients for groups x and y. Clearly, the calculated cohort size will be a function of Rx—Ry, That is for a specified power level, different cohort sizes will be needed depending upon the particular comparisons chosen as important. Of the c primary comparisons specified by the investigator a priori, the one for which the difference Rx—Ry is smallest is identified and n is computed by using this value of Rx—Ry in the denominator of (5), The n determined by this procedure will be the number of subjects required per group at the termination of the trial. Appropriate modifications would have to be made by the investigator to compensate for losses of subjects throughout the trial period. The actual determination of n can be greatly facilitated if (5) is rewritten as

~ ^>^k-'(i

RTT

(*^)

and it can be shown that the quantity on the left hand side of (6) is a decreasing function of n. Thus, since the right hand side of (6) is a presumed known constant, there is a minimum value of n for which (6) is satisfied, and this is the desired solution of n. Table 2 presents the values of the quantity (ta/3—ti-/!)-/n for n ranging from 50 to 600, with « = 0.05 and selected values of 1—(3. These methods can be illustrated by calculating the theoretical cohort sizes that would be needed in the trial which compared two frequencies of application, F and G, for which 20 % and 35 % reductions were expected. The comparison of primary interest here is

34

KINGMAN

Table 3, Percentage of times out of 1,000 runs that statistically significant differenees (at the 5 % level) were detected when no differenees existed No, of subjects per group Goniparison

100

200

300

400

500

600

4.2

5,9

5,2

4,2

4,9

5,0

4,7

4,8

4,7

5,0

5,2

6,0

5.3

6,9

5,4

5,1

4,7

4,0

4,6

6,1

5,4

5,0

4,5

5,0

Gontrol vs W

(0%) Gontrol vs D (0%) W vs D (0%) All groups (F-test)

Parameter values used Gontrol W

D

6,0 6.5 1,1

6,0 6,5 1,1

Averages s,d, G,V,

6,0 6.5 1,1

Rw—RD, that is, the comparison of the frequencies of application. Upon substituting in ( 6 ) : g = 3, Rw—RD = . 1 5 , 2ki2 (1—Ri)" = 2.0625, and we obtain 0.0164 for the right hand side of (6). Table 2 shows that n = 380 will suffice to achieve a 70 % sensitivity level for this comparison. The results of the Monte Carlo trials are sumTable 4, Pereentage of times out of 1,000 runs that statistieally significant differenees (at the 5 % level) were detected when 20 % and 35 % caries reductions actually occurred No, ol' subjects per group Gomparison 100

200

300

400

500

600

(20%) Gontrol vs D (35%) W vs D

36,6

60,3

77,8

87,3 93,2

78,7

98,5

99,9

(15%) All groups (F-test)

17,9

37,5

54,6

69,4

96,7

99,7

Gontrol vs W 96,2

100,0 100,0 100,0 69,2

77,2

83,4

100,0 100,0 100,0

Parameter values used

Averages s,d. G,V,

Gontrol

W

D 3,9 4,4 1,1

6,0

4,8

6.5

5.3

1,1

1.1

marized in Tables 3 and 4, The exact parametersjj values used are also summarized in these tables. Table 3 displays the type 1 errors as a function of cohort size. The percentage of times out of the! 1,000 runs the indicated comparison was found to be statistically significant at the 5 % level is sliowti when all three groups actually had identical mean caries increments. That is, these figures represent the percentage of times spurious differences would be declared statistically significant at the 5 % level. Theoretically, spurious differences should occur ap-| proximately 5 % of the time. 1 Table 4 displays the approximate type 2 error' sizes when 20 % and 35 % reductions in caries incidence were actually experienced by groups F and| G respectively. Again, the percentage of times out of the 1,000 runs in which the indicated comparison was found to be statistically significant at the| 5 % level are presented, together with the results of testing the equality of the three group means. These percentages approximate the power or sitivity (1—P) of the corresponding test that achieved in a study where these reduction levels are experienced.

I DISCUSSION

-

The differences in the number of subjects needed in a clinical trial as obtained by different methods illustrates the importance of determining the group sizes on the basis of the principle objectives of the study. The methods derived in this paper are based on the assumption that the caries increments follow a normal distribution. It is clear from the results that this assumption is really a mild one. A comparison of the empirical results obtained by the Monte Carlo procedure which assumed a negative binomial distribution of caries iticrenients shows a remarkable degree of consistency with the results obtained under the normality assumption. Whereas 90 and 380 subjects were found necessary to achieve 70 % sensitivity for the F-test and t-test respectively under the normality assumption, it was shown that approximately 100 and 400 subjects respectively were needed for the F-test and t-test under the negative binomial assumption. The point to be made here is that the degree of underestimation in group size can be substantial if one uses an inappropriate procedure in these determinations. Thus, even though one would be able to correctlv

Adequate cohort size conclude that the control, F and G groups had different caries increments in 70 % of those trials comparing these frequencies of application, in only 35 % of these trials would a 15 % difference between the effectiveness of F and G be correctly detected. Thus a more effective agent can routinely go undetected more often than not when compared with a less effective agent in a clinical trial. If we follow the suggestions of COHEN' or SNEDECOR & COCHRAN'-' and set the sensitivity level at 80 % or 90 % for the comparisons of primary interest, then one can be assured that a specified meaningful difference will not go undetected tnore than 10^ % or 20 % of the time. Such a strategy is reasonable in that it imposes sensitivity levels on multi-group trials similar to those realized in most two-group trials reported in the literature.

35

other side of statistical significance 'alpha, beta, delta, and the sample size, Clin. Charm. Ther. 1976: 18: 491-505, 5, Fi'.i.DT, L, S, & M,\iiMOUD, M, W,: Power funcuon charts for speeifiealion of sample size in analysis of variance, Psychometrika 1958: 23: 201-210, 6, JACKSON, D , : Errors in elinieal trials. In: Proceedings of the 12th Gongress of ORGA, Utrecht, Adv. Ftuor. Res. Dent, Caries Prev, 1965: 4: 23-32, 7, K,\STKNBAUM, M, A,, HoKI,, D, G, & Bo\VM,\N, K, O,: Sample size requirements: One-way analysis of varianee, Biometrika 1970: 57: 421-430, 8, KRAMER, M , & GREENHOUSE, S, W , : Determination of

sample size and selection of cases. In: Goi.E, J, O, & GER,\RD, R , W , : Psychopharviacology: Problems in evaluation. National Academy of Scienees, National Research Gouncil 1959, pp, 356-371, 9, MARTHALER, T , : Estimation of sample size for longitudinal elinieal caries trials, Helv. Odontot. Acta 1967: / / ; 167-174, 10, MCGLENDON, B , G , , DRISCOLL, W , S,, ABRAMS, A, M, &

REFERENCES 1, GHILTON, N , W , & Fr.RTiG, J, W,: The estimation of sample size in experiments, I, Using eomparisons of averages, / , Dent. Re.^. 1953: 32: 530-537, 2, COHEN, J,: Statisticat power anatysts jor the behaviorat ' sciences. Academic Press, New York 1969, p, 54, 3, DUNN, O , J,: Multiple comparisons among means, / , Am. Stat, Assoc, 1961: 56: 52-64, 4, FEINSTEIN, A , : Glinical biostatisdcs: XXXIV, The Address: A, Kingman, Ph. D. Biometry Section Nationat Caries Program National Institute oj Dental Research Nationat Institutes oj Health Bethesda, MD 20014 U.S.A.

BARBANO, J , P,: A procedure for estimating sample size in clinical trials of dental caries preventives, / , Dent. Res. 1972: 5 / ; 1589-1593, 11, MILLER, R , : Simultaneous statistical injerence, McGraw-Hill, New York 1966, pp, 67-70, 12, Scin,ESSEi.M,\N, J,: Planning a longitudinal study, I, Sample size determination, / , Ctironie Dis, 1973: 26: 535-560, 13, SNEDECOR, G , W , & GociiR,\N, W, G,: Statisticat methods. 6th ed, Iowa State University Press, Ames, Iowa 1967, pp, 111-115,

Adequte cohort sizes for caries clinical trials.

Oral Epidemiology Community Dent. Oral Epidemiol. 1977: 6; 30-35 (Key words: dental caries, clinical trials) Adequate cohort sizes for caries clinic...
5MB Sizes 0 Downloads 0 Views