CONCEPTS, COMPONENTS & CONFIGURATIONS research, sequential trials

Sequential Clinical Trials in Emergency Medicine In a traditional clinical trial, a fixed number of patients are ewduated before the data are analyzed. This has the disadvantage that more patients m a y be enrolled than are necessary to achieve a statistically significant result. Sequential statistical techniques provide a method for the analysis of clinical trials so that a reliable result is obtained with a m i n i m u m number of patients. In a sequential trial, the data are analyzed after each patient's outcome is known, and the trial is halted as soon as treatment efficacy or lack thereof is demonstrated. This study was undertaken to confirm the advantages of sequential statistical techniques over conventional fixed-sample-size statistical techniques for the analysis of clinical trials. Using sequential techniques, we conducted computer simulations of two fixed-sample-size clinical studies from the literature - a trial of hepatitis B vaccine in homosexual men (N Engl J Med 1980;303:833-841) and a trial of the pneumatic antishock garment in hypotensive patients with penetrating abdominal trauma (Ann Emerg Med 1987;i6:653-658). In the trial simulations, patients were randomly assigned to the control and test groups, their outcomes were determined randomly according to the frequency of outcomes observed in the actual studies, and the simulated sequential studies were continued until conclusions were reached. Thousands of possible realizations of each trial were simulated; thus, the distribution of required patient numbers was determined. The risks of type I and type II errors were set at 0.05. A sequential trial sufficiently sensitive to detect a vaccine-induced reduction in hepatitis B of 50% would have required a median of 150 patients to complete the follow-up to terminate and demonstrate efficacy given the observed vaccine efficacy of 92%. The actual hepatitis B vaccine study included 1,083 patients. Given the same lack of efficacy seen in the original trial, a sequential study designed to detect a 50% reduction in mortality with use of the pneumatic antishock garment would have required a median of 102 patients and demonstrated lack of efficacy. The actual study included 201 patients. A disadvantage of sequential trials in that while the number of required patients is reduced, the estimated magnitude of the treatment effect is slightly less accurate than that of conventional fixed-sample-size trials. The theoretical properties of sequential trials are confirmed by our simulations and make this trial design advantageous for m a n y clinical trials in emergency medicine. [Lewis R], Bessen HA: Sequential clinical trials in emergency medicine. Ann Emerg Med September 1990;19:i047-1053.]

Roger J Lewis, MD, PhD Howard A Bessen, MD, FACEP Torrance, California From the Medicine, Torrance, Medicine,

Department of Emergency Harbor-UCLA Medical Center, California; and UCLA School of Los Angeles.

Received for publication October 2, 1989. Revision received March 29, 1990. Accepted for publication May 1, 1990. Presented at the Society for Academic Emergency Medicine Annual Meeting in San Diego, May 1989. Address for reprints: Roger J Lewis, MD, PhD, Department of Emergency Medicine, Harbor=UCLA Medical Center, 1000 West Carson Street, Torrance, California 90509.

INTRODUCTION

Many areas of clinical research in emergency medicine concern the efficacy of critical interventions in reducing subsequent morbidity and mortality. Randomized controlled clinical trials are required to properly evaluate the efficacy of such interventions. Optimally, such trials are conducted with the smallest number of patients necessary to obtain a reliable result. In this way, the fewest possible patients are exposed to the less efficacious treatment, which may prove to be the control or the test treatment. In a traditional fixed-sample-size trial, the number of patients to be enrolled is calculated before any data are collected. This number is determined by the magnitude of the m i n i m u m t r e a t m e n t effect sought and the allowable risk of obtaining an erroneous result.~ 6 After the data have been

19:9 September 1990

Annals of Emergency Medicine

1047/145

CLINICAL TRIALS Lewis & Bessen

collected from this fixed n u m b e r of p a t i e n t s , a s t a t i s t i c a l t e s t is applied to d e t e r m i n e w h e t h e r the o u t c o m e s in the test and control groups are significantly different. Two possible c o n c l u s i o n s are t h a t t h e test treatm e n t is m o r e e f f i c a c i o u s t h a n t h e c o n t r o l t r e a t m e n t and t h a t the t e s t and c o n t r o l t r e a t m e n t s are e q u i v a lent. These conclusions can usually be m a d e u s i n g fewer p a t i e n t s t h a n the number required by a fixedsample-size trial. Sequential statistical techniques provide a m e t h o d for the analysis of clinical trials so that a reliable result is obtained w i t h a m i n i m u m n u m b e r of patients. In a sequential trial, the n u m b e r of p a t i e n t s depends not only on the m a g n i t u d e of t h e m i n i m u m t r e a t m e n t effect s o u g h t and the all o w a b l e r i s k of o b t a i n i n g an erroneous result but also on the results o b t a i n e d f r o m e a c h p a t i e n t as t h e trial progresses. 7q4 P a t i e n t s are assigned r a n d o m l y to either the control or the test t r e a t m e n t ; after each outc o m e is k n o w n , t h e data are reanalyzed to d e t e r m i n e w h e t h e r a conclusion has been reached. The trial is h a l t e d as soon as a r e s u l t is established w i t h adequate confidence. Specialized statistical methods m u s t be u s e d in a s e q u e n t i a l trial; one cannot s i m p l y reanalyze the data after each o u t c o m e w i t h s t a t i s t i c a l tests i n t e n d e d for f i x e d - s a m p l e - s i z e trials because doing so will lead to a high p r o b a b i l i t y of a false-positive result.ts, ~6 T h e n o t a t i o n and formulas for a s e q u e n t i a l t r i a l are o u t l i n e d using the t h e o r y of Whitehead. 9

THEORY OF A SEQUENTIAL TRIAL In d e s i g n i n g a n y c l i n i c a l t r i a l , there are two classes of error to consider. A type I error (also called an error) o c c u r s w h e n o n e c o n c l u d e s that there is t r e a t m e n t efficacy w h e n in fact t h e r e is not; it is a form of f a l s e - p o s i t i v e . A t y p e II error (also c a l l e d a ~ error) o c c u r s w h e n one fails to detect t r e a t m e n t efficacy that in fact exists; it is a form of falsenegative. ~7 To r e l i a b l y i n t e r p r e t the r e s u l t s of a s t u d y , r e g a r d l e s s of w h e t h e r the r e s u l t s suggest efficacy of the n e w t r e a t m e n t or not, studies m u s t be designed w i t h a well-defined risk of both type I and type II errors. The rate of type I error is called and is equal to the m i n i m u m P value that w o u l d be considered statistically 146/1048

significant (usually .05). The rate of type II error is called ~. The " p o w e r " of the s t u d y is given by 1 - f~ and is t h e p r o b a b i l i t y of d e t e c t i n g a t r u e specified t r e a t m e n t difference. A study that concludes that the test and control treatments are equivalent is meaningful only if f~ is low; o t h e r w i s e , the n e g a t i v e r e s u l t has an appreciable probability of occurring even if the test t r e a t m e n t is more efficacious than the control treatment. 18-~0 The sequential theory used here assumes t h a t R and ~ are equal. O t h e r sequential theories or a m o d i f i c a t i o n of this theory, can be used when c~ a n d f~ a r e n o t equal. 7,9,10 The s y m b o l 0 is used here to repres e n t t h e degree to w h i c h t h e t e s t t r e a t m e n t is superior to the control treatment. T h e value of 0 is positive if the test t r e a t m e n t is b e t t e r t h a n the control t r e a t m e n t and is negative if t h e t e s t t r e a t m e n t is l e s s efficacious. T h e reference i m p r o v e m e n t of the test t r e a t m e n t over the control t r e a t m e n t that the trial is designed to d e t e c t is d e n o t e d 0,.. It m a y be the m i n i m u m i m p r o v e m e n t over control t h a t w o u l d be c l i n i c a l l y significant, or it m a y be larger if it were impractical to l o o k for the s m a l l e s t clinically s i g n i f i c a n t i m p r o v e m e n t . T h i s latter case m a y occur if only very few patients are available for the study. The true i m p r o v e m e n t of the test t r e a t m e n t over the control t r e a t m e n t is 0. T h i s is the v a l u e the c l i n i c a l trial is d e s i g n e d to m e a s u r e . If 0 is equal to 0r, then a properly designed study will detect this i m p r o v e m e n t , w i t h a p r o b a b i l i t y of 1 ~9 If 0 is 0 (ie, t h e t e s t and t h e c o n t r o l treatm e n t s are equivalent), t h e n a properly designed study will yield a negative result, w i t h a probability of 1 O~.9

In this study, trials t h a t are randomized, binary, and between-subjects are considered. " R a n d o m i z e d " in this c o n t e x t m e a n s that each pat i e n t is r a n d o m l y a l l o c a t e d to t h e t e s t or t h e c o n t r o l t r e a t m e n t . "Binary" refers to the fact that there are only two possible o u t c o m e s for each patient, loosely defined as success or failure. " B e t w e e n - s u b j e c t " refers to the fact that each patient can receive only one t r e a t m e n t . The probability of success w i t h the test t r e a t m e n t is Pt, and the probab i l i t y of s u c c e s s w i t h t h e c o n t r o l t r e a t m e n t is Pc. T h e n u m b e r of paAnnals of Emergency Medicine

tients w h o received the test and control treatments, at any p o i n t in the trial, are N t and N~, respectively. T h e n u m b e r of p a t i e n t s w h o h a v e h a d s u c c e s s f u l o u t c o m e s after r e c e i v i n g the test t r e a t m e n t is St, and the n u m ber of p a t i e n t s w h o have had successful o u t c o m e s after receiving the control t r e a t m e n t is Sc. The m e a s u r e of the relative treatm e n t efficacy, 0, is defined by o ~

Pdl Pc(; -

in

GI P,)

(t)

w h e r e I n refers to t h e n a t u r a l log. T h i s m e a s u r e of t h e d i f f e r e n c e bet w e e n Pt and P~ is useful even if b o t h probabilities of success are very close to 0 or 1. In deciding the value of 0r, the reference i m p r o v e m e n t of the test treatm e n t over the control t r e a t m e n t t h a t the sequential trial w i l l be designed to detect, Pc, the p r o b a b i l i t y of success with the control treatment, w h i c h is u s u a l l y the present standard of care, m u s t be e s t i m a t e d . N e x t it m u s t be d e t e r m i n e d , o n c l i n i c a l grounds, w h a t value o f / ) t w o u l d represent a clinically i m p o r t a n t increase in t r e a t m e n t s u c c e s s r a t e over Pc. T h e s e t w o v a l u e s c a n be u s e d in Equation 1 to d e t e r m i n e 0r. Once t h e basic p a r a m e t e r s of the s e q u e n t i a l trial are d e t e r m i n e d , patients m a y be enrolled. Each is rand o m l y assigned to either the control or the t e s t t r e a t m e n t , and the outc o m e recorded. A f t e r every p a t i e n t , two statistics are calculated. T h e statistic Z is a measure of the degree to w h i c h the test t r e a t m e n t seems better t h a n t h e c o n t r o l t r e a t m e n t , and the s t a t i s t i c V is a m e a s u r e of the a m o u n t of i n f o r m a t i o n contained in the data up to that point. T h e y are given by 9 Z

and

NoN t

S,

S~

Nc + Nt

Nt

Nc

_ NcSt

N t S ~ 12)

N, + N,:

V = N~Nt(S'+S~) ( N * - S t + N ~ - S c ) (N t + N~) 3

(3)

These expressions apply to a binary trial in w h i c h Equation 1 is used to define 0. Each n e w c o m b i n a t i o n of Z and V values is plotted on a graph w i t h Z as the v e r t i c a l axis and V as the horiz o n t a l axis. As d a t a are c o l l e c t e d , these p o i n t s will form an irregular path that drifts to the right (Figure 1) b e c a u s e the a m o u n t of i n f o r m a t i o n 19:9 September 1990

2"

Z

0" ~

Test Treatment Better

-

/ ~./ / ~

N

~ ~

Continuation Region

Test Treatment Not Better Than Control Treatment

-1" -2

FIGURE I. Hypothetical result from a sequential trial in which the test treatment is shown to be superior to t h e c o n t r o l t r e a t m e n t . A x e s are m a r k e d in arbitrary units. For clarity, the path shown is m u c h coarser than that of the actual simulations. Terminal value of V, denoted V*, is approximately 3.2 units.

o

i

7

V

1

2O

20

Z 0

0-

-10

-10-

-20

Z

0

1'o

2'o

3'0

4'0

-20

0

2O

2O

1 0 " ~

1

0

0"

0-

10"

-10-

20

o

,.O

2'0

3'0

4'o

V contained in the data, as m e a s u r e d by the variable V, increases as m o r e data are obtained. The p a t h t a k e n by the p l o t t e d points will tend to drift up if the test t r e a t m e n t is m o r e efficacious t h a n t h e c o n t r o l t r e a t m e n t , and Z will be positive and increase as data are collected. On the other hand, if m o r e p a t i e n t s have successful out19:9 September 1990

-20

1'0

0

2'0

-

1'0

3'0

4.0

3'o

4'0

~

2'0

V

2

comes with the control treatment than w i t h the test treatment, then Z will be negative and decrease as m o r e data accumulate. The sequential trial is t e r m i n a t e d as soon as one of two boundaries on t h e g r a p h is c r o s s e d . If t h e u p p e r b o u n d a r y is crossed, the t e s t treatm e n t is better than the control treatAnnals of Emergency Medicine

FIGURE 2. Effects of variation of Or, ~, a n d ~ on the p o s i t i o n of t h e boundaries u s e d to d e t e r m i n e t h e t e r m i n a t i o n p o i n t of a s e q u e n t i a l trial. A. Boundaries are shown for a trial designed with Or = 1.00 and = ~ - .05. B. N e w boundary positions if Or = 0.67 and ~ and ~ are unchanged. In other words, the trial is designed to detect a smaller treatm e n t e f f i c a c y and t h e r e f o r e w i l l p r o b a b l y require m o r e patients. C. N e w boundary positions if O r = 1.50, again leaving c~ and ~ unchanged. The resulting trial will detect only large differences b e t w e e n the s t u d y groups but will require f e w patients to do so. D. Result of returning Or to a value of 1.00 but reducing ~ and f~ to .01. This trial would yield a very l o w risk of type I and type II errors b u t w o u l d require m o r e p a t i e n t s than the trial shown in A.

m e n t . If t h e l o w e r b o u n d a r y is crossed, the test t r e a t m e n t is not better than the control treatment. T h e p l a c e m e n t of the b o u n d a r y lines depends on % ~, and ~, w h i c h are rel a t e d to t h e s e n s i t i v i t y , d e s i r e d P value, and power of the trial, respect i v e l y (Figure 2). T h e e q u a t i o n s for the boundary lines are given in Appendix A. Once a sequential trial has terminated, several o t h e r pieces of inform a t i o n m a y be obtained, in addition to the r e l a t i v e efficacy of t h e t w o treatments. T h e value of V that occurred w h e n t h e trial t e r m i n a t e d is d e n o t e d V*. W i t h t h e u s e of p u b lished tables, the values of V* and 0r can be used to d e t e r m i n e both the P value for the trial and the 95% confidence interval for 0.9,2l, ~2 T h e sooner a trial terminates on the upper b o u n d a r y , g i v i n g a s m a l l v a l u e for V*, the smaller the resulting P value and the larger the resulting 0 value.

METHODS To d e m o n s t r a t e t h e p r o p e r t i e s of sequential trials, we used sequential 1049/147

CLINICAL TRIALS Lewis & Bessen

FIGURE 3. Distribution of the number of patients required for the termination of a s i m u l a t e d s e q u e n t i a l trial of hepatitis B vaccine. The number of patients required ranged from 38 to 380, with a median of 150. m e t h o d s in c o n d u c t i n g c o m p u t e r simulations of two fixed-sample-size clinical studies from the literature a trial of hepatitis B vaccine in hom o s e x u a l m e n 2a and a trial of the pneumatic antishock garment (PASG) in hypotensive patients with penetrating abdominal trauma. 24 In each trial s i m u l a t i o n , h y p o t h e t i c a l patients were r a n d o m l y assigned to either the control or the test group, and their o u t c o m e s were r a n d o m l y determined according to the frequencies of o u t c o m e s o b s e r v e d in the original fixed-sample-size study. For example, in the h e p a t i t i s v a c c i n e trial, 18.1% of the placebo-treated group and 1.4% of the vaccine recipients developed hepatitis B. In the c o m p u t e r - s i m u l a t e d trial, patients were r a n d o m l y assigned to v a c c i n e or placebo; each placebotreated patient had an 18.1% chance of "developing" hepatitis, and each vaccine recipient had a 1.4% chance of "developing" hepatitis. The simulated sequential study was continued with additional patients until a conclusion was reached, and then the entire simulation was repeated. Twenty-five thousand different realizations of each trial were simulated; thus, the distributions of required patient n u m b e r s , P values, and 95% c o n f i d e n c e i n t e r v a l s w e r e determined. T h e s i m u l a t e d s e q u e n t i a l trials were designed w i t h a risk of both type I error (a) and type II error (~) of .05. T h e s a m p l e sizes t h a t w o u l d h a v e been r e q u i r e d by e q u i v a l e n t fixed trials were determined from expressions in Whitehead. 9 T h e s i m u l a t i o n s were performed on an IBM P C / A T c o m p u t e r with software written in the language C. The r a n d o m - n u m b e r r o u t i n e s were based on p u b l i s h e d shuffled linear congruence generators. 2,~ The simulation of 25,000 trials required approximately six hours of computer time. One end point of the hepatitis B vaccine trial 2a was hepatitis B infect i o n w i t h an a l a n i n e a m i n o t r a n s ferase elevation of 90 IU. Of the placebo-treated control group, 81.9% had a s u c c e s s f u l o u t c o m e in t h a t 148/1050

No.

of Trials

50

lO0

150

200

250

300

350

400

450

No. of Patients

they did not experience infection by this definition. In the vaccine-treated group, 98.6% did not experience infection. 2:~ Thus, the vaccine produced a 92% reduction in hepatitis B infection. A value of 80%, or 0.80, was used as an estimate for Pc, the success rate w i t h t h e c o n t r o l t r e a t m e n t ; this value might have been k n o w n before conducting the study. A 50% reduction in hepatitis B infection was used as the reference improvement in our simulated tria'ts because this would be a clinically significant improvement. A 50% reduction in this failure rate w i t h the t e s t t r e a t m e n t would yield a 10% failure rate, or a 90% success rate. Thus, Pt would be 90%, or 0.90. The value of 0 r is obtained from Pc and Pt by Equation 1, which yields a value of 0.811. T h e a u t h o r s of the s t u d y " e s t i mated that a m i n i m u m of 800 participants would be required to prove efficacy of the vaccine with a power of 0.9 and a level of s i g n i f i c a n c e of 0 . 0 1 . ''23 Such an e s t i m a t e requires the definition of a m i n i m u m level of efficacy to be detected, and although this level was not stated, it is believed to have been an approximately 50% to 70% efficacy (CE Stevens, personal c o m m u n i c a t i o n ) . A fixedsample-size trial designed for a power of 0.95 and a .05 significance level (1 -- [~ and % respectively) and designed to detect a 50% reduction in hepatitis B i n f e c t i o n w i t h the vaccine would require approximately 520 patients. 9 Annals of Emergency Medicine

The PASG trial examined the effect of the PASG on survival in hypotensive patients w i t h penetrating a b d o m i n a l trauma. 24 The observed survival rate without the PASG was 77.9% a n d w i t h t h e P A S G w a s 69.1%. Again, 0.80 was used as an a p p r o x i m a t e value for P~., and the simulated sequential trials were designed to detect a 50% reduction in m o r t a l i t y w i t h the PASG w i t h a power of 0.95 and a significance of .05. Thus, 0r was 0.811, and an equivalent fixed-sample-size trial w o u l d have required approximately 520 patients. T h e actual PASG trial, with 201 patients, would have had inadequate power to detect a 50% reduction in mortality with PASG for the estimated Pc. For the actual PASG trial to have had a power of 0.95, a 70% r e d u c t i o n in m o r t a l i t y w i t h PASG would have been necessary. 9 N o attempt has been made to provide all of the information required to replicate our simulations or to design and conduct a sequential clinical trial because some details are bey o n d the scope of this discussion. Further information on this form of sequential analysis is available in Whitehead. 9 The major results of our simulations have been independently c o n f i r m e d (J W h i t e h e a d , p e r s o n a l communication) with available softw a r e . 2 6 , 27

RESULTS All 25,000 sequential simulations of the hepatitis B vaccine trial concluded that the vaccine was more ef19:9 September 1990

FIGURE 4. Distribution of the num-

ber of patients required for the termination of a s i m u l a t e d s e q u e n t i a l trial of the PASG. The number of patients required ranged from 25 to 405, with a median of 102.

No.

of Trials

0

|

m

m

50

100

150

n

m

200 250 300 No. of Patients

460

45O 4

ficacious t h a n the placebo. T h e number of patients required to reach this c o n c l u s i o n ranged f r o m 38 to 380, w i t h a m e d i a n of 150; m o r e t h a n 90% of the trials concluded w i t h less than 225 patients. T h e distribution of r e q u i r e d p a t i e n t n u m b e r s is s h o w n (Figure 3). The actual study included 1,083 patients. O n e d i s a d v a n t a g e of a s e q u e n t i a l trial is that the reduction in required sample size, relative to a fixed trial, leads to lower accuracy in e s t i m a t i n g the actual m a g n i t u d e of t r e a t m e n t effects. T h e true efficacy of the vaccine, 0, was 2.745, a n u m b e r that is c a l c u l a t e d f r o m t h e o b s e r v e d frequencies of hepatitis infection in the placebo and vaccine groups with Equation 1. This is equal to a 92% r e d u c t i o n in h e p a t i t i s (Appendix B). T h e m e a n average e s t i m a t e from the sequential s i m u l a t i o n s was 1.859 + 0.297, w h e r e 0.297 is t h e s t a n d a r d deviation of these e s t i m a t e s (an app r o x i m a t e l y 81 + 5% r e d u c t i o n in hepatitis). More important, the m e a n 95% confidence interval for 0 ranged f r o m 0 . 7 6 0 to 2.953 (an a p p r o x i m a t e l y 48% to 94% r e d u c t i o n in hepatitis), w i t h 30.1% of these intervals u n d e r e s t i m a t i n g and not including the true value of 0. A l t h o u g h a s e q u e n t i a l t r i a l designed to detect a 50% reduction in h e p a t i t i s w o u l d h a v e r e q u i r e d far fewer patients than the actual fixedsample-size trial, the sequential trial probably w o u l d have u n d e r e s t i m a t e d the efficacy of the vaccine. This syst e m a t i c u n d e r e s t i m a t i o n of the true 19:9 September 1990

m a g n i t u d e of t h e t r e a t m e n t effect probably results from the sequential trial being designed to detect a treatm e n t effect m u c h s m a l l e r t h a n the true v a l u e ; in o t h e r words, 0 was m u c h larger than % A n o t h e r p o s s i b l e e x p l a n a t i o n is that the hepatitis vaccine results in so few failures that a n o r m a l approxi m a t i o n u s e d in t h e t h e o r y of t h e a n a l y s i s is less a c c u r a t e (J W h i t e head, personal communication). The c o n c l u s i o n s about t h e r e l a t i v e efficacy of the test and control groups rem a i n valid, however. W h i t e h e a d 9 discusses the problems of interpretation of confidence intervals w h e n the observed efficacy of the test t r e a t m e n t is very different from % To verify that a sequential trial designed to detect only a very large vaccine efficacy w o u l d be m o r e accurate in m e a s u r i n g the effect of the vaccine, sequential trials designed to detect a 90% reduction in hepatitis (0~ - 2.506) were simulated. These trials t e r m i n a t e d w i t h a m e d i a n of 59 patients, and 95.7% of the 95% conf i d e n c e i n t e r v a l s i n c l u d e d the t r u e vaccine efficacy. Of t h e 25,000 s i m u l a t i o n s of the PASG trial, all but 12 concluded that the PASG was not more efficacious than the control. The n u m b e r of patients required to reach this conclusion ranged from 25 to 405, w i t h a m e d i a n of 102, and m o r e than 90% of t h e t r i a l s t e r m i n a t e d w i t h l e s s than 180 patients. The d i s t r i b u t i o n of required p a t i e n t n u m b e r s is s h o w n (Figure 4). The actual study used 201 Annals of Emergency Medicine

patients and had a lower power. The true efficacy of the PASG, 0, was - 0 . 4 5 5 (an a p p r o x i m a t e l y 40% increase in mortality), and the average e s t i m a t e of 0 w a s - 0 . 5 3 9 +0.477 (an a p p r o x i m a t e l y 55 _+ 50% increase in mortality). All but 4.4% of the 95% c o n f i d e n c e i n t e r v a l s included the true value of 0. The m e a n l i m i t s of the 95% c o n f i d e n c e intervals were - 1 . 4 5 0 and 0.380 (an app r o x i m a t e l y 160% i n c r e a s e in mortality to a 27% decrease in m o r t a l i t y w i t h the PASG). The s i m u l a t i o n s of both the hepat i t i s B v a c c i n e and the PASG trials w e r e d e s i g n e d w i t h r i s k s of t y p e I and type II errors of .05. Yet no type II errors were seen in the hepatitis B trial simulations, and only 12 of the 25,000 PASG trial simulations yielded a type I error. The low rate of error occurred because the hepatitis B v a c c i n e was m u c h m o r e efficacious t h a n t h e s e q u e n t i a l t r i a l was designed to detect and the PASG was a c t u a l l y less e f f i c a c i o u s t h a n t h e control treatment. DISCUSSION Most clinical studies are performed w i t h a fixed sample size. Because of u n c e r t a i n t y in the efficacies of t h e t r e a t m e n t s being studied, the fixed s a m p l e size used m a y n o t be optim u m . T h e n u m b e r of p a t i e n t s m a y be too small to yield a valid conclusion; m a n y fixed trials in w h i c h a t r e a t m e n t has been found "ineffect i v e " have h a d i n a d e q u a t e samples, leading to high rates of type II error and d r a w i n g their c o n c l u s i o n s i n t o doubt. 2o Alternatively, the n u m b e r of patients enrolled m a y be greater than that needed to achieve a s t a t i s t i c a l l y significant result. This exposes m o r e patients than necessary to the less eff i c a c i o u s t r e a t m e n t and adds t i m e and expense to the study. In a s e q u e n t i a l trial, the data are reanalyzed as soon as each patient's o u t c o m e is k n o w n , and t h e trial is halted as soon as t r e a t m e n t efficacy, or l a c k t h e r e o f , is d e m o n s t r a t e d . Thus, the n u m b e r of patients is m i n i m i z e d but is not so small as to yield an u n r e l i a b l e r e s u l t . T h i s r e p e a t e d 1051/149

CLINICAL TRIALS Lewis & Bessen

a n a l y s i s of d a t a c a n n o t b e d o n e w i t h traditional fixed-sample-size statistical m e t h o d s b e c a u s e d o i n g so g r e a t l y i n c r e a s e s t h e r i s k of t y p e I error.lSA 6 Our simulated sequential replicat i o n s of b o t h t h e h e p a t i t i s B a n d t h e P A S G t r i a l s c o n f i r m t h e e c o n o m y of s a m p l e size e x p e c t e d for a s e q u e n t i a l trial. S u c h s a m p l e - s i z e s a v i n g s c a n b e essential, particularly when the trial i n v o l v e s t h e t h e r a p y of a s e r i o u s illn e s s or t h e u s e of t r e a t m e n t s w i t h significant cost or morbidity. In m a n y p l a n n e d s t u d i e s , large n u m b e r s of p a t i e n t s are n o t a v a i l a b l e for enrollment in a clinical trial; sequential t e c h n i q u e s m a y a l l o w t h e perform a n c e of s t u d i e s t h a t w o u l d o t h e r w i s e b e d i f f i c u l t to c o m p l e t e . O n t h e o t h e r h a n d , a l t h o u g h sequential analysis reduces the average n u m b e r of p a t i e n t s r e q u i r e d , t h e n u m b e r t h a t w i l l b e r e q u i r e d for a n y p a r t i c u l a r t r i a l is n o t p r e d e t e r m i n e d ; t h u s , o n e m u s t a l l o w for t h e p o s s i bility that the trial may require a l a r g e r n u m b e r of p a t i e n t s to t e r m i n a t e t h a n e x p e c t e d . T h i s is m o s t l i k e l y to o c c u r if t h e t r u e efficacy of t h e t e s t t r e a t m e n t is s m a l l b u t s t i l l g r e a t e r t h a n t h a t of t h e c o n t r o l t r e a t ment. A s w e h a v e s h o w n , c o n c l u s i o n s regarding relative treatment efficacy are u n a f f e c t e d b y t h e u s e of a s e q u e n tial design. The sample size reduction obtained with a sequential trial does, h o w e v e r , r e s u l t i n less a c c u r a c y i n t h e e s t i m a t i o n of t h e m a g n i t u d e of t r e a t m e n t effects. T h i s is p a r t i c u l a r l y t r u e if t h e a c t u a l m a g n i t u d e of t h e t r e a t m e n t effect is m u c h g r e a t e r t h a n t h a t t h e s t u d y is d e s i g n e d to d e t e c t . The advantages of sequential m e t h o d s are m o s t c o m p e l l i n g w h e n a p p l i e d to t r i a l s w i t h s e v e r a l s p e c i f i c characteristics. The trial should be r a n d o m i z e d , p r o s p e c t i v e , and, if possible, d o u b l e - b l i n d . T h e o u t c o m e for each patient should be known in a s h o r t t i m e r e l a t i v e to t h e d u r a t i o n of t h e trial. T h e d e t e r m i n a t i o n of t h e r e l a t i v e e f f i c a c y of t w o t r e a t m e n t s m u s t b e m o r e i m p o r t a n t t h a n a n exa c t e s t i m a t i o n of t h e m a g n i t u d e of t h e t r e a t m e n t effect. Sequential techniques have been u s e d i n m a j o r p u b l i s h e d studiesZS, ¢9 although not yet in emergency medicine. We have shown the application of s u c h t e c h n i q u e s t o t w o s p e c i f i c c l i n i c a l s t u d i e s , e a c h w i t h t w o disc r e t e o u t c o m e s . It s h o u l d b e n o t e d that sequential methods are also 150/1052

a v a i l a b l e for t h e c o m p a r i s o n of m o r e than one treatment with a control a n d for t h e c o m p a r i s o n of c o n t i n u o u s variables. 9 CONCLUSION For b o t h e t h i c a l a n d e c o n o m i c reasons, investigations should be cond u c t e d w i t h t h e f e w e s t n u m b e r of subjects required to obtain a valid a n d r e l i a b l e r e s u l t . As s e e n f r o m o u r computer simulations, a sequential t r i a l w i l l u s u a l l y r e q u i r e f e w e r pat i e n t s t h a n a n e q u i v a l e n t fixed trial. The statistical formulas and proc e d u r e s u s e d for s e q u e n t i a l t r i a l s are slightly more involved than for a fixed trial, but an excellent monograph is available detailing the method. 9 E m e r g e n c y m e d i c i n e is u n i q u e l y concerned with critical interventions a i m e d at a l t e r i n g p a t i e n t s ' c o n d i t i o n s o v e r m i n u t e s to h o u r s , m a k i n g i t a n i d e a l a r e a for t h e a p p l i c a t i o n of seq u e n t i a l trials. O u r h o p e is t h a t clinical studies in emergency medicine will begin to use these techniques, thus setting an example for other a r e a s of c l i n i c a l r e s e a r c h a n d m i n i m i z i n g t h e n u m b e r of p a t i e n t s u n necessarily exposed to unproven treatments. The authors t h a n k John Whitehead, PhD, w h o provided m a n y insightful c o m m e n t s on this work. They also t h a n k Cladd E Stevens, MID, for information on the hepatitis B trial, and James T Niemann, MD, and Philip H e n n e m a n , MD, for helpful suggestions. REFERENCES 1. Armitage P: Statistical Methods in Medical Research. New York, John Wiley & Sons, 1971. 2. Colton T: Statistics in Medicine. Boston, Little, Brown & Co, 1974. 3. Altman DG: Statistics and ethics in medical research: tlI. How large a sample? Br Med J 1980;281:1336-1338.

4. Brown GW: Sample size. A m J Dis Child 1988;142:1213-1215. 5. Boag JW, Haybittle JL, Fowler JF, et al: The number of patients required in a clinical trial. Br J Radiol 1971;44:122-125. 6. Freiman JA, Chalmers TC, Smith H, et al: The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. N Engl J Med 1978;299:690-694. 7. Wald A: Sequential Analysis. New York, John Wiley & Sons, 1947. 8. Armitage P: Sequential Medical "I?ials. Oxford, Blackwell Scientific Publications, 1960. 9. Whitehead J: The Design and Analysis of Sequential Medical Trials. New York, John Wiley Annals of Emergency Medicine

& Sons, 1983. 10. Wetherill GB, Glazebrook KD: Sequential Methods in Statistics, ed 3. London, Chapman and Hall, 1986. 11. Anscombe FJ: Sequential medical trials. J Am Stat Soc 1963;58:365-383.

12. O'Brien PC, Shampo MA: Statistics for clinicians: 12. Sequential methods. Mayo Clin Proc 1981;56:753-754. 13. Armitage P: Sequential methods in clinical trials. Am J Public Health 1958;48:1,395-1402. 14. Whitehead J, Jones D: The analysis of sequential clinical trials. Biometrika 1979;66: 443-445. 15. O'Brien PC, Shampo MA: Statistical considerations for performing multiple tests in a single experiment: 3. Repeated measures over time. Mayo Clin Proc 1988;63:918-920. 16. O'Brien PC, Shampo MA: Statistical considerations for performing multiple tests in a single experiment: 6. Testing accumulating data repeatedly over time. Mayo Clin Proe 1988;63: 1245-1250. 17. Brown GW: Errors, types I and II. A m J Dis Child 1983;137:586-591. 18. Detsky AS, Sackett DL: When was a "negative" clinical trial big enough? Arch Intern Med 1985;145:709-712. 19. Makuch RW, Johnson MF: Some issues in the design and interpretation of "negative" clinical studies. Arch Intern Med 1986;146:986-989. 20. Brown CG, Kelen GD, Ashton JJ, et al: The beta error and sample size determination in clinical trials in emergency medicine. A n n Emerg Med 1987; 16:183-187. 21. Whitehead J: On the bias of maximum likelihood estimation following a sequential test. Biometrika 1986;73:573-581. 22. Whitehead J: Supplementary analysis at the conclusion of a sequential clinical trial. Biometrics 1986;42:461-471. 23. Szmuness W, Stevens CE, Harley EJ, et al: Hepatitis B vaccine: Demonstration of efficacy in a controlled clinical trial in a high-risk population in the United States. N Engl J Med 1980;303:833-841. 24. Bickell WH, Pepe PE, Bailey ML, et al: Randomized trial of pnemnatic antishock garments in the prehospital management of penetrating abdominal.trauma. Ann Emerg Med 1987;16: 653-658. 25. Press WH, Flannery BP, Teukolsky SA, et al: Numerical Recipes. Cambridge, UK, Cambridge University Press, 1986. 26. Whitehead J, Marek P: A FORTRAN program for the design and analysis of sequential clinical trials. C o m p u t B i o m e d Res 1985; 18:176-183. 27. Whitehead J, Brunier H: The PEST 2.0 Manual. Reading, UK, University of Reading, 1989. 28. Lewis HD, Davis JW, Archibald DG, et al: Protective effects of aspirin against acute myocardial infarction and death in men with unstable angina. N Engl J Med 1983;309:396-403. 29. Storb R, Deeg HJ, whitehead J, et al: Methotrexate and cyclosporin compared with cyclosporin alone for prophylaxis of acute graft versus host disease after marrow transplantation for leukemia. N Engl J Med 1986;314: 729-735. 19:9 September 1990

APPENDIX A The equation for the upper boundary is

z : a + 10rv (4) 4 and the equation for the lower boundary is z = -a where

(5)

4

2 in1 Or

+ 3_0rv

0.583~/-

(6)

20~

and o~ and 13 are assumed to be equal. The factor / is equal to the average increment in V after each patient. 9 The value of / for each set of simulations was obtained by first performing the simulation with / ~ 0 and

19:9 September 1990

measuring the average increment in V. The simulations were then repeated with the correct value of /, which ranged from 0.021 to 0.049. The constant / was set to 0 in the calculation of the b o u n d a r y lines shown (Figure 2).

and the percent efficacy, PE, is given by

APPENDIX B This article makes frequent use of both percentage efficacies and 0 as measures of treatment effect. Given that Pc and Pt are the probabilities of a successful outcome with the control and test treatments, respectively, then 0, also called the log-odds-ratio, is given by Equation 1. On the other hand, given a measured value of e from a sequential trial, it is often useful to express this as a percent efficacy. The value of Pt implied by the measured value of e may be obtained from Pc and 0 using

Using Pc and 0 in this way to obtain Pt is only valid if Pc is so well known that it may be regarded as "fixed." In the majority of situations, e is a better measure of treatment efficacy, and there is little reason to calculate explicit values for Pt. The estimate of e obtained at the termination of a sequential trial is unbiased (equally likely to fall on either side of the true value. 9,2~ One cannot obtain an unbiased estimate of treatment efficacy by simply using the observed values of Pc and Pt at trial termination.9,21,22

Annals of Emergency Medicine

Pce" Pt = pc(e e - 1) + 1

PE = 100 (PI - Pc) 1 - Pc

(7)

(8)

1053/151

Sequential clinical trials in emergency medicine.

In a traditional clinical trial, a fixed number of patients are evaluated before the data are analyzed. This has the disadvantage that more patients m...
700KB Sizes 0 Downloads 0 Views