STATISTICS IN MEDICINE, VOL. 10, 1647-1664 (1991)

METHODS FOR DOSE FINDING STUDIES IN CANCER CLINICAL TRIALS: A REVIEW AND RESULTS OF A MONTE CARL0 STUDY JOHN O’QUIGLEY AND SYLVIE CHEVRET D~+irrpc~niontde Biosrtrtisriyuc, i’t Infimiarique MPdicolc, Hripiial Saint-Louis,

I uoenue Cluude Vellefuux. 75010 Paris,

France

SUMMARY We discuss some of the statistical approaches to the design and analysis of phase I clinical trials in cancer. An attempt is made to identify the issues, particular to this type of trial, that should be addressed by an appropriate methodology. A brief review of schemes currently in use is provided together with our views of the extent to which any particular scheme addresses the main issues. Some simulations are provided together with graphical illustration of the operating characteristics of the particular methods. It appears that the continual reassessment method is preferable to other contending schemes.

1. INTRODUCTION In the development of a new therapeutic agent, phase I clinical trials are those studies where the drug is administered to humans for the first time. Their aim is to establish the range of safe dosage. This first administration in man must obviously be conducted as safely as possible. To maximize the level of safety, most phase I trials are similar in design and are based on three main features. First, the initial choice of the different dose levels to be tested takes into account previous knowledge about the potential toxicity of the new drug regimen, available from animal studies. Secondly, these trials are conducted in healthy volunteers. Thirdly, most of the trials are based on slow dose escalation. Even if it has not been demonstrated how these three features explicitly address the considerations mentioned above, they seem sufficient for many practical situations given the rarity of observed toxic effects.’ In life threatening diseases such as cancer, the situation is rather different. The efficacy of chemotherapy, or any cancer treatment, is, indeed, frequently associated with a non-negligible risk of severe toxic effects, often fatal, so that ethically, the initial administration of such drugs cannot be investigated in healthy volunteers, but only in cancer patients.2 Furthermore, these patients will often be at very high risk of death in the short term under all standard therapies, some of which may have already failed.3 A slow dose escalation, as in traditional schemes, may then put patients at increased risk of death from the disease. The main points we should bear in mind when carrying out a phase I clinical trial in cancer are then: 1. At low doses, the new therapy will probably be ineffective. 2. At high doses, any proposed new therapy may have severe toxic effects. 3. Little is known about the relationship between dose and probability of toxicity. 4. We need to reconcile the risks of toxicity to patients with the potential benefit to these same patients. 5. The design should be efficient and use no more patients than necessary.

0277-67 15/9 1/ 1 1 1647- 18$09.OO 0 1991 by John Wiley & Sons, Ltd.

1648

J. OQUIGLEY AND S. CHEVRET

Apart from the second, the usual designs for phase I studies do not explicitly address any of the above points. In addition, we have to keep in mind that the main objective of phase I trials in cancer is to It is unlikely estimate the appropriate dose range for possible efficacy with tolerable that we will be able to obtain much information on the overall dose-toxicity relationship and, so, a more limited but realistic goal ought to be the point estimation of the dose associated with a targeted probability of toxicity.6 Phase I trials sometimes attempt to reach the ‘median effective dose’ (MED), which is defined as the dose which elicits a toxic response in 50 per cent of treated patients7 More commonly, the targeted dose is less specific, and referred to the ‘maximal tolerated dose’ (MTD), as described in dose ranging studies in animals. The definition of the MTD varies with investigators,* some of whom define it to be the maximal dose that was administered during the trial without eliciting any t ~ x i c i t y .Others ~ define the MTD as the minimal administered dose associated with a toxic response.2 These definitions will be highly data dependent, in particular they are strongly dependent on sample size. Storer’ takes the MTD to mean some percentile of the dose-toxicity relationship, about the 33rd. More generally, it would seem desirable to estimate some arbitrary percentile chosen by the experimenter. To attain any of these targeted doses, several methods of carrying out and analysing phase I trials in cancer are used.6 Designs are usually based on slow dose escalation, employing various decision rules, The most commonly used ‘up-and-down’ designs involve establishing and assigning each patient’s dose on the basis of observations on the toxicity of the most recently entered patient(^).^ Statistical methods have been developed in an attempt to estimate the targeted dose. The ‘up-and-down’ methods’.’ recommend fitting a parametric model after the trial. Stochastic approximation designs usually use the last design point as the estimator,’- l 1 although parametric model fitting with such designs has also been proposed.12*13Indeed, the separation of the design and estimation procedures allows greater flexibility. Guidelines for the clinical evaluation of antineoplastic drugs edited by the FDA5 recommend that ‘statistical procedures may not be essential at this stage’, so that, in practice, targeted doses are mainly estimated empirically, for example the last dose level reached. The continual reassessment method (CRM), was proposed to estimate the dose associated with some ‘acceptable’ targeted toxicity level, and to address explicitly the five features of a cancer phase I trial enumerated above. l 4 Patients are included sequentially and, using Bayes formula, together with all the available information on observed toxicities and doses, we estimate the relative toxic probability associated with each dose level. Thereafter, each patient is treated at a level for which the estimated toxicity is closest to some fixed targeted level. In this paper, we aim to present and compare the different available methods for conducting and analysing phase I clinical trials in cancer. In Section 2 we review the schemes currently in use and the extent to which they address the five points above. In Section 3 we describe a simulation study to assess the relative small sample performance of CRM, as well as the five classical designs described in Section 2. Finaliy, as an illustration, and to obtain a more qualitative feel of such relative characteristics, we chose, from the simulations, an example of how each method might behave in a number of different situations. The range of situations covers the case in which the highest used level is less than the targeted toxicity level to the case in which the lowest used level is more toxic than the targeted level.

2 METHODS Typically, phase I studies proceed by administering various dose levels of the drug regimen to patient@)and subsequently observing their toxic response, most often characterized in terms of a simple dichotomy: for example, aplasia or not. We denote the dose levels by xi ( i = 1, . . . ,k),

DOSE FINDING STUDIES IN CANCER CLINICAL TRIALS

and let

1649

5 be a binary random variable (0, 1) where 1 indicates severe toxic response for the jth

( j = 1, . . . ,n) patient. We also use the notation x(j) for the dose level, xi( i = 1,.

. . ,k), at which thejth patient is treated, and xR for the recommended dose level at the end of the trial. For most designs, it is usual that xR = x(n + 1).

2.1. Simple up-and-down design This dose-escalation design was introduced by Dixon and Mood7 to estimate the MED. Several doses (xl,x2,.. . , xk), equally spaced, are chosen. The first patient is given the first level, x(1) = x l rand thereafter patients are treated according to the following decision rule; x ( j + 1)=

x( j ) + dI(x( j) c xk) if r j = 0, x( j ) - dI(x( j) > xl) if 5 = 1,

where d = xi+' - xi( i = 1,. . . ,k - 1) and I is the indicator variable taking the value 1 when true and 0 otherwise. Classically,' logarithmic units are used to achieve the usual dose-doubling scheme 1,2,4,8,16,.. . . For cases in which the x are qualitative, we can fix d = 1. Dixon and Mood7 recommended fitting a probit model after collecting the data, and using an approximation of the maximum likelihood estimator to determine the required percentile. However, in practice, the design is used without carrying out any model fitting: the stopping dose is used to estimate the MED. The principal advantage of the method is its simplicity and its great ease of application. However, several limitations can be pointed out immediately. First, it samples around the median whereas we may, in practice, wish to sample around a different percentile, more likely the 33rd or lower. Secondly, this design supposes that the tested doses are fixed in advance and equally spaced. Even if Dixon and Mood initially suggested that the distance d between two doses should be chosen approximately equal to the standard deviation of the dose-toxicity curve, its unavailability will lead investigators to choose d rather arbitrarily.16 Thirdly, Dixon and Mood recommended that this design be restricted to samples sizes of over 50 observations. Unfortunately, in most phase I trials in cancer, only a small number of patients are available, often no more than about twelve. Furthermore, several authors have already pointed out the poor small sample statistical properties of this method. Brownlee et a l l 7 showed that the method was effective in estimating the MED only if the starting dose, x(l), was chosen near to the MED, and if the interval, d, between two doses was small enough. This was later confirmed by Wetherill" who studied the asymptotic properties of the Dixon and Mood design, and proceeded by specifying a logistic model relating the toxic probability to the dose level, and then estimating the unknown parameters using the summary statistics provided by the observations. He showed that this estimate of the MED is biased and that the strength of the bias depends on two main factors: (i) the interval, d, between two doses- the greater the interval, the greater the bias - and (ii) the starting dose, x(l), so that even if the sample size is large, it is significantly biased when x(1) is not close to the MED. These properties are disturbing since it is difficult to choose an appropriate value for d, and we would prefer the starting dose to be less crucial since we are most interested in situations where our initial guess turns out to be wrong. To improve the operating characteristics of this method, several alternatives have been proposed,*. 1 5 * those described in the following section still being widely used in practice. 2.2. Grouped upanddown designs Wetherill" described various possible modifications of the above design. Most of his proposals were close to Dixon and Mood's scheme, except that the decision rule was based on the observed

1650

J. OQUIGLEY AND S. CHEVRET

response of two patients instead of one (grouped designs). When the two responses observed after administration of the dose level xi are similar (for example, two toxicities) the two following patients are treated according to Dixon and Mood’s rule (in this example, they receive the next lower level, xi - d). When the two responses differ (that is one toxicity and one no-response), then one more patient is treated at the same dose level, xi. However, Wetherill noticed that none of these schemes succeeded in improving the asymptotic properties of Dixon and Mood’s design and, for instance, there was still a loss of efficiency when the choice of the initial dose, x(l), was ‘incorrect’,with little precision as to what is really meant by ‘incorrect’. Bolognese’ compared three up-and-down dose-ranging designs, namely the classical design of Dixon and Mood,7 and two modified designs, both based on the possibility of repeated observations on a smaller number of patients. The underlying idea though, rather than leaning on a dose-toxicity model, however approximate, utilizes the notions of threshold and plateau doses. The threshold was defined as the lowest dose at which a subject will respond (that is the lowest dose level associated with a probability of toxic response > 0),and the plateau dose as the lowest dose to which all subjects will respond (that is the lowest dose level associated with a probability of 1). Estimation of the threshold and plateau doses was therefore obtained empirically, without any modelling of data, and based on the response of the first patient (in the first period). If no response was observed for this first patient, then the estimate of the threshold dose was given by the lowest dose at which a response was observed for the other patients, and the plateau dose by one dose higher than the highest dose given in the trial. Otherwise, the estimates were (i) the dose, one level higher, than the lowest dose given and (ii) the highest administered dose at which no response was observed, respectively. The subjectivity of the decision rules that are used to estimate the threshold and plateau doses limits the practical value of these designs. Moreover, a Monte Carlo comparison did not demonstrate improved efficiency of these designs compared with that of Dixon and Mood.’’ What is more, the latter, involving one observation on each of 30 subjects, was considered more efficient than designs involving three observations on each of 10 subjects. Finally, in our particular context, it will not, in general, be feasible to carry out more than one experiment on any patient. Recently, Starer* compared four grouped designs in the context of dose finding studies in cancer, all of which aim to estimate the MTD, which, in most cases, was taken as the dose associated with a toxic probability of 33 per cent. Using the same terminology as Storer, these designs can be described as follows: 1 . Design A consists of including a group of three patients at the lowest dose level. If no toxicity

occurs, another group of three patients is given the next highest dose level. Otherwise, an additional three are treated at the same dose level. If fewer than two toxicities are observed among the six, then the dose is escalated to the next highest level; otherwise, the trial is terminated. The actual stopping dose level is the estimate of the MTD. 2. Design B is the up-and-down scheme as described above.7 3. Design C is similar to design B, except that two consecutive non-toxic responses must be observed before escalation occurs: This is summarized as follows; treat the first two patients at level x 1 and next, for j = 3,. . . ,n; x(j) =

{

x(j- l)+dl(x(jx ( j - 1)-dl(x(j-

l)xl)

if Y j - ’ + Y j - 2 =0, if Yj-l = 1.

4. Design D includes groups of three patient at each dose level. The trial starts at xl, and next,

1651

DOSE FINDING STUDIES IN CANCER CLINICAL TRIALS

for j

=

3,. . . , n - 2;

otherwise. All of the designs were implemented with equally spaced dose levels fixed in advance, and with fixed sample sizes. To obtain an estimation of the MTD, Storer proposed fitting the observations on dose-toxicity using a logistic model, the parameters being estimated by maximum likelihood. Although this is clearly a sensible approach to estimation once the trial is completed, its use in the course of the trial would have no influence on subsequent dose allocation. As with the simple up-and-down design, the grouped up-and-down designs have the advantage of simplicity and ease of application. However, the disadvantages of the simple up-and-down design are only partially overcome. The designs tend to sample around the 33rd percentile rather than the median. Design A establishes a stopping rule that might appear attractive to investigators, given legitimate concern to minimize the number of included patients. However, Storer' and OQuigley et a l l 4 showed this design to be inconsistent, so that we did not retain it for further study. Finally, Storer suggested using a combination of Dixon and Mood's design with design C or D, that is, start out using the simple up-and-down design (design B), and as soon as some toxicity is observed switch to a grouped up-and-down design (either C or D). Storer showed that these two-stage designs performed better than the single-stage designs, and could be of some help when the initial dose is far below the aimed toxicity level. In summary, all the variants of the initial up-and-down design established by Dixon and Mood failed in producing a clear cut improvement over the properties of the initial design; estimation is usually biased or inconsistent.

2.3. Stochastic approximation methods A class of sequential designs, developed by Robbins and Monro" for estimating the root of an unknown regression function, can be adapted for use in phase I trials. The methods are based on stochastic approximation, and were first proposed to estimate the dose x * associated with some probability, 8, of an observable output, which is, in our context, toxic response. The data consist of a series of binary responses Yj ( j = 1 , . . . , n), each of which are taken at dose levels x ( j ) such that: x(j

+ 1) = x ( j )

-

aj( Yj - O),

(4)

where a j is a sequence of positive numbers converging to 0. The above process establishes a decision rule which could be used for phase I trials, and which is based on previous observations as are the up-and-down schemes. The constants aj are chosen so that successive changes in level become smaller and observations converge to the true value x * . After n patients have been included, the recommended dose is usually taken to be the last design point, that is, x K = x ( n + 1). By contrast to the up-and-down designs, the Robbins and Monro method' takes into account the total accumulated information before allocating a dose level to the most recently included patient. Moreover, as opposed to up-and-down schemes, the decision rule, as expressed in

'

1652

J. OQUIGLEY AND S. CHEVRET

equation (4), also incorporates the targeted probability, 8, which the experimenter can fix in view of the particular application. This cannot be done with the Dixon and Mood design. Although a little more involved than the designs of the previous two sections, iterative calculation is, none the less, fairly straightforward. Several points are worth noting. The principal difficulty with the Robbins-Monro scheme is in the specification of the sequence a j . Classical stochastic approximation methods use aj = c j - ' , where c is a constant, minimizing the large sample variance of xR.I9Assuming linearity of the dose-toxicity relationship in the neighbourhood of x*, Hodges and Lehman,20 Sacks,2 and Wetherill" recommended as a choice for c, the inverse of the slope, 8, of the dose-toxicity curve, that is c = j-'= { a$(~)/ax};=!.~ where $(x) denotes the probability of toxicity at dose x. However, the latter is not known, so that it is usually impossible to place our observations to get the exact optima in this way. To overcome the problems of determination of the constant a j , Anbarse9suggested an adaptive where the slope p is estimated sequentially via design in which aj = p,'

',

'{x(i)

-i j }

{ yi - j i }

and where

Since, in the region of x*, pj will converge to the true slope, then aj converges to the value minimizing large sample variance. Unfortunately the method is very unstable in the early stages of the design. An approach to overcome this instability has been suggested by Wu" although, as with Anbar's scheme, it is necessary to observe some toxicity and a change in dose levels before being able to apply it. Small sample properties of the Robbins and Monro method were studied by Wetherill.'* He showed that the loss of efficiency was related to the lack of linearity of the tolerance curve in the neighbourhood of x * , and that the strength of the bias was related to the values of x(1) and c. Adopting a Monte Carlo method, we saw that, if the initial guess x( 1) was wrong, that is far from x*, and the constant c too small, then the bias in the estimator was quite large even for large sample sizes. Although the stochastic approximation method was developed for use with a continuous range of doses, their unavailability in practice means we will work with a small number of discrete levels for which the method is not ideally suited. Here, in order to compare the stochastic approximation approach with other designs (see Section 3), we proceeded as follows. First, we used a discrete panel of doses. The decision rule was modified in that we computed the exact value of the current recommended dose given by equation (4), as though we were working with a continuous range of values of xi. Then, instead of taking as design point the value obtained from this calculation, we took the closest point to it from our limited set of discrete values. 2.4. Continual reassessment method

The continual reassessment method (CRM) has been described p r e v i ~ u s l y 'and ~ will be summarized here only briefly. The main idea consists in sequentially including patients and then, after each observation, using Bayes formula to estimate the dose level x * associated with a previously fixed targeted toxic probability, 8.

DOSE FINDING STUDIES IN CANCER CLINICAL TRIALS

1653

By contrast with the up-and-down methods, any targeted toxic probability, 0, could be chosen (for example, 0 = 0 2 5 ) . Furthermore, CRM does not rely on the scale in which the doses are expressed, so that, for instance, unequally spaced doses and different therapies in combination can be chosen for experimentation. Schematically, the dose levels to be tested by CRM consist of k doses associated with some theoretical levels of toxicity, the only constraint being that these doses are increasingly ranged in terms of prior best guesses of the toxic probabilities. These levels are chosen initially by the investigator, through an implicit idea of the dose-toxicity relationship, however imprecise. For instance, investigators may chose six dose levels, xl, x2, x3, x4, x 5 , x6, that they associate with prior probabilities of toxicity, for example p1 = 0.05, p z = 0.10, p 3 = 0.25, p4 = 0.35, p s = 0.50 and p 6 = 0.70, respectively. This differs from the stochastic approximation designs where, at least theoretically, the doses to be administered cannot be defined before experimentation. A simple one-parameter model, $,,(x), modelling the dose-toxicity relationship, is chosen. Our uncertainty in the nature of the true dose-toxicity relationship is expressed via a prior density, g(a) for a. The one parameter model must be sufficiently flexible to approximate the true dose-toxicity curve in the neighbourhood of the targeted toxicity value, even if it performs poorly at values distant from this. Many choices are possible. In our previous work, we used: $ J x i ) = { (tanh xi

+ 1)/2}"

whereas here we use $.(xi)

=

+

exp(ao axi) 1 exp(ao axi)

+

+

(7)

with a, fixed, the choice of a one parameter logistic model being made to facilitate comparison with the other models. In both cases, we took the initial prior g(a) = exp( - a). We use this model and our best prior guesses to redefine the original units (possibly multidimensional) in which the doses are expressed by: xi = $ i l ( p i ) ,

i = 1,. . . , k.

(8)

Subsequently, we work with these new units. The main characteristic of CRM is to mix the design and estimation procedures of the phase I trial. Experimentation begins at x, (1 c s < k), the level at which the corresponding prior estimate of toxic probability is close to 0. For instance, considering the above example, the first patient will be administered the third dose of the panel, that is, x3. This means administering this dose to the first patient ( j = l), and then observing his response to treatment in terms of severe toxicity. Let Rj denote the set {x(i), y,; i = 1 , . . . ,j - l} and f ( a , Rj) a non-negative function summarizing accumulated information on the parameter a, be our current prior before experimenting on the jth subject. Given the observed response of this subject, we can calculate f ( a , Rj+ from f ( a , Rj) via Bayes formula. Furthermore this sequential scheme enables continuous re-evaluation of our estimates of toxic probabilities, denoted Bij, at the various dose levels where Bij =

I

$ J x i ) f ( u , Rj)da,

i = 1, . , . , k.

(9)

We allocate the jth patient to treatment level xi such that Oij is as close as possible to the targeted level. This process is continued until the last patient is entered. The stopping dose, that is the recommended dose computed after n inclusions, estimates x *. Point and interval estimation at the recommended dose has been investigated22 but is not described here.

1654

J. O'QUIGLEY AND S . CHEVRET

3. OPERATING CHARACTERISTICS To assess the relative performance of the continual reassessment method compared with the different approaches presented above, we carried out a simulation study involving several sampling schemes.

3.1. Simulation set-up Six estimators, denoted a, b, c, d, e, and f, of the targeted dose level x* were compared in this simulation study. They correspond to six different methods of conducting and analysing phase I clinical trials in cancer, as described below. The first design (a) refers to the simple 'up-and-down' method initially described by Dixon and Mood7 (see Section 2.1). The second (b) and third (c) schemes correspond to two grouped 'up-and-down' designs proposed by Storer,' noted above as schemes C and D, respectively (see Section 2.2). Designs (d) and (e) are based on the Robbins-Monro process" previously expressed in relation (4), with aj = cj-' where c was defined either as a constant, c = 1 (design (d)), or as the inverse of the slope in the neighbourhood of x*, which was estimated through the use of the adaptive design described by Anbar,g*'o(design (e)) (see Section 2.3). The last ( f ) is the estimator given by CRMI4 (see Section 2.4). The constant, a,, of the one-parameter logistic model (see formula (7)) was chosen to obtain a vague prior, the a priori Bayesian 95 per cent confidence intervals of each pi almost covering the whole range (0, 1 ) . In all of these six designs, we estimated x * to be the recommended dose level after n inclusions, that is, the dose xR = x(n 1). This corresponds to current practice,'.'* even if the theoretical possibility of separate estimation via a parametric model, once the trial is completed, exists. The sample size used was n = 25. Larger sample sizes were not considered here. We simulated experiments with three targeted probabilities of toxic response. The choice of 8 = 0.50 corresponds to conditions in which the simple up-and-down scheme was initially built to estimate the MED.7 Storer' described estimators of the MTD which he defined as the 33rd percentile of the dose-toxicity relationship, so that the second chosen value of 8 was 0.33. Finally, we choose a lower value, 8 = 0 2 5 , which is more plausible for many phase I trials in cancer. Six dose levels associated with various theoretical levels of toxic response probability were used. For CRM, we assumed the prior estimates of the toxic probabilities to be respectively p1 = 0.05, p 2 = 010, p 3 = 0.25, p4 = 035, p 5 = 0.50, and p s = 0.70. Six general situations were studied. In the first, the model generating the data was the same as that of the prior estimates. In others the true toxic probabilities were either higher (situations 2,3,4) or lower (situations 5, 6) than the prior estimates. In each case, 500 replications were made, in order to estimate the first decimal place with sufficient accuracy. The logistic model (see formula 7) gives (i) a scale in which doses are expressed on the basis of the six prior estimates, for CRM, as well as (ii) a scale x i for the other methods. In summary, we computed six estimates for n = 25,8 = 0.50,033, and 0.25, in the six situations defined above. The main goal in each case was to estimate the dose level associated with the targeted toxic probability, 8. The comparisons of the various estimators were based on several criteria. For each case studied, we computed the overall observed toxicities, the percentage of experimentation at each dose and the percentage of recommendation for each dose. Finally, we look at a single simulated phase I trial in which the recommended dose was studied as a function of the number of included patients.

+

DOSE FINDING STUDIES IN CANCER CLINICAL TRIALS

1655

3.2. Results We first used the targeted probability of 030, to simulate the situations where phase I trials aim to estimate the 50th percentile (MED) of the dose-toxicity relationship. CRM was compared with the other designs which are used for this purpose, that is the Dixon and Mood scheme and the Robbins-Monro methods. Table I shows how these four methods, namely, a, d, e, and f, performed under the six general situations described above. In the first situation (four first rows of table I), the model generating the data being the same as that of our prior estimates, CRM clearly performed better than the other three in terms of dose level recommended at the end of the trial. Indeed, the correct level was recommended 79 per cent of the time with CRM, as compared to 30 per cent with Dixon and Mood’s design and 41 per cent with those based on stochastic approximation. Furthermore, this improvement of CRM was not associated with any marked increase in observed toxicities. When the prior estimates do not differ markedly from the true probabilities (situation 2), CRM yielded the highest percentage of correctly recommended level (71 per cent), in sharp contrast to the classical design (27 per cent with a and 29 per cent with d). For the remaining four situations covered in Table I, in which the model generating the data differs, markedly, from our prior estimates, the main point that emerges is that CRM and the Anbar’s adaptive design clearly outperform the other designs, that is, Dixon and Mood’s and non-adaptive Robbins-Monro’s schemes. As regards CRM and Anbar’s design, although neither of the designs were able to demonstrate a clear-cut improvement in all situations in which the targeted probability was 0.50, CRM on average was to be preferred. When the prior probabilities were rather underestimated (situations 3 and 4), performances of the classical designs improved, although were still no better than CRM or Anbar’s adaptive designs. When the prior probabilities were overestimated (situations 5 and 6), CRM and Anbar’s schedule yielded similar results. For the second set of simulations, the targeted probability was chosen to be 0.33, a more realistic figure for phase I dose finding studies in cancer. Five designs were thus compared, those described by Storer, together with the Robbins-Monro’s designs and CRM. Table I1 displays the main findings of these simulations. Two points are immediately apparent: (i) the consistently poor performance of the up-and-down designs, and (ii) the overall good performace of CRM. When the prior estimates were close to those generating the data (situations 1 and 2), CRM was better in identifying the correct dose level, which was recommended more than 60 per cent of the time, as compared to a range of 16-45 per cent with the remaining four designs. Furthermore, CRM was more conservative than the traditional designs, often misguidedly believed to be conservative given that they start out at the lowest dose. In situation 2 for instance, the two up-and-down designs described by Storer ended up recommending the highly toxic levels, levels 5 or 6, in over 16 per cent of runs, while for CRM the corresponding percentage was 4 per cent. In the third situation of Table 11, CRM led to the best performances; the correct dose level was recommended with CRM 59 per cent of the time, as compared to a range of 25 per cent to 43 per cent with all the other methods. In the fourth situation of Table 11, the targeted level is surrounded by two dose levels that could be considered as ‘correct’, namely level 1 (0.30)and level 2 (0.40).The methods based on stochastic approximation did poorly here, favouring more than 38 per cent of the time levels associated with a probability of 052 or more. By contrast, CRM and the grouped up-and-down designs performed well, although on average CRM was more conservative. In the remaining situations in which the targeted toxicity was 0.33, for CRM and Anbar’s scheme, results were comparable.

a d e f

f

e

a d

e f

a d

f

a d e

e f

a d

e f

a d

Methods

0.10

0.00

0.00

0.30

0.22

0.00

0.01

0.40

0.32

0.61

0.54

0.38

0.35

0-76

0.69

0.57

0.50

0.03

0.05

0.06

0-04 0-09 0.24

0.52

0.45

0.27

0.25

Probabilities of toxicity at x, 2 3 4 5

0.09 0 1 6

0.05

1

0.22

0.49

0.87

0.80

0.75

0.70

6

0 2 0

0

13 0 5 23 0 0 2 0

2 2 7 0 0 11

1 0

2

0 0 0 0

24 22 13 18 0 0 1 0

6 2 4 3 18 11 6 13

4 2

1

1

5

1

4 5 4 4

30 55 54 29 5 5 5 5

3 0

11

5

1

21 21 22 14 13 14 1

29 47 27 19 25 30 32 19

20 29 25 13 29 55 57 30

11

27 44 21 12

17 23 22

19 4R 11 2 9

10 2 3 12 34 73 14 19

28 23 31 50 15 4 4 22

30 32 33 51

Experimentation at xi (%) 2 3 4 5

0

1

71 36 79 3

3 0 3 4 48 7 77 75

15 0 11 13 6 0 0 6

20 0 18 22

6

0 0 2 0

11 0 5 15 0 0 0 0

1 0 0 0 4 0 0 4

0

1

0 0

1

0 0 0 0

28 22 15 16 0 0 0 0

7 0 1 0 24 7 5 6

0 0

0

4

0 0

0 0

18 59 54 47 1 0 0 0

8 18 19 6 13 57 57 46

9 12 16 2

0

0

0

1

1

0

31 19 22 I2 10 2

36 53 29 18 37 33 34 19

30 47 20 9

0

8

9

5 0 1 0 54 23 1 86

22 0 10 5 9 0 0 0

10

27 0 18

6

80 85 9 8 0 1 0 0

19 14

14

7 1 2 9 34 74

27 29 40 71 13 3 4 26

30 41 45 79

Recommendation at x i (%) 2 3 4 5

17 11 18 21

52 52 53 50 33 23 42 42

45 39 44 49 49 48 48 50

43 37 46 48

Observed toxicity (%)

Table I. Comparison of four sequential methods to estimate the targeted probability B = 0.50. with n = 25 included patients: a Dixon and Mood; d Robbins-Monro (c = 1.00); e Robbins-Monro (Anbar); f CRM (exponential prior, a. = 5)

4

m

P

Methods for dose finding studies in cancer clinical trials: a review and results of a Monte Carlo study.

We discuss some of the statistical approaches to the design and analysis of phase I clinical trials in cancer. An attempt is made to identify the issu...
1MB Sizes 0 Downloads 0 Views