Biostatistics (2016), 17, 2, pp. 249–263 doi:10.1093/biostatistics/kxv040 Advance Access publication on October 20, 2015

A robust Bayesian dose-finding design for phase I/II clinical trials SUYU LIU∗ Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA [email protected] VALEN E. JOHNSON Department of Statistics, Texas A&M University, College Station, TX, USA SUMMARY We propose a Bayesian phase I/II dose-finding trial design that simultaneously accounts for toxicity and efficacy. We model the toxicity and efficacy of investigational doses using a flexible Bayesian dynamic model, which borrows information across doses without imposing stringent parametric assumptions on the shape of the dose–toxicity and dose–efficacy curves. An intuitive utility function that reflects the desirability trade-offs between efficacy and toxicity is used to guide the dose assignment and selection. We also discuss the extension of this design to handle delayed toxicity and efficacy. We conduct extensive simulation studies to examine the operating characteristics of the proposed method under various practical scenarios. The results show that the proposed design possesses good operating characteristics and is robust to the shape of the dose–toxicity and dose–efficacy curves. Keywords: Adaptive design; Bayesian method; Phase I/II trial; Trade-off; Utility.

1. INTRODUCTION Testing to evaluate the toxicity and efficacy of novel drug treatments is normally conducted in two separate trials. In the first trial, the phase I trial, several doses of a drug are tested to determine the toxicity profile of the drug and to establish its maximum tolerated dose (MTD). In the second trial, the phase II trial, the efficacy of the drug at the MTD is evaluated by administering it to a small cohort of patients. If the phase II trial is successful, a larger, confirmatory phase III trial may be conducted. There are several disadvantages of conducting separate trials to evaluate efficacy and toxicity. First, the operating characteristics of the actual phase I/II process are not known because the phase II trials do not account for the uncertainty in the MTD established in the phase I trials. This unreliable identification of the MTD occurs because the sample size in a phase I trial is typically small. Consequently, dose adjustments are often needed in phase II if excessive toxicity is encountered. Such dose adjustments may invalidate the planned operating characteristics of phase II trials. ∗ To

whom correspondence should be addressed. c The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]. 

250

S. LIU

AND

V. E. JOHNSON

In this article, we describe a single phase I/II trial design to find the dose of an experimental drug that maximizes a utility function. The utility function is defined as a simple trade-off between toxicity and efficacy. The proposed design is adaptive in the sense that each cohort of patients is more likely to be assigned to the dose of the drug that is currently estimated to have the highest utility. This feature of our trial design has important ethical implications, as well as a potentially positive influence on patient recruitment and accrual rates. Independent, default prior densities are used to provide vague descriptions of the expected toxicity and efficacy of each drug dose prior to the onset of the trial. Importantly, these prior densities are monotonic in both toxicity and efficacy, reflecting the fact that both quantities typically increase with dose, particularly in tests of cytotoxic drugs. Aside from monotonicity constraints, however, we impose only vague constraints on the shape of the dose–toxicity and dose–efficacy curves, thus providing design flexibility in allowing the data to influence these relationships. Our design is not suitable for trials involving biological agents that have non-monotonic dose–toxicity and dose–efficacy curves. For design of such trials, see, for example, Hunsberger and others (2005) and Mandrekar and others (2007). Phase I/II designs to address similar issues have been published in the literature. O’Quigley and Zohar (2006) provided a comprehensive review of earlier phase I/II designs. For example, Gooley and others (1994) proposed a phase I/II clinical trial design for a bone marrow transplantation study. That design sought a dose that balanced the risks of two immunologic complications. A few years later, Thall and Russell (1998) proposed a phase I/II design to find a dose that would satisfy both safety and efficacy requirements based on a trinary outcome. Braun (2002) extended the continual reassessment method (CRM) (O’Quigley and others, 1990) to account for both toxicity and efficacy. Thall and Cook (2004) proposed a Bayesian phase I/II trial design based on trade-offs between efficacy and toxicity probabilities. Bekele and Shen (2005) investigated a joint distribution of a binary and a continuous outcome by introducing latent variables in a probit model. Yin and others (2006) developed a phase I/II Bayesian dose-finding design that uses toxicity and efficacy odds ratios. Recently, more phase I/II designs have been developed for various clinical settings, including drug-combination trials (Mandrekar and others, 2010; Whitehead and others, 2011; Yuan and Yin, 2011a, 2011b; Cai and others, 2014; Wages and Conaway, 2014), immunotherapy trials (Messer and others, 2010), dose-schedule finding (Thall and others, 2013; Guo and others, 2015), and ordinal outcomes (Houede and others, 2010). A potential issue of many of these phase I/II trial designs is that they assume parametric models for the dose–toxicity and dose–efficacy curves (e.g. logistic model). If the assumed models for dose–toxicity and/or dose–efficacy are mis-specified, the performance of these designs may be severely compromised. Our design is more robust to this issue because it does not assume a parametric model for the shape of the dose–toxicity and dose–efficacy curves. Our approach is non-parametric for dose–toxicity and dose–efficacy curves. Though it is similar in spirit to the curve-free method proposed by Gasparini and Eisele (2000), there are several important differences. First, as we will describe later, our design utilizes (weakly informative) subjective priors elicited from physicians, rather than non-informative priors, and thus avoids the rigidity problem of the curve-free method (Cheung, 2002). Secondly, the dose assignment under our method is based on a trade-off (utility) between toxicity and efficacy, which often is not monotonically increasing with dose. This makes dose finding more challenging. Our research is motivated by a phase I/II trial in patients with castrate-resistant prostate cancer. Patients enrolled in the trial experienced progression within 3 months prior to entering the study, and had no standard treatment options available to them. Patients took study medication in cycles of 28 days. The primary objective of the trial was to find a safe and effective dose of the study agent for subsequent clinical investigation. A total of 5 dose levels were investigated, with a maximum sample size of 48 patients treated in cohorts of size 3. A difficulty encountered in the design of this trial was that there was very limited prior information on the shape of dose-toxicity and dose–efficacy curves. This made the specification of a parametric form to model the dose–toxicity and dose–efficacy curves difficult, and implied that model

A robust Bayesian dose-finding design for phase I/II clinical trials

251

mis-specification could potentially affect the operating characteristics of the trial. For these reasons, it was desirable to use a trial design that did not impose a parametric form on these curves. The rest of the paper is organized as follows. In Section 2, we propose a Bayesian dynamic model for toxicity and efficacy, define a utility function as a trade-off between toxicity and efficacy, and describe the dose-finding algorithm and decision rules for the proposed trial design based on this utility function. In Section 3, we extend the proposed design for delayed outcomes. In Section 4, we present simulation studies to examine the operating characteristics of this design for a prostate cancer trial. We conclude with a brief discussion of our results in Section 5. The program used to simulate the design can be obtained from http://www.stat.tamu.edu/∼vjohnson/. 2. METHODS 2.1 Probability model We let yi T and yi E denote the binary toxicity and efficacy outcomes for the ith patient, with a value of 1 denoting that the treatment is toxic or efficacious. Let j = 1, . . . , J denote the J dose levels under investigation, and let pT j and p E j denote the toxicity and efficacy probabilities at dose level j. We assume that the pT j ’s and p E j ’s monotonically increase with the dose, which is generally true for cytotoxic agents. We model the distributions of yT i , y Ei , pT j , and p E j using a Bayesian dynamic model as follows: yki |d = j ∼ Bern( pk j ),

k ∈ {T, E}

pk j = pk, j−1 + (1 − pk, j−1 )βk j ,

j = 2, . . . , J

pk1 = βk1 βk j ∼ Beta(ak j , bk j ),

(2.1)

j = 1, . . . , J.

Here Bern(·) denotes a Bernoulli distribution, and Beta(ak j , bk j ) is a beta distribution with parameters (ak j , bk j ). The specification of hyperparameters ak j and bk j is described later. In this model, the toxicity and efficacy probabilities pk j follow a Markov structure: the toxicity (or efficacy) probability of the current dose (i.e. pk j ) equals the toxicity (or efficacy) probability of the previous dose (i.e. pk, j−1 ) plus a positive random variate generated from a scaled beta distribution. As a result, toxicity and efficacy probabilities monotonically increase with the dose levels, i.e. pk1 < pk2 < · · · < pk J . As an aside, we note that the distribution of y E and yT could be modeled jointly. For example, if πa,b = Pr(Y E = a, YT = b | d) for a, b ∈ {0, 1} denotes the joint distribution of y E and yT given a dose d, we could model the joint distribution of y E and yT using a Gumbel distribution (Murtaugh and Fisher, 1990) of the form  γ  e −1 πa,b = ( p E )a (1 − p E )1−a ( pT )b (1 − pT )1−b + (−1)a+b p E (1 − p E ) pT (1 − pT ) γ , (2.2) e +1 where γ parameterizes the association between y E and yT . In theory, such joint modeling of y E and y E could improve the efficiency of the estimates of p E and pT . However, our empirical experience suggests that such an approach does not improve the performance of trial designs when the number of subjects available at each dose is small to moderate (see Section 4.3). Returning to the dynamic model (2.1), we note that pk j can be re-expressed as pk j = 1 − (1 − pk, j−1 )(1 − βk j ), which implies that pk j = 1 −

j  (1 − βkr ), r =1

j = 1, . . . , J.

(2.3)

S. LIU

252

AND

V. E. JOHNSON

Given that n patients have been treated in the trial, the marginal likelihood for the toxicity and efficacy data y = {yki j }, k = T or E, is thus L( y | β T , β E ) =

n  



j[i]  (1 − βkr ) 1−

k={E,T } i=1

r =1

 yki j

j[i]  (1 − βkr )1−yki j ,

(2.4)

r =1

where β T = (βT 1 , . . . , βT J ) and β E = (β E1 , . . . , β E J ). If f (β T ) and f (β E ) denote the prior distribution of β T and β E , then it follows that the posterior distribution on β T and β E can be expressed as f (β T , β E | y) ∝ f (β T ) f (β E )L( y | β T , β E ).

(2.5)

2.2 Subjective prior specification We propose to determine the values of the hyperparameters ak j and bk j that appear in the prior distributions (1) based on the prior estimates of toxicity and efficacy probabilities elicited from physicians. Let p˜ k j denote the prior estimates of the toxicity and efficacy probabilities at dose level j for k = {T, E} and j = 1, . . . , J . Based on equation (2.3), we solve for βk j as a function of pk j to obtain βk j =

pk j − pk, j−1 , 1 − pk, j−1

j = 1, . . . , J,

where pk0 ≡ 0. To fix the values of {ak j } and {bk j } subjectively, we equate the prior mean of βk j , ak j /(ak j + bk j ), to ( p˜ k j − p˜ k, j−1 )/(1 − p˜ k, j−1 ), for some prior estimates of the probabilities p˜ k j , and impose one additional constraint on the values of ak j and bk j so as to achieve identifiability. Such a constraint can be obtained by fixing the value of ak j + bk j to equal a constant m, where m is interpreted approximately as the “effective sample size” contained in the prior. With this constraint, the hyperparameters ak j and bk j are fixed at the following values: ak j = m

p˜ k j − p˜ k, j−1 , 1 − p˜ k, j−1

1 − p˜ k j bk j = m . 1 − p˜ k, j−1

(2.6)

Weakly informative priors can be obtained by setting m at a small positive value, e.g. m = 1 (i.e. effective prior sample size of 1 patient). 2.3 Utility function To account for the trade-off between efficacy and toxicity, we define the utility of a dose according to the following function, U ( p E , pT ) = p E − w1 pT − w2 pT I ( pT > φT ), (2.7) where I (·) is the indicator function, w1 and w2 are non-negative weights, and φT is an upper toxicity threshold prespecified by clinicians. A larger value of U ( p E , pT ) represents a higher level of utility. In this function, we include two components for toxicity. The first two terms, p E − w1 pT , specifies the trade-off between toxicity and efficacy before toxicity reaches the threshold φT ; w1 can be interpreted as the number of units of efficacy that patients are willing to trade for one unit of decrease in toxicity. This penalty is used to account for the fact that even low probabilities of toxicity—lower even than φT —are undesirable. For

A robust Bayesian dose-finding design for phase I/II clinical trials

253

example, suppose φT = 0.3 and there are two doses for which ( p E , pT ) = (0.48, 0.05) and (0.50, 0.25). It is then possible that patients might prefer the first dose because it has much lower toxicity with little sacrifice of efficacy. The utility of that dose might then be higher. The second penalty, w2 pT I ( pT > φT ), is used to incorporate the practical consideration that a higher penalty on toxicity is needed if it exceeds a generally accepted threshold (i.e. φT = 0.3). A large value of w2 indicates a strong preference for choosing doses with toxicity probabilities below φT . When w2 is very large, we essentially exclude the doses with toxicity probabilities higher than φT . We note the obvious fact that our definition of utility is not unique; we have chosen this form of utility because it is intuitive and easy to explain to clinicians and patients. Our designs can be adapted to more sophisticated utility functions in a straightforward way. To implement the proposed utility function, we elicit the values of w1 and w2 directly from patients or physicians. Alternatively, we can elicit three pairs of toxicity and efficacy rates that are clinically equally desirable, say ( p ∗Ei , pT∗ i ), i = 1, . . . , 3, with pT∗ 1 < φT and pT∗ 3 > φT , and then determine the value of w1 and w2 by solving two equations U1∗ = U2∗ and U1∗ = U3∗ , where Ui∗ ≡ U ( p ∗Ei , pT∗ i ). In the event that patients or physicians provide more than three equally desirable efficacy–toxicity pairs, we can determine the values of w1 and w2 by minimizing the variation of the Ui∗ ’s according to (w1 , w2 ) = argmin

I 

(w1 ,w2 ) i=1

 Ui∗



I 

2 Ui∗ /I

.

(2.8)

i=1

2.4 Dose-finding algorithm Given the utility function and assumptions described in the preceding sections, our dose-finding algorithm can be described as follows: (1) Treat the first cohort at the starting dose specified by the physician. (2) Based on observed data y, randomize the next cohort of patients to dose j ∗ , where dose j ∗ is either the dose with the highest utility or one of the neighboring doses (i.e. the next higher or lower doses). The randomization probability between these doses is taken to be proportional to the posterior probability of each dose having the highest utility, i.e. pr(U j ∗ = Umax | y) where Umax = max(Ui : i = 1, . . . , J ). Note that this algorithm implies that a dose cannot be chosen unless a previous cohort of patients has been treated at the next lower dose. (3) When the maximum sample size is reached, choose the dose that has the highest utility, i.e. pr{U j = Umax | D). In practice, for patient’s safety and benefit, we impose some stoping rules to terminate a clinical trial early if there is an evidence of either unexpectedly high toxicity or unexpectedly low efficacy. One possible stopping rule can be described as follows. Let φ E denote a lower threshold for efficacy and φT denote a upper threshold for toxicity. We propose to terminate a trial early if none of the doses satisfies the following inequalities: pr( p E j > φ E | D) > C E

(2.9)

pr( pT j < φT | D) > C T ,

(2.10)

and

where C E and C T are pre-specified thresholds, and j = 1, . . . , J . Appropriate threshold values can be determined through simulation experiments so as to obtain thresholds that yield acceptable safety characteristics.

S. LIU

254

AND

V. E. JOHNSON

3. DELAYED OUTCOMES Like most phase I/II designs, the proposed design relies on the observation of patient outcomes to modify the dose assignment rules for new patients. In many situations, however, toxicity and efficacy outcomes may be “delayed” in the sense that they cannot be evaluated immediately. Such delayed outcomes cause logistical difficulties in implementing the proposed design because it is undesirable, and often impossible, to delay a new patient’s treatment while awaiting the outcomes of patients who have already been treated in the trial. In addition, making inferences and dose assignment decisions based solely on currently observed data can lead to biases (Yuan and Yin, 2011a, 2011b). To address this issue, we adopt the approach of Cheung and Chappell (2000), in which weights are assigned to the outcomes of patients whose final outcome status has not been determined. Let TE and TT denote the assessment periods needed to fully evaluate patient toxicity and efficacy. If efficacy or toxicity occurs during the corresponding assessment period, yk = 1; otherwise yk = 0 for k ∈ {E, T }. Suppose that at an interim decision time in the trial (e.g. the moment when a new patient is accrued and ready for dose assignment), the ith patient has been followed for a period of time xki , where xki  Tk , and let yki∗ indicate whether that patient has experienced toxicity or efficacy at that time. We assume that pr(yki∗ = 1) = pr(tki  xki ) = pr(tki  xki | tki  Tk ) pr(tki  Tk ) ≡ wki pk, j[i] where pk, j[i] is given by equation (2.3). Therefore, the marginal likelihood for the toxicity or efficacy data y∗k = {yki∗ } observed at that interim decision time is given by L( y∗k

| βk) =

n 





(wki pk, j[i] ) yki (1 − wki pk, j[i] )1−yki

k ∈ {E, T }.

i=1

Following Cheung and Chappell (2000), we set weights wki = xki /Tk so that the closer a patient is to having a completed assessment, the larger the weight we assign to information collected from that patient. For patients who have completed the assessment (i.e. xki = Tk ), we claim full information and take wki = 1. Ideally, the weight wki should be chosen based on the joint distribution of times to toxicity and efficacy. However, due to the small sample size of early phase trials, the use of joint model does not necessarily improve the performance of the design (Cunanan and Koopmeiners, 2014), as shown by the sensitivity analysis described later. In addition, although theoretically the weight wki = xki /Tk was “optimal” only when the times to toxicity or efficacy are uniformly distributed, Cheung and Chappell (2000) showed that the design based on wki was robust when the time to toxicity deviated from the uniform distribution. Our simulation in Section 4.2 showed that the proposed design possessed robust performance when the times to toxicity and efficacy followed Weibull distributions. 4. NUMERICAL STUDIES We examine the performance of the proposed dose-finding design under 8 scenarios using simulation studies. We focus our comparisons on the phase I/II design used at the UT MD Anderson Cancer Center, proposed by Thall and Cook (2004, hereafter referred to as TC). We also compared our design to the more recently proposed method by Yin and others (2006, denoted as YLJ), which used the Dale model for toxicity and efficacy. 4.1 Operating characteristics In the trial that motivated our study, there were 5 investigational doses: 0.25, 0.5, 0.75, 1.00, and 1.25 mg/m2 . The maximum number of patients available for accrual was 48, and patients were treated

A robust Bayesian dose-finding design for phase I/II clinical trials

255

in cohort sizes of 3. After consulting with physicians, we took the prior estimates of the efficacy probabilities to be p˜ E = (0.2, 0.3, 0.4, 0.5, 0.6), and the prior estimates of toxicity probabilities to be p˜ T = (0.05, 0.1, 0.2, 0.3, 0.35), and assigned each vector an “effective” sample size of m = 1. Based on formula (2.6), the hyperparameters for the beta prior on β T were (aT 1 , . . . , aT 5 ) = (0.05, 0.1, 0.2, 0.3, 0.35) and (bT 1 , . . . , bT 5 ) = (0.95, 0.9, 0.8, 0.7, 0.65); the hyperparameters for the beta prior on β E were (a E1 , . . . , a E5 ) = (0.2, 0.3, 0.4, 0.5, 0.6) and (b E1 , . . . , b E5 ) = (0.8, 0.7, 0.6, 0.5, 0.4). To fix the utility function, we set the upper toxicity threshold at φT = 0.3 and the lower efficacy threshold at φ E = 0.2. We set w1 = 0.33 and w2 = 1.09, which represent 33% penalty for toxicities less than 0.3, and a 142% penalty for toxicities greater than 0.3. For the early stopping rules, we chose the cut-offs C E = 0.2 and C T = 0.2 because they provided good operating characteristics based on preliminary studies of the operating characteristics of the design. We considered 8 dose–toxicity and dose–efficacy scenarios (see Figure 1), representing what we felt was a realistic range of dose–response relationships. The first 3 scenarios were generated from the logistic model assumed under the TC design (i.e. implying the TC design was exactly correct), while the last 5 were generated to reflect the range of efficacy and toxicity relationships we anticipated in practice. Under each scenario, the efficacy and toxicity data were generated from the Gumbel model (2.2) with correlation parameter γ = 3. We conducted 1000 simulation runs for each scenario. We compared our trial design to TC and YLJ designs. The TC design assumes that the marginal distributions of y E and yT follow a standard logistic regression model (with dose as a covariate), and assumes a Gumbel model for the joint distribution of y E and yT . By assuming this parametric model for the dose– response curve, the TC design is potentially subject to the influence of model misspecification. To facilitate comparisons between the TC and YLJ designs and our design, we used the same utility function (2.7) in all designs. This implies, of course, that the target dose is the same under the three designs. The simulation results are summarized in Tables 1 as the selection percentage and the number of patients treated at each dose level. We performed 1000 simulated trials at each design setting. The category of “None” indicates the percentage of trials being that were terminated with all doses rejected. We defined the percentage of correct selections (PCS) as the percent of simulation runs in which a design correctly selected the target dose (i.e. the dose with the highest value of the utility or desirability). Scenarios 1 to 3 were generated based on the logistic models from the TC design. In scenario 1, the dose–efficacy curve increased slowly from dose level 1 to 2 and then rapidly thereafter, the dose–toxicity curve increased rapidly, and the target dose was level 1. In this scenario, all parametric assumptions of the TC design were satisfied and the TC design performed well, resulting in a PCS of 77.3% and 22.4 patients allocated to the target dose. The performance of the proposed design was comparable with that of the TC design with an optimal dose selection percentage of 71.9% and an average of 21.2 patients assigned to that dose. Similarly, under scenarios 2 and 3, in which all parametric assumptions of the TC design were again satisfied, the TC design performed well and the proposed design yielded a very similar performance in both PCS and the number of patients treated at the target dose. In these scenarios, the YLJ design yielded similar performance as the TC design and the proposed design. For example, under scenario 3, the PCS was 86.8% for the YLJ design, 95.3% for the TC design, and 88.4% for the proposed design, and the number of patients treated at optimal dose was 25.8 for the YJL design, 27.1 for the TC design, and 26.8 for the proposed design. When the toxicity and efficacy probabilities were not generated from the logistic model, e.g. scenarios 4 to 8 in Table 1, the proposed design outperformed the TC and YLJ designs. Specifically, in scenario 4, the dose–toxicity curve increased slowly from dose level 1 to 3 and then rapidly thereafter, and the dose– efficacy curve increased from dose level 2 to 3 and from 4 to 5, but remained similar for the increases in dose from dose level 1 to 2 and from dose level 3 to 4. In this scenario, the PCS of the proposed design was 15.4 and 0.8% higher than those of the TC and YLJ designs, respectively. In scenario 5, dose 4 was the true optimal dose, and the PCS of the proposed design was 71.1%, which was 8.7% higher than the PCS using

S. LIU

256

AND

V. E. JOHNSON

–0.14

0.8

Scenario 1 0.23

–0.15

–0.20

Scenario 2 –0.18

Probability

Probability

0.8 0.4 0.0

0.09

0.21

1

2

2

3

4

5

0.06

0.13

4

5

0.33

–0.11

–0.32

3

4

5

–0.18

–0.29

–0.28

3

4

5

–0.43

–0.45

–0.47

3

4

5

Scenario 4 0.25

0.8

0.39 Probability

Probability

0.04

0.4 0.0

0.12

0.14

1

2

0.4 0.0

1

2

3

4

5

Dose

Dose

Scenario 5 0.08

0.18

0.22

Scenario 6 0.45

0.8

0.04 Probability

Probability

3 Dose

Scenario 3

0.4 0.0

0.02

0.25

1

2

0.4 0.0

1

2

0.06

3

4

5

Dose

Dose

Scenario 7

Scenario 8

0.37

0.35

0.8 –0.01

–0.35 Probability

0.8 –0.01 Probability

–0.26

0.4

Dose

0.8

–0.19

0.0 1

0.8

–0.09

0.4 0.0

–0.03

0.4 0.0

1

2

3 Dose

4

5

1

2

Dose

Fig. 1. Illustration of scenarios. The solid lines with spheres represent true probabilities of efficacy and the dashed lines with triangles represent true probabilities of toxicity. The numbers appearing across the top of each panel are the value of the utility for each dose.

the YLJ design. The TC design was more likely to select an incorrect optimal dose (dose 5). This is because the logistic model assumption made in the TC design overestimates the efficacy probability of dose 5 after observing a sharp increase of efficacy from dose 3 to dose 4. For scenario 6, where the dose–toxicity and dose–efficacy curves crossed around dose level 3, and the proposed design outperformed the TC and YLJ designs with a higher PCS. Scenario 7 is an interesting case in that two doses (i.e. dose levels 3 and 4) had similar utility. In this case, the total probability of selecting these two doses under the proposed design was 15.2% higher than that under the TC design, and 2.2% higher than that of YLJ design. Scenario 8 was used to examine whether the designs terminated early when none of the doses were admissible (i.e. too toxic or futile). All designs stopped the trial with high probabilities under scenario 8.

A robust Bayesian dose-finding design for phase I/II clinical trials

257

Table 1. Selection percentage and the number of patients treated at each dose level (shown in parentheses) under the proposed design, the TC design, and the YLJ design. The target dose is shown in boldface. In scenarios 1 to 3, data generated from the TC model (assuming logistic model for dose–efficacy and dose–toxicity probabilities); in scenarios 4–8, data generated arbitrarily Dose level Scenario

Design

1

2

3

4

5

1

( p E , pT ) U TC YLJ Proposed

(0.28, 0.15) 0.23 77.3 (22.4) 71.8 (17.2) 71.9 (21.2)

(0.30, 0.32) −0.15 10.8 (10.8) 16.4 (14.9) 10.9 (13.9)

(0.44, 0.45) −0.20 2.5 (4.8) 2.2 (8.1) 2.1 (5.4)

(0.60, 0.55) −0.18 2.0 (4.5) 1.0 (3.8) 1.0 (2.3)

(0.74, 0.62) −0.14 2.0 (4.1) 0.1 (1.4) 0.8 (1.0)

5.4 8.5 13.3

( p E , pT ) U TC YLJ Proposed

(0.10, 0.04) 0.07 12.8 (12.6) 18.0 (10.1) 20.1 (11.4)

(0.27, 0.18) 0.20 63.0 (16.2) 52.3 (16.9) 58.7 (18.2)

(0.44, 0.37) −0.03 13.0 (8.7) 10.0 (12.9) 9.8 (10.8)

(0.58, 0.54) −0.02 0.1 (3.8) 6.5 (5.5) 0.7 (3.6)

(0.69, 0.67) −0.01 0.3 (3.6) 3.8 (1.8) 0.0 (1.0)

10.8 9.4 10.7

( p E , pT ) U TC YLJ Proposed

(0.05, 0.02) 0.04 0.2 (5.1) 0.0 (6.7) 0.1 (3.2)

(0.08, .05) 0.06 0.0 (3.8) 0.1 (2.5) 0.6 (3.5)

(0.15, 0.07) 0.13 2.2 (5.5) 2.9 (7.9) 2.1 (4.1)

(0.28, 0.10) 0.25 1.1 (6.2) 9.2 (5.1) 7.3 (9.9)

(0.43, 0.12) 0.39 95.3 (27.1) 86.8 (25.8) 88.4 (26.8)

1.2 1.0 1.5

( p E , pT ) U TC YLJ Proposed

(0.15, 0.10) 0.12 17.8 (13.4) 5.3 (9.0) 5.2 (5.8)

(0.18, 0.12) 0.14 20.1 (10.4) 18.3 (11.6) 18.3 (10.6)

(0.38, 0.15) 0.33 53.5 (10.9) 68.1(17.4) 68.9 (17.7)

(0.40, 0.36) −0.11 4.3 (6.6) 4.7 (8.0) 3.4 (9.7)

(0.60, 0.65) −0.32 1.1 (6.0) 2.1 (1.9) 0.3 (3.1)

3.2 1.5 3.9

( p E , pT ) U TC YLJ Proposed ( p E , pT ) U TC YLJ Proposed

(0.10, 0.05) 0.08 3.9 (6.9) 1.2 (3.2) 0.7 (3.6) (0.05, 0.10) 0.02 11.8 (12.3) 0.6 (9.8) 6.8 (8.8)

(0.20, 0.07) 0.18 0.6 (5.6) 4.2 (3.6) 3.0 (4.7) (0.30, 0.15) 0.25 52.0 (13.4) 61.6 (12.1) 75.6 (18.8)

(0.25, 0.10) 0.22 19.8 (9.5) 16.8 (7.8) 15.6 (8.7) (0.32, 0.35) −0.18 20.4 (9.1) 26.8 (13.7) 6.7 (11.3)

(0.50, 0.15) 0.45 14.7 (9.1) 62.4 (18.0) 71.1 (16.6) (0.35, 0.45) −0.29 2.1 (6.4) 0.2 (5.3) 0.3 (4.1)

(0.54, 0.35) 0.04 60.3 (16.7) 14.0 (13.3) 8.4 (13.9) (0.50, 0.55) −0.28 1.0 (3.6) 0.0 (1.8) 0.4 (1.9)

12.7 10.8 10.2

( p E , pT ) U TC YLJ Proposed

(0.02, 0.10) −0.01 4.7 (8.8) 2.3 (8.0) 2.4 (4.6)

(0.10, 0.12) 0.06 10.6 (10.1) 7.3 (13.4) 6.9 (8.2)

(0.42, 0.15) 0.37 66.2 (12.3) 72.1 (16.2) 75.4 (17.5)

(0.45, 0.30) 0.35 4.4 (8.7) 11.5 (7.8) 10.4 (12.0)

(0.50, 0.60) −0.35 5.3 (6.3) 1.0 (2.2) 0.0 (4.0)

8.8 5.8 4.9

( p E , pT ) U TC YLJ Proposed

(0.02, 0.10) −0.01 10.6 (8.3) 10.0 (6.0) 10.9 (8.8)

(0.05, 0.25) −0.03 6.7 (8.6) 2.0 (9.2) 3.2 (8.7)

(0.35, 0.55) −0.43 0.7 (4.4) 0.8 (6.1) 0.6 (5.0)

(0.40, 0.60) −0.45 0.1 (2.1) 0.5 (1.8) 0.1 (1.6)

(0.52, 0.70) −0.47 0.1 (1.3) 3.5 (0.8) 0.3 (0.6)

81.8 83.2 84.9

2

3

4

5

6

7

8

None

0.7 1.4 1.2

S. LIU

258

AND

V. E. JOHNSON

Table 2. Selection percentage and the number of patients treated at each dose level under the proposed design when the outcomes are delayed. The target dose is shown in boldface Dose level Scenario

Design

1

2

3

4

5

1

( p E , pT ) Proposed

(0.28, 0.15) 62.8 (20.1)

(0.30, 0.32) 16.9 (14.2)

(0.44, 0.45) 4.2 (5.9)

(0.60, 0.55) 0.5 (2.5)

(0.74, 0.62) 1.3 (0.9)

14.3

2

( p E , pT ) Proposed

(0.10, 0.04) 15.7 (11.0)

(0.27, 0.18) 54.3 (18.0)

(0.44, 0.37) 17.4 (10.9)

(0.58, 0.54) 1.2 (4.1)

(0.69, 0.67) 0.2 (1.2)

11.2

3

( p E , pT ) Proposed

(0.05, 0.02) 0.2 (3.3)

(0.08, 0.05) 0.5 (3.8)

(0.15, 0.07) 2.3 (4.6)

(0.28, 0.10) 4.5 (10.6)

(0.43, 0.12) 90.6 (25.5)

1.9

( p E , pT ) Proposed

(0.15, 0.10) 3.1 (5.7)

(0.18, 0.12) 10.4 (10.7)

(0.38, 0.15) 66.1 (17.3)

(0.40, 0.36) 15.6 (9.9)

(0.60, 0.65) 1.7 (3.6)

3.1

( p E , pT ) Proposed

(0.10, 0.05) 0.7 (3.8)

(0.20, 0.07) 2.5 (5.6)

(0.25, 0.10) 11.5 (9.6)

(0.50, 0.15) 66.9 (16.2)

(0.54, 0.35) 17.8 (12.7)

0.6

( p E , pT ) Proposed

(0.05, 0.10) 4.2 (9.4)

(0.30, 0.15) 69.5 (17.5)

(0.32, 0.35) 10.3 (10.8)

(0.35, 0.45) 1.5 (4.5)

(0.50, 0.55) 0.9 (2.0)

13.6

( p E , pT ) Proposed

(0.02, 0.10) 1.7 (5.0)

(0.10, 0.12) 5.4 (8.7)

(0.42, 0.15) 65.6 (16.6)

(0.45, 0.30) 20.6 (11.8)

(0.50, 0.60) 1.3 (4.5)

5.4

( p E , pT ) Proposed

(0.02, 0.10) 7.3 (9.7)

(0.05, 0.25) 4.6 (9.8)

(0.35, 0.55) 2.1 (5.3)

(0.40, 0.60) 0.0 (2.2)

(0.52, 0.70) 0.1 (0.6)

85.9

4 5 6 7 8

None

4.2 “Delayed” outcomes A second simulation study was conducted to evaluate the performance of the proposed design when toxicity and efficacy were delayed. We assumed that patients arrived according to a Poisson process, with an accrual rate of three patients per month. The assessment periods were TT = 2 months for toxicity and TE = 3 months for efficacy. We used the same 8 toxicity and efficacy scenarios as shown in Table 1. Under each scenario, we simulated the times to toxicity and efficacy independently based on a Weibull distribution so that 50% of the toxicity and efficacy events occurred in the latter half of the assessment period (Tk /2, Tk ). Specifically, at each dose level, the scale and shape parameters of the Weibull distribution were chosen so that (i) the cumulative distribution function at the end of the assessment window Tk would be the toxicity/efficacy probabilities of that dose; and (ii) among all the findings of toxicity and efficacy that occurred in (0, Tk ), 50% of the findings would occur in (Tk /2, Tk ), the latter half of the assessment period. Because the toxicity and efficacy probabilities varied across different dose levels, the scale and shape parameters of the Weibull distribution were different across the dose levels. The results (see Table 2) show that the proposed design performed well when toxicity and efficacy were delayed. The design selected the target dose with the highest probabilities and allocated the highest percentages of patients to the target dose in all 8 scenarios. Compared with the results in Table 1, where the toxicity and efficacy outcomes were immediately ascertainable, the selection percentage and the number of patients treated at the target dose under the delayed outcomes were slightly lower (mostly < 10%). This is because less information was available to make the decision of dose assignment when toxicity and efficacy were delayed.

A robust Bayesian dose-finding design for phase I/II clinical trials

259

Table 3. Results of the sensitivity analysis Dose level Scenario 1

2

3

4

5

6

7

8

Varied parameter

1

2

3

4

5

( p E , pT ) Prior mean m = 0.5 N = 36

(0.28, 0.15) 68.5 (19.5) 71.3 (20.6) 70.5 (15.0)

(0.30, 0.32) 11.8 (13.7) 8.7 (13.7) 14.0 (10.8)

(0.44, 0.45) 2.4 (6.2) 2.4 (5.5) 2.8 (4.9)

(0.60, 0.55) 0.6 (2.5) 1.2 (2.2) 1.1 (2.0)

(0.74, 0.62) 0.9 (0.9) 1.4 (1.1) 0.5 (0.8)

15.8 15.0 11.1

( p E , pT ) Prior mean m = 0.5 N = 36

(0.10, 0.04) 18.7 (10.0) 17.3 (10.7) 23.5 (8.8)

(0.27, 0.18) 56.9 (17.6) 55.3 (17.1) 53.2 (12.7)

(0.44, 0.37) 13.8 (12.6) 10.0 (10.3) 11.2 (8.5)

(0.58, 0.54) 0.5 (4.2) 0.8 (3.6) 1.2 (3.2)

(0.69, 0.67) 0.0 (0.9) 0.1 (1.1) 0.1 (0.9)

10.1 16.5 10.8

( p E , pT ) Prior mean m = 0.5 N = 36

(0.05, 0.02) 0.2 (3.2) 0.1 (3.2) 0.3 (3.2)

(0.08, 0.05) 0.2 (3.4) 0.7 (3.5) 1.3 (3.5)

(0.15, 0.07) 2.5 (4.6) 2.6 (4.4) 5.3 (4.0)

(0.28, 0.10) 10.1 (11.1) 9.4 (10.4) 12.6 (7.9)

(0.43, 0.12) 86.0 (25.5) 84.7 (25.8) 79.1 (17.1)

1.0 2.5 1.4

( p E , pT ) Prior mean m = 0.5 N = 36

(0.15, 0.10) 3.8 (4.9) 4.7 (5.4) 7.8 (5.3)

(0.18, 0.12) 14.6 (9.9) 16.0 (9.9) 23.8 (8.4)

(0.38, 0.15) 74.9 (19.1) 70.1 (17.9) 61.8 (12.0)

(0.40, 0.36) 3.5 (10.5) 3.3 (9.9) 3.5 (7.0)

(0.60, 0.65) 0.2 (2.6) 0.3 (2.9) 0.2 (2.8)

3.0 5.6 2.9

( p E , pT ) Prior mean m = 0.5 N = 36

(0.10, 0.05) 0.3 (3.4) 0.2 (3.5) 0.6 (3.5)

(0.20, 0.07) 1.7 (4.6) 4.9 (5.1) 5.9 (4.5)

(0.25, 0.10) 17.0 (9.8) 15.0 (9.3) 20.8 (7.0)

(0.50, 0.15) 72.9 (17.6) 70.0 (16.5) 61.9 (11.2)

(0.54, 0.35) 7.0 (12.2) 7.0 (12.7) 9.9 (9.6)

1.1 2.9 0.9

( p E , pT ) Prior mean m = 0.5 N = 36

(0.05, 0.10) 6.8 (8.6) 7.1 (8.6) 11.5 (7.7)

(0.30, 0.15) 72.5 (18.3) 72.1 (18.4) 70.9 (13.3)

(0.32, 0.35) 8.9 (11.8) 5.5 (10.7) 7.0 (8.2)

(0.35, 0.45) 0.3 (4.4) 0.4 (3.9) 0.4 (3.4)

(0.50, 0.55) 0.3 (1.6) 0.4 (1.9) 0.5 (1.7)

11.2 14.5 9.7

( p E , pT ) Prior mean m = 0.5 N = 36

(0.02, 0.10) 1.9 (4.1) 1.3 (4.3) 4.5 (4.5)

(0.10, 0.12) 3.7 (7.5) 6.4 (7.9) 11.6 (7.0)

(0.42, 0.15) 77.9 (18.3) 74.6 (17.4) 66.5 (11.1)

(0.45, 0.30) 11.8 (13.0) 13.0 (12.3) 11.2 (8.6)

(0.50, 0.60) 0.1 (3.6) 0.6 (4.0) 0.3 (3.5)

4.6 4.1 5.9

( p E , pT ) Prior mean m = 0.5 N = 36

(0.02, 0.10) 10.5 (8.7) 11.0 (8.0) 17.9 (7.9)

(0.05, 0.25) 2.7 (8.8) 1.8 (8.4) 4.4 (7.7)

(0.35, 0.55) 0.6 (5.2) 0.3 (4.4) 0.7 (4.4)

(0.40, 0.60) 0.0 (1.8) 0.3 (1.5) 0.0 (1.5)

(0.52, 0.70) 0.0 (0.4) 0.2 (0.6) 0.2 (0.5)

86.2 86.4 76.8

N one

4.3 Sensitivity analysis We evaluated the sensitivity of the proposed design to the prior specification of the unknown parameters β E and β T , sample size and cohort size. Specifically, we considered (i) a different set of prior estimates of efficacy probabilities pˆ E = (0.1, 0.3, 0.5, 0.6, 0.7) and prior estimate of toxicity probabilities pˆ T = (0.02, 0.08, 0.15, 0.3, 0.45); (ii) a more dispersed prior with a smaller “effective” sample size of m = 0.5; and (iii) a smaller sample size of N = 36 patients, in cohort sizes of 2. Table 3 shows the results of the sensitivity analysis. We can see that the proposed design was robust to various simulation conditions, and the results were generally comparable with those reported in Table 1.

S. LIU

260

AND

V. E. JOHNSON

Table 4. Comparison of the proposed design based on marginal or joint distribution of toxicity and efficacy Dose level Scenario

Design

1

2

3

4

5

1

( p E , pT ) Marginal Joint

(0.28, 0.15) 71.9 (21.2) 68.6 (21.4)

(0.30, 0.32) 10.9 (13.9) 13.6 (13.9)

(0.44, 0.45) 2.1 (5.4) 2.6 (5.2)

(0.60, 0.55) 1.0 (2.3) 1.1 (2.2)

(0.74, 0.62) 0.8 (1.0) 1.9 (1.2)

13.3 12.2

( p E , pT ) Marginal Joint

(0.10, 0.04) 20.1 (11.4) 14.4 (11.9)

(0.27, 0.18) 58.7 (18.2) 59.4 (18.3)

(0.44, 0.37) 9.8 (10.8) 14.5 (10.5)

(0.58, 0.54) 0.7 (3.6) 0.9 (3.5)

(0.69, 0.67) 0.0 (1.0) 0.2 (1.0)

10.7 10.6

( p E , pT ) Marginal Joint

(0.05, 0.02) 0.1 (3.2) 0.2 (3.2)

(0.08, 0.05) 0.6 (3.5) 0.3 (3.5)

(0.15, 0.07) 2.1 (4.1) 2.1 (4.2)

(0.28, 0.10) 7.3 (9.9) 6.1 (9.8)

(0.43, 0.12) 88.4 (26.8) 90.1 (26.9)

1.5 1.2

( p E , pT ) Marginal Joint

(0.15, 0.10) 5.2 (5.8) 3.2 (5.8)

(0.18, 0.12) 18.3 (10.6) 10.3 (10.9)

(0.38, 0.15) 68.9 (17.7) 70.8 (17.8)

(0.40, 0.36) 3.4 (9.7) 12.1 (9.6)

(0.60, 0.65) 0.3 (3.1) 0.8 (3.0)

3.9 2.8

( p E , pT ) Marginal Joint

(0.10, 0.05) 0.7 (3.6) 0.2 (3.6)

(0.20, 0.07) 3.0 (4.7) 2.9 (5.2)

(0.25, 0.10) 15.6 (8.7) 9.4 (8.8)

(0.50, 0.15) 71.1 (16.6) 59.0 (16.4)

(0.54, 0.35) 8.4 (13.9) 28.0 (13.9)

1.2 0.5

( p E , pT ) Marginal Joint

(0.05, 0.10) 6.8 (8.8) 4.6 (9.0)

(0.30, 0.15) 75.6 (18.8) 65.6 (18.5)

(0.32, 0.35) 6.7 (11.3) 15.7 (10.9)

(0.35, 0.45) 0.3 (4.1) 1.6 (4.0)

(0.50, 0.55) 0.4 (1.9) 0.6 (1.7)

10.2 11.9

( p E , pT ) Marginal Joint

(0.02, 0.10) 2.4 (4.6) 0.9 (4.4)

(0.10, 0.12) 6.9 (8.2) 4.6 (8.4)

(0.42, 0.15) 75.4 (17.5) 63.7 (17.4)

(0.45, 0.30) 10.4 (12.0) 24.6 (12.1)

(0.50, 0.60) 0.0 (4.0) 1.2 (4.1)

4.9 5.0

( p E , pT ) Marginal Joint

(0.02, 0.10) 10.9 (8.8) 8.5 (8.8)

(0.05, 0.25) 3.2 (8.7) 5.6 (8.7)

(0.35, 0.55) 0.6 (5.0) 1.0 (4.7)

(0.40, 0.60) 0.1 (1.6) 0.1 (1.5)

(0.52, 0.70) 0.3 (0.6) 0.6 (0.5)

84.9 84.2

2

3

4

5

6

7

8

N one

In addition, we evaluated the impact of the correlation between the toxicity and efficacy on the design. Table 4 summaries the simulation results. The row labeled “Marginal” represented the proposed design without considering the correlation. The row labeled “Joint” showed the proposed design when the correlation based on the Gumbel copula given by (2.2) was used. We can see the proposed design did not lose any efficiency in selecting the optimal dose without considering the correlation between the efficacy and toxicity. In those scenarios, where the toxicity and efficacy curves crossed, such as scenario 6 and 7, the proposed design without considering the correlation had better optimal dose selection percentages due to the flexibility of estimating the toxicity and efficacy independently. The joint model considering the correlation might reflect the true relationship between the toxicity and efficacy. However, with the limited number of patients treated at each dose level, the obtained information might not be sufficient to provide accurate estimates of the correlation. The use of joint model added computational complexity, but did not necessarily improve the performance in the phase I/II trial. When the functional form of the correlation is incorrect, using the joint model can lead to incorrect selection of the optimal dose and mistreat patients at overly toxic doses. This has been shown in a comprehensive simulation study by Cunanan and Koopmeiners (2014).

A robust Bayesian dose-finding design for phase I/II clinical trials

261

4.4 Trial conduct We applied our design to conduct the prostate cancer clinical trial. For reasons of propriety, a stylized summary of the results is reported here. A total of 48 patients were treated sequentially in cohorts of size 3. The posterior estimates of the dose toxicity and efficacy probabilities, as well as the posterior utility, were updated for each cohort. Five doses were investigated and the trial started by treating the first cohort at dose level 1. The patients were sequentially treated at dose levels 1, 2, 3, 4, and 5. Upon the arrival of the sixth cohort of patients, the posterior estimates of the probability of that each dose was the optimal dose were 0.02, 0.14, 0.40, 0.42, and 0.01 for dose levels 1, 2, 3, 4, and 5, respectively. The sixth cohort was assigned to dose level 4. Meanwhile, we had 3 toxicity outcomes observed among the 3 patients treated at dose level 5, and no more patients were treated at dose level 5 due to toxicity. The remaining patients were assigned to the most desirable dose with the highest probability of the maximum utility. Upon arrival of the ninth cohort of patients, we had 6 efficacy outcomes and 3 toxicity outcomes observed among the 12 patients treated at dose level 4, and we had 3 efficacy outcomes but no toxicity outcomes observed at dose level 3. The posterior probabilities that each dose was the optimal dose were 0.02, 0.12, 0.43, 0.40, and 0.03 for dose levels 1, 2, 3, 4, and 5, respectively. Therefore, dose level 3 became the most desirable dose. Finally, we had 3, 3, 24, 15, and 3 patients treated at doses 1, 2, 3, 4, and 5, respectively. Dose level 3 was selected as the optimal dose and had a posterior utility mean of 0.42. The posterior estimates of the respective toxicity and efficacy probabilities at dose level 3 were 0.17 and 0.49.

5. DISCUSSION We have proposed a new Bayesian adaptive phase I/II trial design to find the most desirable dose using both toxicity and efficacy information. An important advantage of the proposed design is that it borrows information across doses using a novel Bayesian dynamic model, but does not impose stringent model assumptions on the dose–toxicity and dose–efficacy curves. As a result, the proposed design yields robust performance across many shapes of the underlying true dose–toxicity and dose–efficacy curves. In addition, sensitivity analyses suggest that the proposed design is relatively insensitive to the prior specification, and using the marginal model for toxicity and efficacy does not loose efficiency in selecting the optimal dose. The proposed design assumes that both toxicity and efficacy outcomes are binary. In practice, toxicity and efficacy are often measured on an ordinal scale. For example, in oncology, toxicity is often measured on a scale of five grades according to the Common Terminology Criteria for Adverse Events, and efficacy is often classified according to the following four levels of disease status: disease progression, stable disease, partial remission, and complete remission. Lasonos and others (2011) proposed to incorporate the lower grade toxicity information into the dose-finding study using CRM. Meter and others (2012) extended CRM using continuation ratio model. Tighiouart and others (2012) extended the method of escalation with overdose control by introducing an intermediate grade 2 toxicity when assessing dose-limiting toxicity. One extension of the proposed design would be to account for the ordinal nature of toxicity and efficacy. For example, a proportional odds model (Agresti, 2002) could be used to accommodate the ordinal nature of toxicity and efficacy, and the utility could be modified accordingly to account for the different levels of desirability associated with the different grades of toxicity and levels of efficacy.

ACKNOWLEDGMENTS Conflict of Interest: None declared.

262

S. LIU

AND

V. E. JOHNSON

FUNDING S.L. research was partially supported by Award Number R01 CA154591 from the National Cancer Institute. V.E.J. research was supported by Award R01 CA158113 from the National Cancer Institute.

REFERENCES AGRESTI, A. (2002) Categorical Data Analysis, 2nd edition. New York: Wiley. BEKELE, B. N. AND SHEN, Y. (2005). A Bayesian approach to jointly modeling toxicity and biomarker expression in a phase I/II dose-finding trial. Biometrics 61, 343–354. BRAUN, T. M. (2002). The bivariate continual reassessment method: extending the CRM to phase I trials of two competing outcomes. Controlled Clinical Trials 23, 240–256. CAI, C., YUAN, Y. AND JI, Y. (2014). A Bayesian dose-finding design for oncology clinical trials of combinational biological agents. Journal of the Royal Statistical Society, Series C 63, 159–173. CHEUNG, Y. AND CHAPPELL, R. (2000). Sequential designs for phase I clinical trials with late-onset toxicities. Biometrics 56, 1177–1182. CHEUNG, Y. K. (2002). On the use of nonparametric curves in phase I trials with low toxicity tolerance. Biometrics 58, 237–240. CUNANAN, K. AND KOOPMEINERS, J. K. (2014). Evaluating the performance of copula models in phase I–II clinical trials under model misspecification. BMC Medical Research Methodology 14, 51. GASPARINI, M. AND EISELE, J. (2000). A curve-free method for phase I clinical trials. Biometrics 56, 609–615. GOOLEY, T. A., MARTIN, P. J., FISHER, L. D. AND PETTINGER, M. (1994). Simulation as a design tool for phase I/II clinical trials: an example from bone marrow transplantation. Controlled Clinical Trials 15, 450–462. GUO, B., LI, Y., YUAN, Y. (2015). A dose-schedule finding design for phase III clinical trials. Journal of the Royal Statistical Society, Series C. doi:10.1111/rssc.12113. HOUEDE, N., THALL, P. F., NGUYEN, H., PAOLETTI, X. AND KRAMAR, A. (2010). Utility-based optimization of combination therapy using ordinal toxicity and efficacy in phase I/II trials. Biometrics 66, 532–540. HUNSBERGER, S., RUBNSTEIN, L. V., DANCEY, J. AND KORN, E. L. (2005). Dose escalation trial designs based on a molecularly targeted endpoint. Statistics in Medicine 24, 2171–2181. LASONOS, A., ZOHAR, S. AND O’QUIGLEY, J. (2011). Incorporating lower grade toxicity information into dose finding designs. Clinical Trials 8(4), 370–379. MANDREKAR, S. J., CUI, Y. AND SARGENT, D. J. (2007). An adaptive phase I design for identifying a biologically optimal dose for dual agent drug combinations. Statistics in Medicine 26, 2317–2330. MANDREKAR, S. J., QIN, R. AND SARGENT, D. J. (2010). Model-based phase I designs incorporating toxicity and efficacy for single and dual agent drug combinations: methods and challenges. Statistics in Medicine 29, 1077– 1083. MESSER, K., NATARAJAN, L., BALL, E. D. AND LANE, T. A. (2010). Toxicity-evaluation designs for phase I/II cancer immunotherapy trials. Statistics in Medicine 29, 712–720. METER, E. M. V., GARRETT-MAYER, E. AND BANDYOPADHYAY, E. (2012). Dose-finding clinical trial design for ordinal toxicity grades using the continuation ratio model: an extension of the continual reassessment method. Clinical Trials 0, 1–11. MURTAUGH, P. A. AND FISHER, L. D. (1990). Bivariate binary models of efficacy and toxicity in dose-ranging trials. Communications in Statistics, Part A - Theory and Methods 19, 2003–2020.

A robust Bayesian dose-finding design for phase I/II clinical trials

263

O’QUIGLEY, J. AND ZOHAR, S. (2006). Experimental designs for phase I and phase I/II dose-finding studies. British Journal of Cancer 94, 609–613. O’QUIGLEY, J., PEPE, M. AND FISHER, L. (1990). Continual reassessment method: a practical design for phase I clinical trials in cancer. Biometrics 46, 33–48. THALL, P. F. AND COOK, D. C. (2004). Dose-finding based on efficacy-toxicity trade-offs. Biometrics 60, 684–693. THALL, P. F. AND RUSSELL, K. E. (1998). A strategy for dose-finding and safety monitoring based on efficacy and adverse outcomes in phase I/II clinical trials. Bimetrics 54, 251–264. THALL, P. F., NGUYEN, H. Q., BRAUN, T. M. AND QAZILBASH, M. H. (2013). Using joint utilities of the times to response and toxicity to adaptively optimize schedule–dose regimes. Biometrics 69, 673–682. TIGHIOUART, M., COOK-WIENS, G. AND ROGATKO, A. (2012). Escalation with overdose control using ordinal toxicity grades for cancer phase I clinical trials. Journal of Probability and Statistics 2012, doi.org/10.1155/2012/317634. WAGES, N. A. AND CONAWAY, M. R. (2014). Phase I/II adaptive design for drug combination oncology trials. Statistics in Medicine 33, 1990–2003. WHITEHEAD, J., THYGESEN, H. AND WHITEHEAD, A. (2011). Bayesian procedures for phase I/II clinical trials investigating the safety and efficacy of drug combinations. Statistics in Medicine 30, 1952–1970. YIN, G., LI, Y. AND JI, Y. (2006). Bayesian dose-finding in phase I/II clinical trials using toxicity and efficacy odds ratios. Biometrics 62, 777–787. YUAN, Y. AND YIN, G. (2011a). Bayesian phase I/II adaptively randomized oncology trials with combined drugs. The Annals of Applied Statistics 5, 924–942. YUAN, Y AND YIN, G. (2011b). Robust EM continual reassessment method in oncology dose finding. Journal of American Statistical Association 106, 818–831. [Received March 31, 2015; revised September 16, 2015; accepted for publication September 24, 2015]

II clinical trials.

We propose a Bayesian phase I/II dose-finding trial design that simultaneously accounts for toxicity and efficacy. We model the toxicity and efficacy ...
NAN Sizes 3 Downloads 11 Views