Biometrical Journal 56 (2014) 6, 947–962

DOI: 10.1002/bimj.201300138

947

Practical considerations for using functional uniform prior distributions for dose-response estimation in clinical trials Bj¨orn Bornkamp∗ Novartis Pharma AG, CH-4002 Basel, Switzerland Received 16 July 2013; revised 9 March 2014; accepted 2 May 2014

Estimating nonlinear dose-response relationships in the context of pharmaceutical clinical trials is often a challenging problem. The data in these trials are typically variable and sparse, making this a hard inference problem, despite sometimes seemingly large sample sizes. Maximum likelihood estimates often fail to exist in these situations, while for Bayesian methods, prior selection becomes a delicate issue when no carefully elicited prior is available, as the posterior distribution will often be sensitive to the priors chosen. This article provides guidance on the usage of functional uniform prior distributions in these situations. The essential idea of functional uniform priors is to employ a distribution that weights the functional shapes of the nonlinear regression function equally. By doing so one obtains a distribution that exhaustively and uniformly covers the underlying potential shapes of the nonlinear function. On the parameter scale these priors will often result in quite nonuniform prior distributions. This paper gives hints on how to implement these priors in practice and illustrates them in realistic trial examples in the context of Phase II dose-response trials as well as Phase I first-in-human studies.

Keywords: Bayesian statistics; Dose-finding; Emax model; Functional uniform prior; Jeffreys prior.



Additional supporting information may be found in the online version of this article at the publisher’s web-site

1 Introduction Characterizing the dose-related effects of a pharmaceutical compound is the major objective of its clinical development. In Phase I first-in-human trials one typically aims to obtain information on safety of the compound, and estimates the maximum tolerated dose (MTD), which constitutes an upper bound on the doses for subsequent clinical development. In Phase II one then gains information on the dose-response relationship for efficacy (and tolerability) by studying several doses of the compound in patients, to select optimal doses for Phase III trials (and general use) in case the drug is sufficiently efficacious and tolerable. In Phase I trials model-based approaches using, for example, logistic regression are adopted on a more regular basis only since the past few years, see, for example, Neuenschwander et al. (2008) or the review of methods in Berry et al. (2011, ch. 3). From a statistical perspective one of the challenges in Phase I trials is that adaptation has to start early in the trial, so little data can be used to estimate the dose-response function and thus guide dose-escalation in early stages of the trial. Maximum likelihood (ML) estimation often fails in these situations, as the estimate might not exist for the observed data constellations. ∗ Corresponding

author: e-mail: [email protected], Phone: +41-616-963-764

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

948

B. Bornkamp: Practical considerations for using functional uniform prior distributions

The situation is comparable for Phase II dose-finding trials. Nonlinear regression techniques have gained popularity in pharmaceutical statistical practice only in the past few years, see, for example, Bretz et al. (2005), Thomas (2006), Dragalin et al. (2007), Jones et al. (2011) or Tan et al. (2011), although rather complete textbook reviews of nonlinear regression methods date back at least to Bates and Watts (1988) and Seber and Wild (1989). The late uptake of these methods might be related to the fact that usually few dose-levels are studied in Phase II trials (mostly due to logistical constraints) and the variability in the response in relation to the used sample size is high, which makes the underlying inference problem hard, see also the summary of the PhRMA adaptive dose-ranging working group (Bornkamp et al., 2007). In both Phase I and II one hence faces similar problems: Data are sparse and when using ML estimation the iterative algorithms used for calculating the estimator often diverge, often because the ML estimate does not exist. A partial solution to the problem is to impose bounds on the space for the nonlinear parameters. The parameter space is then compact and the ML estimator is guaranteed to exist (see, Jennrich, 1969, for the result in the context of nonlinear least-squares estimation). As a consequence, however, the estimate will often be exactly equal to the selected bound, which can be considered undesirable. In addition, the standard asymptotic theory for confidence intervals does not apply when the parameter is on the boundary of the parameter space. Another way to partially overcome these problems is to adopt a Bayesian approach. In the case of sparse and highly variable data, however, one can often observe a strong influence of the prior on the posterior distribution. This is undesirable when no carefully determined prior is available. A potentially reasonable assumption in these cases is to assume that all functional shapes of the assumed nonlinear function are equally likely. This is the idea underlying functional uniform prior distributions (see Bornkamp, 2012). These distributions are hence not noninformative in a formal, for example, information-theoretic sense. Nevertheless, these prior distributions will be a reasonable starting point in a variety of situations, and traditional noninformative priors (like the Jeffreys or related priors, see Eaves, 1983, or Ye and Berger, 1991) result as particular functional uniform distributions in certain situations. The general idea of functional uniform priors is to use a distribution that exhaustively and uniformly covers the functional space underlying the nonlinear regression function in a selected metric. This approach has advantages, for example, over using uniform priors on the model parameters. These can often be unwantedly informative and crucially depend on the selected parameterization (see Bornkamp, 2012), but are nevertheless still often used in practice. In Section 2 the idea of functional uniform priors will be described in detail and guidance will be provided on the choice of the metric, how to incorporate additional prior information and how to implement the priors in computer software. In Section 3 the approach is implemented for two real examples that are representative of situations encountered in drug development practice.

2 Functional uniform prior distributions 2.1

Methodological approach

Suppose it is desired to use μ(x, θ) as a regression function, where μ(x, θ) is nonlinear in the parameter vector θ, with x ∈ X ⊂ R and θ ∈ , where  is a compact subspace of Rk . The range X can be thought of as a range of interest for the dose variable x. The main idea of functional uniform priors in the context of nonlinear regression (Bornkamp, 2012) is to map the parameters θ to the space of functions μ(x, θ) endowed with a metric d (μ(., θ), μ(., θ  )), and impose a uniform distribution in this space. Several metrics can be reasonable, one example is the L2 metric d (θ, θ  ) =

 X

(μ(x, θ) − μ(x, θ  ))2 ν(dx),

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

(1)

www.biometrical-journal.com

Biometrical Journal 56 (2014) 6

949

Figure 1 Function 1/(1 + exp(−θ x)) displayed for θ = 0.8, 1, 1.2 and θ = 3.8, 4, 4.2 (left) and θ = 0.89, 1, 1.19 and θ = 3.1, 4, 5.37 (right). where ν(dx) is a measure on X . When X is a real interval and ν(dx) the Lebesgue measure one  would hence obtain X (μ(x, θ) − μ(x, θ  ))2 dx. For a discrete measure ν(dx) one would end up with   2 x ∈X (μ(xi , θ) − μ(xi , θ )) . The main idea is that the so defined metric on the space of functions is i

a more adequate measure to assess distances between the parameter values θ and θ  than the original parameter space , and the functional uniform distribution is the distribution that assigns equal probabilities to sets of equal size in the functional metric instead of the parameter space . In Figure 1 the idea of functional metrics is illustrated using the one-dimensional logistic model μ(x, θ ) = 1/(1 + exp(−θ x)) with x ∈ [−2, 2]. The functions are displayed for θ0 = 1 and θ0 = 4, in addition the θ −values with Euclidean distance |θ − θ0 | = 0.2 are plotted on the left hand side. On the right hand side one can see the values corresponding to d (θ , θ0 ) = 0.2, where d is the L2 metric 1, where a discrete uniform measure for ν(dx) was chosen and X = {−2, −1.9, −1.8, . . . , 1.9, 2}. One can see that when using the Euclidean distance the same change in θ (i.e. ±0.2) leads to quite different changes in the functional shape: For θ0 = 4 one obtains small changes, but bigger ones around θ0 = 1. So assigning equal probabilities for intervals of equal size on the parameter scale will poorly represent differences on the functional scale. Doing so would lead to a distribution that has steep functions overrepresented in the prior distribution and more shallow increasing functions (corresponding to small θ ) under-represented. Sets of equal size in the L2 metric lead to a visually more uniform coverage, unrelated to the location of θ0 . When looking at the parameter scale the interval around θ0 = 1 is given by [0.84, 1.19] and the one bracketing θ0 = 4 is given by [3.1, 5.37], reflecting the fact that a larger change in θ is needed to obtain the same change in the function for θ0 = 4 compared to θ0 = 1. So the functional L2 metric seems better suited in this case to define a distribution that leads to a uniform coverage of the functional shapes. However, it is not completely obvious how to derive a probability distribution that is uniform in a specified metric. In what follows a heuristic derivation for uniform priors in L p metrics is given based on Hartigan (1983, chapter 5.4) (see Bornkamp, 2012, for an alternative exposition, based on Dembski, 1990, and Ghosal et al., 1997). The defining property of a uniform probability distribution is that equally sized sets (in a specified metric d) receive equal probability, which means that Pθ {K (θ  )} → 1,  →0 Pθ {K (θ )} lim

for all

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

θ  , θ  ∈ ,

(2)

www.biometrical-journal.com

950

B. Bornkamp: Practical considerations for using functional uniform prior distributions

where P denotes the prior probability for θ and K (θ  ) = {θ|d (θ, θ  ) < }. This means that small balls (in the metric d (θ, θ  )) around any point θ  ∈  receive the same prior probability. The essential task is hence to calculate volumes of the set K (θ  ) for specified metrics and ensure that one can derive a distribution for which (2) holds. In what follows the focus is on L p metrics  1 d (θ, θ  ) = ( X |μ(x, θ) − μ(x, θ  )| p ν(dx)) p with p ≥ 1. Under appropriate differentiability conditions on μ one can approximate μ(x, θ) − μ(x, θ  ) by a Taylor approximation Jx (θ  )(θ − θ  ) + O(||θ − θ  ||2 ), ∂ μ(x, θ) is the gradient of the model function evaluated at θ  . where Jx (θ  ) = ∂θ Now from the triangle inequality and the reverse triangle inequality one can upper and lower    1 1 bound d (θ, θ  ) by ( X |Jx (θ  )(θ − θ  )| p ν(dx)) p + ( X |O(||θ − θ  ||2 )| p ν(dx)) p and |( X |Jx (θ  )(θ −  1 1 θ  )| p ν(dx)) p − ( X |O(||θ − θ  ||2 )| p ν(dx)) p |. Hence the major contribution for calculation of K (θ) for small  is   1p   p |Jx (θ )(θ − θ )| ν(dx) . (3) X

In the one-dimensional case k = 1 one can directly move |θ − θ  | out of the integral and calculate the  p p  volume, i.e., the length of the interval, of K (θ  ) as √ . So a density ∝ p X |Jx (θ )| ν(dx)  p X

|Jx (θ )| ν(dx)

will fulfill Eq. (2). In the multidimensional case calculation of volumes is more difficult. For p = 2 one can expand (3) to (θ − θ  )T Z ∗ (θ  )(θ − θ  ), (4)   T  ∗ where Z (θ) = X Jx (θ ) Jx (θ )ν(dx). It is known that sets defined by a quadratic form (4) define an ellipsoid, so when Z ∗ (θ) has finite positive eigenvalues, the volume of the set K (θ  ) around θ  , can be derived based on the fact that K (θ  ) for small  approximately corresponds to an ellipsoid of volume proportional to √ 1 ∗  in the  space. Hence a density proportional to

det(Z (θ ))



det(Z (θ)),

(5)

will give the desired property that spheres of equal size in the functional space will receive the same prior probability as in (2). For illustration, we will calculate the functional uniform metric in the L2 distance for the nonlinear function from Figure 1: The gradient of the model function can be calculated as Jx (θ ) = ∂θ∂ μ(x, θ ) = x exp(−θx) and using a discrete uniform measure for ν(dx) with equally spaced points X = {−2,−1.9, (exp(−θx)+1)2

 x exp(−θx) 2 −1.8, . . . , 1.9, 2 }, the formula above then results in a density proportional to ( (exp(−θx)+1) 2 ) . x∈X

In Figure 2 (i) we plot the resulting density function, and in (ii) one can observe the 1%, 10%, 20%, 30%, . . ., 90%, and 99% quantile of the distribution in the function space. It can nicely be observed that the distribution is quite nonuniform in the parameter space (putting more probability mass on small values of θ ), however on the functional scale it results in a distribution that uniformly covers the different functional forms that the function can assume. 2.1.1

Some properties

The functional uniform distribution is invariant to parameterization of the model function: Once the space (i.e. the metric) is determined where to impose the uniform distribution, the functional uniform prior distribution will be the same distribution as the function space is fixed, no matter what parameterization is used. Depending on the function μ and sets  and X , the functional uniform “density” (e.g., (5)) does not need to be integrable. The tails might not converge quickly enough to zero or might have nonintegrable  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 56 (2014) 6

951

Figure 2 (i) Functional uniform density. (ii) Function 1/(1 + exp(−θ x)) displayed for 1%, 10%, 20%, . . ., 80%, 90%, and 99% quantiles of the functional uniform density, i.e. θ is equal to −28.2, −5.21, −1.99, −0.99, −0.43, 0, 0.43, 0.99, 1.99, 5.21, 28.2. singularities for certain values θ. A simple example, where (5) can be improper is linear regression functions: μ(x, θ) = x θ and L p metrics. Here the gradient Jx (θ) is independent of θ, and thus constant as a function of θ. So for finite  one obtains a uniform distribution, while for infinite  the function is not integrable. Whether or not nonintegrability is a problem and potential strategies to use to overcome these problems depend on the modeling situation (e.g., the likelihood function). 2.1.2

Additional prior information available

In dose-response situations in Phase II at least vague prior information is typically available: There might be information on the placebo response from historical trials, but also for other parameters of the dose-response curve information might be available from earlier phases of development or similar compounds. This information can then be incorporated on top of a functional uniform distribution. In cases where prior information is available in form of a probability distribution, one can use multiplication of the densities to combine the information. This approach of information synthesis is also employed in Bayes theorem, where one multiplies likelihood and prior to combine information, but it has also been derived from information theoretic considerations, see for example, Zellner (1988). When, for example, moments of the prior distribution should be fixed to certain values, one can find the distribution that minimally deviates from the functional uniform density, in terms of the Kullback-Leibler distance, subject to the given moments, see Sun and Berger (1998). 2.2

Choosing a metric

An obvious question is what metric to use to measure differences between functions and to define uniformity. While there is no clear cut answer to this question, some guidance can be provided. In what follows the question will be split up into two subquestions (i) where to measure differences between functions (i.e. how to choose X and ν(dx)) and (ii) how to measure the difference between functions. Regarding (i) one can argue that the set X should consist of those x, where it is expected that the function μ(x, θ) will change with changing parameters, so that the prior will be sensitive to changes at these values. That means in practice X should at least consist of the doses x that are actually being used in the upcoming trial or could potentially be used, when an adaptive trial is planned. Regarding (ii), it is interesting to see that for specific choices of metrics the resulting functional uniform density is identical to traditional uninformative prior distributions: When using the empirical Hellinger distance (as well as the empirical Kullback-Leibler divergence) of the observed log-likelihood  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

952

B. Bornkamp: Practical considerations for using functional uniform prior distributions

function as distance metric, the resulting functional uniform density is Jeffreys prior distribution, see Ghosal et al. (1997) or Hartigan (1983, chapter 5.4). When a homoscedastic Gaussian residual model  T is assumed, the Jeffreys prior for θ is equal to (5) with Z ∗ (θ) = m n J (θ) Jx (θ), where xi are the i=1 i xi i used dose-levels, i = 1, . . . , m and ni arethe observed number of patients allocated to dose xi , so that the empirical measure of the covariates ni δx is used for ν(dx) and the Hellinger metric is equivalent i to the L2 metric in this case. For other residual distributions, the Hellinger metric will result in metrics different from the L2 metric for μ, and these might be a good starting point for the model at hand. A challenge for the Jeffreys prior is the use of the empirical measure for X and ν(dx): For example for missing data, it is not obvious whether dose values of the missing observations should be included in ν(dx) or not. In adaptive studies the batch-sequential analysis using stagewise Jeffreys priors at each interim and the combined analysis of all data using the Jeffreys prior at the end, will produce different results. This means that Jeffreys prior violates the likelihood principle. A simple modification of the Jeffreys prior that might be of interest in many situations is to choose ν(dx) according to the considerations related to (i) and (ii) according to the residual model. This will avoid some of the problems of the Jeffreys prior, because the prior is fully determined before data collection. 2.3

Implementation

The calculation of the functional uniform densities in explicit form is not straightforward in many cases (see Bornkamp, 2012, for some explicit calculations). In this section we will provide hints toward implementation in computer software. In Section 2.3.1 we describe how to implement L2 functional uniform priors of form (5) with a discrete measure ν(dx) in Bayesian modeling software tools by directly evaluating formula (5). This approach can be used for general Jeffreys-like priors as described in Section 2.2. The direct evaluation of (5) can be computationally demanding. For that reason we will provide in Section 2.3.2 default approximations for some popular dose-response models commonly used in Phase II. As these approximations will be derived in terms of standard distributions, the main advantage is that those are easy to implement and they might serve as a quick starting point in a number of practical situations. 2.3.1

Direct evaluation

To illustrate the approach we will provide an artificial example, and then show how to implement this in various Bayesian modeling software tools in detail in the Supporting Information. Suppose one would like to use the Emax model for analysis of a dose-response trial, with model function given by θ0 + θ1 x/(x + θ2 ), where θ2 is a positive parameter and gradient given by Jx (θ) = (1, x/(x + θ2 ), −xθ1 /(x + θ2 )2 ). Suppose that the dose-range of interest is [0.80 mg]. For illustration we assume there are data y = (0, 6, 8, 8) corresponding to doses x = (0, 20, 40, 80), and we  use a normal likelihood with, for illustration, known constant standard deviation 1. We select 10 x=0 δ8x for ν(dx), i.e. a uniform distribution on X = {0, 8, 16, . . . , 80} and let F (θ ) be the 11 × 3 matrix with J x (θ), x ∈ X in the rows. The functional uniform prior in the L2 metric is then proportional to det(Z ∗ (θ)) with Z ∗ (θ) = F (θ)T F (θ). The codes for implementing this model in different modeling languages (BUGS, PROC MCMC, R, and stan) are given in the Supporting Information. There we also give some advice on how to incorporate prior information on the parameter scale on top of the functional uniform prior as outlined in Section 2.1.2. 2.3.2

Default approximations for selected models

In this section we will provide approximations for the functional uniform density for a selection of nonlinear regression functions used in dose-response analysis in Phase II.  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 56 (2014) 6

953

Table 1 Model functions and parameter ranges (D denotes the maximum dose). Name

˜ on [0, D] Function μ0 (x, θ)

Proposed bounds

Emax Sigmoid Emax Power model Beta model

x/(x + θ2 ) θ xθ3 /(xθ3 + θ2 3 ) (x/D)exp(θ2 ) (x/(1.2D))θ2 (1 − x/(1.2D))θ3

[0, 1.5D] [0, 1.5D] × [0.5, 10] [−5, 5] [0.05, 4] × [0.05, 4]

Most nonlinear dose-response functions used in Phase II can be written as ˜ μ(x, θ) = θ0 + θ1 μ0 (x, θ),

(6)

where θ0 is a location parameter, and θ1 a scaling parameter, and only the model parameter θ˜ enters the regression function nonlinearly (see Pinheiro et al., 2006). Table 1 shows popular dose-response functions, which are all of this form and for which we will calculate and approximate functional uniform prior distributions. The Emax and sigmoid Emax models are probably the most popular models used in Phase II dose-response studies (see e.g. Thomas, 2006; Dragalin et al., 2007). The power model is a flexible model that can approximate concave, linear, and convex shapes. The beta model is a flexible unimodal dose-response function, which is a rescaled and truncated version of the density function of the beta distribution. It is in practice preferable to use parameter bounds in both Bayesian and nonBayesian analyses, as it typically ensures existence of the ML estimate or integrability of the posterior distribution. Assuming bounds is often no major restriction, because dose-response functions typically converge to a limiting function, when one or more of the nonlinear parameters converge to ∞ or 0. In addition the functional uniform prior distribution will, by definition, typically have relatively low probability density anyway in places, where the function changes little by changing the parameters, so that the truncation will have a limited impact on the prior. Table 1 gives parameter bounds that typically cover all shapes of relevance for Phase II studies. The bounds are selected relatively wide; in some situations even tighter bounds can be adequate (for example for the beta model, values as small as 1.5 for the upper bounds can be adequate). We will focus on calculating the functional uniform prior only for the nonlinear part μ0 of the regression function as only the nonlinear part determines the shape of the model function. This has the advantage that one can also use these priors in situations when either θ0 or θ1 are omitted from the model function (e.g., when fitting on placebo-adjusted scale) or are themselves parameterized in a more complex fashion (e.g., including covariates). As a dose-range we will choose [0, 1]. Note that this is no restriction, all priors can be transformed if the range is [0, D] for general D > 0. Specificities of the metric that were used to measure distances between the model functions are given in Appendix A.1, the essential idea is to use a metric that works on shapes, so that θ0 and θ1 cancel out and to use for ν(dx) something that is adequate under a variety of commonly used dose designs. To calculate the approximations, Eq. (5) was integrated for the different situations to calculate L2 based functional uniform densities on the shapes. The ranges for integration for the model parameters were those given in the second column of Table 1. As the resulting densities were complex mathematical expressions but not too complex in terms of their shape, simple standard distributions were used for approximation: beta distributions with range given in the second column of Table 1 and normal or lognormal distributions were used. In the case of a two-dimensional parameter (i.e., sigmoid Emax and beta models) a Gaussian copula was used to combine the two marginal densities for θ2 and θ3 , resulting in one additional parameter. More details on the used computational techniques for approximation are given in Appendix A.2. The final obtained approximations are given in Table 2. Note that also for the normal distributions it is recommended to use the parameter bounds from Table 1, i.e. actually using truncated normal distributions.  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

954

B. Bornkamp: Practical considerations for using functional uniform prior distributions

Table 2 Default approximations for shape-based functional uniform priors for the parameters θ2 and θ3 of the dose-response models in Table 1. Model

Type

Parameters of approximation density

Emax

Beta log-normal Beta log-normal Beta normal Beta log-normal

α = 0.5, β = 5.2 μ = −2.5, σ = 1.8 α1 = 0.82, β1 = 3.5, α2 = 0.93, β2 = 1.4, ρ12 = 0.32 μ1 = −1.5, σ1 = 1.1, μ2 = 1.4, σ2 = 0.88, ρ12 = 0.42 α = 3.6, β = 3.6 μ = 0, σ = 1.9 α1 = 0.72, β1 = 2.2, α2 = 0.84, β2 = 1.7, ρ12 = 0.52 μ1 = −0.13, σ1 = 1.1, μ2 = 0.29, σ2 = 1.1, ρ12 = 0.61

Sigm. Emax Power Beta model

If one wants to use the range [0, D] for integration in the metric, the transformation τ (θ2 ) = Dθ2 has to be used for the θ2 parameter of the Emax and sigmoid Emax models. The θ3 parameter of the sigmoid Emax model is not affected by this change. Also for the power and the beta model one can directly use the priors displayed in Table 2 for the general range [0, D], as the underlying model functions depend on D explicitly.

3 Examples In the following we will illustrate the presented methodology on a Phase II and a Phase I example. Both examples were chosen because they are representative of the data situations in these type of trials. Context specific prior information that might have existed at the design stage (from historical/preclinical trials or similar compounds) will be ignored in formulation of the priors, in practice methods from Section 2.1.2 can be used to combine potential information with a functional uniform prior. 3.1

Phase II dose-response example

In this example we analyze data from a dose-response trial obtained from clinicaltrials.gov, trial number NCT00362115. The purpose of this study was to investigate efficacy and safety of Azilsartan Medoxomil, a treatment to lower blood pressure, in patients with mild to moderate hypertension. The trial is quite representative for dose-response studies in Phase II in terms of the number of doses, the dose spacing, sample size, and the signal to noise ratio (effect size). In total 371 patients completed in the treatment groups placebo, 5, 10, 20, 40, and 80 mg. The allocations were almost balanced between the treatment groups and the primary endpoint was the change from baseline in sitting clinic diastolic blood pressure (in mmHg) after 8 weeks of treatment. The results obtained from the webpage are summary measures (least square mean μˆ i and standard error) for the different treatment groups. We analyze the data assuming a normal distribution for μˆ i with the given standard error as likelihood. The sigmoid Emax model will be used for analyzing the data. As parameter bounds we use the bounds from Table 1 with D = 80. We will compare a functional uniform prior approach to ML estimation and an approach using a uniform prior distribution on the parameter scale, as this is still often considered as a supposedly uninformative prior. For the functional uniform approach the L2 metric will be used and for ν(dx) a uniform measure on the actually administered doses, so that the functional uniform prior is essentially equivalent to a Jeffreys prior, but with equal weights on each dose level. The profile log-likelihood is monotonically increasing for θ3 → ∞, so that the ML estimate formally does not exist without imposing the boundaries on θ3 . When imposing boundaries, the estimate for  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 56 (2014) 6

955

Figure 3 Comparison of ML estimate with two different upper bounds for θ3 (i), the posterior distribution using the functional uniform prior (ii), and the posterior distribution using a uniform prior (iii). θ3 is exactly equal to the selected upper bound. Figure 3 (i) shows the ML estimate corresponding to θ3 ∈ [0.5, 10] and [0.5, 30]. One can see that the ML estimate on functional scale essentially gets equal to a step function, when θ3 gets large, which is a priori a biologically rather implausible estimate in dose-response situations. The posterior distribution resulting from the functional uniform prior can be observed in Figure 3 (ii). Compared to the ML fit, one can see that the posterior median is rather smooth, despite the fact that the best fit is obtained by a step function. The smoothness of the posterior distribution results because the posterior distribution “averages” over alternative parameter values that are supported by the likelihood (and prior). As the likelihood is rather flat in the region of the optimum also less extreme parameter values (in the sense of smaller θ3 ) are present in the posterior sample. The posterior distribution resulting from a uniform prior on the parameter scale is displayed in Figure 3 (iii). One can see that the posterior distribution is generally similar to the functional uniform prior, but the placebo response is slightly shifted downwards, away from the data. The reason for this is that the uniform prior places a relatively large amount of weight on shapes with large θ2 and large θ3 . The best fit in terms of θ0 , θ1 when θ2 and θ3 are large, is a constant function (i.e. θ1 ≈ 0). When averaging over the posterior shapes (some of them being close to constant shapes) this results in the downward shift for the placebo response. This problem of the uniform prior becomes more apparent when increasing the upper bound for θ2 and θ3 (see Figure B2 (i) in Appendix B). The posterior distribution on functional scale looks more like a constant function, as more prior weight is placed on even larger values of θ2 and θ3 , ultimately overruling the data. The functional uniform prior is less affected by increasing the bounds (see Figure B2 (ii)), as larger values of θ2 and θ3 are downweighted in the prior anyway (a change in the parameters for large θ2 and θ3 does not lead to a large change in the functional shape). So the uniform prior induces a strong (and typically) unwanted penalization toward particular shapes, while the functional uniform prior weights all functional shapes equally and allows to focus on the shapes providing the best data fit. 3.2

Phase I example

A major objective in Phase I studies is to find the maximum-tolerated dose (MTD), which is defined as the smallest dose that achieves a certain target probability (for example 30%) for observing a toxicity. This dose is then typically used as an upper bound for subsequent clinical development. The study participants enter the study in cohorts (for example of size 3 or 6) and the studies are highly adaptive: the dose recommendations for the next cohort of patients depend on the previous results. Neuenschwander et al. (2008) propose a method to design these studies based on dose-response modeling using logistic regression with a set of dose-recommendation rules. In what follows we will not  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

956

B. Bornkamp: Practical considerations for using functional uniform prior distributions

consider the dose-recommendation rules further, but will concentrate on investigating the adequacy of functional uniform priors for the dose-response model. Their dose-response model for the probability p(x) of observing a dose-limiting toxicity at dose x, is given by a logistic regression in log-dose p(x, θ) = 1/(1 + exp(−μ(x))), with μ(x) = α + exp(β ) log(x/x∗ ), where θ = (α, β ) and x∗ is a reference dose so that the parameter α can be interpreted as the odds of a dose-limiting toxicity at dose x∗ . The gradient of the model function p(x, θ) is given by   exp(μ(x)) exp(μ(x) + β ) log(x/x∗ ) ∂ , , (7) p(x, θ) = Jx (α, β ) = ∂θ (exp(μ(x)) + 1)2 (exp(μ(x)) + 1)2 with which one can calculate the functional uniform prior. To illustrate the ideas, we will consider the example study from Neuenschwander et al. (2008), which was a dose-escalation trial designed to characterize the safety profile of a cancer drug and to determine its MTD. The doses that were predetermined for this study were 1, 2.5, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, and 250 mg. The trial produced the results given in the first row of Table 3, i.e. no toxicities were observed for the doses 1, 2.5, 5, 10 mg but for dose 25 mg two out of two patients had dose-limiting toxicities. In addition to this scenario we also consider four other hypothetical scenarios given in the last four rows of Table 3. Note that in all these data situations the data are very sparse and in most situations an MLE does not exist (see Table 3). The functional uniform distribution concept will be used in two variations (i) the L2 distance with the potential doses that could be administered as points in X and a uniform discrete measure for ν(dx) and (ii) using the Jeffreys prior with metric induced by the binomial likelihood and as measure ν(dx) the doses observed so far. It thus based on the Fisher information matrix given by ∂

T ∂

5 p(xi , θ) p(xi , θ) ∂θ ∂θ , (8) ni p(xi , θ)(1 − p(xi , θ)) i=1

where xi are the doses observed so far, and ni the corresponding number of patients. In this case these priors differ in two aspects: (i) the prior according to Jeffreys rule only uses doses xi in ν(dx) used up to each interim (which will be different at each interim), while the first approach uses all potential doses from the beginning on and (ii) also the induced metric itself is different from the L2 metric, due to the denominator in (8), which will weight changes in the function p(x, θ) higher when p(x, θ) is close to 0 or 1. We will compare these approaches with a vague multivariate normal (MVN) prior distribution for this model proposed by Neuenschwander (personal communication) as an improvement over the vague default prior presented in Neuenschwander et al. (2008). This MVN prior uses prior means of −0.69 and 0 for α and β, with prior standard deviations of 2 and 1. The correlation is assumed to be 0 and x∗ = 50. This implies a 95% prior interval at 50 mg of [0.01, 0.96]. The rationale for using a functional uniform prior distribution is that it covers the underlying function shapes exhaustively without overly favoring particular shapes. When one employs it, one hence hopes for two properties of the posterior distribution: (i) The posterior is able to concentrate on those functional shapes that achieve a good data fit, without artificially favoring shapes with less good fit, (ii) subject to a good fit, one would hope that uncertainty in the posterior distribution is as large as possible. Item (i) can be assessed by measuring the log-likelihood values of the posterior draws. Item (ii) will be measured by the average length of central 90% credible intervals at all potential doses 1, 2.5, 5, . . . , 250. In Figures 4 and B1 one can observe the posterior distributions on functional scale for the different scenarios in Table 3. The vertical bars overlaid display quantiles from the posterior distribution when looking at each dose separately (not employing a logistic regression model), using an uninformative beta B(1/3, 1/3) prior distribution, see Kerman (2011).  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

0/3/, 0/4, 1/5, 1/4, 1/2 yes 1/3, 1/4, 0/5, 3/4, – yes

4 5

no no no

0/3, 0/4, 0/5, 0/4, 2/2 0/3, 0/4, 0/5, 0/4, 1/2 0/3, 0/4, 0/5, 0/4, –

CI

log-lik

Jeffreys prior CI

log-lik

MVN prior

−2.4 , −0.45 , −0.021 0.29 −2.4, −0.71, −0.11 0.25 −4.8, −2.9, −1.5 −2.6, −1.4, −0.77 0.52 −2.8, −1.3, −0.79 0.53 −3, −1.9, −1.1 −0.54, −0.004, −6.4e-09 0.63 −2.1, −0.6, −0.042 0.75 −0.72, −0.069, −0.00032 −4.6, −3.5, −2.9 0.56 −4.8, −3.5, −2.9 0.57 −4.6, −3.5, −2.9 −7.2, −6.1, −5.7 0.56 −7.6, −6.2, −5.8 0.57 −7.2, −6, −5.7

MLE exists log-like

1 2 3

Scenario 1, 2.5, 5, 10, 25 mg

FU (L2 ) prior

0.55 0.53

0.51 0.54 0.52

CI

Table 3 Behavior of the functional uniform prior, Jeffreys prior and a MVN prior in terms of 0.1, 0.5, 0.9-quantile of the log-likelihood (loglike) of the posterior draws and the average mean length of the posterior 0.9 credible intervals (CI) under five alternative data constellations. Scenario 1 is the situation observed in the actual trial.

Biometrical Journal 56 (2014) 6 957

www.biometrical-journal.com

958

B. Bornkamp: Practical considerations for using functional uniform prior distributions

Figure 4 Posterior distributions on functional scale for (i) functional uniform prior based on L2 (ii) Jeffreys prior and (iii) a multivariate normal prior. The dark gray shaded area displays the central 50% credible interval (0.25/0.75 quantile), the light gray shaded area displays the 90% credible interval 0.05/0.95 quantile). The black line is the posterior median. The bold vertical lines are marginal central 50% and 90% posterior intervals, when using B(1/3,1/3) prior distributions and only using data from the corresponding dose. It can be seen that the posterior corresponding to the functional uniform priors (L2 and Jeffreys) are in better agreement with vertical bars, indicating a better fit of these models to the data. Observing the further results in Figure B1 in Appendix B and in Table 3 one can observe that the functional uniform priors (L2 and Jeffreys) are quite similar over the whole dose-range for all of the scenarios. In some situations the L2 approach predicts the increase slightly later than the Jeffreys prior (see for example Figure 4 and the first two rows in Figure B1). The reason for that is that the Jeffreys prior encodes the information that the change in function happens in the interval [1, 25], as only the doses up to 25 mg are used in ν(dx). In Table 3 one can observe that the L2 functional uniform prior provides a slightly better fit in those scenarios than Jeffreys prior (while still having similarly wide credible intervals). The posterior under the multivariate normal prior is similar to the other two approaches in the dose range [1, 10] but differs in the upper dose-range as it penalizes toward a more shallow increase. This means that the estimate for the 25 mg dose is quite different in these situations and it also results in a slightly worse fit compared to the functional uniform L2 and the Jeffreys prior, sometimes even leading to shorter credible intervals. For Scenarios 4 and 5, where the MLE exists, all three prior distributions result in rather similar posterior distributions. In summary the functional uniform priors lead to rather large log-likelihood values and thus a good fit but also relatively wide posterior uncertainty intervals (with slight advantages for the L2 approach that included all potential doses from the beginning on). So the posterior distribution under the functional uniform prior concentrates its mass on rather diverse functional shapes subject to the fact that they have large log-likelihood values, which is exactly as desired.

4 Conclusions This paper reviews functional uniform prior distributions, provides computer implementations for this approach in all major Bayesian modeling languages and discusses practical aspects when using these prior distributions, for example the choice of the metric. The main idea of functional uniform distribution is to obtain a distribution that uniformly covers the potential shapes that are associated with a particular nonlinear function in a specified metric. Intuitively it makes as few additional “assumptions” as possible, beyond the assumption of using a particular dose-response model.  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 56 (2014) 6

959

In situations, where no additional prior information is available at design stage this might be an attractive approach for inference and will also have attractive properties from a frequentist perspective. Simulations conducted in Bornkamp (2012), for example, suggest that the functional uniform priors have competitive frequentist properties compared to ML estimation, in terms of mean squared error for estimating the dose-response function, but also in terms of providing credible intervals that also lead to close to nominal frequentist coverage. However, also in situations when additional information is available functional uniform priors define a reasonable starting point, in the sense that they convey the information of “no preference” between the functional shapes, and additional information should be added on top of that. Combination of additional information on top of a functional uniform prior is discussed in Section 2.1.2. In Phase I but in particular in Phase II trials using prior information from historical trials (after an appropriate downweighting) can be very relevant. Acknowledgment The author thanks Editor and Reviewers for comments that improved the presentation of the paper. The author would also like to thank Roland Fisch and Beat Neuenschwander, both provided comments and input on parts of this manuscript.

Conflict of interest The author has declared no conflict of interest.

Appendix A: Derivation of default approximations A.1

Selection of metric

As it is desired to work with model shapes, we should use a metric where the scaling θ1 and location θ0 do not affect the metric. One way of doing this is to use d (θ, θ  ) =



(μ(x, ˜ θ) − μ(x, ˜ θ  ))2 ,

x∈X

where μ(x, ˜ θ) =

μ(x,θ)− |X1 | ||μ(x,θ)− |X1 |

 x∈X x∈X

μ(x,θ) μ(x,θ)||

and a discrete, finite set X is used for ν(dx). The vector ob-

tained by evaluating μ(x, ˜ θ) lies on the unit sphere. The function is standardized so that at X hence 1  2 μ(x, ˜ θ) = 0 and μ ˜ (x, θ) = 1, and the parameters θ0 and θ1 cancel out in this metric. x∈X x∈X |X | As discussed in Section 2.2 one could choose the measure ν(dx) and X based on the doses that are used or are available for being used in the trial. This of course will be different between trials, so we will choose here a dose-design that is representative for commonly used designs in Phase II. In many cases the smallest dose is in the range of 1/5 to 1/30 of the highest dose. Here we will use 0.05 as smallest dose. Subsequent dose levels are then often chosen as multiples of the previous dose, but most commonly these multiples get smaller for larger doses. Here we chose the sequence 2, 5/3, 3/2, 4/3, 4/3, 4/3 . . . (see Penel and Kramar, 2012, for more details on this dosing sequence that is popular also in Phase I trials). The resulting doses (rounded) are then 0, 0.05, 0.1, 0.17, 0.25, 0.33, 0.44, 0.59, 0.79, 1.00. These are nine doses and placebo, which is slightly more than commonly used in Phase II, but sufficiently fine to capture the main features dose-response functions of interest here. A.2

Approximation of densities

The approximations in terms of parametric distributions were derived by integrating the term (5) over the bounds specified in Table 1, where the gradient of the function μ(x, ˜ θ) was used. The resulting  C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

960

B. Bornkamp: Practical considerations for using functional uniform prior distributions

normalizing constant was multiplied with (5) to obtain a probability density, which was evaluated on a fine grid. The parameters of the approximating parametric distributions were derived by numerically minimizing the L1 distance at the grid points between the true density and the approximating parametric density (see the code accompanying this paper for the exact details).

Additional plots B:

Figure B1 Posterior distributions for Scenario 2, 3, 4, 5 (from top to bottom) from Table 3: Posterior on functional scale for (i) functional uniform prior based on L2 , (ii) Jeffreys prior, and (iii) a multivariate normal prior. The dark gray shaded area displays the central 50% credible interval (0.25/0.75 quantile), the light gray shaded area displays the 90% credible interval (0.05/0.95 quantile). The black line is the posterior median. The bold vertical lines are marginal central 50% and 90% posterior intervals, when using B(1/3,1/3) prior distributions.

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 56 (2014) 6

961

Figure B2 Posterior distributions using the same prior distributoins as in Section 3.1, but with bounds for θ3 increased to [0, 200] and for θ4 to [0.5, 30]. (i) corresponds to the uniform distribution on parameter scale and (ii) to the functional uniform prior.

References Bates, D. M. and Watts, D. G. (1988). Nonlinear Regression Analysis and Applications. John Wiley and sons, New York, NY. ¨ Berry, S. M., Carlin, B. P., Lee, J. J. and Muller, P. (2011). Bayesian Adaptive Methods for Clinical Trials. CRC Press, Boca Raton, FL. Bornkamp, B. (2012). Functional uniform priors for nonlinear modelling. Biometrics 68, 893–901. ¨ Bornkamp, B., Bretz, F., Dmitrienko, A., Enas, G., Gaydos, B., Hsu, C.-H., Konig, F., Krams, M., Liu, Q., Neuenschwander, B., Parke, T., Pinheiro, J. C., Roy, A., Sax, R. and Shen, F. (2007). Innovative approaches for designing and analyzing adaptive dose-ranging trials. Journal of Biopharmaceutical Statistics 17, 965–995. Bornkamp, B., Pinheiro, J., and Bretz, F. (2013). DoseFinding: Planning and Analyzing Dose Finding experiments. R package version 0.9-6. Bretz, F., Pinheiro, J. C., and Branson, M. (2005). Combining multiple comparisons and modeling techniques in dose-response studies. Biometrics 61, 738–748. Dembski, W. A. (1990). Uniform probability. Journal of Theoretical Probability 3, 611–626. Dragalin, V., Hsuan, F., and Padmanabhan, S. K. (2007). Adaptive designs for dose-finding studies based on the sigmoid emax model. Journal of Biopharmaceutical Statistics 17, 1051–1070. Eaves, D. M. (1983). On Bayesian nonlinear regression with an enzyme example. Biometrika 70, 373–379. Ghosal, S., Ghosh, J. K., and Ramamoorthi, R. V. (1997). Non-informative priors via sieves and packing numbers. In: Panchapakesan, S. and Balakrishnan, N. (Eds.), Advances in Statistical Decision Theory and Applications, Birkh¨auser, Boston, MA, pp. 119–132. Hartigan, J. A. (1983). Bayes Theory (1st edn). Springer, New York, Berlin, Heidelberg, Tokyo. Jennrich, R. I. (1969). Asymptotic properties of nonlinear least squares estimation. Annals of Mathematical Statistics 40, 633–643. Jones, B., Layton, G., Richardson, H. and Thomas, N. (2011). Model-based Bayesian adaptive dose-finding designs for a phase II trial. Statistics in Biopharmaceutical Research 3, 276–287. Kerman, J. (2011). Neutral noninformative and informative conjugate beta and gamma prior distributions. Electronic Journal of Statistics 5, 1450–1470. Neuenschwander, B., Branson, M. and Gsponer, T. (2008). Critical aspects of the Bayesian approach to phase I cancer trials. Statistics in Medicine 27, 2420–2439. Penel, N. and Kramar, A. (2012). What does a modified-Fibonacci dose-escalation actually correspond to? BMC Medical Research Methodology 12, 103, 1–5. Pinheiro, J. C., Bornkamp, B., and Bretz, F. (2006). Design and analysis of dose finding studies combining multiple comparisons and modeling procedures. Journal of Biopharmaceutical Statistics 16, 639–656. Seber, G. A. F. and Wild, C. J. (1989). Nonlinear Regression. John Wiley and Sons, Hoboken, NJ. Stan Development Team (2013). Stan: A C++ library for probability and sampling. Version 1.3.

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

962

B. Bornkamp: Practical considerations for using functional uniform prior distributions

Sun, D. and Berger, J. O. (1998). Reference priors with partial information. Biometrika 85, 55–71. Tan, H., Gruben, D., French, J. and Thomas, N. (2011). A case study of model-based Bayesian dose response estimation. Statistics in Medicine 30, 2622–2633. Thomas, N. (2006). Hypothesis testing and Bayesian estimation using a sigmoid Emax model applied to sparse dose designs. Journal of Biopharmaceutical Statistics 16, 657–677. Ye, K. Y. and Berger, J. O. (1991). Noninformative priors for inference in exponential regression models. Biometrika 78, 645–656. Zellner, A. (1988). Optimal information processing and Bayes’s theorem. The American Statistician 42, 278–280.

 C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Practical considerations for using functional uniform prior distributions for dose-response estimation in clinical trials.

Estimating nonlinear dose-response relationships in the context of pharmaceutical clinical trials is often a challenging problem. The data in these tr...
912KB Sizes 0 Downloads 3 Views