STATISTICS IN MEDICINE, VOL. 11,37-53 (1992)

ON SOME APPLICATIONS OF BAYESIAN METHODS IN CANCER CLINICAL TRIALS JOEL B. GREENHOUSE Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A.

SUMMARY The NCCTG randomized controlled clinical trial for the treatment of avanced colorectal carcinoma is a wonderful case study of the dynamic interplay between scientific learning and statistical inference. Ethical concerns for minimizing the number of patients assigned to an inferior treatment and interest in identifying subsets of patients for whom a treatment is most likely efficaciouspose challenging problems for the practice of statistics. In the first part of this paper, I comment on the applications of Bayesian methods to these problems in the NCCTG trial as presented by Freedman and Spieglehalter and Dixon and Simon, respectively. In the second part of this paper, I discuss and illustrate a Bayesian approach to model sensitivity analysis with a particular focus on model specification and criticism. The Bayesian approach provides a formal methodology to assess the sensitivity of inferences to the inputs into an analysis so that it is possible to investigate the consequences of the specification of the model. I apply these methods to the specification and criticism of a class of survival models for the analysis of survival times in the NCCTG trial.

INTRODUCTION At a recent meeting of the Society of Clinical Trials, there was a session on data monitoring committes and how they reach consensus about stopping a trial early. A statistician at that session confided to the group that during the deliberations of a data monitoring committee everyone is a Bayesian (in the sense that they informally use as much information about the problem as possible to reach a consensus) but that he himself 'would never publish a Bayesian analysis'. Fortunately, as illustrated in the papers by Freedman and Spiegelhalter' and Dixon and Simon.f more and more statisticians are not only willing to indulge in the Bayesian omelette but are also willing to crack the Bayesian egg. (Historical note: It was L. J. Savage" who originally characterized the aim of fiducial probability as what he termed 'making the Bayesian omelette without breaking the Bayesian eggs") The papers by Freedman and Spiegelhalter and Dixon and Simon, and the introductory remarks by Wieand and Cha 4 concern the analysis of the North Central Cancer Treatment Group (NCCTG) clinical trial of the relative efficacies of six different chemotherapy regimens in the treatment of advanced colorectal carcinoma. I have few critical comments about the papers themselves and will focus my remarks on aspects of the Bayesian approach presented in the papers that address challenging problems in the analysis of randomized controlled clinical trials. Although Jerry Cornfield 5 - 7 addressed many of the issues discussed in these papers over 21 years ago, these presentations make a substantial contribution to the growing literature of applications of Bayesian methods to important practical problems.

0277-6715/92/010037-17$08.50 © 1992 by John Wiley & Sons, Ltd.

Received September 1990 Revised June 1991

38

1. B. GREENHOUSE

I divide my remarks into two major parts. The first part discusses the approach to interim analysis and early stopping addressed in the Freedman and Spiegelhalter paper and the problem of subset analysis addressed in the Dixon and Simon paper. In the second part, I review some Bayesian approaches to model specification and criticism and illustrate some of these methods with data from the NCCTG trial. Concerns about model specification and criticism are not uniquely Bayesian. Rather, the Bayesian approach provides a formalism for doing model sensitivity analysis, that is, to evaluate the extent and nature of the dependence of the outputs of a model on the inputs. Finally, since much of the work on Bayesian methods has not appeared in the standard biostatisticalliterature, I will be fairly liberal about providing references to writings on the applications of Bayesian methods in an attempt to make these works more readily available.

INTERIM ANALYSIS AND EARLY STOPPING A concern of the NCCTG trial was to minimize the number of patients assigned to ineffective treatments and to stop accrual to any of the experimental regimens assessed as no more effective than the standard treatment, 5-FU, alone. Wieand and Cha 4 describe the two-stage design used to monitor the trial and the decision rules that specified when to stop enrollment to any of the treatment regimens. The trial was in fact stopped at the first-stage after enrollment of 420 patients and, according to Wieand and Cha, '... results from other concurrent trials convinced us, from both scientific and ethical perspectives, of the impropriety of continuing accrual of patients to 5-FU alone'. The purpose of the paper by Freedman and Spiegelhalter ' is to illustrate the application of Bayesian methods in monitoring trials such as the NCCTG trial. They also indicate how the Bayesian approach provides a formal methodology for combining data from an ongoing trial with information from other similar trials that might bear on a decision regarding early termination, and how the Bayesian approach to interim analysis is unaffected by the design of the trial or the stopping rule used in obtaining the data (see also Comfield:"), In effect, they demonstrate how a data monitoring committee could use a Bayesian approach to evaluate systematically the impact of different sources of information along with the data from the ongoing trial to reach a consensus regarding stopping the trial. The basis for the ethical imperative to monitor a trial is concern with minimizing the number of patients exposed to an inferior or harmful treatment. Freedman and Spiegelhalter operationalize this ethical concern by defining a range of treatment differences that would not lead to a clear clinical decision as to the preferred treatment, what they call the 'range of clinical equivalence'. This idea is similar to what Baruch Freedman" calls 'clinical equipoise', the state of genuine uncertainty about which of two treatments is superior. According to Freedman, it is ethical for a trial to continue as long as the accumulating data does not disturb the state of equipoise. Freedman and Spiegelhalter provide a Bayesian framework for evaluating whether the state of clinical equipoise is disturbed at any point in time during the course of a trial based on evaluation of the posterior probability that the treatment effect (for example, the hazard ratio) exceeds the upper limit of the range of equivalence or falls below the lower limit of the range of equivalence (see also Kass and Greenhouse"). Thus, one could monitor a trial continuously since, with the use of Bayesian methods, inferences or decisions about the treatment effectsbased on interim analysis are unaffected by previous decisions regarding the continuation of the trial. In practice, one might question the general feasibility of determining a range of equivalence, but Freedman and Spiegelhalter!" and others 1 1. 12 have demonstrated successfully how one can use interviews with knowledgeable clinicians (or experts) to elicit such a range.

BAYESIAN METHODS IN CANCER CLINICAL TRIALS

39

An interesting component of the Bayesian model considered by Freedman and Spiegelhalter is the use of the mixture prior which assigns a prior point mass at the null hypothesis value of no difference in the hazard rates between two treatment groups, H0:() = 0 (where () = In A). In the hypothesis testing setting, the argument for using this prior is that if one were to use a continuous prior density, such a prior would give H 0 prior probability of zero and hence posterior probability of zero. Therefore from the Bayesian point of view, as articulated by Cornfield," 'If one is concerned about the high probability of rejecting H 0 it must be because some possibility of its truth is being entertained. , . The appropriate way to entertain the possibility for the truth of H o is to assign a non-zero probability to H o.' Alternatively, we may think of the point null hypothesis as an approximation to an interval null hypothesis, say, H 0: () E ( - s, e), which more realistically allows for the possibility that the new treatment might in fact do harm, perhaps because of unexpected toxicities. We can then think of the prior density function specified in Freedman and Spiegelhalter's equation (I) as the limiting distribution of a two-component mixture distribution, where the first component of the mixture is, say, a normal distribution centred at () = 0 with a relatively smaller variance than the second component. Freedman and Spiegelhalter, in fact, consider such a family of prior distributions in evaluating the sensitivity of their recommendation to stop the trial to the specification of the prior and, not surprisingly, they obtain similar posterior probabilities for () :::; O. To me, the latter family of prior distributions seems more natural in this setting than a prior distribution with a point mass on a single parameter value that indicates no treatment difference. The relative simplicity, however, of using the family of priors given in their equation (1) and the results of the sensitivity analysis seem to support Freedman and Spiegelhalter's use of this model (see also Berger and Delampady'P).

SUBSET ANALYSIS The problem of multiple comparisons, multiplicity, or subset analysis continues as a source of great concern and interest to practising statisticians, Bayesian and classical alike (see for example, Cornfield," Meier.l" Tukey.P Buyse!"), Generally, subset analysis refers to multiple looks at a data set for various reasons, such as, in the analysis of a trial with multiple treatment groups when one must compare each treatment to the control or standard (for example, 5-FU in the NCCTG trial), when there is interest in measuring many different clinical endpoints (such as, survival, morbidity, and rate or degree of toxic side effects), or when the concern is with the identification of subgroups of patients who respond differently to treatment. As Cornfield 17 wrote, 'Just as the Sphinx winks if you look at it too long, so, if you perform enough significance tests you are sure to find significance, even when none exists. But repeated examination of the results of a modern clinical trial is both necessary and desirable', Some of the most challenging and interesting controversies in the history of clinical trials have centred on subset analysis, such as the VA Cooperative Study of Medical versus Surgical Treatment for Stable Angina 1 8 which identified a significant difference in survival in favour of surgery among a subgroup of patients with left main disease, or the University Group Diabetes Program (UGDP) which discovered an unexpectedly high cardiovascular mortality rate in the Tolbutamide treatment group.l" Although both classical and Bayesian statisticians appreciate the need to make adjustments for multiple looks at the data, the two approach the problem differently. Recently Bill Dulvlouchel!? (p. 130) succinctly characterized the differences between the frequentist and the Bayesian approach to the multiple comparison problem. He writes, 'Both frequentists and Bayesians agree that, when investigating a great many comparisons, it is necessary to adjust for the "surprise value" of any "significant" differences

40

1. B. GREENHOUSE

which show up. But the two schools prefer very different strategies for making such an adjustment. In the Bonferroni method, for example, the severity of the frequentist's adjustment is based solely on how many comparisons are being made, with no reference to how surprising the value of y [the data vector] really was. Frequentists expect to be surprised when making a great many comparisons, and that is all they care about. In contrast, the Bayesian definition of "surprise" is directly tied to how much the data disagree with the prior distribution.... Besides using radically different criteria for when an adjustment for surprise value is necessary, the two types of statisticians make the actual adjustment very differently as well. The usual frequentist adjustment consists solely of increasing the stated uncertainty of the estimate, but leaving the estimate itself unchanged at y. To the contrary, the Bayesian's use of an informative prior will actually decrease the uncertainty somewhat (compared to that in the sampling distribution ofy), but the main effect of an adjustment will be to pull in the estimate itseljfrom y toward the set of a priori most likely values of 8.' Dixon and Simon/ present an application of Bayesian subset analysis to the problem of identifying subgroups of patients in the NCCTG trial with differential response rates to treatment. They base their approach on the multi-state hierarchical model described by Lindley and Smith.F? Dixon and Simon assume exchangeability among the distributions of the parameters at the second-stage of the model and therefore do not focus attention a priori on anyone particular subset. Thus, as they acknowledge, this method is best recommended for 'application to screening of a set of subsets'. It is the key assumption of exchangeability in the hierarchical model that leads to the shrinking of the posterior distributions of the interaction terms towards zero. Furthermore, it is important to note that the amount of shrinkage greatly depends on the specification of the components of the hierarchical model in terms of distributional assumptionsP'P! and in terms of the estimation of the relevant components of variance for the parameters of interest (for example, y) which also may be affected by the prior on the variance-" (for example, p(e 2 ) ). The specification of the model and the model assumptions play a central role in both the classical and Bayesian approaches, yet these are inputs into an analysis that often for mathematical convenience are taken as known. As indicated above, the impact of the model and the model assumptions can be quite important. In the next section, I discuss some aspects of the Bayesian approach that provide a methodology for assessing the effects of perturbations of the model on subsequent inferences. MODEL SPECIFICATION AND CRITICISM In discussing the dynamic interplay between scientific learning and statistical inference, Box 2 3 describes an iterative process where, 'Known facts (data) suggest a tentative theory or model, implicit or explicit, which in turn suggests a particular examination and analysis of the data and/or the need to acquire further data; analysis may then suggest a modified model that may require further practical illumination and so on ... New knowledge thus evolves by an interplay between dual processes of induction and deduction in which the model is not fixed but is continually developing. The statistician's role is to assist this evolution.' In the remainder of this paper, I focus may discussion on the statistician's role in this process through the specification and criticism of statistical models in scientific investigations, such as the randomized controlled clinical trial. Note that throughout this discussion I tacitly assume an essential and close

BAYESIAN METHODS IN CANCER CLINICAL TRIALS

41

collaborative relationship between the statistician and the subject-matter specialist. My perspective follows Box and is one that some have called the robust Bayesian view on modelling. It is a view, however, that I believe is shared by anyone who has serious interest in learning systematically from data and in advancing scientific knowledge. I summarize this view in two parts: 1. Model specification: A model must utilize as much of the available information in a problem as possible, usually based on substantive knowledge about the problem and preliminary data analysis; and 2. Model criticism: Inferences based on a statistical model must be investigated with respect to the sensitivity and the robustness of the inferences to the specification of the model. Although concerns about model specification and criticism are not limited to Bayesians, the Bayesian approach, as I hope to show, does provide a formalism by which one can investigate the consequences of varying the inputs to the model. Many authors have expounded eloquently on the Bayesian approach to model specification and criticism (for example, Berger.i" Box,23 Hill 25 and Smith-"). Therefore, the reader should view what follows as a review in which I attempt to characterize some aspects of this approach with an emphasis on applications and illustrations with data from the NCCTG trial. (For another example of a Bayesian approach to model specification for survival data in cancer patients and additional discussion of Bayesian modelling, see Chen et al.2 7) From the Bayesian point of view, the model consists of all aspects of the available information about the substantive problem that one can express in the joint density obtained by combining the likelihood and the prior p(y, 01 M) = p(yl 0, M) p(OI M)

(1)

where y is the data vector, () the parameter vector, and conditioning on M is understood to indicate the dependence on the assumptions in the model specification. A feature of the Bayesian approach is that the inputs into the model are explicitly laid out whether in terms of the specification of the sampling distribution of the data, p(y 1 0, M), or in terms of the auxilliary information brought to the problem concerning the parameter 0, and expressed in the prior distribution p( 0 1 M ). The model as expressed in equation (1) can also be factored as p(y, 01 M) = p(OIY, M) p(YI M).

(2)

The first factor on the right is the posterior distribution of 0 given the data where p( 0 I Y, M) oc p(y 10,M) p( 01 M)

(3)

and the second factor on the right in (2) is the predictive density (which is the marginal density) associated with the data given the model, where p(YIM)= fp(Y10,M)P(0IM)dO.

(4)

From the Bayesian point of view, the basis for inference is the posterior distribution (3) conditional on the model. As summarized by Berger'" (p. 64), a basic assumption of the Bayesian approach is a belief that 'In any statistical investigation, one will ultimately be faced with making reports, inferences, or decisions that involve uncertainties. Of interest is the information available about these uncertainties after seeing the data, and the only trustworthy and sensible measures of this information are Bayesian posterior measures.' Since the assumptions and specifications ofthe

42

1. B. GREENHOUSE

model are a source of uncertainty, one can investigate the implications of the model specification using Bayesian methods.

MODEL SPECIFICAnON A criticism of Bayesian methods has centred on the issue of subjectivity in the choice of the prior distribution in (l). It is, however, in fact a characteristic of the robust Bayesian approach that it provides a formalism by which one can consider a range of inputs for p( elM) either by trying different reasonable priors or a family of priors to assess the sensitivity or robustness of the analysis to possible misspecifications of the prior. In other words, Bayesian prior sensitivity analysis may be used to achieve an understanding of the information provided by the data by introducing alternative priors and determining resulting inferences (see for example, Berger." Section 4.7.9; Kass and Greenhouse"), Both the papers by Freedman and Spiegelhalter and Dixon and Simon provide illustrations of the use of prior sensitivity analysis. Freedman and Spiegelhalter consider an extensive sensitivity analysis to illustrate the impact of various plausible choices of priors on their Bayesian stopping rule and argue, in my view correctly, that such formal analyses can be an important input to a data monitoring committee in their decision whether or not to stop a trial. They find that their recommendations for the NCCTG trial are sensitive to their prior specification, in particular, to the value of the point mass on e = O. Dixon and Simon assess the sensitivity of the results of their subset analysis to the specification of the prior on the hyperparameter at the third stage of their hierarchical model and find in this example that the effect is minor, a result that is often true for higher stage priors in hierarchical models.l? It is interesting to note that in many situations where there is a relatively large amount of sample information, such as in the NCCTG trial, the likelihood function becomes relatively concentrated in a region in which the prior distribution is relatively gentle and then the prior has little impact on the conclusions. This phenomenon, which in practice assures almost complete robustness to the prior specification, has been called 'stable estimation'P" In addition to the concern with prior sensitivity, an area of sensitivity analysis of interest to all practising statisticians is the robustnesss of p(y I e, M), that is, the specification of the model for the data. (See COX 3 1 and Lehman.F for a non-Baysian discussion.) Some argue that the model for the data has some theoretical justification or may have some external validity and therefore one treats it as if known with certainty. I agree with Box, however, that in practice, scientific learning is a dynamic iterative process in which the model develops continually as we acquire new knowledge and therefore we need methods for model specification and criticism. For ease of exposition in the remainder of this paper, in considering questions of model robustness, I refer to the 'model' as p(y I e, M), the likelihood function in (1). One approach in evaluation of the sensitivity of the model specification is to enlarge the model by embedding it into a larger family, that is, incorporating into the model a nuisance parameter, say f3, where certain values of f3 may correspond to special cases of interest, and then to perform a Bayesian analysis (for example, Chen et al.27). As we will see, a feature of the Bayesian approach that is particularly useful here is that inferences about a parameter of interest in the presence of other parameters is conceptually straightforward and is based on a well-defined entity, the marginal posterior distribution. For the NCCTG trial consider the problem of specifying a model for the survival time data. According to Wieand and Cha, inspection ofthe survival curves in their Figures 3-5 suggests that the assumption of proportional hazards as well as the assumption of a constant hazard rate may not be valid. Although Wieand and Cha and Freedman and Spiegelhalter analyse the data from

BAYESIAN METHODS IN CANCER CLINICAL TRIALS

43

Table I. Posterior modes (and posterior standard deviations) for the Weibull survival model for the 5-FU and 5-FU + High Dose Leucovorin treatment groups 5-FU

5-FU + High Leucovorin

0

0·0031 (0'0004)

0·0020 (0'0003)

P

1·061 (0'14)

1-123 (0'16)

the NCCTG trial using the proportional hazards model they could have enlarged this model as suggested by COX 3 3 by including a time dependent component to assess the adequacy of the assumption of proportional hazards. Or, if they were particularly interested in the possibility of constant hazard rates, they might have modelled survival times in each of the treatment groups usinga parametric family such as the Weibull distribution indexed by scale and shape parameters, () and /3, respectively, where /3 equal to one corresponds to a constant hazard rate model. The Weibull distribution is a rich family of survival distributions that includes increasing hazard rate models, /3 > 1, decreasing hazard rate models, 0 < /3 < 1, and, as noted, the constant hazard rate model, /3 = 1. (Alternatively, see Kalbfleisch.") In their paper, Dixon and Simon assume a normal distribution for the parameter vector (estimated from the proportional hazards model) at the first stage of their hierarchical model, but they could have used a larger family of densities, for example, a class of heavy-tailed densities (indexed, say, by a parameter /3) of which the normal density is a special case.23.25 As an illustration of this idea of enlarging a model and to address the appropriateness of the assumption of a constant hazard rate model for the survival data for the 5-FU and the 5-FU + High Dose Leucovorin treatment groups from the NCCTG trial, I fit a Weibull distribution to each group. Using flat priors for () and /3, Table I presents the posterior modes and posterior standard deviations for ((), /3) for each treatment group. To assess the plausibility of a constant hazard rate model, that is, /3 = 1, Figures 1 and 2 present the marginal posterior distribution for /3 for each treatment group (dotted lines). The 95 per cent HPD (highest posterior density) interval for /3 for the 5-FU group is appoximately (0'78,1'34) and for the 5-FU + High Dose Leucovorin group is approximately (0'80, 1'44). In both groups it is clear that there is information in the data about the parameter /3 and that the general Weibull model does not provide much improvement over the exponential model. Further, the relative sharpness of the posterior distributions for /3 suggests that the phenomenon of stable estimation applies here, and the quadratic shape of the posterior distributions suggests that one could support inferences based on the likelihood function and on normal theory in appoximating various posterior distributions. Specifically, the solid lines in Figures 1 and 2 are the respective profile likelihood functions for /3, that is SUPe L( (), /3) for each group. As is evident from the figures, the profile likelihood functions in these examples are excellent approximations to the marginal posterior distributions of /3. I note that, as discussed by Freedman and Spiegelhalter, there is good a priori information available about the expected survival time in each treatment group which one could incorporate into this analysis, although information about the nuisance parameter would more likely be diffuse. I use flat priors here, however, to focus attention primarily on the methodology and to provide a benchmark by which to judge the effect of additional prior input. A discrete version of the model enlargement approach for the problem of choosing among competing models is based on assessing the posterior odds for the models given the data. This

44

J. B. GREENHOUSE

P({3 I y)

, , i i

:

i i i i i i

\

\ \ i

\,

I

;

\

f i

; \

\

:

;

\ \

\,

! !

\

,

\

i

i

\

\

j

\

i

0.5

1.0

1.5

{3 Figure 1. Marginal posterior distribution (dotted line) and profile likelihood function (solid line) for Pfor the 5-FU group

approach applies either when the models are nested or unnested. Suppose that M o , M i , . . . , M k are models under consideration where under M i , the model is described by equations (2) and (3). If p(MJ denotes the prior probability of the model M i , then, conditional on at least one of the M, being the true model, the posterior probability of the model M, follows from an application of Bayes theorem, yielding

p(

M i

I )Y-

k

p(y I M i ) p(M i )

Lj=o p(y I M j) p(M j)

where p(y I MJ is the predicitve density given in (4). Specifically, one could make a comparison

45

BAYESIAN METHODS IN CANCER CLINICAL TRIALS

p({J I y)

,.---------------------------,

0.8

0.6

1.0

1.2

1.4

1.6

1.8

Figure 2. Marginal posterior distribution (dotted line) and profile likelihood function (solid line) for 5-FU + High Dose Leucovorin group

P for

the

between, say, two models, M 0 and M 1, based on the ratio of the posterior to prior odds, called the Bayes factor, B

I

= p(M o I y) p(M o) = p(y IM o) p(M 1 IY)

p(Md

p(yjMd

(5)

which has several interpretations, including from (4), the ratio of integrated or weighted likelihoods, and the ratio of the predictive distributions of the data under each model. Thus, one can use the Bayes factor to assess the relative plausibility of models. Note when M 0 and M 1

46

J. B. GREENHOUSE

denote simple hypotheses, the Bayes factor is just the likelihood ratio of M 0 against M 1 and is usually interpreted as the odds in favour of M0 against M r- Cornfield" discusses the use of Bayes factors for evaluating the results from a clinical trial, although he calls (5) 'the relative betting odds'. Other applications appear in Spiegelhalter and Smith.I" Raftery and Akman.P" Kass and Greenhouse," and Kass and Vaidyanathan.l" The above model selection procedure is essentially a Bayesian hypothesis test of H 0:M0 against H 1: M i - Kass and Greenhouse? point out that from the Bayesian point of view, hypothesis testing differs from estimation. Unlike many posterior interval probabilities, the posterior odds on a sharp null hypothesis is generally very sensitive to the choice of prior densities under the null and alternative hypotheses. This is because the Bayes factor, as noted above, is the ratio of two marginal densities associated with the data under the hypotheses, (1) H 0: M0 in the numerator, and (2) H 1: M 1 in the denominator. The observation y is often in the tail of each density, and the ratio is thus likely to vary substantially when the priors change. Kass and Vaidyanathan suggest an approximation to the Bayes factor, Knormin, which is generally not sensitive to small changes in the prior. They show that for large samples sizes and when the parameters are what they call null-orthogonal, Knormin is the minimum value of the Bayes factor over a broad class of normal prior distributions (Kass and Vaidyanathan.i" Theorem 3). (One can think of the likelihood ratio in the one parameter case as the minimum value of the Bayes factor over all priors.) To illustrate, consider again the question of evaluating an exponential survival model versus a general Weibull survival model for the data in the two treatment groups in the NCCTG trial. This question is equivalent to testing H o :f3 = I(M o ) versus HI: 13 ¥- 1(M 1)' The value of (Knormin) -1 is 1·46 for the 5-FU group and 0·97 for the 5-FU + High Dose Leucovorin group. Taking even prior odds for H 0 versus H l ' (Knormin)-1 is an approximation to the odds against H o. Using Jeffreys' rule of thumb of 10: 1 as the rough cutoff between 'substantial' and 'strong' evidence against the null hypothesis (Jeffreys.P? Appendix B), it is clear that the evidence against the exponential model versus the Weibull model is on the order of even odds against H 0 and is not substantial. MODEL CRITICISM After specifying the model for the data and checking whether the postulated model accords with the data, one needs to explore the extent and nature of the dependence ofthe outputs of the model on the inputs. As part of model criticism, the Bayesian analysis would now centre on assessing the sensitivity of the parameter of interest, say e, on the model by investigating

p(ely)=

f

p(e 1f3, y ) p(f3 IY) d f3

(6)

which we can think of as a weighted average of the distribution of e, given the model described by 13, the weight being the density of 13 as indicated by the data. That is, p(f3l y) serves as a weight function acting on the various conditional posterior distributions p( eI13, y). As pointed out by Box and Tiao,40 the elements in (6) provide a systematic basis both for the critical evaluation of assumptions and for inference. Thus, 1. the sensitivity of inferences about e to changes in 13 is reflected by p( eI13, y) considered as a function of 13; 2. the analysis of p(f3l y) indicates the plausibility of several models and shows how the a priori uncertainty about the model, represented by p(f3), is transformed by the data; and 3. p( ely) represents the overall marginal information on so that inferences about are robust in the sense that they are based on a wider class of models.

e,

e

BAYESIAN METHODS IN CANCER CLINICAL TRIALS

47

p(/31 y)

0.5

1,0

1.5

{3 o

:3c:i ()

0

M

8

c:i

o

~ o

c:i

0,5

1.0

1.5

{3 Figure 3. Marginal posterior distribution for fJ and contours of the joint posterior distribution of (0, fJ) for the 5·FU group. Each contour represents a drop of 1 in the logarithm of the posterior distribution

Box 2 3 suggests a particularly informative graphical display for such an investigation that presents the contours of the joint posterior distribution of (e, P), together with the marginal posterior distribution of p, The purpose of such a display is to investigate how the conditional distribution p( I P, y) in (6) changes with p. Box and Tia0 3 5 (Section 1.6.2) discuss various implications of such an analysis. For example, if p(PI y) were relatively sharp with most of its mass concentrated over a small region about its mode, then inferences about e are sensitive to P over a wide range. It is also clear, however, that the posterior distribution p(PI y) contains so much information about P to rule out values of P not close to its mode. Or if p(PI y) were flat, indicating there was little information about Pfrom prior knowledge and from the sample, this sensitivity analysis would suggest that in addition to reporting the marginal distribution p( ely), it would also be useful to indicate how p( IP, y) changes over the range in which p(PI y) is appreciable (see also Smith?"), To illustrate such an analysis, Figures 3 and 4 present for the Weibull survival model for the 5-FU and the 5-FU + High Dose Leucovorin groups, respectively, the contours for the joint

e

e

48

1. B. GREENHOUSE

p({3 I y)

0.6

0.8

1.0

1.2

1.4

1.6

1.8

1.2

1.4

1.6

1.8

{3

§ ci

(J ~

8 ci

o

8ci 0.6

0.8

1.0

{3 Figure 4. Marginal posterior distribution for fJ and contours of the joint posterior distribution of (0, fJ) for the 5-FU + High Dose Leucovorin group. Each Contour represents a drop of 1 in the logarithm of the posterior distribution

posterior distribution of (8, {3) along with the marginal posterior distribution of {3 using flat priors on 8 and {3. Notice in both displays that there is little change in the location of the conditional distribution p( 81 {3, y) which changes in {3 from 0 to more plausible values. This point also is evident in Figures 5 and 6, which for the 5-FU and the 5-FU + High Dose Leucovorin groups, respectively, display the conditional distribution p( 81 {3, y) at three different values of {3, at its mode and at the mode ± 1 standard deviation. Thus, it appears that inferences about 8 are not sensitive to changes in {3 in either treatment group within the class of Weibull survival models. Another aspect of model criticism focuses on Bayesian diagnostic methods for identifying unusual or influential observations. One approach measures the effectof an 'unusual' observation based on calculating the Kullback-Leibler (KL) distance with respect to the predictive distribution of a future observation given the data to the predictive distribution of a future observation

49

BAYESIAN METHODS IN CANCER CLINICAL TRIALS

P(81.B,Y)

,----------------------, ,-, I

I

I

\

, I

\ \

I

, :/ ,:, ! :1 ,,:

, \

,

I

,

»: ~...

I

......

,

,

\ ~ \"

,

I

\~/fJ-1.20 ':'

,::/

\"

,,' /

fJ - 1.06

'V,V /13- 0.92 ,

~

;, .~

:' j

,I

1

• ;, ~

~

i:

#

il :",

I,

I,

if :,:, t,

,

I,"

I \ \

it

,,

:,I", II

\~

I:.

,

!I

:

l: f'

:

,:

v.

I',

"'.

"

, I

\

\.\

\

/,' .:! ,I

",

" \.', , ... \

\

::','

-, ~·~·:.:·.;.:-.::·~.14...._

__.....,1.:..::.:......

0.002

"

" ......

." , ,," ,

0.003

0.004

0.005

e Figure 5, Conditional distribution of p(OI (J, y) for {J at its mode and at the mode± 1 standard deviation for the 5-FU group

given the data with the jth observation deleted (see Johnson and Geisser'"), A single 'unusual' observation should yield a KL distance which stands out from the rest. We can consider the KL distance an overall measure of the influence of an observation for predicting a new observation from the data. It reflects how the predictive distributions differ with and without the 'unusual' observation. I will not illustrate this methodology with data from the NCCTG trial since Kass et al.4 2 discuss a similar approach to Bayesian analysis of influential observations (although based on changes in posterior expectations rather than on changes in predictive distributions) and they illustrate applications of the methodology to an exponential survival model with covariates for a survival analysis of patients with leukaemia.

50

1. B. GREENHOUSE

p(8 I (3,y)

~----------------------,

,," ,, ,,, ,

\

,

,

\

I

\

I

, ,: I,' ,: ,

,'/ :'

\ ."',

.

'

I

-,

'

-,

,

','

...

~ .., \ ~ fJ- 1.28

\.

,:

',I

I.:·

'. \

fJ - 1.12

, '.~

,:"

~

i

fJ - 0.96

'.

\.~I .> y':,



'\

~,

;1

I

~I

I

, f, f: ?,

~,

~

•t,

\ \

I

:,l,' !,

I

t

..

n

i:

t,

if /,:

\ \ \ \ \ \

1,

\ \

t, "

! ,

\ \ \

,,

: I

:' ;'

I

,,

\

.: I

\

! ,

\

, ,

\

..: I :" .:

:

:

...:' ,I .... " ..' ... '

0.0015

,,

\ \

\

I

\ \

\

0.0020

0.0030

0.0025

,,

0.0035

0.0040

(J Figure 6. Condition distribution of p(O IP, y) for P at its mode and at the mode ± 1 standard deviation for the 5-FU + High Dose Leucovorin group

CONCLUDING REMARKS The NCCTG randomized controlled clinical trial for the treatment of advanced colorectal carcinoma is a wonderful case study of what Box calls the dynamic interplay between scientific learning and statistical inference. Clinical trials are extremely complex experiments on humans and as a result raise challenging problems for the practice of statistics. For example, because of the ethical concern for minimizing the number of patients assigned to an inferior treatment, the NCCTG implemented a monitoring procedure that resulted in early stopping of the study. The

BAYESIAN METHODS IN CANCER CLINICAL TRIALS

51

concern to learn as much as possible from the trial led the investigators to seek subgroups in which a treatment regimen was most likely efficacious. Careful data analysis revealed that the benefit attributed to the new treatment, 5-FU + High Dose Leucovorin, occurred mainly in a subgroup of patients with non-measurable disease. I note that Wieand and Cha originally reported these results based on good statistical practice, commonsense, and good science. The reanalyses of the trial by Freedman and Spiegel halter and Dixon and Simon, reached the same conclusions using formal analytic methods based on Bayesian posterior measures. It is not surprising that the Freedman and Spiegel halter analyses and the NCCTG analyses agree since there are a large number of observations in the NCCTG trial, and so, for the analyses discussed, the phenomenon of stable estimation would apply and inferences based on the likelihood function could be supported in approximating various posterior distributions. I have tried to highlight in this discussion how the Bayesian approach provides a conceptually useful framework for model sensitivity analysis, criticism and inference. A feature of the Bayesian approach in applications, as illustrated by Freedman and Spiegel halter and Dixon and Simon, is that it provides a formalism that allows a data analyst to be explicit about his or her assumptions, to investigate how the data bear on the multitude of analytic decisions that are made during the analysis of a substantive problem, and to utilize auxiliary information. In other words, the Bayesian approach provides a methodology to assess the sensitivity of inferences to the inputs into a model so that one can investigate the consequences of whatever goes into the model. I have illustrated some of these methods using the question of the specification of the survival model for the treatment groups in the NCCTG trial as a case study of the use of model enlargement, marginal and conditional posterior distributions for sensitivity analysis, and Bayes factors for model selection. It is interesting to note the limited application of Bayesian methods in general and to clinical trials in particular. I suspect this is likely due to (i) a reluctance to utilize formally prior or auxiliary information and (ii) the heavy computational burden required to conduct a fully Bayesian analysis. As we have seen in the papers by Freedman and Spiegel halter 1. 10 and Dixon and Simon? substantial advances in the applications of Bayesian methods have been made in both these areas. Recent developments in computational methods have facilitated Bayesian computations, and include Gauss-Hermite quadrature.t ' quasi Monte Carlo quadrature.r" sampling-based methods, such as Gibbs sampling.t" and approximate integration methods." An understanding of these computational methods has become an essential component for the toolkit of the practising statistician. At Carnegie Mellon University, for instance, students learn about methods for Bayesian computations in a course on statistical computing before they even begin a study of applied Bayesian methods. (I note in passing that there are currently several excellent textbooks available as source material for a course in applied Bayesian methods, including Box and Tia0 3 5 (though out of print), Berger.i" Lee,47 and Press,48 as well as many excellent articles, including the recent review article by Breslow.t") Twenty-one years ago, Jerry Cornfield? presented his now classic paper entitled 'The Bayesian outlook and its applications' at a special session of the Spring Biometrics Meetings in Iowa. Twenty-one years later, we are able to observe Cornfield's profound influence on the practice of clinical trials with respect to the careful design, execution and analysis of the NCCTG trial as presented by Wieand and Cha, and we are now beginning to observe the influence of his 'Bayesian outlook' in the discussion and implementation of methods for interim analysis and subset analysis as presented by Freedman and Spiegelhalter and by Dixon and Simon, respectively. My thanks to all the authors for their lucid and thoughtful discussions of real and challenging applications of statistical methods to the analysis of cancer clinical trials.

52

J. B. GREENHOUSE ACKNOWLEDGEMENTS

This paper is an expanded version of a discussion presented at the 1990 Biometric Society ENAR meetings, in a session entitled 'Bayesian analysis of cancer clinical trials'. I would like to thank Dr. Jay Herson who organized the session for inviting me to participate. I would also like to acknowledge my debt to Professors Constantine Gatsonis, Robert Kass, and Larry Wasserman for many invigorating and enlightening discussions on Bayesian methods and their applications. This work was supported in part by a grant from the National Institute of Mental Health, CRC Grant MH30915 and a grant from the National Cancer Institute, CA 54852. REFERENCES 1. Freedman, L. S. and Spiegelhalter, D. J. 'Applications of Bayesian statistics to decision making during a clinical trial', Statistics in Medicine, 11, 23-35 (1992). 2. Dixon, D. and Simon, R. 'Bayesian subset analysis in a colorectal cancer clinical trial', Statistics in Medicine, 11, 13-22 (1992). 3. Savage, L. J. 'Discussion of 'Fiducial Probability', by John Tukey', in Bulletin of the International Statistics Institute, 40(2), 925-927 (1963). 4. Wieand, S. and Cha, S. 'Description of the statistical aspects of a study for advanced colorectal cancer patients', Statistics in Medicine, 11, 5-11 (1992). 5. Cornfield, J. 'Sequential trials, sequential analysis and the likelihood principle', American Statistician, 20, 18-23 (1966). 6. Cornfield, J. 'A Bayesian analysis of some classical hypotheses - with applications to sequential clinical trials', Journal of the American Statistical Association, 61,577-594 (1966). 7. Cornfield, J. 'The Bayesian outlook and its applications', Biometrics, 25, 617-657 (1969). 8. Freedman, B. 'Equipoise and the ethics of clinical research', New England Journal of Medicine, 317, 141-145 (1987). 9. Kass, R. E. and Greenhouse, J. B. 'Comment: A Bayesian perspective on "Investigating therapies on potentially great benefit" ECMO" by James H. Ware', Statistical Science, 4,298-340 (1989). 10. Freedman, L. S. and Spiegelhalter, D. 1. 'The assessment of subjective opinion and its use in relation to stopping rules for clinical trials', The Statistician, 33, 153-160 (1983). 11. Freedman, L. S., Lowe, D. and Macaskill, P. 'Stopping rules for clinical trials', Biometrics, 40, 575-586 (1984). 12. Kadane, J. B. 'Progress toward a more ethical method for clinical trials', Journal of Medical Philosophy, 11, 385-404 (1986). 13. Berger, J. O. and Delampady, M. 'Testing precise hypotheses (with discussion)', Statistical Science, 3, 317-352 (1987). 14. Meier, P. 'Statistics and medical experimentation', Biometrics, 31, 511-529 (1975). 15. Tukey, J. 'Some thoughts on clinical trials, especially problems in multiplicity', Science, 198, 679-684 (1977). 16. Buyse, M. E. 'Analysis of clinical trials outcomes: Some comments on subgroup analysis', Controlled Clinical Trials, 10, 4S, 187S-194S (1989). 17. Cornfield, J. 'Recent methodological contributions to clinical trials', American Journal of Epidemiology, 104,408--421 (1976). 18. Takaro, T., Hultgren, H., Lipton, M., Detre, K. 'The VA cooperative randomized study of surgery for coronary arterial occlusive disease. II. Subgroup with significant left main lesions', Circulation, 54, Suppl 3:III-107-117 (1976). 19. DuMouchel, W. H. 'A Bayesian model and a graphical elicitation procedure for multiple comparisons', in Bernardo, 1. M., DeGroot, M. H., Lindley, D. V. and Smith, A. F. M. (eds), Bayesian Statistics 3, Oxford University Press, 1988. 20. Lindley, D. V. and Smith, A. F. M. 'Bayes estimates for the linear model (with discussion)', Journal ofthe Royal Statistical Society, Series B, 34, 1-41 (1972). 21. O'Hagan, A. 'Modelling with heavy tails', in Bernardo, J. M., DeGroot, M. H., Lindley, D. V. and Smith, A. F. M. (eds), (Bayesian Statistics 3, Oxford University Press, 1988. 22. Kass, R. E. and Steffey, D. 'Approximate Bayesian inference in conditionally independent hierarchical models (Parametric empirical Bayes models)', Journal of the American Statistical Association, 84, 717-726 (1989).

BAYESIAN METHODS IN CANCER CLINICAL TRIALS

53

23. Box, G. 'Sampling and Bayes' inference in scientific modelling and robustness (with discussion)', Journal of the Royal Statistical Society, Series A, 143, 383-430 (1980). 24. Berger, 1. 'The robust Bayesian viewpoint', in Joseph B. Kadane (ed.), Robustness of Bayesian Analyses, North-Holland, New York, 1984. 25. Hill, B. 'A theory of Bayesian data analysis', in Geisser, S., Hodges, 1., Press, S. and Zellner, A. (eds), Bayesian and Likelihood Methods in Statistics and Econometrics, North-Holland, 1990. 26. Smith, A. F. M. 'Bayesian approaches to outliers and robustness', in Florans, J. P., Mouchart, M., Raoult, J. P., Simar, L. and Smith, A. F. M. (eds), Specifying Statistical Models: From Parametric to NonParametric, Using Bayesian or Non-Bayesian Approaches, Springer-Verlag, New York, 1983. 27. Chen, W. c., Hill, B. M., Greenhouse, J. B. and Fayos, J. V. 'Bayesian analysis of survival curves for cancer patients following treatment', in Bernardo, 1., DeGroot, M. H., Lindley, D. V. and Smith, A. F. M. (eds), Bayesian Statistics 2, North-Holland, New York, 1985. 28. Berger, 1. Statistical Decision Theory and Bayesian Analysis, 2nd. edn, Springer-Verlag, New York, 1985. 29. Goel, P. and DeGroot, M. H. 'Information about the hyperparameters in hierarchical models', Journal of the American Statistical Association, 76,140-147 (1981). 30. Edwards, W., Lindman, H. and Savage, L. 1. 'Bayesian statistical inference for psychological research', Psychological Review, 70, 193-242 (1963). 31. Cox, D. R. 'Models in statistical analysis', Statistical Science, 5,169-174 (1990). 32. Lehman, E. L. 'Model specification: The views of Fisher and Neyman and later developments', Statistical Science, 5, 160-168 (1990). 33. Cox, D. R. 'Regression models and life-tables (with discussion)', Journal of the Royal Statistical Society, Series B, 34, 187-220 (1972). 34. Kalbfleisch, J. D. 'Nonparametric Bayesian analysis of survival time data', Journal of the Royal Statistical Society, Series B, 40, 214-221 (1978). 35. Box, G. and Tiao, G. Bayesian Inference in Statistical Analysis, Addison-Wesley, London, 1973. 36. Spiegelhalter, D. J. and Smith, A. F. M. 'Bayes factors for linear and non-linear models with vague prior information', Journal of the Royal Statistical Society, Series B, 42, 213-220 (1982). 37. Raftery, A. E. and Akman, V. E. 'Bayesian analysis of a Poisson process with a change-point', Biometrika, 73, 85-89 (1986). 38. Kass, R. E. and Vaidyanathan, S. K. 'Approximate Bayes factors and orthogonal parameters, with application to testing equality of two binomial proportions', Journal of the Royal Statistical Society, Series B, in press. 39. Jeffreys, H. Theory of Probability, 3rd edn, Oxford University Press, Oxford, 1961. 40. Box, G. and Tiao, G. 'A Bayesian approach to the importance of assumptions applied to the comparison of variances', Biometrika, 51,153-167 (1964). 41. Johnson, W. and Geisser, S. 'A predictive view of the detection and characterization of influential observations in regression analysis', Journal of the American Statistical Association, 78,137-144 (1983). 42. Kass, R. E., Tierney, L, and Kadane, J. 'Approximate methods for assessing influence and sensitivity in Bayesian analysis', Biometrika, 76, 663-674 (1989). 43. Naylor, J. C. and Smith, A. F. M. 'Applications of a method for the efficient computation of posterior distributions', Applied Statistics, 31, 214-225 (1982). 44. Shaw, 1. E. H. 'A quasirandom approach to integration in Bayesian statistics', Annal of Statistics, 31, 214-225 (1988). 45. Gelfand, A. E. and Smith, A. F. M. 'Sampling-based approaches to calculating marginal densities', Journal of the American Statistical Association, 85, 398-409 (1990). 46. Tierney, L. and Kadane, J. 'Accurate approximations for posterior moments and marginal densities', Journal of the American Statistical Association, 81, 82-86 (1986). 47. Lee, P. M. Bayesian Statistics: An Introduction, Oxford University Press, New York, 1989. 48. Press, S. J. Bayesian Statistics: Principles, Models, and Practice, Wiley, New York, 1989. 49. Breslow, N. 'Biostatistics and Bayes (with discussion)', Statistical Science, 5, 269-284 (1990).

On some applications of Bayesian methods in cancer clinical trials.

The NCCTG randomized controlled clinical trial for the treatment of advanced colorectal carcinoma is a wonderful case study of the dynamic interplay b...
2MB Sizes 0 Downloads 0 Views