Research Article Received 8 October 2013,

Accepted 2 June 2014

Published online 7 July 2014 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/sim.6249

Comparison of objective Bayes factors for variable selection in parametric regression models for survival analysis Stefano Cabras,a,b*† Maria Eugenia Castellanosc and Silvia Perrad This paper considers the problem of selecting a set of regressors when the response variable is distributed according to a specified parametric model and observations are censored. Under a Bayesian perspective, the most widely used tools are Bayes factors (BFs), which are undefined when improper priors are used. In order to overcome this issue, fractional (FBF ) and intrinsic (IBF ) BFs have become common tools for model selection. Both depend on the size, Nt , of a minimal training sample (MTS), while the IBF also depends on the specific MTS used. In the case of regression with censored data, the definition of an MTS is problematic because only uncensored data allow to turn the improper prior into a proper posterior and also because full exploration of the space of the MTSs, which includes also censored observations, is needed to avoid bias in model selection. To address this concern, a sequential MTS was proposed, but it has the drawback of an increase of the number of possible MTSs as Nt becomes random. For this reason, we explore the behaviour of the FBF , contextualizing its definition to censored data. We show that these are consistent, providing also the corresponding fractional prior. Finally, a large simulation study and an application to real data are used to compare IBF , FBF and the well-known Bayesian information criterion. Copyright © 2014 John Wiley & Sons, Ltd. Keywords:

improper priors; intrinsic prior; model selection; survival analysis

1. Introduction We consider the problem of model selection in regression analysis when the response variable, Y, follows a parametric distribution (e.g. generalized gamma, Weibull, lognormal and exponential) and observations are right censored, although other types of censoring are discussed later. This is a very common problem in many studies of risk factors such as those for lung cancer, which is one of the leading causes of cancer death in developed countries. Despite all new advances in its treatment, a 5-year absolute survival rate is currently only 10.2% [1], and consequently, new therapeutic strategies based on suitable prognostic factors constitute an important piece of research. From a clinical perspective, solving the model selection problem may allow to personalize the most adequate therapeutic option for each patient. Although the usual procedures to select the most appropriate treatment for a patient suffering cancer are based on clinical trials, the National Cancer Institute and the National Institute for Health and Clinical Excellence agree in promoting complementary designs and methodologies, such as observational studies, to help improving therapy decision making. In particular, understanding the role and significance of prognostic factors in cancer would become a tool to gain knowledge about the different features of this disease such as survival times. Within this framework, we consider the problem of comparison of different objective Bayesian methods for estimating the risk factors of, for instance, stage IV non-small-cell lung cancer (NSCLC), which is the most prevalent type of lung cancer [2]. Here, we revisit the model selection procedure used in [3] extending the analysis to other model selection procedures. a Department of Statistics, Universidad Carlos III de Madrid, Getafe, Spain b Department of Mathematics and Informatics, Università di Cagliari, Cagliari, Italy c Department of Informatics and Statistics, Universidad Rey Juan Carlos, Móstoles, Spain d Department of Social Science and Institutions, Università di Cagliari, Cagliari, Italy

Spain. † E-mail:

[email protected]

Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

4637

*Correspondence to: Stefano Cabras, Department of Statistics, Carlos III University of Madrid, C/ Madrid 126, 28903 Getafe,

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

In Bayesian model selection, the most widely used tools are Bayes factors (BFs) [4], which are unscaled when improper priors are used. To overcome this problem and to approximate the underlying BF, fractional (FBF) and intrinsic (IBF) BFs [5, 6] have been proposed in the literature. Both depend on the size, Nt , of the minimal training sample (MTS), and in particular, the IBF also depends on the specific MTSs used. For this reason, the IBF is non-parametrically bootstrapped over a set of MTS, leading to the averaged IBF, BF AI , or to the median IBF, BF MI . In the presence of censored data, the definition of the MTS is not obvious, as Assumption 0 in [7] must be fulfilled. This requires that in drawing MTS, we have to ensure that the space of all possible MTSs is fully explorable. More formally, let  I be the hypothetical sample space made of proper training samples (TSs) assuming an infinite sequence of data.  Then for every model i under comparison with parameter 𝜽i , Assumption 0 states that Pr𝜽 i ( I ) = 1. i This would not be the case if we only drew uncensored data that are, however, the only samples that allow to train an improper prior into a proper posterior. In order to fulfill this assumption, Berger and Pericchi proposed in [7] a sampling scheme called sequential MTS (SMTS) where samples are drawn until a fixed number of uncensored observations is selected. The size, Nt , of each MTS in SMTS is thus random. In the case of censored data, the number of possible MTSs that can be drawn from a given sample increases, and hence, the use of SMTS may be infeasible to explore a large space of models induced by a large set of covariates. For such exploration, we propose to employ the FBF, which however depends on the specific choice of the likelihood fraction B. In order to mimic the IBF with SMTSs, we propose a double fractionation of the likelihood, to account for the uncensored part of the likelihood and for that of the censored one. Such double fractionation must further account for the randomness of Nt , and in this paper, we derive the exact distribution of Nt , Pr{Nt }, showing that it is governed by the sample size, n, the number of censored observations, nc , and the number of parameters in the model, s. We propose three definitions of the FBF, which account for the uncertainty on Nt : in two of them, Pr{Nt } is summarized by the mode and by the median, respectively, and in the third one, all the conditional FBFs are averaged with respect to Pr{Nt } leading to the marginal FBF, denoted by mFBF. Without loss of generality, we illustrate the procedure mainly for some parametric models as the generalized gamma, Weibull, and lognormal regression, although some exact results may be obtained for the simpler exponential model. More formally, let (ti , 𝛿i , xi ) be the survival time, censoring indicator and covariates, respectively, for ∑n individual i = 1, … , n, where 𝛿i = 0 if right censored and 1 if uncensored, and where i=1 𝛿i = nu = n−nc is the number of uncensored observations. The design matrix X is assumed fixed with p+1 columns, including the intercept. Consider yi = log(ti ) and the following regression model k with the kth set of covariates denoted by xk,i k ∶ yi = 𝜷 Tk xk,i + 𝜎k 𝜖i ,

(1)

4638

where k ∈ {1, 2, … , K = 2p } is the model index with design matrix Xk = (xk,1 , … , xk,n )T ∈ n×pk and parameter 𝜽k = (𝜷 k , 𝜎k ) ∈ Θk = pk × + . If the error term 𝜖i is distributed as a standard Gumbel distribution, with d.f. ) f (𝜖i ) = exp(𝜖i − exp(𝜖i )), then ti |xk,i , 𝜽k follows a Weibull distribution with location ( exp −𝜷 Tk xk,i ∕𝜎k and shape 1∕𝜎k , while if 𝜖i were normally distributed, then the times would be lognormally distributed, and finally for 𝜖i that follows a generalized Gumbel distribution, the observed times would follow a generalized gamma distribution. We want to select the most probable model k given the observed data through BFs and posterior probabilities of each model [4, 8, 9]. In order to calculate BFs, we need to specify a prior distribution 𝜋k (𝜽k ) separately for each model, and this can be complicated, because one often initially entertains K models leading to the impossibility of careful subjective prior elicitation. Hence, Bayesian model selection is usually carried out by means of default methods [9]. In this context and because of the specific type of parametric model, we use the usual default prior for location-scale models, that is, to 𝜋kN (𝜽k ) ∝ 1∕𝜎k [10]. In literature, there exist at least four methods under the label of default Bayesian model selection: IBF and FBF, the Bayesian information criterion (BIC) and the conventional prior (CP) approach. Here, we focus on the first two methods, while more details about BIC and CP approach can be found in [9]. The rest of the paper is organized as follows. For fully observed responses, Section 2 describes the IBF and FBF that in the literature are usually coupled with two strategies of model selection, namely the highest posterior probability model (HPPM) and the median posterior probability model (MPPM) criteria. Section 3 reconsiders IBF and FBF for right-censored data providing definition and theoretical properties of the possible versions of the FBF. Although the illustrated methodology is applicable to Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

several parametric models, Section 4 discusses mainly three: generalized gamma, Weibull and lognormal, comparing the model selection methods, in a simulation study, under a wide range of common scenarios in survival analysis (Section 5). Section 6 illustrates an application to two data sets. Finally, remarks and conclusions are contained in Section 7.

2. Objective variable selection for fully observed responses Consider a response variable Y and a set of explanatory variables {X1 = 1, … , Xp+1 } possibly obtained from the set of all principal effects, first-order interactions, and so on of the p covariates. We suppose that such set induces the following family of K models:  = {1 , … , K }, where each k , for k = 1, … , K, has the form specified in (1). Let 𝜋k (𝜽k ), k = 1, … , K, be the prior distribution for the unknown parameters of model k; then the predictive density of y, under k , is mk ( y) =

∫Θk

fk ( y ∣ 𝜽k )𝜋k (𝜽k )d𝜽k ,

where fk is the induced density of Y from Equation (1). The BF of i against j is given by Bij ( y) =

mi ( y) , mj ( y)

(2)

and the posterior probability of i is (

mi ( y)pi

p(i ∣ y) = ∑K

j=1

=

mj ( y)pj

1+

∑ pj j≠i

pi

)−1 Bji ( y)

,

(3)

where pi are the model prior probabilities. A common choice in default Bayesian analysis is pi = 1∕K, ∑ and model posterior probabilities depend only on the BFs or equivalently on mi (y)∕ Kj=1 mj ( y). Under the improper prior 𝜋kN (𝜽k ), the BF in (2) BNij ( y) =

mNi ( y) mNj ( y)

=

ci ∫ fi ( y ∣ 𝜽i )𝜋iN (𝜽i )d𝜽i , cj ∫ fj ( y ∣ 𝜽j )𝜋jN (𝜽j )d𝜽j

is unscaled, because of the ratio between normalizing priors pseudo-constants ci ∕cj . For more than two models, possibly non-nested, BFs may be incoherent in the sense that Bjk ≠ Bji Bik . To avoid this problem, we use the encompassing approach [9, 11], where each submodel i is compared with a reference model 1 , for example, the model with the intercept only. There are different forms of model encompassing, and in this paper, we consider the pairwise comparison from below [12] between the simplest model, 1 nested in k . Another approach of encompassing can be that of comparing each model k with the full one K , for example, the model that includes all the covariates. However, Moreno and Girón in [12] compared the two encompassing methods, in the case of linear regression, and found that both methods are similar although the encompassing from below has more appealing theoretical properties. For our purposes, the most important feature is that encompassing from below provides coherent model posterior probabilities in the space of all models  [12]. With encompassing from below, it is possible to obtain pairwise BFs, Bi1 , i = 1, … , K, such that the BF of i to j is

Copyright © 2014 John Wiley & Sons, Ltd.

Bi1 , i, j = 1, … , K, i ≠ j. Bj1

(4)

Statist. Med. 2014, 33 4637–4654

4639

Bij =

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

2.1. Intrinsic Bayes factors A solution to avoid the indetermination, induced from an improper prior, is the IBF where part of the data acts as TS. The idea is to use the TS to train the improper prior into a proper posterior and to use the rest of the data to compute the BF, using the updated posterior acting as a data-dependent prior distribution [6]. Definition 1 A TS, indexed by l, is a subset of the data y, denoted by ỹ (l). It is called proper if 0 < mNk (̃y(l)) < ∞, for all k , k = 1, … , K, and it is called minimal if it is proper and no subsets are proper. In the sequel, the minimal TS will be denoted by MTS. Given an MTS, the improper prior 𝜋kN (𝜽k ) is trained into a proper posterior as follows 𝜋kN

(

)

𝜽k ∣ ỹ (l) =

( ) fk ỹ (l) ∣ 𝜽k 𝜋kN (𝜽k ) mNk ( ỹ (l))

.

Using this proper posterior as prior, the BF is then computed over the rest of the sample. The resulting BF conditioned on ỹ (l) is Bij ( ỹ (l)) = BNij ( y)BNji ( ỹ (l)) ,

(5)

where BNji ( ỹ (l)) = mNj ( ỹ (l)) ∕mNi ( ỹ (l)) is the correction factor. Observe that the conditional BF given in (5) depends on the specific TS ỹ (l) and there are several proposals to combine a set of L MTSs. In particular, Berger and Pericchi in [6] proposed averaging Bij ( ỹ (l)) over all possible MTSs leading to the arithmetic IBF (AIBF) defined as 1∑ N B ( ỹ (l)) , L l=1 ji L

BFijAI = BNij ( y)

(6)

while another possibility is to consider the median of all the L possible Bij ( ỹ (l)) obtaining the median IBF (MIBF), { } BFijMI = BNij ( y) Median BNji ( ỹ (l)) . (7) l=1,…,L

Further details about AIBF and MIBF can be found in [6,9,13–15] and, specifically for survival models, in [16,17]. We note that, in order to fulfill Definition 1, the size of the MTS should be equal to the number of parameters in the most complex model. However, in cases in which there are important differences between the simplest and the most complex model, the MTS may not fulfill the predictive matching criteria according to which the prior should be such to obtain a unity BF (see Definition 3 in [18]). More generally, establishing the MTS size is a problem of determining the effective sample size of a model [19], which is beyond the scope of this paper. Therefore, unless not explicitly stated, we will strictly adhere to Definition 1 and fix the MTS size according to the most complex model. 2.2. Fractional Bayes factors The FBF developed by O’Hagan in [5] is based on a similar idea behind the IBF, but instead of using part of the data to train priors into proper posteriors, it uses a fraction, b, of the likelihood function for the whole sample y. The unused fraction of the likelihood function is used for model discrimination. The FBF is = BNij ( y) BF,b ij

∫ fj ( y ∣ 𝜽j )b 𝜋 N (𝜽j )d𝜽j ∫ fi ( y ∣ 𝜽i )b 𝜋 N (𝜽i )d𝜽i

,

(8)

4640

where b ∈ (0, 1) is a fixed constant. Among the different recommendations given in [5], the most used one is b = s∕n where s is the size of the MTS. Some appealing arguments on this choice of b are given in [20, 21]. Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

2.3. Highest and median posterior probability model Once we have calculated the posterior probability for each model using Equation (3), we rank the aforementioned BFs selecting the model that satisfies one of the following two strategies: (i) the HPPM or (ii) the MPPM discussed in [22]. The HPPM is simply the k∗ th model with the highest posterior probability, k∗ , k∗ = arg maxk∈1,…,K p(k ∣ y), which however may still have a very low probability. The MPPM is the model containing those regressors that have overall posterior inclusion probability larger than 1∕2. Let lj be the set of indexes of all models containing a given variable j. The posterior inclusion probability for a variable j is pj =



Pr(k ∣ y),

(9)

k∈ lj

which is the overall posterior probability that the variable j is included in one model. ( If it exists, the ) ∗ ∗ , MPPM k∗ is the model consisting of those variables for which pj ⩾ 1∕2, that is, k = k2∗ , … , kp+1 where each kj∗ is { 1 if pj ⩾ 1∕2 , kj∗ = 0 otherwise for all j = 2, … , p + 1 covariates in X excluding the intercept (i.e. j = 1). Sometimes, it may happen that the MPPM does not exist in the sense that the most probable is the reference null model [22].

3. Objective variable selection for censored responses In the sequel, we assume that observed log-times Y may be right censored, meaning that if, for instance, we stop our study at a fixed time T, then survival times larger than T are not fully observed. This type of censoring is mostly known as generalized type I censoring. Other types of censoring are possible, but they just complicate the exposition and derivations. When part of the observations is censored, the MTS must be carefully considered. In the uncensored case, the minimal dimension of the TS for a model with s parameters is equal to s and the MTS is made of s uncensored observations. However, Berger and Pericchi in [7] observed that the space of possible MTSs should be fully explorable with probability 1 (Assumption 0 in [7]), and this can be accomplished by using, for instance, the sequential MTS (SMTS) defined as follows. Definition 2 Suppose we have s parameters. Then the SMTS is constructed drawing observations, without replacement, from y stopping when s uncensored observations are obtained. The SMTS induces a TS of the form } { y(l) = {s − 1 uncensored and Nt − s censored observations}, ys (l) , with random size Nt ⩾ s. Note that y(l) is not minimal in the sense of Definition 1 because it contains censored observations that can be removed. The following proposition provides the probability distribution of Nt , which is used in the sequel. Proposition 1 The probability distribution of Nt given n, s and nc is ( Pr(Nt = nt |n, s, nc ) =

nc )(n−nc −1) (nt s−1 nt −s

− 1)!(n − nc )

Dn,nt

, s ⩽ nt ⩽ nc + s,

(10)

where Dn,nt = n!∕(n − nt )!.

4641

Proof 1 See the Appendix. Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

3.1. IBF under right censoring The calculation of the IBF can be very costly from a computational point of view, because it is necessary to approximate 2(L + 1) integrals, where L should be the∑number of all possible SMTSs, and this can be n +s ( c )(n−nc −1) simply infeasible even for small samples as L is of order Nc =s Nn−s (Nt − 1)!(n − nc ). A possible s−1 t t approximation consists in considering a subset of the set of all SMTSs. As mentioned in Section 2 of [7] and shown in [23], it often suffices to randomly choose a subset of cardinality n × nt of MTSs, all with fixed size nt . For censored data, MTSs have a different sizes, and thus, we have to account for the randomness of Nt and consider a subset of cardinality L of SMTSs in which L can be given by the mode or the median of Nt , Lmo = n × mode{Nt } or Lme = n × [median{Nt }], with [⋅] denoting the integer part and where the median and the mode are calculated under the probability distribution of Nt in (10). It would be interesting to consider the whole probability distribution of L induced by Nt , and this is carried out in the sequel for the FBF, because for the IBF, this is usually computationally infeasible. Distribution (10) could be further employed for implementing a stratified SMTS sampling such that the distribution of sample sizes of SMTS follows (10). This would require enumerating all possible SMTS, which may be infeasible for large sample sizes; thus, another way to introduce distribution (10) is to use it in re-weighting the Monte Carlo samples of SMTSs according to their sizes. To compare with the actual version of the SMTS, this latter strategy will not be further pursued in this paper. For each SMTS, y(l), we obtain BNi1 (l) = BNi1 ( y)BN1i ( y(l)), and then IBFs are BAI = BNi1 ( y) i1,L ⋆

BMI i1,L⋆

=

L⋆ 1 ∑ N B ( y(l)) L⋆ l=1 1i

BNi1 ( y) Median l=1,…,L⋆

(11)

BN1i ( y(l)),

where L⋆ stands for Lmo or Lme . Default BFs as the IBF or FBF are generally not actual BF; for this reason, it is necessary to study their behaviour. One way to do this is by analysing if they correspond to an actual BF obtained from a certain prior distribution, namely their intrinsic prior. While, in general, the prior intrinsic to an approximation of the BF is called intrinsic prior, we will call fractional priors those priors intrinsic to the FBF and those intrinsic to the IBF intrinsic priors. Sometimes, it is not possible to show directly if a default BF exactly corresponds to a real BF, but this can be carried out using asymptotic arguments. In [6], it is defined an intrinsic prior as a prior distribution that would produce the same default BF with a large amount of data. As pointed out in [6], the intrinsic prior exists when the correction factor of the IBF converges to a positive number as the sample size goes to infinity. The intrinsic priors considered in this paper are for two models j nested in i . Under mild conditions, intrinsic priors corresponding to BFijAI defined in (6) exist and are given by (see Appendix 1 in [9]) ( ) M 𝜋jI (𝜽j ) = 𝜋jN (𝜽j ), 𝜋iI (𝜽i ) = 𝜋iN (𝜽i )E𝜽 i BNji ( y(l)) , i

(12)

4642

) ( M where E𝜽 i BNji ( y(l)) denotes the expectation with respect to the density of y(l) under the model i . i More details on the calculation of the intrinsic prior can be found in [6, 9]. Different types of censoring induce complications and difficulties in the calculation of the intrinsic prior. In particular, for type II censoring, there exists a deterministic stopping rule, which further complicates the likelihood. Finally, random censoring implies that r ∈  becomes random and this induces a space of possible MTS that is also random, and Assumption 0 in [7] should be regarded as marginal to the probability distribution of R, namely HR (r). The analysis may proceed then in two steps: first obtain the intrinsic prior conditioning on R = r and then marginalizing it with respect to HR (r). However, a random censoring mechanism, represented by model HR (r), should be assumed in order to obtain the intrinsic prior as also pointed out in [7]. Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

3.2. The exponential model example The following toy example from [7] contains details on the calculation of the intrinsic prior in the case of censored data for the exponential model. Suppose we want to compare two exponential models with mean 𝜃 > 0. 0 ∶ 𝜃 = 𝜃0 1 ∶ 𝜃 ≠ 𝜃0 and let y be a sample of right-censored exponential observations with censoring time r. Under the usual default prior 𝜋 N (𝜃) = 1∕𝜃 and with p(𝜃) = Pr(Y > r|𝜃) = exp(−𝜃r), the two marginal prior distributions are ) ( nu nu N m0 ( y) = 𝜃0 exp −𝜃0 𝜃̂ and ∞

mN1 ( y) where 𝜃̂ = nu ∕

∑n

i=1 yi .

=

∫0

) ( Γ(n ) nu 1 nu d𝜃 = ( )unu , 𝜃 exp −𝜃 nu 𝜃 𝜃̂ 𝜃̂

Then BN10 ( y)

( ( ) ) nu −nu nu = Γ(nu ) 𝜃0 exp 𝜃0 . 𝜃̂ 𝜃̂

(13)

In this case, an SMTS, y(l), consists in one uncensored observation, named y(l), and Nc∗ (l) censored times at r, so the expression of BN01 for a SMTS is as follows: ) ( ( ) ) ( BN01 ( y(l)) = 𝜃0 Nc∗ (l)r + y(l) exp − Nc∗ (l)r + y(l) 𝜃0 , and the approximated IBF corresponding to L random SMTSs is given in [7] 1∑ 𝜃 t(l) exp(−t(l)𝜃0 ), L l=1 0 L

AI BF10 ≅C×

where t(l) = Nc∗ (l)r + y(l), C = Γ(nu )(T𝜃0 )−nu eT𝜃0 , with T = From (12), the intrinsic prior for the SMTS is

(14)

∑n

i=1 yi .

] M [ 𝜋 I (𝜃) = 𝜋 N (𝜃)E𝜃 1 BN01 ( y(l)) ∞ r 1∑ = 𝜃 (jr + y) exp(−(jr + y)𝜃0 )p(𝜃)j 𝜃 exp(−𝜃y)dy 𝜃 j=0 ∫0 0 =

𝜃0 . (𝜃 + 𝜃0 )2

3.3. FBF under right censoring

Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

4643

In the definition of FBF (see (8)), it is necessary to specify the fraction b. When the MTS size is fixed, with value nt , b is usually defined as b = nt ∕n. In the case of censoring, the SMTS is made of s uncensored observations and a random number Nc∗ = Nt −s of censored observations, 0 ⩽ Nc∗ ⩽ nc , whose probability distribution is induced by Pr{Nt } in (10). The two sample subsets have different information for the parameters because only uncensored observations lead to a proper trained posterior, and this should be reflected in the FBF by adopting a double fractionation of the likelihood in order to mimic the information contained in the SMTS. Specifically, we define the following vector of fractions:

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

( B=

∗)

N s , c n − nc nc

,

leading to the following fractional likelihood: (

)



f ( y|𝜽)B =

(

s n−nc

fY (yi |𝜽)

×

i∈uncensored

) Nc∗



nc

SY (yi |𝜽)

,

(15)

i∈censored

where fY and SY denote density and survival functions induced by model (1) for a specified set of covariates. Note that the uncensored part of the likelihood is made of n − nc factors, so, using the fraction 1∕(n − nc ) corresponds to geometrically averaging the uncensored part of the likelihood, and then applying the power s in order to have the same weight that in the SMTS scheme. An analogue argument applies to the uncensored part with a random power Nc∗ , in which we have to consider the probability distribution of the resulting random fraction B. In particular, we consider three different ways of accounting for such randomness: ( ) mode{Nc∗ } s (1) bmo = n−n , in analogy to the IBF calculated using Lmo and FBFij,mo denotes the FBF n c

c

obtained(with b = bmo ; ) [median{Nc∗ }] s analogue to the IBF calculated using Lme and FBFij,me is the FBF , (2) bme = n−n n c

c

obtained with b = bme ; (3) consider the whole probability distribution of the random variable ( B=

∗)

N s , c n − nc nc

, for 0 ⩽ Nc∗ ⩽ nc ,

and marginalizing the FBF over Nc∗ = Nt − s thus obtaining what we call the marginal FBF

mFBFij =

nc ∑

F,b(n∗ ) Bij c

n∗c =0

nc ) ( ∗ ) ∑ ( F,b(n∗ ) ∗ ⋅ Pr∗ Nc = nc = Bij c ⋅ Pr Nt = n∗c + s . Nc

Nt

n∗c =0

(16)

Such double fractionation is not common, but in some challenging situations, as in the Neymann–Scott problem, it has been proven to provide a consistent FBF as shown in Example 4 in [9]. The following lemma is needed in order to state the necessary condition for the consistency the FBFs and obtain its fractional prior. Lemma 1 Let nu be the number of uncensored observations, assuming that nu = w × n, where w is the proportion of ( ) d uncensored observations. Then for n → ∞, we have that Nt −→ Ñ t ∼ NegBinomial(s, w) with E Ñ t = ( ) s∕w and Var Ñ t = s(1 − w)∕w2 . Proof 2 See the Appendix. We recall that in [5], it is stated that the FBF is consistent if the fraction b → 0 for n → ∞. Proposition 2 ̃ for likelihood (15) that tends to 0 when allows to consider the existence of a one-dimensional fraction, b, the two fractions also tends to 0. Proposition 2 Consider the likelihood in Equation (15), then

4644

(i) there exists a one-dimensional fraction b̃ that satisfies f ( y|𝜽)b = f ( y|𝜽)b ; (ii) let b = (b1 , b2 ) and, for n → ∞, b1 → 0, b2 → 0. Then b̃ → 0. ̃

Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

Proof 3 See the Appendix. Point (i) of Proposition 2 states that the used double fractioning corresponds to an ordinary FBF with b = b̃ and such unique fraction can be statistically interpreted under the light of a sensible double fractionation for censored and uncensored observations. Proposition 3 states that the two fractions tend asymptotically to 0, and therefore, all the proposed versions of the FBF are consistent. Proposition 3 Assume w is the fixed proportion of uncensored observations, that is, for all n, w = nu ∕n. Then as d

n → ∞, B −→ (0, 0). Proof 4 See the Appendix. In [24], following the approach of Berger and Pericchi [6], it is shown that the FBF asymptotically corresponds to a real BF calculated over suitable fractional priors. Let Bbji ( y) = ∫ fj ( y ∣ 𝜽j )b 𝜋 N (𝜽j )d𝜽j ∕ ∫ fi ( y ∣ 𝜽i )b 𝜋 N (𝜽i )d𝜽i in (8). The FBF is BF,b = BNij ( y)Bbji ( y). ij De Santis and Spezzaferri [24] showed that BF,b corresponds to an actual BF if ij ( ) ( ) 𝜋i 𝜽̂ i 𝜋jN 𝜽̂ j 1 ( ) ( ) + op (1) = b , N ̂ ̂ B ( y) 𝜋i 𝜽i 𝜋j 𝜽j ij where 𝜽̂ i and 𝜽̂ j are the maximum likelihood estimators (MLEs) for the two models i and j , respectively, and for some priors 𝜋i (⋅) and 𝜋j (⋅). The equation that defines the fractional prior for the FBF is 𝜋iFI (𝜽i )𝜋jN (𝜓j (𝜽i )) 𝜋iN (𝜽i )𝜋jFI (𝜓j (𝜽i ))

= B∗i (𝜽i ),

(17)

where B∗i (𝜽i ) is the limit, for n → ∞, of 1∕Bbij ( y) and 𝜓j (𝜽i ) is the limit of 𝜽̂ j under model i . Then the two fractional priors are 𝜋jFI (𝜽j ) = 𝜋jN (𝜽j )u(𝜽j ) 𝜋iFI (𝜽i ) = 𝜋iN (𝜽i )u(𝜓j (𝜽i ))B∗i (𝜽i ), with u(⋅) a continuous non-negative function. In [24], it is observed that under some general conditions of their Theorem 2.1, B∗i (𝜽i ) can be obtained from BNij ( y) by replacing n with nt and the MLEs 𝜽̂ i and 𝜽̂ j with their limits under the model i . We now provide an example of the calculation of the FBF using the double fractionation and also the derivation of the fractional prior for the right-censored exponential model. 3.4. The exponential model example (continued)

Copyright © 2014 John Wiley & Sons, Ltd.

(

) 1∕nu , Nc∗ ∕nc . The fractional predictive

Statist. Med. 2014, 33 4637–4654

4645

For the exponential example, the double fraction is B = distributions for B are as follows:

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

( )) ( m0,B ( y) = 𝜃0 exp −𝜃0 Nc∗ r + ȳ u , with ȳ u = n−1 u

∑nu

y i=1 i

and ∞

m1,B ( y) =

∫0

)) ( )−1 ( ( exp −𝜃 Nc∗ r + ȳ u d𝜃 = Nc∗ r + ȳ u .

The fractional part of the FBF is ( ( ) ( )) BB01 ( y) = Nc∗ r + ȳ u 𝜃0 exp −𝜃0 Nc∗ r + ȳ u , recalling the expression of BN10 ( y) given in (13), the FBF for the fraction B is ( ) ( ( ) ) F,B = C × 𝜃0 Nc∗ r + ȳ u exp − Nc∗ r + ȳ u 𝜃0 BF10

(18)

F,B AI . It is possible to see that BF10 where C is the same quantity that appears in Equation (14) for the BF10 F,B AI amounts to an averaged BF10 over all possible SMTSs with size Nt = Nc∗ + 1, because in BF10 data ∗ evidence is summarized replacing t(l) = Nc r + y(l) by the mean over all SMTS of size Nt , which is Nc∗ r + ȳ u . (𝜃); this quantity is obtained In order to calculate the fractional prior, it is necessary to calculate B∗,B 1 from 1∕BN10 ( y) replacing nu by 1 and 𝜃̂ by their limit under 1 , which is 𝜃. Applying (18), the fractional asymptotic prior, intrinsic to all three versions of the FBF, does not depend on the fraction, and it is

) ( 𝜃 𝜋 FI (𝜃) = 𝜃0 𝜃 −2 exp − 0 = InverseGamma(shape = 1, scale = 𝜃0 ). 𝜃 This is a proper distribution, showing that, for the right-censored exponential model, the three FBFs asymptotically correspond each one to an actual BF. Such interpretation would have been difficult if used directly a unique fraction b̃ arising from point (i) of Proposition 2.

4. Application to Weibull, lognormal and generalized gamma regression models To obtain the above BFs, it is necessary to calculate the marginal predictive distributions, for which there is not an analytical form. In this section, we drop the model subscript k and consider the calculation of the predictive distributions of the log-time for a Gumbel, generalized Gumbel and normal regression model with s regressors whose likelihood is L( y ∣ 𝜽, X) =

n ∏

fY (yi )𝛿i [SY (yi )](1−𝛿i )

i=1

( )]𝛿i [ ( )](1−𝛿i ) n [ ∏ yi − 𝜷 T xi yi − 𝜷 T xi 1 S𝜖 f𝜖 = 𝜎 𝜎 𝜎 i=1

(19)

where fY , SY , f𝜖 and S𝜖 are the density and survival functions of Y and 𝜖, respectively. Under the usual default prior for location-scale models 𝜋(𝜽) ∝ 𝜎1 , for 𝜽 ∈ s × + , the posterior kernel is 𝜋(𝜽 ∣ y, X) ∝

n 1∏ 𝜎 i=1

(

1 f 𝜎 𝜖

(

yi − 𝜷 T xi 𝜎

)]𝛿i [ ( )](1−𝛿i ) yi − 𝜷 T xi S𝜖 , 𝜎

4646

and 𝜋(𝜽 ∣ y, X) has been approximated using Markov chain Monte Carlo (MCMC). In particular, we employed a random walk Metropolis–Hastings (RW-MH) for parameters (𝜷, log(𝜎)) where the proposal is a standard multivariate normal distribution [25]. To initialize the algorithm, we run a Laplace approximation using MLE for the regression parameters. The RW-MH uses a proposal having the Laplace Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

100 10

# of integrals (log-scale)

1000

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

10

20

30

40

50

60

70

80

90

100

Sample size (n)

Figure 1. Number of integrals (vertical axis log-scale) needed for mFBF and IBF with L⋆ = Lmo as a function of the sample size n (horizontal axis) for s = 5 and nc ∕n =30% of censored observations.

approximation’s variance multiplied by a fixed scale factor, chosen in order to have an acceptance rate of 30–40%. The marginal predictive distribution of Y, for each model k , m( y ∣ k ), has been obtained using the method proposed in [25] (Section 2.1). Algorithm 1 contains the pseudo-code for approximating a generic Bij ( y). Approximations of BFs have been carried out using 104 MCMC samples and a necessary massive parallelism, as each BN1i ( y(l)) in (11) or BF,b in (16) can be evaluated separately. While such ij computational effort may not be relevant for the mFBF, it becomes important for the different versions of the IBF. The difference in running time can be compared in terms of the number of integrals needed to calculate mFBF and IBF when, for instance, L⋆ = Lmo . Figure 1 illustrates this for several values of n with s = 5 and nc ∕n =30% of censored observations. Computationally differences are therefore of order exp(n), and they are maximal at nc ∕n = 0.5 and n fixed.

5. Simulation study

Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

4647

In this section, we present the results of a wide simulation study, for the Weibull, lognormal and generalized gamma distributions, in which we investigate and compare the performance of IBFs, FBFs, for censored data defined in Sections 3.1 and 3.3, and the BIC for censored data discussed in [26]. We first generate Y from different regression models chosen among the following set of possible explanatory variables {X1 = 1, X2 , … , Xp+1 }, where p = 3 or 20 covariates. In all models, 𝛽1 = 0 and 𝜎 = 1, while the other coefficients, (𝛽2 , … , 𝛽p+1 ), vary according to the model that generates Y, that is, the null model corresponds to (𝛽2 =, … , 𝛽p+1 = 0) and the model with 1 ⩽ J ⩽ p + 1 covariates have (𝛽2 = … = 𝛽J = 1, 𝛽J+1 = … = 𝛽p+1 = 0). In order to evaluate the behaviour of BFs with respect to the choice of the MTS size, s, we consider either s fixed for all BFs according to the most complex model size or s varying according to the model size of the most complex model of each BF. We have considered two percentages of censored data (10% and 30%), two different sample sizes (50 and 100) and two different types of design matrix (independent and correlated covariates (with all correlations between covariates set to 0.5) simulated from a multivariate normal distribution). Results are based on 100 replications for each simulation scenarios. In each replication, we calculate all versions of IBF, FBF and the BIC for censored data (hence eight different BFs: BAI , BMI , BAI , BMI , FBFmo , FBFme , mFBF and BIC) and select Lmo Lmo Lme Lme the model under HPPM and MPPM. We obtained 100 replicates of the probability of the true model, the proportion of times that the true model has been recognized, p̃ , under HPPM and MPPM, along with the posterior expected model size (PEMS). In order to analyse the whole simulation study, we used a logit regression of p̃ with respect to the main effects of simulation scenarios, type of BFs and selection criteria along with all their possible interactions. As a general result, across all parametric models, p̃ decreases when the percentage of censored data increases and when covariates change from independent to correlated, while it increases when the sample size increases. We did not observe, among the simulated scenarios, relevant differences between HPPM and MPPM nor between Lme and Lmo especially for n large. For this reason, we only provide results for Lmo .

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

Figure 2. Simulation results for the Weibull model with 30% of censored data, independent covariates and two sample sizes: true model (right vertical axes) posterior probability (top), proportion of selection of the true model (bottom left) and posterior expected model size (bottom right).

The general behaviour of all considered BFs is summarized by Figures 2 and 3, which refer to the scenario with 30% of censored observations and independent covariates using MPPM with p = 3 and varying s in the calculation of each BF. Similar results have been obtained for the generalized gamma distribution, and the relative figure is omitted. The height of the vertical segments, for each quantity of interest, represents the standard deviation of the mean value in 100 replications. When s is fixed to 5, in the considered simulation study, there are no significant differences with respect to the results in Figures 2 and 3. From such figures, we can observe the following: (i) As expected, BMI provides best results among all true models although at large sample sizes (e.g. Lmo n = 100), FBFmo and mFBF provide results that do not significantly differ from BMI especially Lmo for most complex models. (ii) With respect to the other considered BFs, BIC performs poorly for the simplest models, while it tends to select the most complex model as also noted in [9] and [27]. behaves worse than BMI . This is known [13], and it is due to the lack of stability of the L (iii) BAI Lmo Lmo partial BFs in (5), which is only partially mitigated by BMI ⋅ . This result recalls the message that the median IBF is often preferable to the arithmetic from a robustness perspective (see Section 6 of [7]).

4648

In order to challenge the comparison, we perform a similar simulation study as that mentioned earlier with p = 20 where, because of the availability of computational resources, the model space has been limited to the null and a model with one covariate, so that K = 21 and the most complex model has three parameters (𝛽0 , 𝛽j , 𝜎), for j ∈ (2, … , p + 1). Essentially, this exercise is useful to have an evaluation of the false discovery rate (FDR) and the false non-rejection rate (FNR) in a multiplicity of tests. Such quantities and p̃ are negatively correlated, and while the FNR is almost the same for all BFs, the FDR changes. In order to account for the multiplicity of comparisons, we consider the model mass prior discussed in [28], where a uniform prior is assumed over the sets of models with equal dimension. For this case, it would Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

Figure 3. Simulation results for the lognormal model with 30% of censored data, independent covariates and two sample sizes: true model (right vertical axes) posterior probability (top), proportion of selection of the true model (bottom left) and posterior expected model size (bottom right).

be one-half of prior probability for the null model and 1/40 for the rest of the models. Such prior allows to control the FDR [28]. We consider this prior also in the following section for the analysis of a real data set, although a deep discussion of model prior choice is beyond the scope of this paper. We illustrate the results for the generalized gamma in Figure 4. In particular, we can see that the versions of FBF are a valid alternative to the BIC when the median IBF is not feasible with large p and large n. For large p and small n, the median IBF is feasible, as there are very few different SMTSs, and hence, we recommend its use. Practically, for lower sample sizes, sparsity affects more the FBF than the median IBF, while it is really important for the average IBF and the BIC, because of the curse of dimensionality in approximating the posterior with the Laplace method.

6. Application to real data sets

Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

4649

In this section, we consider an application to two data sets. The first concerns the analysis of survival in NSCLC, a data set also considered in [3]. This contains the survival times for n = 35 patients at the fourth stage of the NSCLC, of which nc = 19 are censored. In the original data set, there are two different types of survival: the overall, which is the time from the entrance in the study until death, and the progression-free survival, which is the time until the cancer progresses. In this analysis, we consider the overall survival. As the sample size is relatively small and there are only nu = 16 uncensored observations, we build the model space with the five possible predictive variables indicated by the oncologist and summarized in Table I. The model space consists of K = 25 = 32 models in which the most complex one has s = 7 parameters. In order to evaluate the difference between s = 7 fixed for all BFs and s not fixed, Table II illustrates the results under such choices of s using the prior in [28]. We can see that when s varies, the simplest models tend to be favoured because their posterior probability increase as well as the PEMS decreases. It is clear that all methods select the Calcemia as the most related covariate to the overall survival as also reported in [3]. Other probable models still contain the Calcemia covariate.

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

Figure 4. Simulation results for the generalized Gamma model with 20 covariates, 30% of censored data, independent covariates and two sample sizes: true model (right vertical axes) posterior probability (top), proportion of selection (bottom left) and posterior expected model size (bottom right).

Table I. Covariates for the overall survival in NSCLC. Variable (unity of measure) Body mass index Age (years) Calcemia (mg/dL) Albumins (g/dL) Anaemia (g/dL)

Values (median)

No. of patients (range)

⩽18 >18 (63) (9.6) (3.5) (12.8)

25 16 (49–82) (8.8–10.8) (2.1–4.6) (9.6–16.3)

To avoid the effect of extreme observations, body mass index has been discretized following medical indications.

4650

In the analysis of this data set, where nu is not large with respect to s, the versions of FBF turn out to be more precise than that of IBFs when the number, L, of TS decreases to a number that equals the computational effort between the IBF and FBF. That is, if L were such that the amount of integrals needed to compute the IBF equals that of FBF, then the latter is more stable and the IBF tends to converge to the values reported in Table II when L → L∗ . The second analysis is with the larynx data set introduced in [29] and further discussed in [30]. This data set describes the survival times of n = 90 patients suffering from larynx cancer where nc = 40 individuals were alive at the end of the study. We have two possible covariates: the stage of the disease, a factor with four levels, and the age at diagnosis. In this case, the most complex model has s = 6 parameters and the most simple s = 2; hence, four TSs over 50 uncensored observations are practically irrelevant, and thus, we reported, in Table III, the results only when varying s with the model prior probability developed in [28]. We can see that all BFs agree that survivor is mostly related with the stage of the disease. Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

Table II. Results for the overall survival in NSCLC when s = 7 is fixed in all BFs (top) or when varying s (bottom). Model posterior probability (selection criteria) Model Calcemia *NULL* Calcemia-Age Calcemia-BMI PEMS

Calcemia *NULL* Calcemia-Age Calcemia-BMI PEMS

mFBF

FBFmo

0.08 (H,M) 0.08

BAI L

BIC

BMI L

mo

mo

s = 7 fixed for all BFs 0.08 (H,M) 0.29 (H,M) 0.34 (H,M) 0.13 0.08 0.13

0.21 (H,M)

0.15 2.1

2.1

0.12 (H,M) 0.09

2.0

1.6

1.8

Varying s in all BFs 0.13 (H,M) 0.29 (H,M) 0.30 (H,M) 0.14 0.11 0.13

0.31 (H,M)

0.11 2.0

2.0

2.0

1.5

1.7

The selection criterion is reported in parenthesis when the model is chosen according to HPPM (H) and MPPM (M). The model posterior probability is shown only when the model belongs to the two most probable models according to the BF in each column. The rows labelled ‘PEMS’ report the posterior expected model size.

Table III. Results for the larynx data set when varying s. Varying s in all BFs Model

mFBF stage stage-age *NULL* age PEMS

Model posterior probability (selection criteria) FBFmo BIC BAI L mo

BMI L

mo

0.77 (H,M) 0.21 0.02 0.00

0.91 (H,M) 0.06 0.02 0.01

0.63 (H,M) 0.36 0.01 0.00

0.82 (H,M) 0.14 0.04 0.00

0.82 (H,M) 0.15 0.02 0.01

1.2

1.1

1.4

1.1

1.1

The selection criterion is reported in parenthesis when the model is chosen according to HPPM (H) and MPPM (M). The ‘PEMS’ row reports the posterior expected model size.

7. Discussion

Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

4651

This paper provides more insights about the IBFs for censored data already introduced in [7] with application to widely used regression models in survival analysis. It is already known that AIBF and MIBF are consistent as they satisfy Assumption 0 discussed in [7]. However, they make formidable computational demands when the marginal of the two models are not known analytically. This can be partially mitigated in some models, for example, the exponential model, when the intrinsic prior and hence the prior predictive marginal distribution are available in closed form. In order to provide a less computationally demanding procedure, more precise than the BIC, we explore the use of the FBF proposed in [5] for censored data interpreting its original unique fraction in terms of double fractioning and then accounting for the randomness in the MTS size. Moreover, for the most simple case of the exponential model, we showed that for the versions of FBF, the fractional prior exists, and hence, they can be regarded asymptotically as actual BFs under such prior. In summary, the MIBF should be preferred to the FBF if the time employed in the analysis is not relevant; otherwise, FBF compensates a small decrease in precision in model selection with a quicker answer. In fact, it is known that the FBF considers an averaged MTS while the IBF the whole empirical distribution of the MTSs. Calculating the correction factor over the expected MTSs is less computationally demanding than over the entirely empirical distribution, and it may be a crude approximation especially

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

in small samples. On the other hand, large samples make computationally infeasible the IBF as a large number of SMTSs must be taken into account in order to have a satisfactory approximation of the empirical distribution of the SMTSs. In such a case, the corresponding expected correction factor is a suitable summary for all SMTSs, and it can be considered directly with the FBF making it the only computational solution for model selection. Theoretical results, as well as simulation study and applications to real data, are pervaded by such trade-off between model selection accuracy and computational feasibility. Such trade-off is also considered when comparing the BIC against IBF and FBF. The BIC is a reference model selection technique for many packages because of its simplicity in the implementation and evaluation. However, this paper shows that if one is willing to use Monte Carlo methods, as MCMC, instead of just the numerical optimizations, required for the BIC, then the FBF may be a quick and accurate alternative. Although this paper focused on model selection criteria based on BFs and the related posterior model probabilities, another popular alternative is the deviance information criterion (DIC), which was originally proposed in [31]. The original version of the DIC has been further elaborated with several variants [32, 33], in which we can include the Bayesian predictive information criteria (BPIC) [34]. Most of such variants concern the determination of the effective number of parameters of the model usually denoted by pD as it may be inconsistent depending on the class of models under comparison. Such number is either interpreted as a penalization for model complexity (DIC) or as an estimator of the bias between the posterior mean of the log-likelihood and that expected under the true model (BPIC). We think that while pD may be easy to be determined when all observations are equally informative and independent (as the BPIC case), this may be not the case with censored and uncensored observations. Among the different variants of the DIC, the most related to the problem of model choice under censored data may be those discussed in [33], which deal with missing data. In fact, using their notation, the latent variable z may represent the unobserved variable due to a censoring mechanism described in the observational model f ( y, z|𝜽) that appears in equation (4) of [33]. Depending on how z is regarded, that is, as a new parameter or an unobserved variable, different versions of the DIC can be considered as discussed in [33]. However, at the moment, we are not aware of a specific investigation of these or other suitable versions of the DIC for censored data, and this could be an interesting argument for further research. Finally, although this may be viewed as a personal preference of the authors, we find more straightforward to communicate the results of a model selection procedure using model posterior probabilities. We think that these can be more informative than the DIC. For instance, the HPPM model may also correspond to that with the lowest DIC, but we still have the advantage of providing how much this model is probable given the data.

Appendix Proof of Proposition 1 Expression (10) is given by the ratio between favourable and possible cases for an SMTS of size Nt = nt without replacement. The denominator of (10) is just the number of all possible ways in which we can choose nt elements out of n when their order is taken into account: Dn,nt = n!∕(n − nt )!. To count the favourable cases in the numerator, we sample nt elements until we reach s uncensored observations. Such sample takes the form { ⎧ (s − 1) uncensored ⎪ (nt − 1) observations SMTS of size nt ∶ ⎨ (nt − 1) − (s − 1) censored ⎪ 1 uncensored observation ⎩ ( c −1) ways, while the (nt − 1) − (s − 1) censored We can choose the (s − 1) uncensored observations in n−n s−1 ( nc ) observations can be chosen in n −s ways. All these elements can be permuted in (nt − 1)! ways. Finally, t the last observation can be selected from all uncensored observations, (n − nc ).

4652

Proof of Lemma 1 The result follows from the definition of the negative binomial random variable. In fact, the size Nt of the MTS from an SMTS is the total number of trials, from an infinite population, with probability of success w, and we do not stop until we obtain s successes, namely s uncensored observations. Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

S. CABRAS, M. E. CASTELLANOS AND S. PERRA

Proof of Proposition 2 ̃

b

b

b

b

(i) Write f ( y|𝜽)b = fY 1 SY2 where, for the sake of simplicity, fY 1 and SY2 denote the two factorization terms, with the corresponding exponent, in the likelihood (15). After a simple algebra manipulation, we have that b̃ = b1

log(fY ) log(SY ) + b2 . log(fY ) + log(SY ) log(fY ) + log(SY )

(A.1)

(ii) Being log(SY ) ≠ 0, then it suffices that jointly b1 → 0 and b2 → 0 for n → ∞ in order to have b̃ → 0. Proof of Proposition 3

( ( ( ) ) ) d From Lemma that Nt −→ Ñ t . Then E(B) = s∕nu , E Ñ t − s ∕nc → (0, 0) and ) ( 1, (we )know Var(B) = 0, Var Ñ t ∕n2c → (0, 0) because means and variances of both fractions are fixed constants with respect to nc = n(1 − w) and nu = nw. Moreover, both fractions vanish asymptotically at rate o(n) as required. Approximation of the BF from MCMC Required for k , k = i, j: Dk = ( y, Xk ) observations; fk (Dk |𝜽k ) and 𝜋k (𝜽k |Dk ) the likelihood and kernel of posterior; 𝜏k a fixed scale factor; for the RW-MH we need the following: 𝛼(𝜽, 𝜽′ |Dk ) q(𝜽, 𝜽′ |Dk ) the probability to move from 𝜽 to 𝜽′ and its corresponding proposal density. . (1) Calculate MLE, 𝜽̂ k , and the observed information matrix Σ̂ −1 k (2) Generate 𝜽(1,…,M) ∼ 𝜋 (𝜽 |D ) by an RW-MH using a normal proposal with covariance 𝜏k Σ̂ k . k k k k ̃ (3) Approximate the posterior median of 𝜋k (𝜽k |Dk ) by 𝜃k = median(𝜽(1,…,M) ). k ̃k . (4) Generate J draws, 𝜽(1,…,J) , from the normal proposal with mean 𝜃 k (5) Approximate the posterior ( density,)by( ( ) ) ( ) ∑M ̃k |Dk q 𝜽(m) , 𝜃̃k |Dk ∕ ∑J 𝛼 𝜃̃k , 𝜽(j) |Dk . 𝜋k 𝜃̃k |Dk = m=1 𝛼 𝜽(m) , 𝜃 j=1 k k ( ) ( ) k( ) ( ) ̃ ̃ ̃ ̃ (6) Approximate the predictive density at 𝜃̃k by 𝜃 = f D m ̃ | 𝜃 ( )k k ( ) k k k 𝜋k 𝜃k ∕𝜋k 𝜃k |Dk . Output: approximation of the Bij ( y) as m ̃ i 𝜃̃i ∕m ̃ j 𝜃̃j . Algorithm 1: Approximation of the Bij ( y)

Acknowledgements Maria Eugenia Castellanos was partially supported by Ministerio de Ciencia e Innovación grant MTM2010-19528 and the visiting professor program of the Regione Autonoma della Sardegna. Stefano Cabras has been partially supported by Ministerio de Ciencia e Innovación grant ECO2012-38442, RYC-2012-11455, and Silvia Perra was partially supported by Ministero dell’Istruzione, dell’Univesità e della Ricerca of Italy.

References

Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

4653

1. Sant M, Allemani C, Santaquilani M, Knijn A, Marchesi F, Capocaccia R. EUROCARE Working Group. Eurocare-4. survival of cancer patients diagnosed in 1995-1999. Results and commentary. European Journal of Cancer 2009; 45(6): 931–991. 2. Langer C, Besse B, Gualberto A, Brambilla E, Soria J. The evolving role of histology in the management of advanced non-small-cell lung cancer. Journal of Clinical Oncology 2010; 28:5311–5320.

S. CABRAS, M. E. CASTELLANOS AND S. PERRA 3. Armero C, Cabras S, Castellanos ME, Perra S, Quirós A, Oruezábal MJ, Sánchez-Rubio J. Bayesian analysis of a disability model for lung cancer survival. Statistical Methods in Medical Research 2012. 4. Kass R, Raftery A. Bayes factors. Journal of the American Statistical Association 1995; 90:773–795. 5. O’Hagan A. Fractional Bayes factors for model comparison. Journal of the Royal Statistical Society. Series B (Methodological) 1995; 57(1):99–138. 6. Berger JO, Pericchi LR. The intrinsic Bayes factor for model selection and prediction. Journal of the American Statistical Association 1996; 91(433):109–122. 7. Berger JO, Pericchi LR. Training samples in objective Bayesian model selection. The Annals of Statistics 2004; 32(3): 841–869. 8. Berger JO. Bayes factors. In Encyclopedia of Statistical Sciences, Kotz S, Read S, Banks D (eds), Vol. 3. Wiley: New York, 1999; 20–29. 9. Berger JO, Pericchi LR. Objective Bayesian methods for model selection: introduction and comparison. In Model Selection, Lahiri P (ed.), Institute of Mathematical Statistics, vol. 38, 2001; 135–207. 10. Yang R, Berger J. A catalogue of noninformative priors. Technical Report ISDS Discussion Paper 97-42, Duke University, Durham (NC), 1997. 11. Zellner A, Siow A. Posterior odds ratios for selected regression hypotheses. In Bayesian Statistics 1, Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds), Vol. 31. Springer Berlin: Heidelberg, 1980; 585–603. 12. Moreno E, Girón F. Comparison of Bayesian objective procedures for variable selection in linear regression. Test 2008; 3:472–492. 13. Berger JO, Pericchi LR. Accurate and stable Bayesian model selection: the median intrinsic Bayes factor. Sankhy?: The Indian Journal of Statistics, Series B 1998; 60(1):1–18. 14. Moreno E, Bertolino F, Racugno W. An intrinsic limiting procedure for model selection and hypotheses testing. Journal of the American Statistical Association 1998; 93(444):1451–1460. http://www.jstor.org/stable/2670059. 15. Moreno E, Bertolino F, Racugno W. Default Bayesian analysis of the Behrens-Fisher problem. Journal of Statistical Planning and Inference 1999; 81(2):323–333. http://www.sciencedirect.com/science/article/pii/S0378375899000701. 16. Lingham RT, Sivaganesan S. Intrinsic Bayes factor approach to a test for the power law process. Journal of Statistical Planning and Inference 1999; 77(2):195–220. 17. Kim SW, Sun D. Intrinsic priors for model selection using an encompassing model with applications to censored failure time data. Lifetime Data Analysis 2000; 6(3):251–269. 18. Bayarri M, Berger J, Forte A, Garc´ıa-Donato G. Criteria for Bayesian model choice with application to variable selection. The Annals of Statistics 2012; 40(3):1550–1577. 19. Bayarri M, Berger J, Pericchi LR. The elusive concept of effective sample size. ISBA Regional Meeting and International Workshop/Conference on Bayesian Theory and Applications (IWCBTA), (January 6-10, Varanasi, India), Varanasi (India), 2013. 20. Berger J, Mortera J. Discussion on ‘Fractional Bayes factor for model comparison’. Journal of the Royal Statistical Society. Series B (Methodological) 1995; 57:130–131. 21. Moreno E. Bayes factors for intrinsic and fractional priors in nested models: Bayesian robustness. IMS Lecture Notes 1997; 31:257–270. 22. Barbieri MM, Berger JO. Optimal predictive model selection. The Annals of Statistics 2004; 32(3):870–897. 23. Varshavsky J. On the development of intrinsic Bayes factors. PhD Thesis, Purdue University, 1995. 24. De Santis F, Spezzaferri F. Alternative Bayes factors for model selection. Canadian Journal of Statistics 1997; 25(4): 503–515. 25. Chib S, Jeliazkov I. Marginal likelihood from the Metropolis–Hastings output. Journal of the American Statistical Association 2001; 96(453):270–281. 26. Volinsky CT, Raftery AE. Bayesian information criterion for censored survival models. Biometrics 2000; 56(1):256–262. 27. Bertolino F, Racugno W, Moreno E. Bayesian model selection approach to analysis of variance under heteroscedasticity. Journal of the Royal Statistical Society. Series D (The Statistician) 2000; 49(4):503–517. 28. Scott J, Berger J. Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics 2010; 38(5):2587–2619. 29. Kardaun O. Statistical analysis of male larynx-cancer patients – a case study. Statistical Nederlandica 1983; 37:103–126. 30. Klein JP, Moeschberger ML. Survival analysis: techniques for censored and truncated data 2nd ed. Springer: New York, 2003. 31. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002; 64(4):583–639. 32. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis, Second Edition. Chapman & Hall/CRC: New York (NY), 2003. 33. Celeux G, Forbes F, Robert C, DM T. Deviance information criteria for missing data models. Bayesian Analysis 2006; 1(4):651–674. 34. Ando T. Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika 2007; 94(2):443–458.

4654 Copyright © 2014 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 4637–4654

Comparison of objective Bayes factors for variable selection in parametric regression models for survival analysis.

This paper considers the problem of selecting a set of regressors when the response variable is distributed according to a specified parametric model ...
642KB Sizes 0 Downloads 4 Views