Research Article Received 16 September 2014,

Accepted 4 April 2015

Published online 30 April 2015 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/sim.6517

A Bayesian mixture of semiparametric mixed-effects joint models for skewed-longitudinal and time-to-event data Jiaqing Chena and Yangxin Huangb*† In longitudinal studies, it is of interest to investigate how repeatedly measured markers in time are associated with a time to an event of interest, and in the mean time, the repeated measurements are often observed with the features of a heterogeneous population, non-normality, and covariate measured with error because of longitudinal nature. Statistical analysis may complicate dramatically when one analyzes longitudinal–survival data with these features together. Recently, a mixture of skewed distributions has received increasing attention in the treatment of heterogeneous data involving asymmetric behaviors across subclasses, but there are relatively few studies accommodating heterogeneity, non-normality, and measurement error in covariate simultaneously arose in longitudinal–survival data setting. Under the umbrella of Bayesian inference, this article explores a finite mixture of semiparametric mixed-effects joint models with skewed distributions for longitudinal measures with an attempt to mediate homogeneous characteristics, adjust departures from normality, and tailor accuracy from measurement error in covariate as well as overcome shortages of confidence in specifying a time-to-event model. The Bayesian mixture of joint modeling offers an appropriate avenue to estimate not only all parameters of mixture joint models but also probabilities of class membership. Simulation studies are conducted to assess the performance of the proposed method, and a real example is analyzed to demonstrate the methodology. The results are reported by comparing potential models with various scenarios. Copyright © 2015 John Wiley & Sons, Ltd. Keywords:

Bayesian inference; longitudinal data analysis; mixture of hierarchical joint models; skew-t distributions; time-to-event

1. Introduction Joint analysis of event times and longitudinal measures is an active area of biostatistics and statistics research. There have been a considerable number of statistical approaches in the literature to explore the relationship between a time-to-event outcome and a longitudinal marker [1–7]. For example, the relationship of HIV viral suppression and immune restoration after a treatment has received great attention in HIV/AIDS research [8]. One is often interested in simultaneously studying the HIV dynamics and immune restoration, which may be characterized by the time to decline of CD4/CD8 ratio [9]. The research was motivated by the AIDS study [8] to understand within subject patterns of change in HIV-1 RNA copies (also referred to as viral load) or CD4 cell count and to study the relationship of characteristics of viral load and CD4 profiles with time to decrease in CD4/CD8 ratio. Figure 1 presents the profiles of viral load in log10 scale, CD4 cell count and CD4/CD8 ratio trajectories for three representative patients. It is seen from Figure 1 that compared with the CD4 process, the CD4/CD8 ratio over time looks less wavering and its trend seems more closely associated with the viral dynamics. Researchers may often confront the task of developing inference from data where longitudinal outcomes of interest may follow heterogeneous (not homogeneous) characteristics, exhibit departure from a Department b Department

of Statistics, College of Science, Wuhan University of Technology, Wuhan, Hubei 430070, China of Epidemiology and Biostatistics, College of Public Health, University of South Florida, Tampa, FL 33612,

U.S.A.

2820

*Correspondence

to: Yangxin Huang, Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, Tampa, FL 33612, U.S.A. † E-mail: [email protected]

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

J. CHEN AND Y. HUANG

Figure 1. Profiles of viral load in log10 scale, CD4 cell count and CD4/CD8 ratio trajectories for three representative patients. Trajectory class 1: decrease rapidly and constantly in a short-term period (dotted line with solid dot sign); class 2: decrease at the beginning and then maintain stable at a low level (solid line with circle sign) and class 3: decrease at the beginning, but rebound later (dashed line with plus sign).

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

2821

normality and/or outliers, and suffer from measure with substantial errors. Modeling such data has many challenges due to the following issues of inherent data features. First, in the literature, most studies of longitudinal modeling assume that all subjects come from a homogeneous population where large between and within individual variations were accommodated by random effects and/or time-varying covariates in the models. These typical large between and within individual variations is shown in Figure 1(a), viral load trajectory profiles of three representative patients in an AIDS clinical study [8] (refer to section 3.1 for details of this study and data description). These viral load trajectories can be roughly classified into three classes, which are biologically and clinically interpretable, with the following considerations. The original motivation based on mixture joint modeling to be conducted in this article was to cluster all patients into two classes with virologic suppression and failure (i.e., viral load rebound), respectively, which is of main interest from a clinical prospective. But, the other class is incorporated (class 1) to capture some patients who withdrew too early to be clustered into either class 2 with virologic suppression or class 3 with virologic failure. Thus, the number of components in this analysis was determined empirically based on the viral load trajectory patterns and clinical interpretability. Note that class 1 is referred as a confirmed ‘short-term virologic response’ because of early dropout, which may not indicate virologic suppression in long-term treatment. We, therefore, can reasonably assume that these patients are from a population that consists of three relatively homogeneous classes (refer to section 3.1 for detail) and, thus, it is motivated to consider a finite mixture of models for such data set. Second, most of the published methods assume that the error terms in the models for the longitudinal response and/or covariates follow normal distributions because of mathematical tractability and computational convenience [6, 10–13]. This requires the variables to be ‘symmetrically’ distributed. A violation of this assumption could lead to misleading inferences [14–19]. In fact, observed data in HIV studies are often far from being ‘symmetric’, and asymmetric patterns of observations usually occur. Third, measurement error in covariate is another typical feature of longitudinal data, and ignoring this phenomenon may result in biased statistical inference. This is usually the case in the longitudinal studies; for instance, CD4 cell counts are often measured with substantial measurement errors and exhibit non-normal patterns [6, 11, 13, 17]. To ensure the normality and related assumptions in measurement error models, transforming the covariate variable is one of the most standard methods that might work satisfactorily in some areas of applications (refer to Ho and Lin [16] for detailed discussion). Although such transformation-based methods may yield reasonable empirical results, they should be avoided if a more suitable theoretical model can be found [18]. Some reasons for this are the following: (i) transforming usually provides reduced information on an underlying data generation mechanism; (ii) the transformations are typically applied to each component separately, and joint normality is not ensured; (iii) the transformed variables are more difficult to deal with regarding interpretation, especially when transformations are different for the various coordinates; and (iv) multivariate homoscedasticity often requires a different transformation from the one to get normality. To deal with the problem of departure from normality in models and eliminate the need of ad hoc data transformations, some proposals have been considered in the literature; for example, finite normal mixtures were used to replace the normal assumption [14, 20]. A limiting feature of the aforementioned approach is the assumption of normally distributed variables within each latent class. As Muthén and Asparouhov [21] claimed, with strongly non-normal outcomes, this means that several latent classes are required to capture the observed variable distributions; an advantage of relaxing the assumption of within

J. CHEN AND Y. HUANG

class normality is that a non-normal observed distribution does not require using more than one class to fit data, which is more parsimonious in line with less parameters in such models. Thus, from a practical viewpoint, one needs to seek a more robust parametric family of distributions in regulating possible non-normal and/or heavy-tailed outcomes. Finally, longitudinal studies, where repeated measurements on continuous variables, an observation on a possibly censored time-to-event, and additional covariate information are collected for each subject, are commonplace in medical research. Interest often focuses on interrelationships between these variables [2–4, 6, 7]. A joint model that links the hazard to these longitudinal measures, which can also incorporate information of some typical data features, is becoming essential and increasingly powerful in many applications. The majority of statistical literature for the joint modeling of longitudinal and/or event time data has focused on the development of models that aim at capturing only specific aspects of the motivating case studies. These specific features of data considered in the literature include longitudinal data with, but are not limited to, censored response [13, 22], mixture response [20, 23–25], measurement errors in covariates [11, 13, 17, 26], and longitudinal–survival (or event time) outcomes [1, 3, 6, 7]. However, there are relatively few studies on simultaneous inference for longitudinal data with features of heterogeneity, non-normality, and measurement error in longitudinal response and/or covariate as well as event time data incorporated. It is not clear how heterogeneity, non-normality, measurement error in covariate, and event time of data may interact and simultaneously influence inferential procedures. Statistical inference and analysis complicate dramatically when all of these issues are present. In this article, we develop a mixture of semiparametric mixed-effects joint models to investigate the effects on inference when all of these typical data features exist. In particular, we consider a mixture of joint models with the following components to quantify these data features: (i) a mixture of semiparametric mixed-effects models with skewed distributions for response process. The skewed distributions include the skew-normal (SN) and skew-t (ST) distributions as special cases. (ii) Linear mixed-effects models with skewed distributions for covariate process. (iii) Cox proportional hazards model for event time process, which is linked to longitudinal models through random-effects that characterize the underlying individual-specific longitudinal processes. We present a more flexible and robust Bayesian inferential approach to modeling skewed-longitudinal and time-to-event measures to estimate both model parameters and class membership probabilities based on a finite mixture (with nonlinear mean functions) of joint models. Approaching the mixture of joint modeling problem from Bayesian perspective is more natural and straightforward. It avoids many complicated approximations required by the frequentist approach. Yet, with noninformative priors, we can still get maximum likelihood-type estimates. Different versions of the multivariate ST distribution have been proposed and used in the literature [15, 16, 18, 19, 27–30]. A new class of distributions by introducing skewness in multivariate elliptical distributions were developed in publication [19]. The class, which is obtained by using transformation and conditioning, contains many standard families including the multivariate ST distribution. It is noted that the ST distribution reduces to a t-distribution if skewness parameter is zero (refer to Appendix A for detail). Thus, we use an ST distribution to develop a mixture of joint models-based statistical methodologies. In what follows, we consider multivariate ST distribution introduced by Sahu et al. [19], which is suitable for a Bayesian inference. The remainder of this article evolves as follows. A mixture of joint model setup is constructed, and associated simultaneous Bayesian inferential procedure for estimating both model parameters and probabilities of class membership is presented in section 2. In section 3, we describe the data set that motivated this research and discuss the specific mixture of joint models formulated by three mean functions of different mixture components for viral load response, CD4 covariate process, and time to decline of CD4/CD8 ratio that are used to illustrate the proposed methodologies. In section 4, we demonstrate to apply the proposed methodologies to an AIDS data set described in section 3 by jointly modeling the viral dynamics and the time to decrease in CD4/CD8 ratio in the presence of CD4 count with measurement error and report the analytic results. Section 5 is devoted to simulation studies evaluating the performance of the proposed mixture of joint models-based method. General discussion is included in section 6.

2. A mixture of joint models and Bayesian inferential procedure

2822

In this section, we present the mixture of joint models and associated methodologies in a general form, illustrating that our methods may be applicable in other applications. Figure 1 presents the trajectories of HIV viral load, CD4 cell count, and CD4/CD8 ratio of three randomly selected subjects from an AIDS Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

J. CHEN AND Y. HUANG

clinical trial [8]. While the data indicate a close association of the longitudinal viral load, CD4 cell count and CD4/CD8 ratio, the data show a large variation in the association across subjects and over time within each subject. This observation together with the published researches in the area [3, 6, 11, 12, 23] led us to propose the following mixture of semiparametric mixed-effects joint models for the three components of concern. 2.1. ST-measurement error models for covariate process This section briefly discusses covariate measurement error models. Various covariate mixed-effects models were investigated in the literature [13, 31]. Denote the number of subjects by n and the number of measurements on the ith subject by ni . For simplicity, we consider a single time-varying covariate. Let zi = (zi1 , … , zini )T with zij being the observed covariate value for individual i at time tij (i = 1, … n; j = 1, … , ni ). In the presence of covariate measurement error, we employ the following linear mixed-effects (LME) model with ST distribution for covariate process. zi = uTi 𝜶 + vTi ai + 𝝐 i

(

) ≡ z∗i + 𝝐 i ,

( ) iid 𝝐 i ∼ STni ,𝜈1 −J(𝜈1 )𝛿1 𝟏ni , 𝜎12 Ini , 𝛿1 Ini ,

(1)

where z∗i = (z∗i1 , … , z∗in )T and z∗ij = uTij 𝜶 + vTij ai may be viewed as the true (but unobservable) covariate i value at time tij , ui = (uTi1 , … , uTin )T and vi = (vTi1 , … , vTin )T are l × ni design matrices, 𝜶 = (𝛼1 , … , 𝛼l )T i i and ai = (ai1 , … , ail )T are unknown population (fixed-effects) and individual-specific (random-effects) parameter vectors, respectively, 𝝐 i = (𝜖i1 , … , 𝜖ini )T follows a multivariate ST distribution with degrees of freedom 𝜈1 , unknown scale parameter 𝜎12 , and skewness parameter 𝛿1 , J(𝜈1 ) = (𝜈1 ∕𝜋)1∕2 [Γ((𝜈1 − 1)∕2)∕Γ(𝜈1 ∕2)], and 𝟏ni = (1, … , 1)T . Note that −J(𝜈1 )𝛿1 𝟏ni is specified here to make the ST distribution with mean zero [19]. We assume the skewness matrix being 𝛿1 Ini , indicating that we are interested in iid

skewness of overall pooled measurements from all the patients. We assume that ai ∼ Nl (0, 𝚺a ), where 𝚺a is unrestricted covariance matrix. Model (1) may be interpreted as an ST-measurement error model in covariate that incorporates skewness exhibited in the data. 2.2. ST-mixture of semiparametric mixed-effects models for response process Several studies [20, 21, 24, 25] introduced finite mixture models for longitudinal data analysis, where the latent classes correspond to the mixture components and cluster individuals may provide a better inference. However, most finite mixture models for longitudinal data are currently based on linear (polynomial) [20, 21] or piecewise linear [24, 25] mean functions. The partial reason is that the computation for inference can be conveniently carried out because the likelihood function of a model based on these linear mean functions has a closed form [20]. However, in practice, most longitudinal trajectories appear to be nonlinear patterns. When a mixture model is extended to incorporate nonlinear mean functions, which will be conducted in this article, inferential procedures will complicate dramatically because a closed form of likelihood function no longer exists. We assume that there are K plausible nonlinear trajectory classes with mean functions gk (⋅) (k = 1, … , K), which are known to be specified. The true trajectory∑ mean function of the ith subject might be gk (⋅) with unknown probability 𝜋k = P(ci = k) which satisfies Kk=1 𝜋k = 1, where ci is a latent indicator. An ST-SME model for individual i, given ci = k, can be formulated by ( ) ( ) iid (yij |ci = k) = gk tij , A†k 𝜷 †ij , 𝜑(tij ) + eij , ei ∼ STni ,𝜈2 −J(𝜈)𝛿2 1ni , 𝜎22 Ini , 𝛿2 Ini , [ ] ( ) [ ] iid 𝜷 †ij = h† Zij , 𝜷 † , b†i , 𝜑(tij ) = 𝜐 w(tij ), hi (tij ) , b†i ∼ Ns3 𝟎, 𝚺†b ,

(2)

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

2823

where h† (⋅) and 𝜐(⋅) are known parametric functions, w(t) and hi (t) are unknown nonparametric smooth fixed-effects and random-effects functions, respectively, hi (t) are iid realizations of a zero-mean stochastic process; 1ni = (1, … , 1)T , 𝜷 †ij is s1 ×1 individual-specific time-dependent parameter vector, 𝜷 † is s2 ×1 population parameter vector (s2 ⩾ s1 ); the random error vector ei = (ei1 , ..., eini )T follows a multivariate ( ) ST distribution STni ,𝜈2 −J(𝜈2 )𝛿2 𝟏ni , 𝜎22 Ini , 𝛿2 Ini with degrees of freedom 𝜈2 , unknown scale parameter 𝜎22 , and skewness parameter 𝛿2 ; s3 × 1 vector of random effects b†i follows multivariate normal distribution with mean zero and unrestricted covariance matrix 𝚺†b (s3 ⩽ s1 ); Ak (s1 × s1 ) (k = 1, … , K) is known

J. CHEN AND Y. HUANG

square indicator matrix, of which diagonal elements are either 0 or 1 and off-diagonal elements are all 0. In model (2), we assume that the individual-specific parameters 𝜷 †ij depend on Zij , a design matrix including time-independent and/or time-varying covariates, such as CD4 cell counts. Given ci = k, the ST-SME model (2) reverts to a commonly used nonlinear mixed-effects (NLME) model with ST distribution when the nonparametric parts w(t) and hi (t) are constants. To fit model (2), we apply a regression spline method to w(t) and hi (t). The working principle is briefly described as follows, and more details can be found in literature [32, 33]. The main idea of regression spline is to approximate w(t) and hi (t) by using a linear combination of spline basis functions. For instance, w(t) and hi (t) can be approximated by a linear combination of basis functions 𝚿p (t) = {𝜓0 (t), 𝜓1 (t), ..., 𝜓p−1 (t)}T and 𝚽q (t) = {𝜙0 (t), 𝜙1 (t), ..., 𝜙q−1 (t)}T , respectively. That is,

w(t) ≈ wp (t) =

p−1 ∑

𝜇k 𝜓k (t) = 𝚿p (t)T 𝝁p , hi (t) ≈ hiq (t) =

k=0

q−1 ∑

𝜉ik 𝜙k (t) = 𝚽q (t)T 𝝃 iq ,

(3)

k=0

where 𝝁p and 𝝃 iq (q ⩽ p) are the unknown vectors of fixed and random coefficients, respectively. Based on the assumption of hi (t), we can regard 𝝃 iq as iid realizations of a zero-mean random vector. For our model, we consider natural cubic spline bases with the percentile-based knots. To select an optimal degree of regression spline and numbers of knots, that is, optimal sizes of p and q, the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) is often applied [32, 33]. Substituting w(t) and hi (t) by their approximations wp (t) and hiq (t), we can approximate yij given ci = k in model (2) in a compact way as follows. ( [ ] [ ]) ( ) (yij |ci = k) = gk tij , A†k h† Zij , 𝜷 † , b†i , 𝜐 𝚿p (tij )T 𝝁p , 𝚽q (tij )T 𝝃 iq + eij ≡ gk tij , Ak h(Zij , 𝜷, bi ) + eij , (4) , 𝝃 Tiq )T are the vectors of fixed-effects and random-effects, respecwhere 𝜷 = (𝜷 †T , 𝝁Tp )T and bi = (b†T i tively, and h(⋅) is a known but possibly nonlinear function. Thus, for given 𝚿p (t) and 𝚽q (t), we approximate the ST-SME model (2) by the following ST-NLME model in vector form: ( ) ( ) iid ( yi |ci = k) = gk ti , Ak 𝜷 ij + ei , ei ∼ STni ,𝜈2 −J(𝜈2 )𝛿2 1ni , 𝜎22 Ini , 𝛿2 Ini , iid

𝜷 ij = h[Zij , 𝜷, bi ], bi ∼ Ns4 (𝟎, 𝚺b ),

(5)

where gk (ti , Ak 𝜷 i ) = (gk (ti1 , Ak 𝜷 i1 ), … , gk (tini , Ak 𝜷 ini ))T , ti = (ti1 , … , tini )T , 𝜷 ij = (𝛽ij1 , … , 𝛽ijs )T being a vector of subject-specific parameters for the ith individual, s4 = s3 + q, and 𝚺 is an unstructured covariance matrix. Similarly, Ak (k = 1, … , K) is known s×s indicator matrix, of which diagonal elements are either 0 or 1 and off-diagonal elements are all 0. Ak is introduced because the mean functions of yij (i = 1, … n; j = 1, … , ni ), specified by the nonlinear functions g1 (⋅), … , gK (⋅), may only involve different subsets of 𝜷 ij . By introducing Ak , Ak 𝜷 ij will set unrelated elements of 𝜷 ij to 0 in the kth trajectory class, respectively. We will illustrate the use of Ak and specify nonlinear mean functions gk (⋅) in the HIV dynamic application in the following. We assume that ei and bi are independent of each other. It is easily shown that model (5) can be specified conditionally and marginally as follows, respectively. ( ) iid ( yi |ci = k) ∼ STni ,𝜈2 gk (ti , Ak 𝜷 ij ) − J(𝜈2 )𝛿2 1ni , 𝜎 2 Ini , 𝛿2 Ini , iid

𝜷 ij = h[Zij , 𝜷, bi ], bi ∼ Ns4 (𝟎, 𝚺b ), iid

yi ∼

∑K k=1

( ) 𝜋k STni ,𝜈2 gk (ti , Ak 𝜷 ij ) − J(𝜈2 )𝛿2 𝟏ni , 𝜎 2 Ini , 𝛿2 Ini , iid

𝜷 ij = h[Zij , 𝜷, bi ], bi ∼ Ns4 (𝟎, 𝚺b ).

(6)

(7)

2824

The preceding model (7) defines a finite mixture of ST-NLME models, which is an approximation from a finite mixture of ST-SME models. In model (7), the vector of mixture probabilities 𝝅 = (𝜋1 , ..., 𝜋K )T can be also viewed as the mixture weights of all plausible components under the framework of finite Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

J. CHEN AND Y. HUANG

mixture models. Model (7) is identifiable, as long as each of the component models is identifiable and distinguishable from each other. When the component models are identifiable but not distinguishable from each other, certain constraints may be required to make model (7) identifiable [34]. 2.3. Survival model for time-to-event For the time-to-event process, we assume that the distribution of ℑi , the event time for the ith subject, depends on the random-effects ai and bi , representing individual-specific longitudinal processes. We therefore consider a frailty model for ℑi , which is linked to both the ST-covariate model (1) and STmixture of response model (7) through the random-effects ai and bi . Specifically, we assume that the conditional hazard rate of ℑi at time ti is ( ( ) ) 𝜆(ti |ai , bi ) = 𝜆0 (ti ) exp 𝜸 T1 ai + 𝜸 T2 bi == 𝜆0 (ti ) exp 𝜸 T di ,

(8)

)T )T ( ( where 𝜆0 (ti ) is an unspecified baseline hazard function, di = aTi , bTi , 𝜸 = 𝜸 T1 , 𝜸 T2 , 𝜸 1 and 𝜸 2 are unknown parameters linking the random-effects ai and bi to the conditional hazard rate, respectively. For the previous survival model, we assume that all individuals have the same set of measurement schedules, although for the longitudinal model, we allow the response measurement schedules to be somewhat different across individuals. Thus, equations (1), (7), and (8) form the finite mixture of joint models conducted in this article. It is noticed that Muthén and Asparouhov [21] considered survival model related to the latent classes of the growth, whereas equation (8) relates survival to random-effects. See further discussion on this relationship in section 6. Let 𝝆i = (𝜌i1 , … , 𝜌ini )T be a vector of censoring (‘event’) indicator for individual i: 𝜌ij = 1 or 0 if the event has happened or not by time tij . We assume that 𝜌i1 = 0 for all i. For individual i , let ℑi be the time to an event (or the duration time until an event occurs) and assume P(ℑi < ∞) = 1. In the AIDS example in succeeding discussions, ℑi is the time to to first decline of the CD4/CD8 ratio (denoted by xij for subject i at time tij ), which is usually not observable but confirmed in practice by an observed decrease of the CD4/CD8 ratio (Figure 1(c)). Specifically, if there is m ⩽ ni such that m is the smallest number with xi,m−1 < xim and xi,m−1 < xi,m+1 , in practice, we take 𝜌i1 = · · · = 𝜌i,m−1 = 0 and 𝜌im = · · · = 𝜌ini = 1, that is, ti,m−1 < ℑi ⩽ tim (the decline takes place during the time period (ti,m−1 , tim ]). If there is no such a number m, we view 𝜌ij = 0 for j = 1, … , ni , and thus ℑi > tini . This type of event time data structure is referred to as interval-censored event times. From equation (8), the probability of time-to-event is given as ( ) P ℑ ⩾ t ( ) ( ) i im pim = P 𝜌im = 1|𝜌im∗ = 0, 0 ⩽ m∗ < m = 1 − P ℑi ⩾ tim |ℑi > ti,m−1 = 1 − ( ) P ℑi ⩾ ti,m−1 ) ( (9) tim ( T ) ( ( )) T 𝜆 (t)dt exp 𝜸 di = 1 − exp − exp 𝛾0m + 𝜸 di , = 1 − exp − ∫ti,m−1 0 ( ) t where 𝛾0m = log ∫t im 𝜆0 (t)dt , m = 1, … , ni . Given the current observation mechanism, one needs i,m−1 only to handle the finite number of parameters 𝛾0m instead of the unknown baseline hazard function 𝜆0 (t) in parameter estimation-based method. The contribution to the likelihood from the time-to-event part for the ith subject is denoted by f (𝝆i |di ). Thus, we have f (𝝆i |di ) =

ni ∏ ( ) f 𝜌im |𝜌im∗ , 0 ⩽ m∗ < m; di ,

(10)

m=1 𝜌

where f (𝜌im |𝜌im∗ , 0 ⩽( m∗ < m; di)) = pimim (1 − pim()1−𝜌im and 𝜌im ) equals 0 before an event and 1 after an event. Let vi = max tij ∶ 𝜌ij = 0 and ui = min tij ∶ 𝜌ij = 1 . With (10), the following probability of time-to-event can be expressed as vi

∫0

Copyright © 2015 John Wiley & Sons, Ltd.

)[ ( ) 𝜆0 (t)dt exp 𝜸 di 1 − exp − (

T

ui

∫vi

( ) 𝜆0 (t)dt exp 𝜸 T di

)] , (11)

Statist. Med. 2015, 34 2820–2843

2825

( ) ( P vi < ℑi ⩽ ui |di = exp −

J. CHEN AND Y. HUANG

where ui = ∞ if 𝜌ij = 0 for j = 1, … , ni . The representation earlier is a way to approximate the Cox model through counting process. With this equation, it may reduce some computation burdens and simplify the presentation. 2.4. Simultaneous Bayesian inferential approach In a longitudinal study, such as the AIDS study described previously, the longitudinal response and covariate processes and the time-to-event are usually connected physically or biologically. They can be jointly modeled through the shared random-effects. This joint modeling is often biologically meaningful. Statistical inference on all of the model parameters can then be made simultaneously. Thus, the underlying association can be fully addressed or captured, and the uncertainty is incorporated from the estimation. Although a simultaneously-inferential method based on a joint likelihood via expectation–maximization algorithm adopted by Wu et al. [6] and Muthén and Asparouhov [21] for the longitudinal covariate and/or response data with heterogeneity, non-normality, and measurement error as well as event time data may be favorable, the computation associated with the joint likelihood inference in such models with a skewed distribution (i.e., ST distribution) for longitudinal–survival data can be extremely intensive and, particularly, may lead to convergence problems [1, 6]. Here, we propose a fully Bayesian inferential method via Markov chain Monte Carlo (MCMC) procedure for the ST-mixture of joint models (1), (7) and (10) to estimate both class membership probabilities and all parameters simultaneously based on the joint likelihood of the longitudinal-survival data and specified prior distributions. Thus, the Bayesian joint modeling approach may pave a way to alleviate the computational burdens and to overcome convergence problems for such complex model setting. Let 𝜽 = {𝜶, 𝜷, 𝜸, 𝜎12 , 𝜎22 , 𝚺a , 𝚺b , 𝜈1 , 𝜈2 , 𝛿1 , 𝛿2 } be the collection of unknown population parameters in models (1), (7), and (10) except for the mixture weight 𝝅 in the mixture model (7). Under the Bayesian framework, we need to specify prior distributions for all the unknown parameters as follows. 𝜶 ∼ Nl (𝝉 1 , 𝚲1 ), 𝜎12 ∼ IG(𝜔1 , 𝜔2 ), 𝚺a ∼ IW(𝛀1 , 𝜂1 ), 𝛿1 ∼ N(0, 𝜂3 ), 𝜷 ∼ Ns5 (𝝉 2 , 𝚲2 ), 𝜎22 ∼ IG(𝜔3 , 𝜔4 ), 𝚺b ∼ IW(𝛀2 , 𝜂2 ), 𝛿2 ∼ N(0, 𝜂4 ),

(12)

𝜸 ∼ Ns6 (𝝉 3 , 𝚲3 ), 𝜈1 ∼ Exp(𝜈10 )I(𝜈1 > 2), 𝜈2 ∼ Exp(𝜈20 )I(𝜈2 > 2), where s5 = s2 + p and s6 = l + s4 ; the mutually independent IG, normal (N), exponential (Exp) and IW prior distributions are chosen to facilitate computations. The super-parameter matrices 𝚲1 , 𝚲2 , 𝚲3 , 𝛀1 , and 𝛀1 can be assumed to be diagonal for convenient implementation. The exponential priors for 𝜈1 and 𝜈2 are truncated to lie above 2 to make variance of ST distribution well-defined; refer to equation (A.3) in Appendix A for details). By its definition, the latent indicating variables ci (i = 1, … , n) follow a categorical distribution (Cat) iid

ci ∼ Cat((1, 2, … , K), (𝜋1 , 𝜋2 , … , 𝜋K )),

(13)

in which 𝝅 = (𝜋1 , 𝜋2 , … , 𝜋K )T follows a Dirichlet distribution(Dir) [35, 36], 𝝅 ∼ Dir(𝜍1 , 𝜍2 , … , 𝜍K ).

(14)

Under the umbrella of a mixture of joint models (1), (7), and (10), an MCMC procedure can be specified by following two iterative steps: (i) Sampling class membership indicators ci (i = 1, … , n), conditional on population parameters, 𝜽, and individual random-effects, ai and bi . Generate ci (i = 1, ..., n) from 𝜋 f ( yi |ai , bi , ci = k, 𝜽) P(ci = k|ai , bi , 𝜽, yi ) = ∑K k ( ), k=1 𝜋k f yi |ai , bi , ci = k, 𝜽

(15)

2826

where f ( yi |ai , bi , ci = k, 𝜽) (k = 1, … , K) is a conditional density function of yi based on (6). Then, the probabilities 𝝅 can be updated from the following distribution for next iteration ) ( ) ( (16) 𝝅|num1 , … , numK ∼ Dir 𝜙1 + num1 , … , 𝜙K + numK , ∑ where numk = ni=1 I(ci = k), (k = 1, … , K), in which I(⋅) is an indicator function. Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

J. CHEN AND Y. HUANG

(ii) Sampling parameters 𝜽, and individual random-effects bi , conditional on class membership indicator c = (c1 , … , cn )T . Following the study by Sahu et al. [19], it can be shown, conditional on ci determined in step (i), that by introducing two ni × 1 random vectors wi1 = (wi11 , … , wini 1 )T and wi2 = (wi12 , … , wini 2 )T based on the stochastic representation for the ST distribution (see Appendix A for details) that zi and yi in connection with the time-to-event model (10) can be hierarchically formulated as follows. ( ) zi |ai , wi1 ∼ tni ,𝜈1 +ni z∗i + 𝛿1 [wi1 − J(𝜈1 + ni )𝟏ni ], 𝜁i1 𝜎12 Ini , ( ) yi |zi , ai , bi , wi2 , ci ∼ tni ,𝜈2 +ni gci (ti , Aci 𝜷 ij ) + 𝛿2 [wi2 − J(𝜈2 + ni )𝟏ni ], 𝜁i2 𝜎22 Ini , (17) w ∼ t (𝟎, I )I(w > 𝟎), w ∼ t (𝟎, I )I(w > 𝟎), i1

ni ,𝜈1

ni

i1

i2

ni ,𝜈2

ni

i2

ai ∼ Nl (𝟎, 𝚺a ), bi ∼ Ns4 (𝟎, 𝚺b ), ℑi ∼ F(ti |di ) =



f (𝝆i |di ),

where 𝜁i1 = (𝜈1 + wTi1 wi1 )∕(𝜈1 + ni ), 𝜁i2 = (𝜈2 + wTi2 wi2 )∕(𝜈2 + ni ), tni ,𝜈 (𝝁, B) denote the ni -variate tdistribution with parameters 𝝁, B, and degrees of freedom 𝜈, I(w > 0) is an indicator function and w follow an ni -dimensional standard t-distribution tni ,𝜈 (0, Ini ) truncated in the space w > 0 (i.e., the standard half-t distribution). Note that an important advantage of the previous representations based on the hierarchical models under a Bayesian framework is that they allow one to easily implement the modeling approach using the freely-available WinBUGS software [37] and that the computational effort for the model with an ST distribution is almost equivalent to that for the model with a t-distribution. Let f (⋅|⋅), F(⋅|⋅) and h(⋅) denote a probability density function, cumulative density function (c.d.f.), and prior density function, respectively. Assume that the elements of 𝜽 are independent of each other, that is, 𝜋(𝜽) = 𝜋(𝜶)𝜋(𝜷)𝜋(𝜸)𝜋(𝜎12 )𝜋(𝜎22 )𝜋(𝚺a )𝜋(𝚺b )𝜋(𝜈1 )𝜋(𝜈2 )𝜋(𝛿1 )𝜋(𝛿2 ). After we specify the mixture of joint models, conditional on membership indicator, for the observed longitudinal-survival data and the prior distributions for the unknown model parameters, we can make statistical inference for the parameters based on their posterior distributions under the Bayesian framework. Thus, the joint posterior density of 𝜽 based on the observed data ℜ = {( yi , zi , zi0 , 𝝆i ), i = 1, … , n}, and classification indicator c can be given by { n ni ∏ ∏ f (𝜽|ℜ, c) ∝ f ( yi |zi , ai , bi , wi2 , ci )f (zi |ai , wi1 )f (wi1 |wi1 > 𝟎) ∫ ∫ j=1 i=1 } (18) f (wi2 |wi2 > 𝟎)f (𝝆i |d)f (ai )f (bi )dai dbi

𝜋(𝜽).

In general, the integrals in (18) are of high dimension and do not have a closed form. Analytic approximations to the integrals may not be sufficiently accurate. Therefore, it is prohibitive to directly calculate the posterior distribution of 𝜽 based on the observed data and class membership. As an alternative, the MCMC procedure can be used to sample population parameters, 𝜽, and random-effects, ai and bi (i = 1, ..., n), from conditional posterior distributions, based on (18), using the Gibbs sampler along with the Metropolis–Hastings algorithm. Steps (i) and (ii) are repeated alternatively in iterations of MCMC procedure until convergence is reached. An important advantage of the previous representations based on the mixture of hierarchical models with ST distribution is that they are easily implemented using the freely available WinBUGS software [37] interacted with a function called bugs in a package R2WinBUGS of R. The other advantage is that when WinBUGS software is used to implement our modeling approach, it is not necessary to explicitly specify the full conditional posterior distributions for parameters to be estimated. Although their derivations are straightforward by working the complete joint posterior, some cumbersome algebra will be involved. We, thus, omit those here to save space.

3. Motivating data set and specifications of a mixture of joint models 3.1. An AIDS clinical study

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

2827

The data set that motivated this research is from an AIDS clinical study [8]. This study was a randomized, open-label study comparing two different 4-drug regimens of lamivudine, zidovudine, and indinavir with

J. CHEN AND Y. HUANG

either efavirenz (Arm one) or nelfinavir (Arm two) with a standard 3-drug regimen of lamivudine, zidovudine, and indinavir (Arm three) for 517 subjects with no or limited previous experience with antiretroviral treatments (ART) who had a CD4 cell count ⩽200 cells/mm3 or a plasma HIV-1 RNA level ⩾ 80,000 copies/mL at screening and no or limited prior ARTs. The planned study duration was 72 weeks, which was subsequently increased to 96 weeks. The plasma HIV-1 RNA (viral load) is repeatedly quantified at weeks 0, 2, 4, 8, 16, and every 8 weeks until the last patient on study. Sixty-five subjects had no viral load measurements, and twenty-five subjects had less than three viral load measurements, which were excluded from this study. The remaining 427 subjects which the number of viral load measurements varies from 3 to 18 were included in data analysis. A log10 transformation of viral load was used in the analysis in order to stabilize the variation of the measurement errors and to speed up estimation algorithm. The CD4 and CD8 cell counts were also measured throughout study on a similar scheme. In the first 96 weeks of the study, 72% subjects have a confirmed virologic response (HIV-1 RNA level < 200 copies/mL) by week 24. The detailed data description can be found in [8]. Because viral load and CD4 count are important markers for evaluating virologic response and assessing immunologic response of an ART, respectively, and the time to decline of CD4/CD8 ratio characterizes immune restoration, our objective is to study the relationship of features of viral load and CD4 profiles with time to decrease in CD4/CD8 ratio based on the mixture of joint models. Because viral load is an important marker for assessing virologic response of an ART regimen, our objective is to develop a mixture of joint modeling for characterizing population-level and individuallevel viral load trajectories. As discussed in section 1, viral load trajectory profiles of three representative patients displayed in Figure 1(a) can be roughly classified into three classes in terms of the effect of treatment on viral load: for class 1, patient’s viral loads decrease rapidly and constantly in a short-term period (dotted line with solid dot sign) because of the fact that some patients withdrew too early to be clustered into either class 2 without viral load rebound or class 3 with viral load rebound below; for class 2, patient’s viral loads decrease at the beginning and then stay stable at a low level (solid line with circle sign). The classes 1 and 2 with suppression of plasma HIV-1 RNA levels indicate that the treatment can be thought successfully without serious clinical problems arising, suggesting a confirmed (short-term for class 1 or long-term for class 2) virologic response. While for class 3, patient’s viral loads decrease at the beginning, but they experience viral load increase, which results in viral load rebound eventually (dashed line with plus sign), implying a virologic failure. Along with these observations, we can reasonably assume that patients from a population consist of three relatively homogeneous classes. 3.2. Specifications of the mixture of joint models for HIV dynamics As is evident from Figure 1, the inter-patient variations in viral load appear to be large, and these variations change over time. Previous studies suggest that the inter-patient variation in viral load may be partially explained by time-varying CD4 cell count [11, 13, 17]. CD4 cell counts often have nonnegligible measurement errors, and ignoring these errors may lead to severely misleading results in statistical inference [31]. To model the CD4 covariate process conducted in this study, at the absence of theoretical rationale for the CD4 trajectories, we considered empirical polynomial LME models for the CD4 process specified by model (1), and choose the best model based on AIC and BIC values. Specifically, we consider the CD4 covariate model (1) with uij = vij = (1, tij , … , tijl−1 )T and focus on linear (l = 2), quadratic (l = 3), and cubic (l = 4) polynomials. Based on resulting AIC (BIC) values, we thus adopted the following quadratic polynomial LME model for the CD4 process. zij = (𝛼1 + ai1 ) + (𝛼2 + ai2 )tij + (𝛼3 + ai3 )tij2 + 𝜖ij ,

(19)

2828

where z∗ij = (𝛼1 + ai1 ) + (𝛼2 + ai2 )tij + (𝛼3 + ai3 )tij2 , 𝜶 = (𝛼1 , 𝛼2 , 𝛼3 )T is population (fixed-effects) parameter vector, ai = (ai1 , ai2 , ai3 )T is individual-specific (random-effects) vector with multivariate normal distribution N3 (𝟎, 𝚺a ). To avoid too small (large) estimates, which may be unstable, we standardize the time-varying covariate CD4 cell counts and rescale the original time (in days) so that the time scale is between 0 and 1. Following discussion by Lu and Huang [23], for the HIV dynamic application, we consider onecompartment and two-compartment models with constant decay rate(s) for trajectory classes 1 and 2, respectively, and propose a two-compartment model with a time-varying decay rate in the second compartment for trajectory class 3. Thus, the mean functions of K = 3 components in the mixture model are specified as follows. Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

J. CHEN AND Y. HUANG

(1) One-compartment model with a constant decay rate for class 1 trajectory ( ) g1 (tij , A1 𝜷 ij ) = log10 epi1 −𝜆i1 tij ;

(20)

(2) Two-compartment model with constant decay rates for class 2 trajectory ( ) g2 (tij , A2 𝜷 ij ) = log10 epi1 −𝜆i1 tij + epi2 −𝜆i2 tij ;

(21)

(3) Two-compartment model with constant and time-varying decay rates for class 3 trajectory ( ) g3 (tij , A3 𝜷 ij ) = log10 epi1 −𝜆i1 tij + epi2 −𝜆ij2 tij .

(22)

In (20)–(22), 𝛽i1 ≡ pi1 = 𝛽1 + bi1 , 𝛽i2 ≡ 𝜆i1 = 𝛽2 + 𝛽3 zi0 + bi2 , 𝛽i3 ≡ pi2 = 𝛽4 + bi3 , 𝛽i4 ≡ 𝜆i2 = 𝛽5 + bi4 , 𝛽ij5 ≡ 𝜆ij2 = 𝛽5 + 𝛽6 E(zij ) + 𝜑(tij ),

(23)

where 𝜷 ij = (𝛽i1 , 𝛽i2 , 𝛽i3 , 𝛽i4 , 𝛽ij5 )T , 𝜷 † = (𝛽1 , 𝛽2 , 𝛽3 , 𝛽4 , 𝛽5 , 𝛽6 )T , A1 = diag(1, 1, 0, 0, 0), A2 = diag (1, 1, 1, 1, 0) , A3 = diag(1, 1, 1, 0, 1), zi0 is the baseline CD4, E(zij ) = z∗ij + 𝛿1 [wij1 − J(𝜈1 + ni )] is the expected (but unobservable) value of CD4 at time tij according to formulation in (17), and z∗ij is defined in (19). The decay rate of the second compartment in (22), 𝛽ij5 , is time-varying because of E(zij ) and unknown nonparametric function 𝜑(tij ) ≡ w(tij ) + hi (tij ), but other parameters in 𝜷 ij are time independent. As mentioned previously, mean functions in different components may involve different subsets of 𝜷 ij ; for example, g1 (⋅) only involves parameters 𝛽i1 and 𝛽i2 , and g2 (⋅) and g3 (⋅) share the same parameters 𝛽i1 , 𝛽i2 , and 𝛽i3 but have different decay rates of the second compartment, 𝛽i4 and 𝛽ij5 , respectively. The diagonal indicator matrices, A1 , A2 , and A3 , determine which elements of 𝜷 ij are involved and set other unrelated parameters to be 0 in mean functions, g1 (⋅), g2 (⋅), and g3 (⋅), respectively. It is noted that (22) is a natural extension of (21) to employ a time-varying decay rate for capturing viral rebound in class 3 trajectories. It is noted that unlike traditional mixture models [34, 35], in which each component has the same family of densities but differs only in specific values of parameters such as in their means and/or variances, our mixture models specified here have completely different functional forms with parameters of different dimensions and meanings across the submodels. The specification of gk (.) in equations (20)–(22) (known and pre-specified) is thus motivated by ‘data driven’-based HIV viral load trajectories, where all patients were classified into three classes with shorter-term (class 1) or longer-term (class2) virologic suppression and virologic failure (i.e., viral load rebound specified by class 3), respectively, which is of main interest from a clinical prospective. We employ the linear combinations of natural cubic splines with percentile-based knots to approximate the nonparametric functions w(t) and hi (t). Following studies [11, 33], we set 𝜓0 (t) = 𝜙0 (t) ≡ 1 and take the same natural cubic splines in the approximations (3) with q ⩽ p. The values of p and q are determined based on the AIC/BIC, which suggest the following functions for 𝜑(tij ) with p = 3 and q = 1 in model (3): 𝜑(tij ) = w(tij ) + hi (tij ) ≈ 𝜇0 𝜓0 (tij ) + 𝜇1 𝜓1 (tij ) + 𝜇2 𝜓2 (tij ) + 𝜉i0 ,

(24)

( )T T where 𝝁3 = (𝜇0 , 𝜇1 , 𝜇2 )T and 𝜉i0 ≡ bi4 . Thus, we have 𝜷 = 𝜷 † , 𝝁T3 and bi = (bi1 , … , bi4 )T . Let ℑi be time to the first decline in the CD4/CD8 ratio for the ith subject. We are interested in the association of time to immune suppression of the individual-specific initial viral decay rates and the true CD4 trajectory, which are characterized by the random-effects in the viral load response and CD4 covariate models. We may view the (unobservable) random-effects as error-free covariates in time-to-event models. To focus on the primary issues discussed in the article, we consider the following conditional hazard rate of ℑi at time ti for time to first decline of the CD4/CD8 ratio. ( ( ) ) 𝜆(ti |di ) = 𝜆0 (ti ) exp 𝛾1 ai1 + 𝛾2 ai2 + 𝛾3 bi2 + 𝛾4 bi4 = 𝜆0 (ti ) exp 𝜸 T di , (25)

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

2829

where 𝜸 = (𝛾1 , … , 𝛾4 )T is a vector of parameters, 𝜆0 (ti ) is an unspecified baseline hazard function. One of our objectives is to test if the time to rebound Ti depends on the random-effects di = (ai1 , ai2 , bi2 , bi4 )T that characterize the underlying individual-specific longitudinal processes. It is noted that the random-effects

J. CHEN AND Y. HUANG

ai1 and ai2 capture the main features of individual CD4 trajectories (such as initial intercept and slope), so they are included in the model; although ai3 in the MLE model (19) quantifies variation of individual CD4 trajectories through a quadratic form, which seems not highly predictive for event times, it is excluded from model (25) to reduce number of parameters; the random-effects bi2 and bi4 represent individual variations in the first- and second-phase viral decay rates, respectively, so they may be predictive for event times. While bi1 and bi3 represent variations in the baseline viral loads, they do not appear to be highly predictive of event times, so they are excluded from the model to reduce the number of parameters. Given the CD4 and CD8 evaluation scheme, the times to decline of the CD4/CD8 ratio in the analysis were interval-censored. The median of the observed interval length was 34 days, while all subjects in the study experienced a CD4/CD8 decrease. Model (25) offers the following advantages: (i) the random-effects in the covariate model summarize the history of the covariate process, with individual-specific intercepts and rates of change, and the summary quantities are likely better predictors than the covariates at several particular times; (ii) the random-effects in the response model summarize individual variations in the first-phase and second-phase viral decay rates to predict time to immune suppression; and (iii) the link between the three models is made clear by the shared random-effects.

4. Analysis of AIDS data 4.1. Model implementation As discussed previously, the viral loads (in log10 scale) and CD4 cell counts described in section 3.1 exhibited the asymmetric feature. Thus, it seems plausible to fit models with a skew distribution to the data. Toward this end, the following three statistical models with specifying different distributions for model error in the finite mixture of joint models were employed to compare their performance: • Model T: A mixture of joint model with the t-distribution for response and covariate model errors where the three mean functions are specified by (20–22). • Model ST: A mixture of joint model with the ST distribution for response and covariate model errors where three mean functions are specified by (20–22). • Model C: A common joint model with the ST distribution for response and covariate model errors where viral load response model with the mean function specified by (22) alone .

2830

We conducted the following three scenarios. First, because a t-distribution is a special case of an ST distribution when the skewness parameter is zero, we investigated how asymmetric distribution for model error (Model ST) contribute to modeling results and estimation in comparison with a symmetric (t) distribution for model error (Model T). Second, we investigated a commonly used joint model with ST distribution (Model C) where the mean function in viral load response model is specified by (22) alone. This SME model for viral load response has been widely applied to analyze viral load data [11, 33] in significantly advancing the understanding of pathogenesis of HIV infection and the assessment of effectiveness of ART. We compared this commonly used joint model (ignoring data feature of heterogeneous population) with the mixture of joint model proposed in this article to explore how heterogeneous feature influences modeling results. We further estimated the model parameters by using the ‘naive’ method (NM), which ignores measurement error in CD4 covariate. That is, we modified Model ST to substitute the expected (unobservable) CD4 value E(zij ) by the observed CD4 value zij in (23). We used it as a comparison to the joint modeling (JM) approach proposed in section 3. This comparison attempted to investigate how the measurement error in CD4 cell count contribute to modeling results. To carry out the Bayesian inference, we took weakly informative prior distributions for the parameters. In particular, (i) fixed-effects were taken to be independent normal distribution N(0, 100) for each element of the population parameter vectors 𝜶, 𝜷, and 𝜸; (ii) we assume a noninformative IG prior distribution IG(0.01, 0.01), which has mean 1 and variance 100, for variance parameters 𝜎12 and 𝜎22 ; (iii) the priors for the variance–covariance matrices of the random-effects 𝚺a and 𝚺b were taken to be IW distributions IW(𝛀1 , 𝜂1 ) and IW(𝛀2 , 𝜂2 ), where the diagonal elements for diagonal variance matrix 𝛀1 and 𝛀2 were 0.01, and 𝜂1 = 𝜂2 = 4; (iv) the degrees of freedom parameter 𝜈1 and 𝜈2 followed truncated exponential distribution Exp(0.5)I(𝜈 > 2); (v) for the skewness parameters 𝛿1 and 𝛿2 , we chose independent normal distributions N(0, 100); and (vi) we set hyper-parameters of Dirichlet distribution in (14), 𝜙1 = 𝜙2 = 𝜙3 = 1, assuming individuals have equal probabilities of coming from any one of three classes initially. The MCMC sampler was implemented using WinBUGS software [37] interacted with a function called bugs in a package R2WinBUGS of R, and the program codes for Model ST are available in Appendix B. Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

J. CHEN AND Y. HUANG

Figure 2. Convergence diagnostics with three Markov chains as obtained from the WinBUGS software for representative parameters based on Model ST: (a) Gelman–Rubin (GR) diagnostic plots (top panel). At the initial stage, the middle (green) and bottom (blue) curves below the dashed horizontal line (indicated by the value 1) represent ̂ and average within-sample variance (W), ̂ respectively, and the top (red) curve the pooled posterior variance (V) ̂ above the dashed horizontal line represents their ratio (R). (b) Autocorrelation plots (bottom panel).

When the MCMC procedure was applied to the actual clinical data, convergence of the generated samples was assessed using standard tools within WinBUGS software such as trace plots and Gelman–Rubin (GR) diagnostics [38]. Figure 2 shows the autocorrelation and dynamic version of GR diagnostic plots based on Model ST for the representative parameters 𝛽3 , 𝛽6 , and 𝛿2 . For the dynamic plots of GR diagnostics(left panel), where the three curves are given, the middle and bottom curves below the dashed horizontal ̂ and average within-sample line (indicated by the value one) represent the pooled posterior variance (V) ̂ respectively, and the top curve represents their ratio (R). ̂ It is seen that R̂ is generally variance (W), ̂ stabilize expected to be higher than one at the initial stage of the algorithms, but R̂ tends to 1, and V̂ and W as the number of iterations increase, indicating that the algorithm has approached convergence. We further monitor convergence using autocorrelation plots (right panel) that autocorrelations are very low with a lag being 50, implying that convergence is obtained. Along with this convergence diagnostics observed, we proposed that, after an initial number of 50,000 burn-in iterations of three chains of length 100,000, every 50th MCMC sample was retained from the next 50,000 for each chain. Thus, we obtained a total of 3000 samples of targeted posterior distributions of the unknown parameters for statistical inference. The Bayesian joint modeling approach in conjunction with the three component (the viral load response, the CD4 covariate, and the time to first decline of CD4/CD8 ratio) processes with different scenarios was applied to fit the AIDS data described in section 3.1 and to estimate both the parameters in the mixture of joint models and the probabilities of class memberships. In what follows, we report the results based on the three scenarios highlighted previously. 4.2. Comparison of modeling results

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

2831

From the model fitting results, we have seen that, in general, all the models provided a reasonably good fit to the observed data for most patients in our study, although the fitting for a few patients (< 7%) was not completely satisfactory because of unusual patterns of viral load fluctuation for these patients, particularly for Model T. Figure 3 displays the three representative individual estimates of viral load trajectories obtained by using the JM approach based on Models ST, T, and C. The following findings are observed from JM results. (i) The estimated individual trajectories for Models ST and C fit the originally observed values more closely than those for Model T, in particular, for the subjects with viral load rebound, but Model ST performs slightly better than Model C. (ii) Overall, the 95% credible intervals (CI, not shown in the plot) associated with fitted value from Model T are generally wider than the corresponding 95% CI from Models ST and C; all the 95% CIs from Models ST and C cover the observed viral load values, while some of 95% CIs from Model T do not. For example, for patient 8, the observed viral load value in log10 scale at day 292 is 3.892, the corresponding 95% CIs for Models ST and C are (3.788, 4.297) and (3.702,4.213) with the fitted values 4.003 and 4.023, respectively, while the corresponding 95% CI for Model T is (3.927, 4.575) with the fitted value 4.207.

J. CHEN AND Y. HUANG

Figure 3. Individual fitted curves of viral load for three representative patients based on Models ST (solid lines), T (dash lines), and Model C (dotted lines). The observed values are indicated by circle “◦”.

Figure 4. Goodness-of-fit: (a) fitted values versus observed values of log10 (RNA) (top panel). (b) ST and normal Q-Q plots with line (bottom panel).

2832

To assess the goodness-of-fit of the three models based on the JM approach, the diagnosis plots of the fitted values versus the observed values (top panel) and ST and t Q-Q plots (bottom panel) from Models ST, T, and C are presented in Figure 4. It was seen from Figure 4 (top panel) that Model T shows much worse fit to the observed viral load data, compared with Models ST and C for which Model ST performs slightly better than Model C. This finding can also be explained by examining the Q-Q plots of the residuals (bottom panel) that all plots show the existence of outliers, but it is clearly seen that Model T has much more outliers than Models ST and C do, which thus, provide a better goodness-of-fit to the data than Model T. The population posterior mean (PM), the corresponding standard deviation (SD), and 95% CI for fixed-effects parameters based on the three models with two methods (i.e., JM and NM) are presented in Tables I and II. The following findings are observed for estimated results of parameters. (i) In the mixture of response model with the three components (20)–(22), the findings show that these estimates are different from zero for the three models because the 95% CI do not contain zero except for 𝛽5 . In particular, Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

2833

Copyright © 2015 John Wiley & Sons, Ltd.

Naive method

Joint modeling

Method

12.18 11.96 12.34 0.096 12.33 11.90 12.82 0.292 12.33 12.06 12.74 0.216

PM LCI UCI SD PM LCI UCI SD PM LCI UCI SD

T

C

ST

12.09 11.29 13.75 0.501

PM LCI UCI SD

𝛽1

ST

Model

156.8 136.5 184.8 16.69

158.0 133.3 192.0 21.45

147.5 136.7 159.2 5.883

131.7 80.16 185.3 38.93

𝛽2

−33.73 −44.47 −23.44 5.349

−31.97 −49.36 −21.03 8.546

−34.70 −49.21 −15.71 9.900

−29.40 −46.28 −15.86 8.541

𝛽3

4.262 3.707 4.889 0.388

4.757 4.289 5.169 0.260

4.448 3.779 5.011 0.352

4.303 3.944 4.962 0.324

𝛽4

−0.494 −1.105 0.064 0.248

0.693 −0.841 1.845 0.777

−0.315 −1.187 0.466 0.432

−0.885 −2.077 −0.049 0.659

𝛽5

4.476 0.528 7.092 2.697

−0.036 −0.198 0.123 0.080

−1.668 −2.604 −0.775 0.450

−0.327 −1.579 −0.076 1.092

𝛽6

−0.508 −0.581 −0.439 0.037

−0.501 −0.583 −0.444 0.034

−0.516 −0.584 −0.449 0.035

−0.507 −0.572 −0.438 0.035

𝛼1

2.314 2.058 2.579 0.134

2.466 2.301 2.643 0.091

2.401 2.168 2.629 0.121

2.404 2.218 2.586 0.094

𝛼2

−1.509 −1.833 −1.205 0.166

−1.778 −2.023 −1.573 0.120

−1.656 −1.931 −1.382 0.147

−1.689 −1.967 −1.424 0.138

𝛼3

0.034 0.026 0.040 0.005

0.013 0.012 0.015 0.001

0.033 0.031 0.035 0.001

0.022 0.014 0.037 0.095

𝜎12

0.092 0.011 0.139 0.056

0.057 0.010 0.151 0.062

0.129 0.119 0.142 0.006

0.052 0.006 0.143 0.060

𝜎22

Table I. Summary of estimated posterior mean (PM) of population (fixed-effects) parameters, the corresponding standard deviation (SD), and lower limit (LCI ) and upper limit (UCI ) of 95% equal-tail credible interval (CI).

J. CHEN AND Y. HUANG

Statist. Med. 2015, 34 2820–2843

J. CHEN AND Y. HUANG

Table II. Summary of estimated posterior mean (PM) of population (fixed-effects) parameters, the corresponding standard deviation (SD), and lower limit (LCI ) and upper limit (UCI ) of 95% equal-tail credible interval (CI) as well as deviance information criterion (DIC) and expected predictive deviance (EPD) values. Method Joint modeling

Naive method

Model

𝛾1

𝛾2

𝛾3

𝛾4

𝛿1

𝛿2

DIC

EPD

ST

PM LCI UCI SD

0.162 0.018 0.308 0.074

−0.084 −0.141 −0.026 0.030

0.003 −0.131 0.151 0.061

−0.012 −0.067 0.043 0.027

0.083 0.005 0.133 0.061

0.325 0.006 0.517 0.225

13916.4

0.359

T

PM LCI UCI SD

0.155 0.013 0.300 0.072

−0.083 −0.142 −0.024 0.030

0.002 −0.100 0.113 0.051

−0.011 −0.049 0.028 0.025

— — — —

— — — —

18573.6

0.930

C

PM LCI UCI SD

0.160 0.016 0.305 0.076

−0.081 −0.142 −0.023 0.030

0.012 −0.108 0.181 0.062

−0.014 −0.058 0.038 0.024

0.152 0.143 0.162 0.006

0.291 0.002 0.478 0.205

15207.3

0.394

ST

PM LCI UCI SD

0.142 0.004 0.282 0.071

−0.078 −0.133 −0.025 0.028

0.001 −0.156 0.148 0.066

−0.011 −0.073 0.040 0.030

0.016 0.009 0.060 0.027

0.144 0.021 0.467 0.220

15804.8

0.613

2834

for the parameters that are of interest, the estimated 𝛽3 with negative value (the coefficient of baseline CD4 in the first-phase viral decay rate 𝜆1 ) from Model ST is slightly larger than those from Models T and C, while 𝛽6 (associated with the second-phase viral decay rate 𝜆2 ) are estimated quite differently. The estimated value of the scale parameter for Model ST (𝜎22 = 0.052) is twice smaller than that of Model T (𝜎22 = 0.129) because the former model takes into account skewness of the data while the latter does not, but the estimated 𝜎22 are slightly different (0.052 and 0.057) between Models ST and C. The estimate of the skewness parameter (𝛿2 ) from Models ST and C are 0.325 with 95% CI (0.006, 0.517) and 0.291 with 95% CI (0.004, 0.478). This finding suggests that there is a significantly positive skewness in the data and confirms the fact that the distribution of the viral load data is skewed even after taking log10 transformation. Thus, incorporating a skewness parameter in the modeling of the data is recommended. (ii) For parameter estimates of the CD4 covariate model (19), the intercept 𝛼1 and coefficient 𝛼3 are negatively estimated, and the coefficient 𝛼2 is positively estimated, but their estimates are qualitatively similar among the three models. It is also seen that the distribution of the CD4 data appears to be slightly skewed based on the estimate of the skewness parameter (𝛿1 ). (iii) For the estimates of parameters in the timeto-event model (25), all parameter estimates appear to be similar to each other for the three models; the estimated results show that time to CD4/CD8 decrease is highly associated with the CD4 changing rates over time because of significant estimates of 𝛾1 and 𝛾2 , but it is not directly associated with the two viral decay rates over time because of insignificant estimates of 𝛾3 and 𝛾4 . The Bayesian joint inference based on MCMC procedure has made it possible to fit increasingly complex statistical models and entailed the wish to determine the best fitting model in a class of candidates. A Bayesian selection criterion, deviance information criterion (DIC) suggested by Spiegelhalter et al. [39], can be used to select the best model that fits the data adequately. As with other model selection criteria such as AIC and BIC, we caution that DIC is not intended for identification of the ‘correct’ model, but rather merely as a method of comparing a collection of alternative formulations. } {∑ As an alternative, 2 (y − y ) we also evaluate expected predictive deviance (EPD) formulated by EPD = E obs,ij i,j rep,ij for model comparison [40], where the predictive value yrep,ij is a replicate of the observed yobs,ij and the expectation is taken over the posterior distribution of the model parameter 𝜽. This criterion chooses the model where the discrepancy between predictive values and observed values is the lowest. We calculate the estimated DICs based on Models ST, T, and C, which are 13916.4, 18573.6, and 15207.3, respectively. As mentioned previously, it is difficult to tell which model is ‘correct’, only which one fits the data better. Furthermore, the model that fits the data better may be more accurate to describe the mechanisms of HIV infection and the CD4 changing process, so it better informs patient treatment. Therefore, Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

J. CHEN AND Y. HUANG

based on the DIC, the results indicate that Model ST is the best fitting model and Model C is next, supporting the contention of a departure from normality. This finding is confirmed by the results of the EPD values (Table II). These results are consistent with those in diagnosis of the goodness-of-fit displayed in Figure 4, indicating that Model ST performs best. In summary, our results may suggest that it is important to assume a skewed distribution for the viral load response and CD4 covariate models in order to achieve reliable results, in particular if the data exhibit non-normality. Based on these findings, in what follows, we will further report our results in detail only for the best Model ST. 4.3. Analysis results based on Model ST

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

2835

In order to investigate how the measurement error in CD4 contributes to modeling results, we employed the NM based on Model ST, which ignores measurement error in CD4 cell count, to estimate the model parameters. It can be seen from Tables I and II that there are important differences in the estimates, in particular, for the parameters 𝛽5 and 𝛽6 , which are directly associated with whether or not ignoring potential CD4 measurement error for inference. The NM may substantially overestimate the covariate CD4 effect (𝛽6 ). The estimated SD for the CD4 effect (𝛽6 ) using the JM method is much smaller than its counterpart using NM (1.092 vs. 2.697). The difference of the ‘naive’ estimates and the joint modeling estimates, due to whether or not potential CD4 measurement error is ignored for inference, indicates that CD4 measurement errors should not be ignored in the analysis. We also obtained estimated DIC values with 13916.4 (JM) and 15804.8 (NM) based on Model ST, respectively, suggesting that the JM approach performs better in comparison with the NM. Thus, it is important to take the CD4 measurement errors into account when collected data are ‘imperfectly’ measured. We further investigated a commonly used SME joint model (Model C, where data feature of heterogeneous population is ignored), in which the mean function is specified by (22) alone. We compared it with the mixture of SME joint model (Model ST) to explore how heterogeneous feature influences modeling results. We found the important differences in the estimates of parameters, in particular for (𝛽5 , 𝛽6 ), which are associated with the second-phase viral decay rate. It is noted that one advantage of the mixture of joint model over the commonly used joint model is its flexibility to provide not only estimates of all model parameters but also evaluate class membership probabilities at both population and individual levels, which is helpful for clinicians to develop individualized treatment. The estimated results based on Model ST in Table II indicate that the skewness parameters in viral load (0.325) and CD4 cell count (0.083) are estimated to be significantly positive. This indicates the skewness with heavy right tail of the viral load and fairly right tail of the CD4 cell count. Thus, it may suggest that accounting for a mixture of joint models with the ST distribution provides a better fit to the data, which exhibits skewness and, in turn, gives more reliable estimates of the parameters. As mentioned in section 1, one of the primary objectives in this analysis was to cluster all individuals’ membership into three classes based on viral load trajectories. Based on the mixture joint modeling, we are able to obtain a summary of class membership at both the population and individual levels. At population level, the MCMC procedure yields samples from the posterior distribution of (𝜋1 , … , 𝜋K ), K = 3, in (16), the population proportion of individuals in each class. The estimates of population proportion and associated 95% CI of (𝜋1 , 𝜋2 , 𝜋3 ) for three classes are 5.30% (3.26%, 7.46%), 57.47% (54.86%, 60.17%), and 37.23% (34.57%, 39.84%), respectively. It can be seen that class 2 (decrease and maintain stability) has the largest proportion 57.47%, followed by class 3 (decrease and rebound later) with proportion 37.23%, and then class 1 (decrease all the time) with the lowest proportion 5.30%. Thus, out of 427 patients, the patterns of changing viral load of 23, 245, and 159 patients followed classes 1, 2, and 3, respectively. This indicates that a confirmed short-term (class 1 due to early dropout) and long-term (class 2) virologic responses were observed in a total of 62.76% of the patients in classes 1 and 2. At individual level, the posterior probability of∑individual i belonging to the kth (k = 1, 2, 3) class, M pik = E[I(ci = k)], can be approximated by M1 m=1 I(c(m) = k), where c(m) is class membership of i i individual i drawn from the posterior distribution (15) in the mth MCMC iteration (m = 1, ..., M) and M is a total number of posterior sample iterations (M = 3000 in this study). Barplot shown in Figure 5 displays the probabilities for the selected 20 individuals (from the 1st to the 20th subjects). The probability corresponding to individual patient who is classified as either viral load rebound or not may help physicians to refine treatment strategy and to identify the possible reasons of viral load rebound for such individual patient. As examples, for the three patients shown in Figure 1 , the patient 112 belongs to class 1 because the viral load decreases constantly in an early short-term period with probability 88.5%;

J. CHEN AND Y. HUANG

Figure 5. Posterior probabilities of belonging to three trajectory classes for the first 20 individuals based on Model ST.

the viral load of the patient 18 decreases and then maintain stable, and thus, this patient belongs to class 2 with probability 97.1%; and finally, the patient 7 is in class 3 (indicating viral load rebound) with probability 89.3%. The estimated results of fixed-effects presented in Table I based on Model ST indicate that the firstphase decay rate and the second-phase decay rate without and with time-varying CD4 covariate may be approximated by 𝜆̂ 1 = 131.7 − 29.4z0 , 𝜆̂ 2 = −0.885, and 𝜆̂ 2 (t) = −0.885 − 0.327(−0.505 + 2.404t − 1.689t2 ), respectively, where z0 is the standardized baseline{CD4 value. } Thus, the population viral load { } ̂ 1 (t) = exp 12.09 − 𝜆̂ 1 t , V̂ 2 (t) = exp 12.09 − 𝜆̂ 1 t + processes of 3 classes may be approximated by V } { } { } { exp 4.303 − 𝜆̂ 2 t , and V̂ 3 (t) = exp 12.09 − 𝜆̂ 1 t + exp 4.303 − 𝜆̂ 2 (t)t . Because the first-phase viral decay rate, 𝜆1 , is significantly associated with the baseline CD4 cell count (because of significant estimate of 𝛽3 ) and the second-phase time-varying viral decay rate 𝜆2 (t) in the component three is significantly associated with the time-varying CD4 values (because of significant estimate of 𝛽6 ), this suggests that the viral load change V(t) may be significantly associated with both the baseline CD4 and time-varying CD4 values. This simple approximation considered here may provide a rough guidance and point to further research even though the true association described earlier may be more complicated. In addition, the estimates of the parameters in the event time model (25) indicate that the time to first decline of CD4/CD8 ratio is highly associated with the CD4 changes (related to random effects ai1 , ai2 ) over time, but it is not directly associated with the two viral decay rates (related to random effects bi2 and bi4 ) over time. This may be explained by the fact that the higher standardized baseline CD4 (𝛾1 = 0.162) and the slower CD4 (linear) changing rate (𝛾2 = −0.084) result in longer time to CD4/CD8 decrease, which may lead to viral rebound.

5. Simulation studies

2836

To assess the performance of the proposed mixture of joint models and methods, as an illustration, we conducted the following simulation studies where Models ST, T, and C are included and the JM and NM methods are compared. To simulate a heterogeneous population, 40, 180, and 180 response trajectory mean values, out of 400 subjects, are generated based on the three functions in (20)–(22), respectively. We generated 500 datasets from specified models according to the additional specifications described in the succeeding discussions. The measurement time points used in the simulation are similar to those in the real data analysis, and the true parameter values are mimic to those obtained in the example. 𝜷 = (𝛽1 , … , 𝛽6 )T = (12.1, 131.7, −29.4, 4.3, −0.9, −0.3)T , 𝜶 = (𝛼1 , 𝛼2 , 𝛼3 )T = (−0.5, 2.4, −1.7)T , 𝜸 = (𝛾1 , … , 𝛾4 )T = (0.2, −0.1, 0.01, −0.01)T , 𝚺b = diag(1.5, 6.6, 3.2, 9.9). The time-varying CD4 covariate zij are simulated from equation z∗ij = (−0.5 + ai1 ) + (2.4 + ai2 )tij + (−1.7 + ai3 )tij2 , with (ai1 , ai2 , ai3 )T ∼ N(𝟎, diag(1.0, 0.5, 0.6)) and 𝝐 ij following t-distribution t3 (0, 0.02) for convenient implementation. An advantage of the ST distribution for the model errors is its propensity for accommodating skewness. In the simulation, we generated eij according to Γ(2, 1) distribution, yielding a skewed distribution with E(eij ) = 2 and Var(eij ) = 2. Under this specification, data generated from mixture model with Γ(2, 1) Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

J. CHEN AND Y. HUANG

Table III. Summary of estimated 𝜷 and skewness parameter as well as MSE for Models ST, T, and C based on 500 generated data sets, Γ(2, 1) distribution for the model error based on the joint modeling (JM) approach and the naive method (NM). Mean and standard deviation (SD) denote averages of the Markov chain (MC) posterior mean and ) deviation, respectively; mean squared error (MSE) is quantified by percent √ √ MC standard MSE = 100 × MSEl ∕|𝛽l | . JM True value

𝛽1 𝛽2 𝛽3 𝛽4 𝛽5 𝛽6 𝛿

Model T Mean

SD

NM

Model C MSE

Mean

12.1 12.77 0.25 6.12 131.7 142.5 7.22 7.51 −29.4 −23.65 3.49 20.23 4.3 5.57 0.53 9.70 −0.8 −1.41 0.98 33.12 −0.3 −0.63 0.77 7.65 — — —

12.25 139.9 −25.61 5.49 −1.12 −0.52 1.07

SD

Model ST MSE

Mean

0.24 1.92 5.89 3.46 3.01 15.72 0.42 5.39 0.87 13.14 0.47 4.56 0.24 —

12.32 132.7 −27.85 4.41 −0.86 −0.31 1.27

SD

Model ST MSE

Mean

SD

MSE

0.21 2.33 5.57 3.17 2.87 14.8 0.39 5.23 0.86 11.0 0.45 4.91 0.38 —

12.49 135.4 −31.87 4.75 −1.04 −0.45 1.16

0.27 5.10 2.98 0.40 0.91 0.48 0.39

6.01 4.78 18.34 7.80 20.1 5.81 —

distribution may exhibit highly skewed feature. Note that the prior distributions considered are all close to non-informative as similarly treated in real data analysis. Thus, we expect the results to be somewhat robust with respect to prior distributions. For evaluating the objective use of the criteria, the models preferred by DIC and EPD were recorded. For example, in the MCMC sampling result, none of DIC selected the t-distribution specification for any of the 500 data sets, demonstrating the ability of selection method to detect an obvious departure from symmetry and suggesting strong evidence of skewness. Table III shows the numerical results for the estimates of parameters 𝛿2 and 𝜷 only (𝜶 and 𝜸 are not shown here to save space, but they appear to have similar interpretation to what 𝜷 was observed). It is seen from Table III that the estimates of 𝛽1 and 𝛽4 were similar among the three fitted models and the two methods, while the estimates of the other parameters tend to be significantly biased when routinely adopting a (symmetric) t-distribution. MC means of the skewness parameters of Models ST and C are 1.27 and 1.07, respectively, indicating a departure from (symmetric) t-distribution. We found that √ Models ST and C work reasonably well in the current setting in terms of MSE (quantified by 100 × MSEl ∕|𝛽l |, l = 1, … , 6), while Model T may lead to large biases and MSEs for most of the parameters. Interestingly, for JM method we found that the MC SDs in Model T (symmetric) are larger than their counterparts in Models ST and C (asymmetric), and the MC SDs in Model ST is smaller than those in Model C , indicating that the model with longer tails and heterogeneous data feature may produce more precise estimates and the efficiency of estimation on 𝛽2 , 𝛽3 , 𝛽5 , and 𝛽6 associated with two phase decay rates is degraded (higher MC SD) when symmetric distribution for model error is assumed (Model T) or heterogeneous data feature is ignored (Model C) relative to allowing a more flexible representation via both ST distributions and heterogeneous data feature (Model ST). This suggests that adopting the symmetric (t) distribution assumption or ignoring heterogeneous data feature may lead to inaccurate and inefficient inference on fixed-effects of primary interest, in particular, when data exhibit asymmetric and/or heterogeneous features. We can also see from the simulation results in Table III that NM may lead to severely biased estimates and large MC SD and MSEs for all parameters in comparison with JM based on Model ST, in particular, for 𝛽5 and 𝛽6 , which are closely related to whether measurement errors in CD4 cell count, are ignored or not. Therefore, it is important to model the data incorporating measurement errors in CD4 covariate into account.

6. Discussion

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

2837

With more studies being conducted that repeatedly take measures over time in an effort to evaluate a patient’s health status for some events, a mixture joint modeling approach is a powerful tool to fit these complicated longitudinal–survival models with a variety of data issues ranging from evaluating heterogeneity, non-normality, measurement error, and others, along with uncertainty about the distributional assumptions of model errors in longitudinal studies. The mixture of hierarchical joint models and

J. CHEN AND Y. HUANG

2838

associated inferential approaches proposed in this article have several potential advantages. First, our mixture of joint models was constructed by three sub-model components: the finite mixture of semiparametric mixed-effects (SME) model with skewed distributions and heterogeneity for response, the LME model for covariate with skewed distributions and measurement error, and the Cox model for time-toevent of interest linked through the random-effects that characterize the underlying individual-specific longitudinal processes. Second, our mixture of joint models is useful when there is uncertainty about the distributional assumptions of model errors and population heterogeneity. Along with this line, we investigated how the mixture of joint models with an asymmetric (ST) distribution contributes to inferential results in comparison with that with a symmetric (Student’s t-test) distribution for model error, how the mixture of SME joint modeling is performed by comparing commonly used SME joint model and how the mixture of joint model-based JM approach influences modeling results in comparison with its counterpart-based NM, which measurement error in CD4 covariate is ignored. Third, a major deficiency to date in the mixture of joint modeling literature as we see is the lack of comparison between the mixture of joint models (Model ST) and commonly used joint models (Model C). Our comparison analysis indicates that Model C can be offered to validate Model ST. If, in fact, Model ST shows improvement over Model C, we can be more confident that the heterogenous assumption is valid, which is the case found in this article. Finally, an advantage of our mixture joint models-based approach is to provide not only all three sub-model parameter estimates but also model-based probabilistic clustering to obtain class membership probabilities at both population and individual levels. In addition, the proposed mixture joint modeling approach can be easily implemented using the publicly available WinBUGS software interacted with R. This makes our models and methods quite useful and accessible to practicing statisticians in the fields. In the presence of skewness in the longitudinal data along with heterogenous response and covariate measurement error, an appropriate statistical inference for HIV dynamics is important for making robust conclusions and reliable clinical decisions. The research findings in this article indicate that Model ST may be favorable to Model T and/or Model C by the fact that there is potential to gain efficiency and accuracy in estimating certain parameters while adjusting for non-normality and heterogeneity in the data. For viral load response, the skewness parameter 𝛿2 and the degrees of freedom 𝜈2 based on Model ST are estimated as 0.325 and 3.01. This confirms the positive skewness with heavy tail of the viral load, suggesting that accounting for significant skewness and heaviness in the tails is required to model the data when data exhibit skewness with heavy tails. For the CD4 covariate model, the estimate of skewness parameter 𝛿1 and the degrees of freedom 𝜈1 based on Model ST are 0.083 and 2.02, indicating slightly positive skewness with heavy tail of the CD4 cell count. This may be explained by the fact that an additional parameter 𝜈1 for heaviness in the tails was estimated to have 2.02 lessening the effect of skewness. In addition, the estimated results of the parameters in the time-to-event model (25) based on (best) Model ST suggest that the time to first decline of CD4/CD8 ratio is highly associated with the CD4 changing rates over time, but it is not directly associated with the two viral decay rates over time, which is different from what was anticipated and, thus, a further research may be needed on this issue. The two issues are noted as follows. First, there are certain limitations in our study which are briefly listed below. (i) This article is not intended to consider nonignorable missing data and data with a limit of detection problems. (ii) The skewness parameters in both covariate and response models were held as constant across time and thus apply the skewness only to the residuals; Muthén and Asparouhov [21] showed the shortcomings of this limitation in their real data analysis and through simulation study. (iii) In the time-to-event model (8), we assumed that the hazard depends on random-effects whose value is timeindependent during follow-up. However, it may also be of interest to investigate whether time-dependent covariates are associated with the risk for an event which is referred to review the book published by Rizopoulos [5] in detail on how time-dependent covariates can be handled in practice. These complicated problems with multiple data features considered are, however, beyond the focus of this article, but a further study may be warranted. Second, similar to studies by Beunckens et al. [41] and Muthén et al. [21, 42], the time-to-event models discussed in this article can be extended to incorporate latent class variable as covariate. The study of such extended time-to-event models are interesting and may be important. We are actively investigating these and hope that the relative results could be reported in the near future. Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

J. CHEN AND Y. HUANG

Appendix A: Multivariate skew-t distributions Different versions of the multivariate skew-t (ST) distribution have been proposed and used in the literature [15, 16, 18, 19, 27–30]. A new class of distributions by introducing skewness in multivariate elliptically distributions were developed in publication [19]. The class, which is obtained by using transformation and conditioning, contains many standard families including the multivariate ST distribution. Because the SN distribution is a special case of the ST distribution when degrees of freedom are large, for completeness, this Appendix briefly summarizes the multivariate ST distribution that will be used in defining the finite mixture of NLME models with ST distribution considered in this article. Assume a k-dimensional random vector Y follows a k variate ST distribution with location vector 𝝁, k × k positive (diagonal) covariance matrix 𝚺 and k×k skewness matrix 𝚫 = diag(𝛿1 , 𝛿2 , … , 𝛿k ) with vector of skewness parameter 𝜹 = (𝛿1 , … , 𝛿k )T and degrees of freedom 𝜈. In what follows, we briefly discuss the multivariate ST distribution introduced by Sahu et al. [19], which is suitable for a Bayesian inference because it is built using the conditional method. For detailed discussions on properties of ST distribution, refer to Reference [19]. An k-dimensional random vector Y follows a k-variate ST distribution if its probability density function is given by f ( y|𝝁, 𝚺, 𝚫, 𝜈) = 2k tk,𝜈 ( y|𝝁, A)P(V > 𝟎), (A.1) where A = 𝚺 + 𝚫2 , we denote the k-variate t-distribution with parameters 𝝁, A, and degrees of freedom 𝜈 by tk,𝜈 (𝝁, A) and the corresponding probability density function by tk,𝜈 ( y|𝝁, A) henceforth, V follows the t-distribution tk,𝜈+k . We denote this distribution by STk,𝜈 (𝝁, 𝚺, 𝚫). In particular, when 𝚺 = 𝜎 2 Ik and 𝚫 = 𝛿Ik , equation (A.1) simplifies to {

}−(𝜈+k)∕2 ( y − 𝝁)T ( y − 𝝁) f ( y|𝝁, 𝜎 , 𝛿, 𝜈) = 2 (𝜎 + 𝛿 ) 1+ 𝜈(𝜎 2 + 𝛿 2 ) [{ ] }−1∕2 𝛿( y − 𝝁) 𝜈 + (𝜎 2 + 𝛿 2 )−1 ( y − 𝝁)T ( y − 𝝁) × Tk,𝜈+k , √ 𝜈+k 𝜎 𝜎2 + 𝛿2 2

k

2

2 −k∕2

Γ((𝜈 + k)∕2) Γ(𝜈∕2)(𝜈𝜋)k∕2

(A.2)

where Tk,𝜈+k (⋅) denotes the cumulative distribution function (c.d.f.) of tk,𝜈+k (𝟎, Ik ). However, unlike in the SN distribution to be discussed in the following, the ST density can not be written as the product of univariate ST densities. Here, Y is dependent but uncorrelated. It is noted that when 𝜹 = 𝟎, the ST distribution reduces to usual It can be shown that the mean and covariance matrix of ) ( the t-distribution. the ST distribution STk,𝜈 𝝁, 𝜎 2 Ik , 𝚫 are given by Γ((𝜈 − 1)∕2) 𝜹, Γ(𝜈∕2) [ ]2 [ ] 𝜈 𝜈 Γ{(𝜈 − 1)∕2} cov(Y) = 𝜎 2 Ik + 𝚫2 − 𝚫2 . 𝜈−2 𝜋 Γ(𝜈∕2) E(Y) = 𝝁 + (𝜈∕𝜋)1∕2

(A.3)

Following the results provided by Sahu et al. [19], the ST distribution of Y has a convenient stochastic representation as follows. (A.4) Y = 𝝁 + 𝚫|X0 | + 𝚺1∕2 X1 , where X0 and X1 are two independent random vectors following tk,𝜈 (𝟎, Ik ). Note that the expression (A.4) provides a convenience device for random number generation and for implementation purpose. Let w = |X0 |, then w follows an k-dimensional standard t distribution tk,𝜈 (𝟎, Ik ) truncated in the space w > 𝟎 (i.e., the standard half t-distribution). Thus, a hierarchical representation of (A.4) is given by Y|w ∼ tk,𝜈+k (𝝁 + 𝚫w, 𝜔𝚺), w ∼ tk,𝜈 (𝟎, Ik )I(w > 𝟎),

(A.5)

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2015, 34 2820–2843

2839

where 𝜔 = (𝜈 + wT w)∕(𝜈 + k). It is noted that the ST distribution presented in (A.5) can be reduced to the following three special cases: (i) as 𝜈 → ∞, then the hierarchical expression (A.5) becomes a skew-normal distribution SNk (𝝁, 𝚺, 𝚫); (ii) as 𝚫 = 𝟎, then the hierarchical expression (A.5) is a standard multivariate t-distribution; (iii) as 𝜈 → ∞, and 𝚫 = 𝟎, then the hierarchical expression (A.5) reverts to a standard multivariate normal distribution.

J. CHEN AND Y. HUANG

Appendix B: R and WinBUGS program codes for Model ST

2840

## R program code library(arm) library(R2WinBUGS) data

A Bayesian mixture of semiparametric mixed-effects joint models for skewed-longitudinal and time-to-event data.

In longitudinal studies, it is of interest to investigate how repeatedly measured markers in time are associated with a time to an event of interest, ...
1MB Sizes 0 Downloads 10 Views