Biometrics 71, 821–831 September 2015

DOI: 10.1111/biom.12296

Multivariate Longitudinal Data Analysis with Mixed Effects Hidden Markov Models Jesse D. Raffa1, * and Joel A. Dubin2 2

1 Department of Statistics, University of Washington, Seattle, Washington, U.S.A. Department of Statistics & Actuarial Science and School of Public Health & Health Systems, University of Waterloo, Waterloo, Ontario, Canada ∗ email: jraff[email protected]

Summary. Multiple longitudinal responses are often collected as a means to capture relevant features of the true outcome of interest, which is often hidden and not directly measurable. We outline an approach which models these multivariate longitudinal responses as generated from a hidden disease process. We propose a class of models which uses a hidden Markov model with separate but correlated random effects between multiple longitudinal responses. This approach was motivated by a smoking cessation clinical trial, where a bivariate longitudinal response involving both a continuous and a binomial response was collected for each participant to monitor smoking behavior. A Bayesian method using Markov chain Monte Carlo is used. Comparison of separate univariate response models to the bivariate response models was undertaken. Our methods are demonstrated on the smoking cessation clinical trial dataset, and properties of our approach are examined through extensive simulation studies. Key words: Hidden disease state; Hidden Markov model; Longitudinal data; Markov chain Monte Carlo; Multivariate response; Smoking cessation.

1. Introduction Longitudinal studies, where data on the same set of subjects are collected over time, are increasingly focusing on multiple responses. Several different methodological approaches have been proposed to address the situation involving multivariate longitudinal data, with a recent review article having summarized much of the work in this area (Verbeke et al., 2014). The advantages of using multivariate longitudinal data analysis are well-documented and include a more complete use of the data and the ability to estimate and utilize the dependence between longitudinal responses, among others. While these advantages are noteworthy, they in themselves are often not the sole purpose for collection of the multiple responses, where frequently the collective responses are believed to capture relevant features of an underlying disease process. Unfortunately, if the data are modeled as a multivariate longitudinal response, the notion that an underlying disease process may contribute to the heterogeneity observed in the multiple responses is often ignored. Likewise, using features of the measured responses as perfect representations of the hidden disease process can be problematic. In the setting of clinical trials one or more responses are monitored longitudinally, and some function (often involving thresholds of these measured responses) is used to define the occurrence of a clinically meaningful endpoint. Often clinicians are most interested in an intervention’s effect on this clinically meaningful endpoint, but are only able to measure it indirectly through the measured longitudinal responses. While the collected multivariate responses may generally correlate with a meaningful clinical endpoint, they may be © 2015, The International Biometric Society

inadequate to capture the study intervention’s effect on the endpoint of interest. For example, in studies of addiction, the relevant clinical endpoint is often the relapse or abstinence rate, which is conceptually understood, but is not effectively measurable. Estimation of these rates is usually done by attempting to determine the underlying disease states at each study visit via thresholds of often several measures of abstinence through biochemical or self-reported data. The thresholding approach often ignores: any unexplained heterogeneity in the responses by, for example, measurement error or between-subject heterogeneity, the dependence structure of the responses, and any information pertaining to the subject’s response or disease state history. Addressing these issues directly may lead to improved inference in many situations, including our motivating example of a smoking cessation clinical trial. One possible approach to analyze such data is to instead view these data as generated from a latent or hidden process, such as a hidden Markov model (HMM). In this article, we extend the work of Altman (2007) and describe how to utilize mixed effects hidden Markov models (MHMMs) to model multivariate longitudinal responses in cases where multiple responses are collected and the observed responses are thought to be generated from hidden disease states.

1.1. Motivating Example Our work is motivated by a smoking cessation clinical trial (O’Malley et al., 2006) examining the efficacy of a pharmacotherapy to assist people in quitting smoking. In this clinical trial, subjects were given a nicotine-replacement patch and

821

822

Biometrics, September 2015

randomized with equal probability to placebo or one of three active treatment groups which were dosage based. We will be primarily interested here in contrasting the placebo group with the combined active treatment groups. After randomization, subjects were followed for 6 weeks, with longitudinal follow-up occurring weekly. At each study visit, patients selfreported (SR) daily cigarette use over the previous week, and a carbon monoxide (CO) test was administered as physiological monitoring of cigarette intake. Abstinence from smoking is usually estimated as a composite of these measures, and was defined in this study in several ways, but in general required CO levels below 10, and no SR cigarettes smoked in a set time period. Both types of measures used to estimate the disease states are limited by their sensitivity and specificity due to the potential for misclassification and measurement error. Furthermore, it has been hypothesized (Hughes et al., 2003; Killeen, 2011) that smoking cessation interventions are described by a three-state Markov model (regular use, withdrawal and long term abstinence). While these states are generally understood, determining the state of a given subject at a given time remains difficult, as SR smoking is known to generally underreport both the prevalence and intensity of smoking (Patrick et al., 1994), and the higher CO levels observed when smoking have a limited half-life where they are detectable (perhaps as little as 4–6 hours). Also, in most analyses of these types of data, thresholds of one or more of these longitudinal responses define a binary status (smoker/non-smoker) ignoring the intermediate withdrawal state. A HMM seems particularly relevant to these data as the disease states involved in the process of quitting smoking are believed to be described by a discrete state space Markov model, where the disease states are only observable through surrogates which imperfectly estimate these states. Also relevant to this motivating example is the heterogeneity observed in the SR and CO responses which are due to between-subject differences even when subjects are in the same disease state (e.g., different baseline SR smoking behavior or persistent exposure to external CO levels, etc.). Instead of using thresholds or cutoffs to define disease states, it may be more efficient to model the data as generated from these hidden or latent disease states. Latent class models have successfully been used for multivariate longitudinal data (e.g., Proust-Lima et al., 2009), but are generally limited to using static latent states. Recently, HMMs have been used in the context of longitudinal studies (Scott, James, and Sugar, 2005; Altman, 2007; Maruotti and Ryden, 2009). Scott et al. (2005) developed a model for multivariate continuous longitudinal data by reducing the dimension of the response, and modeling the reduced response as an HMM. Dimensional reduction can be challenging in a general setting, where responses may be of different data types (e.g., binary and continuous). Alternatively, Altman (2007) developed a general framework for using HMMs to model univariate longitudinal data by introducing random effects into both the observed response and hidden state components of the HMM. The HMM in this framework allows for modeling of heterogeneity in longitudinal data introduced by transitions between hidden states, while the random effects allow for between-subject differences in the observed process within a given hidden state. This approach has subsequently been used by others (Shirley

Zi1

Zi2

Zi3

···

Y i1

Y i2

Y i3

···

bi

Figure 1. Mixed effects hidden Markov model structure for subject i, with hidden process Zi , observed process Y i , and random effects, bi . et al., 2010; DeSantis and Bandyopadhyay, 2011; Maruotti, 2011). In this article, we utilize a MHMM to model multivariate longitudinal data. In Section 2, we extend the framework described by Altman (2007) to the setting of a multivariate longitudinal response. This is done by modeling the observed longitudinal responses jointly through separate, but correlated random effects, and a common hidden Markov process. In Section 3, we describe a Bayesian approach using Markov chain Monte Carlo (MCMC) to estimate the parameters under a MHMM with a multivariate response. In Section 4, we report the results of a simulation study we conducted to assess the properties of our analysis approach. In Section 5, we apply the analysis approach to data from our smoking cessation clinical trial motivating example. Lastly, in Section 6, we discuss our approach, and summarize what was learned from the results generated for our motivating example. 2.

Mixed Effects Hidden Markov Models for Multivariate Longitudinal Data In this section, we will describe a general framework for using MHMMs to describe longitudinal data. HMMs are characterized by two dependent processes. The first is a hidden process, which is usually assumed to adhere to the Markov assumption. The second, the observed process, which, when conditioned on the current value of the hidden process, is independent of any future or past value of the hidden or observed processes (see Figure 1). HMMs have a great number of applications (Zucchini and MacDonald, 2009), and have recently been applied to the analysis of longitudinal data (Scott et al., 2005; Altman, 2007; Maruotti and Ryden, 2009; Shirley et al., 2010; DeSantis and Bandyopadhyay, 2011). 2.1. Model Framework Let Zit denote the hidden process for subject i(i = 1, 2, . . . , N) at some common time points t(t = 1, 2, . . . , ni ). Further, denote Zi to comprise all values of Zit for subject i, and Z to be all values of Zi for all i. Assume Zit arises from an M-state Markov chain with transition matrix P (si ) and initial probability (row) vector, π(si ) , where si = {1, 2, . . . , S} and represents strata membership, such as treatment group, for subject i, which we will assume remains constant over time. This article will primarily focus on HMMs arising from first-order time homogeneous Markov chain hidden processes, where P(Zit =

Multivariate Response Mixed Effects Hidden Markov Models (s )

(s )

k|Zi,t−1 = j) = pjki for all t > 1, and P(Zi1 = k) = πk i , where j, k = 1, . . . , M. Let yit denote a R-dimensional multivariate longitudinal response for subject i at time t, such that  yit = [yi1t , yi2t , . . . , yiRt ] . Similar to above, let yi denote all the observations for subject i across all longitudinal responses and time points, and y be the collection of all yi for i = 1, 2, . . . , N. Suppose that the joint longitudinal response arises from an HMM such that Y it and Y it  were independent given the  HMM’s current state, Zit and Zit  , respectively, for t = t . Further, assume that there exists an additional set of i.i.d subject-specific random effects, bi , arising from some common distribution with a mean of zero, fbi (bi |), where  are any parameters used to describe this distribution, such as a covariance matrix. Now, assume that conditional on the current hidden state, Zit = k, and random effects, bi , each yirt is independent (see Figure 1) with a distribution, frk (yirt |zit = k), in the exponential family with mean, θirtk , and link function hr (.), where: 





hr (θirtk ) = ηirtk = uirtk τ r + xirt βrk + wirt bi .

(1)

Here τ r represents the fixed effect intercept for each hidden state, as defined through an M-dimensional hidden state vector, uirtk , which defines some contrast dependent on Zit .  We follow Scott (2002) such that for hidden state k, uirtk τ r is the cumulative sum of the first k elements of τ r . Further, xirt is a vector of covariates at time t for subject i when Zit = k for the rth longitudinal response, and the associated fixed effect coefficient parameters for these covariates are represented by βrk , which also could vary over different values of the hidden process. Finally, wirt is the row vector of the model matrix for the random effect for the rth longitudinal response for subject i at time t while in hidden state k. When R = 1, that is, the longitudinal response is univariate, this is one of the models proposed by Altman (2007) where she outlines a procedure to do maximum likelihood estimation for Gaussian random effects based on an approximation to the marginal likelihood: L(; y) =

N   i=1

αini 1fbi (bi |)dbi .

(2)

bi

Extending this structure for R > 1 involves only modifying G(yi1t ), and using the conditional independence assumption, such that G(yit ) = diag

...,

R 

R 

r=1

r=1

fr1 (yirt |zit =1, bi , ),

R 

G(yi1t ) = diag[f11 (yi1t |zit = 1, bi , ), f12 (yi1t |zit = 2, bi , ), . . . , f1M (yi1t |zit = M, bi , )], for any t > 0 and recursively for t = 2, 3, . . . , ni , as αit = αi,t−1 P (si ) G(yi1t ).

fr2 (yirt |zit = 2, bi , ),



frM (yirt |zit = M, bi , ) , for t > 0.

r=1

A common scenario involving R > 1 would be to have separate but correlated random effects for each longitudinal response. Under the scenario of a single random intercept in each response, wirt would simply be a vector of all zeroes, except the rth entry which would be one. Alternatively, a shared random effect across responses could also be used, and is accommodated by this parameterization, but the focus of this article will be on the case of separate, but correlated Gaussian random effects. To evaluate L(; y) under these situations becomes increasingly difficult as R increases, as each evaluation often involves the computation of at least N R-dimensional integrals, most often with no closed-form solutions. Although better numerical approximations to the integral may be possible, the integrand is not a simple function of bi due to the inclusion of the hidden process. Nonetheless, we have been able to fit such models when R = 2 by a Gauss–Hermite quadrature approximation, but we have found in general as R increases there may be a need for more computationally sophisticated methods to fit these models. 2.2. Motivating Example In our motivating smoking cessation example, we have a bivariate longitudinal response (R = 2) consisting of a count, Yi1t , of the number of self-reported (SR) days out of nit in tth week where subject i is reported to have smoked at least one cigarette, and Yi2t which consists of the transformed (log[Y + 1]) CO level at the tth week’s study visit. Under the framework outlined in Section 2, we model these data as follows: Yi1t |(Zit = k, bi ) ∼ Binomial(nit , θi1tk ) where, 

She does this by Gauss–Hermite quadrature, where  comprises all parameters of the MHMM, including parameters associated with the hidden process, P (si ) and π(si ) , parameters associated with the canonical link of the observed processes in (1), β and τ, parameters associated with the random effects, such as their covariance matrix, , and additional parameters associated with the observed process, unrelated to (1), ξ. Here αini is a row vector often referred to as the forward probabilities (Baum et al., 1970) for subject i at time point ni , and defined for t = 1 as αi1 = π(si ) G(yi11 ), where,

823





logit(θi1tk ) = ui1tk τ 1 + xi1t β1k + wi1t bi Yi2t |(Zit = k, bi ) ∼ N(θi2tk , σ2 ) where, 





θi2tk = ui2tk τ 2 + xi2t β2k + wi2t bi . As is the case with most complex multivariate longitudinal data, this is only one form of how this could be described. For example, we use weekly SR counts instead of daily counts to ensure common observation times with their corresponding weekly CO measurements. Hughes et al. (2003) and Killeen (2011) describe smoking cessation intervention by a three-state Markov model (M = 3), which will be our default approach and our primary treatment comparison will involve placebo (86 subjects with si = 0) versus all subjects who received active treatment (268  subjects with si = 1). As stated above, we will define uirtk to describe the cumulative sum of the first k elements of τ r , for re-

824

Biometrics, September 2015

sponse r and will use separate, but correlated random effects, bi , which are assumed bivariate normal with 2 × 2 covariance matrix, , and wi1t = [1, 0] and wi2t = [0, 1]. 3. Bayesian Approach Although it is possible to use maximum likelihood estimation by approximating (2) using numerical integration, in general it remains difficult to apply such methods to the framework discussed in Section 2.1 when the dimension of the longitudinal response exceeds two. Indeed, when examining more complicated random effect structures for a single dimensional longitudinal response under a MHMM, Altman (2007) relies primarily on stochastic integration to fit such models. While it may be possible to continue with this approach as R or the number of random effects increase, we have generally found the Bayesian approach to be more efficient from a computational perspective, as well as relatively easy to adapt to other more complicated situations and models. 3.1. Prior Specification Specifying a prior for  is largely dependent on the specific circumstances and the target of inference. It is often convenient for sampling to put independent conjugate priors on each component of . In other circumstances the choice of prior may often depend on the distributional assumptions of each dimension of the longitudinal response, the chosen link function (1), or other requirements such as subject-expert knowledge, if available. In general, it is often convenient to factor [] as:

forward, and there are many possible approaches to this which are discussed in Section 3.3. Our preferred approach for sampling from [Zi |y, , b] is to use stochastic recursion (Scott, 2002). Each hidden state can be sampled separately for each subject, and involves computation of the forward probabilities, αi , from Section 2.1. Once these have been computed, the states are sampled in reverse chronological order. 3.3. Sampling from the Posterior Under the prior structure discussed in Section 3.1, we can take a Gibbs sampling approach, where groups of parameters with priors in conjugate families are sampled from directly. With all other variables, alternative methods can be used such as Metropolis within Gibbs sampling (Smith and Roberts, 1993). Full details of our approach are available in Web Appendix A. 3.3.1. Sampling from the posterior for the motivating example. Sampling from the posterior for the example in Section 2.2 is relatively straightforward. As above, we will initially choose a relatively uninformative Dirichlet conjugate prior for all treatment groups or strata, s, with prior hyper(s) (s) parameters ζ π and ζ v set to [1, 1, 1] for π(s) and the vth (s) row of P , respectively, and in this setting we will assume iid

While there may be certain circumstances where it is merited to use the joint prior of ξ (such as when two or more responses are normally distributed conditional on their random effects and hidden states), it is often useful to factorize R each component of [ξ] as r=1 [ξr ]. Once the priors have been specified, sampling from the posterior distribution, [|y], can be done through Gibbs sampling (Geman and Geman, 1984) or similar approaches. Conditional on the hidden states, z, at each iteration of the Gibbs sample one samples from each component of , conditional on the current simulated value of the other parameters and y. After this is completed, the hidden states, z, are updated, which will be discussed just below in Section 3.2. This procedure is then repeated until stationarity is reached, effectively sampling from the posterior, [|y]. In most cases, our approach will choose conjugate priors for each component of , as this makes sampling from the required full conditional distributions generally straightforward. One may also need to specify hyperparameters for the prior distributions, and we will tend to use standard “default” choices, or ones which are relatively uninformative, and are fully specified below.

that bi ∼ MVN(0, ). We also choose a conjugate inverse Wishart(ν0 , 0 ) prior for , with ν0 = 3 and let 0 be the 2 × 2 identity matrix. Lastly, we choose a flat prior for β and τ, such that [β, τ] ∝ 1. In this example, there is also one extra parameter in the setting when r = 2, where ξ21 = σ12 , which  will also come from a conjugate family, Gamma(a0 , b0 ), with a0 = 0.001, b0 = 0.0002. Once Zi is updated for each i as outlined in Section 3.2, sampling from the full conditionals are equivalent to sampling from the appropriate conjugate distribution (P (s) , π(s) , , σ12 ), or the full conditional distribution is known up to a  normalizing constant (bi , βr , τ r ). We use a random walk Metropolis within Gibbs step to sample from the full conditionals of [bi |.] and [βr , τ r |.] for r = 1 separately, where r = 2 has a multivariate normal full conditional distribution which can be sampled from directly. The interpretation of τ r depends on the defined model matrix, uirt common to all i and t, but not necessarily r (although we will not examine this here). For instance, the contrasts suggested  by Scott (2002) are similar to the following: uirt1 = [1, 0, 0],   uirt2 = [1, 1, 0], and uirt3 = [1, 1, 1] for Zit = 1, 2, and 3, respectively, and we will use these throughout the remainder of the article. For the two steps which require Metropolis within Gibbs sampling, a multivariate normal proposal distribution is used, and updated during a burn-in period. Approximately 3 million iterations are used to reduce the auto-correlation between sampled iterations and thinned to 50,000 samples to allow for efficient and feasible storage of samples. Trace plots, density estimates, auto-correlation plots and formal diagnostic tests were used as an assessment of stationarity and mixing, and a small sensitivity analysis on the choice of ν0 ,

0 , a0 and b0 was performed.

3.2. Updating the Hidden States Taking a Gibbs sampling approach for sampling from the posterior involves alternating between sampling from [Z|y, , b], and [, b|y, z]. Sampling from the latter is relatively straight-

3.4. A Treatment Effect for the Hidden Process Thus far we have only discussed the situation where covariates are introduced to the parameters associated with the observed component of the MHMM. While this may be use-

[, b] = [β, τ, P, π, b, , ξ] = [β, τ] × [P] × [π] × [b|] × [] × [ξ].

(3)

Multivariate Response Mixed Effects Hidden Markov Models ful in many situations, often the desired treatment contrast may manifest itself through some function of the transition probabilities. Indeed, in situations where y are surrogate endpoints or biomarkers, defining a treatment effect (via β in (1)) in terms of observed longitudinal responses may be difficult depending on the treatment’s effect on: the underlying hidden process, the observed longitudinal responses, or the relationship between these observed and hidden responses. In situations where the hidden states resemble the underlying true disease outcomes, it may be advantageous to think about the treatment’s effect on the hidden states. Our approach outlined allows for separate transition probabilities, P (s) , and initial probability vectors, π(s) . for different strata, s, including a treatment effect. Often, the HMM is assumed to be stationary, and/or the initial probability vector is considered a vector of nuisance parameters, but there may be circumstances where this quantity has a particular meaning. This scenario is probably most interesting in the setting of a randomized clinical trial where the hidden states of the study subjects may be considered very homogeneous at baseline. For instance, at enrollment in the motivating example, per protocol, all subjects were required to be current regular smokers, so in principle the initial probability vector has a very specific interpretation—being that it is the first week’s transition probability from hidden state three (the highest smoking state), or something conceptually similar. This potentially allows for treatments to have different types of treatment effects which can be modeled simultaneously, where some treatments may help subjects quit smoking initially, but fail to maintain abstinence, while others only help maintain smoking abstinence. Given a large enough sample size in each stratum with a reasonably high frequency of transitions between states within each strata, it is possible to add as many treatment groups as needed. Other methods would be required to sample from the posterior in cases where the desired covariate of interest was non-categorical. In our motivating example from Section 2.2, we examine four different scenarios involving different ways to look at the treatment effect within the hidden process. Scenarios include: I. No treatment effect, where P i = P and πi = π (common) for all i, II. Treatment effect in only the initial state probabilities (π(0) and π(1) ), with common transition matrix, P (0) = P (1) = P, III. Treatment effect in only the transition probabilities (P (0) and P (1) ), with common initial state probabilities, π(0) = π(1) = π, and IV. Treatment effect in both the initial state probabilities (π(0) and π(1) ), and the transition probabilities (P (0) and P (1) ). 3.5. Label Switching A well-known phenomenon in using MCMC approaches to mixture models including HMMs is that of label switching. This phenomenon occurs because of the invariance of the likelihood under any relabeling of the mixture components. Several approaches have been proposed to alleviate this prob-

825

lem, including putting identifiability constraints on the parameter space, and decision theoretic approaches (Stephens, 2000). Under an MHMM, we have found label switching to occur only in certain situations. In such instances, after the sampling is completed, we relabel the states by ordering one of the observed responses’ population means. Further details can be found in the Web Appendix C. 4. Simulation Study A simulation study was conducted to assess the properties of the estimators derived from the modeling framework and MCMC sampling methods described above and to assess robustness in a few situations of possible concern. Data were simulated for a model similar to those described in the example discussed in Section 2.2, where the hidden process, z, and the random effects, b, are sampled independently, and given these entities, the observed multivariate longitudinal response, y is simulated. Patients are randomly allocated in a 3:1 (265:89) manner to treatment and placebo, which have identical population level models with the exception of their respective transition probability matrices (Scenario III in Section 3.4). As is the case in the motivating example, the observed process is sampled at six time points (ni = 6 for all i). For the purposes of this simulation study, data are assumed to not be missing, and 200 simulations are run using the methodology described in Section 3.3. The data were generated from a MHMM with well-separated modes (τ 1 = [−5, 3, 3], τ 2 = [1.2, 0.2, 1.1]) with a large between-subject heterogeneity in the binomial response via independent multivariate normal random effects. The random effects are i.i.d and have a 0.5 covariance matrix  = [ 8.5 0.5 0.1 ] with the inverse of the residual error, σ12 , in the normally distributed response set at 10.0.  The hidden process is generated from a common initial probability vector, π = [0.6, 0.25,  0.15], and placebo transition ma0.85 0.10 0.05 0.60 0.20 0.20 0.10 0.10 0.80 0.85 0.10 0.05 0.60 0.20 0.20 . 0.10 0.30 0.60

trix P (0) =

P (1) =





with treatment transition matrix,

This situation closely describes the motivating example, where the modes of the hidden states are well-separated, and there is a great deal of heterogeneity in the binomial response. As can be seen in Table 1, the estimates generated from sampling from the posterior, π(|y), are generally well-behaved, and the posterior means are generally unbiased, while the posterior standard deviations are generally close to the standard deviation of the posterior means over the simulations. With the exception of 22 , the 0.025 and 0.975 posterior quantiles cover the true parameter value very close to 95% of the time, with overall mean coverage of 94.7%. In the exception ( 22 ), the reduced coverage may be due in part to the bias introduced by our prior, but may be also due to the structure of the data or link function utilized. We further considered six other simulated situations in Web Appendix B, where we vary the amount of separation between the conditional modes of each response, in addition to varying the amount of between-subject heterogeneity in the observed process. In general, we found that reducing the separation between the conditional modes or increasing the diagonal elements of  resulted in flat posterior distributions for some parameters, in particular P. This could be resolved by in-

826

Biometrics, September 2015 Table 1 Simulation results for 200 runs of Situation (1)

Parameter π1 π2 p012 p013 p021 p023 p031 p032 p112 p113 p121 p123 p131 p132 τ11 τ12 τ13 τ21 τ22 τ23 11 12 22 1 σ2

True value 0.60 0.25 0.10 0.05 0.60 0.20 0.10 0.10 0.10 0.05 0.60 0.20 0.10 0.30 −5.00 3.00 3.00 1.20 0.20 1.10 8.50 0.50 0.10 10.00

Average of posterior means 0.5961 0.2555 0.1075 0.0534 0.5601 0.2033 0.1134 0.1135 0.1042 0.0510 0.5935 0.1976 0.1043 0.3070 −4.9905 2.9859 3.0311 1.2028 0.1983 1.0987 8.5833 0.5100 0.1092 10.0868

Average posterior SD

SD of posterior means

95% Posterior quantile coverage

0.0437 0.0423 0.0391 0.0175 0.1115 0.0735 0.0445 0.0466 0.0237 0.0101 0.0631 0.0387 0.0366 0.0455 0.2292 0.1755 0.2162 0.0217 0.0384 0.0362 0.9981 0.0760 0.0103 0.3916

0.0436 0.0440 0.0385 0.0178 0.0993 0.0681 0.0440 0.0452 0.0245 0.0093 0.0580 0.0383 0.0356 0.0454 0.2374 0.1772 0.2262 0.0236 0.0424 0.0381 1.0238 0.0648 0.0108 0.3939

0.960 0.955 0.955 0.950 0.955 0.960 0.935 0.960 0.945 0.975 0.960 0.945 0.960 0.955 0.945 0.925 0.940 0.940 0.940 0.930 0.955 0.985 0.855 0.945

SD, standard deviation. where πk is the initial probability of hidden state k, psjk is the j, k transition probability for the sth treatment group for the hidden states, τrk is the intercept term for response r and hidden state k,  is the covariance matrix of the random effects, and σ2 is the residual error associated with response 2.

creasing the length of the time series, or by using a more informative prior on the problematic parameters. 5.

Results from the Smoking Cessation Motivating Example We return to our motivating example from a smoking cessation clinical trial (O’Malley et al., 2006) examining a pharmacological intervention’s effect on the rates of maintained smoking abstinence over 6 weeks. To contrast the analysis under two separate univariate response (UVR) models, and one multivariate response (MVR) model, we fit the three models under Model I (no treatment effects) and present these results in Table 2. During the model fitting process we encountered several fitting issues under UVR models which we did not encounter under the MVR model. Label switching did not occur under the MVR model, but was a significant issue in the UVR models, particularly for the SR only model. Under the SR UVR model, we also encountered a problem related to the parameter estimates of τ which often resulted in estimates on the inverse-logit scale which were essentially invariant to changes on the logit scale, due to the parameter estimates being very close to the boundary of the parameter space. This was largely a result of using our chosen flat prior for τ which may perform poorly in such instances, and was partially resolved by using a Gaussian prior in its place, without much substantive effect on the inferences we discuss below.

In Table 2, we note that the parameter estimates under the MVR model can often be seen as a compromise between the two UVR models, but we note several differences. The between-subject heterogeneity ( 11 and 22 ) under the UVR models is noticeably smaller than compared to the MVR model. Second, SR UVR model is largely detecting differences between observations with very low smoking probability (abstainers) and very high probability of smoking (essentially smoking on a given day with probability almost 0 or 1). To compare how the models fit the data, we estimated the observation specific means from the j = 1, . . . , J MCMC

(j) outputs: μ irt = J1 Jj=1 h−1 r (ηirtk ), and constructed 95% posterior intervals for μirt . Examples of model fits to the data are presented in Figure 2, where the UVR and MVR means along with their corresponding intervals are presented side-by-side along with observed values for two study subjects for illustration purposes. In general, we see reasonably close mean estimates between the UVR and MVR models. Since UVR models are fitting the univariate data alone, with no consideration of the additional response, the UVR means will be on average slightly closer to the observed values. The interval estimates provide a stark contrast in several scenarios. We note that in both the UVR and MVR settings, interval widths are often larger at times when there appears to be a hidden state transition (e.g., in the top row subject, observations 1 and 2, in Figure 2). This makes sense, particularly when the tran-

Multivariate Response Mixed Effects Hidden Markov Models

827

Table 2 Bayesian and maximum likelihood estimates (MLE) for no treatment effect model (Scenario I) for individual univariate and multivariate response models Bayesian posterior mean (posterior SD) Parameter π1 π2 p12 p13 p21 p23 p31 p32 τ11 τ12 τ13 τ21 τ22 τ23 11 12 22 1 σ2

SR UVR

CO UVR

MVR

MLE (estimate SE) MVR

0.67 0.19 0.09 0.01 0.36 0.14 0.09 0.18 −5.80 2.95 4.08 — — — 9.81 — — —

0.81 0.16 0.04 0.007 0.10 0.07 0.15 0.19 — — — 1.12 0.86 1.06 — — 0.05 11.46

0.76 (0.04) 0.17 (0.04) 0.04 (0.01) 0.006 (0.003) 0.26 (0.06) 0.21 (0.05) 0.05 (0.03) 0.18 (0.06) −5.29 (0.26) 4.05 (0.28) 1.88 (0.38) 1.18 (0.02) 0.35 (0.08) 1.10 (0.06) 6.67 (1.07) 0.30 (0.08) 0.10 (0.01) 10.49 (0.52)

0.76 (0.04) 0.17 (0.04) 0.03 (0.01) 0.005 (0.001) 0.25 (0.05) 0.21 (0.05) 0.04 (0.03) 0.17 (0.09) −5.30 (0.26) 4.08 (0.28) 1.88 (0.40) 1.18 (0.02) 0.35 (0.08) 1.10 (0.06) 6.59 (1.06) 0.30 (0.07) 0.09 (0.01) 10.49 (0.53)

where πk is the initial probability of hidden state k, pjk is the j, k transition probability for the hidden states, τrk is the intercept term for response r and hidden state k,  is the covariance matrix of the random effects, and σ2 is the residual error associated with response 2. SD, standard deviation; SE, standard error; SR, self-reported smoking response; CO, carbon-monoxide response; UVR, univariate response; MVR: multivariate response.

sition is in isolation, where it may be difficult to distinguish from noise. Under the MVR model, one particularly interesting observation occurs when the study cut-offs used to define smoking status differ in their determination. This has no direct effect on the intervals generated by the UVR analyses, as it does not consider the other response, but it in general widens the intervals for the MVR, as there is less certainty about the underlying hidden state. This can be seen in observation 2 on the lower plots of Figure 2, where a decrease in CO levels for the second observation results in wider SR intervals, where under the UVR, the SR interval remains about the same as the adjacent observations. The differences in interval widths in MVR and UVR analyses can differ in many situations, and largely depends on the level of agreement and consistency between responses. For the multivariate response model, we also computed the maximum likelihood estimates (MLEs), by maximizing (2) using Gauss–Hermite quadrature in a procedure similar to Altman (2007). The standard errors for these estimates are derived from the inverse of the estimated Hessian matrix, adapted to the parameterization used for our approach outlined in Section 3. As one can see in Table 2, the MLEs are very close to the posterior means. When introducing methods to model the treatment effects (Section 3.4), we enumerated four different ways to incorporate treatment effects for the hidden states. In addition to the no treatment effect situation presented in Table 2, we fit each of the other three models with a varying number of parameters (18–26 parameters), and present the results in Table 3.

We generally note very little effect of treatment with the exception of perhaps a difference between p032 and p132 . These quantities represent the transition probability of individuals transitioning from the smoking state to the intermediate withdrawal state in the placebo and treatment groups, respectively. Ideally, this would be high, suggesting the readiness of individuals to stop smoking, even if they had restarted during the study period. Surprisingly however, the rate is higher in the placebo group, suggesting that those on treatment actually do worse than those on placebo. From our MCMC samples, we can estimate quantities like, P(p132 > p032 ), which is estimated as 0.11 in model III and 0.08 in model IV. This provides some evidence that the treatment may inhibit abstinence in those subjects who remained or returned to smoking after the first week of the study period. Without formally performing model selection or hypothesis testing on the two different treatment effects, this single effect remains somewhat unconvincing. Using model I, we can estimate the transformed means of the CO levels for each of the hidden states by using τ 2 and the chosen structure of uirtk . In our case, uirtk corresponds to the cumulative sum of the elements of τ 2 , resulting in estimates of the means of 1.18, 1.53, and 2.63 for the first, second and third hidden states, respectively. By inverting the log(Y + 1) transformation, the estimates on the untransformed scale are 2.25, 3.61, and 12.82, for hidden states one, two and three, respectively. In most studies a CO threshold of 8 or sometimes 10 ppm is used as evidence that a subject is smoking, suggesting our results provide some face validity. Similarly,

828

Biometrics, September 2015 CO Response

2.5

3.0

0.4

1.0

0.0

1.5

2.0

Predicted Mean

0.6

UVR MVR Observation Value

0.2

Predicted Mean

0.8

3.5

1.0

SR Response

1

2

3

4

5

6

1

2

Study Week

4

5

6

5

6

2.5 1.0

0.0

1.5

2.0

0.4

0.6

Predicted Mean

3.0

0.8

3.5

1.0

Study Week

0.2

Predicted Mean

3

1

2

3

4

5

6

Study Week

1

2

3

4

Study Week

Figure 2. Multivariate response (MVR) and univariate Response (UVR) model fits for two study subjects. Left side plots are the self-reported (SR) (binomial) response, and the right side is for the carbon monoxide (Normal) response. The dashed line on the CO response represents the study defined cut-off (log(10+1)) to define smoking. Bars represent the 0.025 and 0.975 posterior quantiles.

for the daily SR binomial response, an individual with this response’s corresponding random effect equal to zero would have an estimated probability of smoking on a single day of 0.005, 0.23, and 0.65 for the first, second and third hidden states, respectively. Our analysis has assumed that the data is generated from a three hidden state HMM, and although this is hypothesized to be the case, we can examine this issue by fitting models and varying the number of hidden states. We examine this issue in more detail in Web Appendix C, and find evidence that a higher number of hidden states better describe the data. These additional hidden states further divide up the intermediate withdrawal state, with the lowest and highest hidden states remaining both qualitatively similar in interpretation as before. There are many possible reasons why this may occur, and is an area that merits further direct analytical and clinical investigation. 6. Discussion Overall, we have outlined a general framework for modeling multivariate longitudinal data in the setting where heterogeneity in the longitudinal responses is explained by random subject-specific differences and/or introduced from a given subject transitioning through multiple disease states. We be-

lieve this could have a great number of applications in the modeling of longitudinal data arising from the study of addiction, where the underlying disease states are conceptually understood to contribute significantly to the underlying heterogeneity of measured responses, but ultimately are difficult to measure directly. Modeling a multivariate longitudinal response through a MHMM has a degree of flexibility in accounting for the correlation and heterogeneity between- and within-responses by the incorporation of the hidden process along with the random effects, that is not present by using only one of these components in isolation. We have found that in most instances, the parameter estimates generated from a MHMM using a bivariate response can be viewed as a compromise between the estimates generated from the two UVR models, but this observation may be dataset specific. In our examination we have found that using more than one response can more appropriately capture the uncertainty of the predicted mean responses, particularly in situations where there is some discordance between the perceived diseases states interpreted from the responses separately. In fact, intervals generated for the predicted mean responses under the multivariate response analysis were generally wider than their corresponding UVR models, except in instances with strong support from the data (i.e., data with persistent and con-

Multivariate Response Mixed Effects Hidden Markov Models

829

Table 3 Results for Scenarios II–IV from the motivating smoking cessation example Parameter π10 π20 π11 π21 p012 p013 p021 p023 p031 p032 p112 p113 p121 p123 p131 p132 τ11 τ12 τ13 τ21 τ22 τ23 11 21 22 1 σ2

Model II

Model III

Model IV

0.75 [0.76, 0.068, (0.60,0.87)] 0.19 [0.18, 0.066, (0.08,0.33)] 0.74 [0.75, 0.043, (0.65,0.82)] 0.18 [0.18, 0.041, (0.11,0.27)] 0.036 [0.035, 0.009, (0.02,0.06)] 0.006 [0.006, 0.003, (0.001,0.013)] 0.26 [0.26, 0.056, (0.16, 0.37)] 0.20 [0.20, 0.052, (0.12, 0.31)] 0.054 [0.050, 0.029, (0.01, 0.12)] 0.18 [0.17, 0.056, (0.08,0.30)] — — — — — — −5.32 [−5.32, 0.26, (−5.88, −4.84)] 4.03 [4.03, 0.28, (3.48,4.59)] 1.84 [1.84, 0.41, (1.05,2.64)] 1.18 [1.18, 0.023, (1.13, 1.22)] 0.33 [0.34, 0.081, (0.19, 0.50)] 1.10 [1.09, 0.059, (0.98, 1.21)] 6.73 [6.63, 1.10, (4.83, 9.12)] 0.31 [0.31, 0.076, (0.17, 0.47)] 0.097 [0.097, 0.011, (0.08, 0.12)] 10.53 [10.52, 0.53, (9.54,11.60)]

0.75 [0.75, 0.04 (0.67, 0.82)] 0.18 [0.18, 0.04 (0.12, 0.27)] — — 0.024 [0.021, 0.014 (0.004,0.06)] 0.013 [0.012, 0.008 (0.002, 0.03)] 0.22 [0.22, 0.08, (0.08, 0.41)] 0.17 [0.16, 0.07, (0.06, 0.34)] 0.062 [0.046, 0.058, (0.002, 0.22)] 0.30 [0.29, 0.12, (0.10, 0.56)] 0.042 [0.041, 0.011, (0.02, 0.07)] 0.005 [0.004, 0.004, (0.0003,0.01)] 0.26 [0.26, 0.062, (0.15, 0.39)] 0.21 [0.21, 0.06, (0.11, 0.34)] 0.076 [0.071, 0.037, (0.02, 0.16)] 0.13 [0.13, 0.063, (0.02, 0.27)] −5.32 [−5.31, 0.26, (−5.87, −4.84)] 4.03 [4.02, 0.28, (3.50, 4.58)] 1.92 [1.92, 0.42, (1.09, 2.73)] 1.18 [1.18, 0.023, (1.13,1.23)] 0.33 [0.32, 0.081, (0.17, 0.50)] 1.10 [1.10, 0.063, (0.98, 1.23)] 6.60 [6.51, 1.08, (4.77, 8.98)] 0.31 [0.31, 0.08, (0.17, 0.47)] 0.10 [0.10, 0.011, (0.08, 0.12)] 10.42 [10.41, 0.52, (9.41, 11.49)]

0.76 [0.76, 0.07, (0.61,0.88)] 0.18 [0.17, 0.07, (0.07,0.33)] 0.74 [0.75, 0.044, (0.65,0.82)] 0.18 [0.18, 0.042, (0.11,0.27)] 0.024 [0.021, 0.014, (0.004,0.06)] 0.013 [0.012, 0.008, (0.002,0.03) 0.27 [0.27, 0.094, (0.11, 0.48)] 0.20 [0.19, 0.079, (0.06,0.38)] 0.062 [0.046, 0.057, (0.002,0.21)] 0.31 [0.30, 0.12, (0.11, 0.55)] 0.043 [0.042, 0.011, (0.02, 0.07)] 0.005 [0.005, 0.003, (0.0004, 0.01)] 0.27 [0.27, 0.065, (0.15, 0.41)] 0.22 [0.22, 0.058, (0.11, 0.35)] 0.071 [0.067, 0.037, (0.02, 0.15)] 0.13 [0.13, 0.060, (0.03, 0.27)] −5.32 [−5.31, 0.27, (−5.87, −4.83)] 3.98 [3.97, 0.28, (3.44, 4.55)] 1.83 [1.83, 0.42, (1.01, 2.64)] 1.18 [1.18, 0.024, (1.13, 1.23)] 0.32 [0.32, 0.078, (0.18, 0.49)] 1.09 [1.09, 0.061, (0.97,1.21)] 6.97 [6.87, 1.16, (4.96, 9.49)] 0.33 [0.33, 0.080, (0.19,0.50)] 0.10 [0.10, 0.011, (0.08, 0.12)] 10.50 [10.49, 0.53, (9.48, 11.57)]

The displayed table are Posterior Mean [(Posterior Median, Posterior standard deviation (0.025 quantile, 0.975 quantile)], where πks is the initial probability of hidden state k for the sth treatment group, psjk is the j, k transition probability for the sth treatment group for the hidden states, τrk is the intercept term for response r and hidden state k,  is the covariance matrix of the random effects, and σ2 is the residual error associated with response 2.

cordant high/low observations). Others have applied HMMs to the setting of addiction (Shirley et al., 2010; DeSantis and Bandyopadhyay, 2011), as well as other diseases where a similar argument can be made pertaining to the underlying disease states, such as multiple sclerosis (Altman, 2007) and schizophrenia (Scott et al., 2005). While these methods are in their relative infancy, we believe they can play an important role in understanding how a treatment works on true clinical outcomes through understanding how a treatment may affect the modeled hidden states. While our methodology described throughout this article may generalize to three or more longitudinal responses, we focus on the bivariate case largely motivated by the smoking cessation example used throughout. We were able to successfully adapt the MLE approach of Altman (2007) in our bivariate response example, and it required significant computing resources (multithreading) to complete within a reasonable time (∼ 1 day). In general, the results of the MLE approach were similar to our Bayesian approach, but the latter required less computing time (16–20 hours) with fewer resources (a single threaded machine). There is also the possibility to improve upon aspects of the MCMC approach outlined in Section 3.3 by using adaptive Metropolis strategies (Haario, Saksman, and Tamminen, 2001). Assessing the computational

feasibility and efficiency or developing methodology for higher dimension longitudinal responses is an area of future work for both the Bayesian and MLE approaches, but the work undertaken in this article for a bivariate longitudinal response seems promising. The methods as presented in this article have several limitations which can be addressed in future work. First, in UVRs, Altman (2007) allows for between-subject heterogeneity in the hidden state process in addition to the observed process. We have not considered this case, mainly due to the limited number of observation times (six) in the motivating smoking cessation study. Scott et al. (2005) take a Bayesian approach allowing for subject-specific transition matrices under a hierarchical model, and we believe a similar approach could be adapted to models presented in this article, if the dataset permitted it. Additionally, a hierarchical approach could also be used to address situations where the effects of a larger number of strata or other covariates are an important target of inference. In other situations, such as when there is a large number of observation times, a small hidden state space, little between-subject heterogeneity or well-separated modes in the observed process, or specific types of hidden process transition probability matrices, it may be possible to cautiously consider including random effects in the hidden

830

Biometrics, September 2015

process. If a homogeneous common transition matrix cannot be applied to a specific dataset, it may be more worthwhile looking at somewhat simpler alternatives such as, for example, increasing the order of the hidden process or using a inhomogeneous but common hidden process, which may be more feasible than using subject-specific transition matrices in certain situations. Extensions which allow additional sources of heterogeneity may have difficulties with parameter identification. Identifiability for different variations of HMMs has been examined in the literature (Rosychuk and Islam, 2009; Garcia-Zattera et al., 2012), and for the models we consider, this may be a concern as well, particularly for short time-series, or MHMMs with conditional modes which are not well-separated or exhibit high between-subject heterogeneity. Secondly, we were limited in our ability to perform formal hypothesis testing, or other model selection techniques. Bayes factors and their analogues are usually the basis of Bayesian hypothesis testing, but are generally reliant upon estimating the integrated likelihood of the models in question. We have undertaken some investigation as it relates to the number of hidden states in the application dataset, and found evidence (see Web Appendix C) that more than three hidden states generally improved model fit by using the WAIC criteria (Watanabe, 2010). The best practices of this approach in all settings has not been well examined, but was generally consistent with other approaches. There are many possible explanations for this phenomena, which merits further separate investigation. Consideration of other analytic approaches including using reversible jump (Green, 1995) approaches to evaluate the number of hidden states as part of the MCMC algorithm, and also considering the latent disease process on a continuous scale (Luo, 2014) should also be undertaken. As a whole, hypothesis testing, model selection and effectively determining the number of hidden states are crucial to wide use of these models, and is an area of future work, particularly as it relates to differences between inference in MVR and UVR models. Development of methodology to handle longitudinal missing data in this context will also be important, as it is for most longitudinal data analyses. Further, when the multiple longitudinal responses are collected at differing time points, this could be modeled through a binning process (e.g., Xiong and Dubin, 2010) or otherwise viewed as a missing data problem. When the missingness or observation time processes can be considered ignorable, both issues can likely be resolved as others have pointed out by making small modifications to the likelihood (Altman, 2007). Sensitivity of this solution in either context to deviations from a missing-at-random assumption is an area which merits further investigation. Additionally, further considerations for prior choice, including the influence on parameters estimates, but also computational feasibility, is an important area to further investigate. In our motivating example using a smoking cessation study, we note a possible difference between the hidden state 3-2 transition probabilities in the placebo and treatment groups. This was a somewhat surprising result, as we would normally expect the treatment group to have a beneficial effect, when here it appears it may not. Given the relatively good performance of our method in the simulation study in this article, which are fairly representative of the data generated from the

motivating study, we are confident this noted difference is not due to a computational issue with our methodology. It is quite possible that the difference estimated in models III and IV for p132 and p032 is due to some unmodeled missing data mechanism, and is an area we are currently investigating. Also, we note that that O’Malley et al. (2006) did not find evidence for a treatment effect in the entire study sample. Lastly, although maximum likelihood (ML) methods could be used for fitting models such as the ones presented in this article, we have found, in general, that a Bayesian approach using MCMC is more computationally efficient. Although we were able to fit MHMMs to multivariate longitudinal response data via ML to such data by approximating the integration in Equation (1) by Gauss–Hermite quadrature, successful computation required multi-threading to do in a reasonable amount of time. In circumstances where practitioners have access to a computer cluster or large multi-threaded computer, using ML to fit many such models may be practical, but until a more efficient ML can be provided, the computation times of the Bayesian approach we outline, relative to the ML approach, differ by up to an order of magnitude on a single-threaded machine. This improvement in computational efficiency, along with the flexibility and general applicability of such models, provides a great opportunity for future work and application to other disease areas and problems. 7. Supplementary Materials Web Appendices and Tables referenced in Sections 3–6 along with R/C++ code to fit the models discussed above are available with this paper at the Biometrics website on Wiley Online Library.

Acknowledgements The authors would like to thank Dr. Stephanie O’Malley from Yale University for the use of the smoking cessation dataset. They would also like to thank the editor, associate editor and anonymous referees for their constructive and important feedback. J.D.R. is partially funded by a Pacific Institute of Mathematical Sciences (PIMS) Post-doctoral Fellowship. J.A.D. is partially supported by NSERC Discovery Grant RGPIN2014-05911, and both J.A.D. and J.D.R. were earlier supported by a previous NSERC Discovery Grant.

References Altman, R. M. (2007). Mixed Hidden Markov models: An extension of the Hidden Markov model to the longitudinal data setting. Journal of the American Statistical Association 102, 201– 210. Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics 41, 164–171. DeSantis, S. M. and Bandyopadhyay, D. (2011). Hidden Markov models for zero-inflated Poisson counts with an application to substance use. Statistics in Medicine 30, 1678–1694. Garcia-Zattera, M. J., Jarab, A., Lesaffrec, E., and Marshall, G. (2012). Modeling of multivariate monotone disease processes in the presence of misclassification. Journal of the American Statistical Association 107, 976–989.

Multivariate Response Mixed Effects Hidden Markov Models Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741. Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732. Haario, H., Saksman, E., and Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli 7, 223–242. Hughes, J. R., Keely, J. P., Niaura, R. S., Ossip-Klein, D. J., Richmond, R. L., and Swan, G. E. (2003). Measures of abstinence in clinical trials: Issues and recommendations. Nicotine and Tobacco Research 5, 13–25. Killeen, P. R. (2011). Markov model for smoking cessation. Proceedings of the National Academy of Science 108, 15549– 15556. Luo, S. (2014). A Bayesian approach to joint analysis of multivariate longitudinal data and parametric accelerated failure time. Statistics in Medicine 33, 580–594. Maruotti, A. and Ryden, T. (2009). A semiparametric approach to hidden Markov models under longitudinal observations. Statistics and Computing 19, 371–393. Maruotti, A. (2011). Mixed hidden Markov models for longitudinal data: An overview. International Statistical Review 79, 427– 454. O’Malley, S. S., Cooney, J. L., Krishnan-Sarin, S., Dubin, J. A., McKee, S. A., Cooney, N. L., Blakeslee, A., Meandzija, B., Romano-Dahlgard, D., Wu, R., Makuch, R., and Jatlow, P. (2006). A controlled trial of naltrexone augmentation of Nicotine replacement therapy for smoking cessation. Archives of Internal Medicine 166, 667–674. Patrick, D. L., Cheadle, A., Thompson, D. C., Diehr, P., Koepsell, T., and Kinne, S. (1994). The validity of self-reported smoking: A review and meta-analysis. American Journal of Public Health 84, 1086–1093. Proust-Lima, C., Joly, P., Dartigues, J. F., and Jacqmin-Gadda, H. (2009). Joint modelling of multivariate longitudinal outcomes and a time-to-event: A nonlinear latent class ap-

831

proach. Computational Statistics & Data Analysis 53, 1142– 1154. Rosychuk, R. J. and Islam, S. (2009). Parameter estimation in a model for misclassified Markov data—A Bayesian approach. Computational Statistics and Data Analysis 53, 3805–3816. Scott, S. L. (2002). Bayesian methods for hidden Markov models: Recursive computing in the 21st century. Journal of the American Statistical Association 97, 337–351. Scott, S. L., James, G. M., and Sugar, C. A. (2005). Hidden Markov models for longitudinal comparisons. Journal of the American Statistical Association 100, 359–369. Shirley, K., Small, D., Lynch, K. G., Maisto, S. A., and Oslin, D. W. (2010). Hidden Markov models for alcoholism treatment trial data. Annals of Applied Statistics 4, 366–395. Smith, A. F. M. and Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (with discussion). Journal of the Royal Statistical Society, Series B 55, 3–24. Stephens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society, Series B 62, 795–809. Verbeke, G, Fieuws, S, Molenberghs, G., and Davidian, M. (2014). The analysis of multivariate longitudinal data: A review Statistical Methods in Medical Research 23, 42–59. Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research 11, 3571–3594. Xiong, X. and Dubin, J. A. (2010). A binning method for analyzing mixed longitudinal data measured at distinct time points. Statistics in Medicine 29, 1919–1931. Zucchini, W. and MacDonald, I. L. (2009). Hidden Markov Models for Time Series: An Introduction Using R. Boca Raton, FL: Chapman and Hall/CRC.

Received February 2014. Revised December 2014. Accepted December 2014.

Multivariate longitudinal data analysis with mixed effects hidden Markov models.

Multiple longitudinal responses are often collected as a means to capture relevant features of the true outcome of interest, which is often hidden and...
256KB Sizes 3 Downloads 10 Views