On estimating and testing associations between random coefficients from multivariate generalized linear mixed models of longitudinal outcomes.

XML Template (2015) [24.1.2015–5:06pm] [1–16] //blrnas3.glyph.com/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140142/APPFile/SG-SMMJ140142.3d (SMM) [PREPRINTER stage]

Article

On estimating and testing associations between random coefficients from multivariate generalized linear mixed models of longitudinal outcomes

Statistical Methods in Medical Research 0(0) 1–16 ! The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0962280214568522 smm.sagepub.com

Susan K Mikulich-Gilbertson,1,2 Brandie D Wagner,2 Paula D Riggs1 and Gary O Zerbe2

Abstract Different types of outcomes (e.g. binary, count, continuous) can be simultaneously modeled with multivariate generalized linear mixed models by assuming: (1) same or different link functions, (2) same or different conditional distributions, and (3) conditional independence given random subject effects. Others have used this approach for determining simple associations between subject-specific parameters (e.g. correlations between slopes). We demonstrate how more complex associations (e.g. partial regression coefficients between slopes adjusting for intercepts, time lags of maximum correlation) can be estimated. Reparameterizing the model to directly estimate coefficients allows us to compare standard errors based on the inverse of the Hessian matrix with more usual standard errors approximated by the delta method; a mathematical proof demonstrates their equivalence when the gradient vector approaches zero. Reparameterization also allows us to evaluate significance of coefficients with likelihood ratio tests and to compare this approach with more usual Wald-type t-tests and Fisher’s z transformations. Simulations indicate that the delta method and inverse Hessian standard errors are nearly equivalent and consistently overestimate the true standard error. Only the likelihood ratio test based on the reparameterized model has an acceptable type I error rate and is therefore recommended for testing associations between stochastic parameters. Online supplementary materials include our medical data example, annotated code, and simulation details. Keywords Joint modeling, random coefficient associations, stochastic parameter associations, delta method, Fisher’s z transformation, likelihood ratio tests

1

Department of Psychiatry, School of Medicine, University of Colorado Anschutz Medical Center, Aurora, USA Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Center, Aurora, USA

2

Corresponding author: Susan K Mikulich-Gilbertson, University of Colorado Anschutz Medical Campus, Mail Stop F478, 12469 East 17th Place, Aurora, CO 80045, USA. Email: [email protected]

Downloaded from smm.sagepub.com at CAMBRIDGE UNIV LIBRARY on August 11, 2015


2

Statistical Methods in Medical Research 0(0)

1 Introduction Biological and medical research is often designed to investigate changes in a collection of response variables which are measured repeatedly over time on the same subjects. Multivariate longitudinal data of this kind provide the opportunity to address a variety of clinically relevant questions with joint analyses of outcomes. Early research on simultaneously modeling multiple longitudinal outcomes is restricted to normal multivariate linear models.1–3 More recently, the approach has been extended to multivariate nonlinear models for outcomes with normal residuals4,5 and multivariate generalized linear models.6–13 We describe a multivariate generalized linear mixed model for multiple outcomes which may be nonnormal and differently distributed and evaluate random coefficient associations among the outcomes, beginning with simple correlation coefficients discussed by others.7,9–13 A series of extensions and novel contributions follow that together inform procedures for estimating and testing random coefficient associations. We demonstrate how more complex partial correlation and regression coefficients may be obtained through recursion and indicate that the alternate method discussed by Zucker et al.3 for computing partial regression coefficients can be extended to nonnormal, differently distributed outcomes. We describe correlation between predicted outcomes at different times and determine the time lag of maximum correlation. A reparameterization of the model allows us to directly estimate association coefficients of interest as model parameters and to obtain standard errors based on the inverse of the Hessian matrix. These inverse Hessian-based standard errors can be compared with more usual standard errors approximated by the delta method; a mathematical proof demonstrates their equivalence when the gradient vector associated with model convergence is zero and approximate equivalence when the gradient is small. Importantly, the reparameterization also facilitates likelihood ratio tests (LRTs) of the significance of these simple and more complex correlation and regression coefficients. We compare them with results from more usual Wald-type t-tests and with Fisher’s z transformations using data from a pharmacotherapy trial and simulations and we show that only the LRTs achieve the nominal significance level. The dataset, annotated code, and simulation methods and results are included in an online supplement (available at: http://smm.sagepub.com/).

2 Multivariate generalized linear mixed model For subjects i ¼ 1,. . ., N, assume independent vectors yi ¼ yT1i

yT2i

yTri

T

Here, yi denotes a vector of observations on r outcomes at multiple times, and yhi is the subvector of these observations for outcome h. Neither the times nor the number of outcomes need be identical for different subjects, although we will assume that such differences are not informative of outcome. Let the joint distribution of the likely correlated observations on subject i be conditional on a k 1 vector of population parameters b, a p 1 vector of subject-specific random effects ui, and a covariate matrix Xi. Additionally, the random effects are assumed to have multivariate normal distribution nðui j Þ, with mean 0 and p p covariance D, a function of m, a vector of unknown parameters. Then inferences about b and are based on the joint likelihood (density) N Z Y f yi jb, ui , Xi nðui jÞ@ui L¼ i¼1



Mikulich-Gilbertson et al.

3

We assume that the within subject distributions are conditionally independent. The joint density for subject i conditional on ui is then the product of densities for the h ¼ 1,. . .,r outcomes r Y f yi j b, ui , Xi ¼ fh yhi jbh , uhi , Xhi h¼1

where bh , uhi, and Xhi are the respective components of b, ui, and Xi for the hth outcome. The conditional densities fh(.) need not be the same for each outcome, and the joint log likelihood for subject into appropriate software (e.g. SAS Procedure NLMIXED14) as i can be entered ln f1 y1i j b1 , u1i , X1i for the first outcome, ln f2 y2i j b2 , u2i , X2i for the second outcome, etc. The conditional density fh yhi j bh , uhi , Xhi of outcome h given the random effects uhi may be (but does not have to be) a member of the exponential family.15 We do require that models utilize functions gh(.) linking the conditional expectations to linear combinations of the fixed and random effects such that for subject i gh E yhi juhi ¼ Xhi bh þ Zhi uhi P where bh is a kh 1 vector of fixed effects such that k ¼ rh¼1 kh and uh is a ph 1 vector of random Pr effects such that p ¼ h¼1 ph for outcome h. Multivariate generalized linear mixed models of this type have been utilized by others to estimate correlations among stochastic parameters.7,9–13 The matrices of the link function can be partitioned such that gh E yhi juhi ¼ Xð1Þ hi

Xð2Þ hi

bð1Þ h bð2Þ h

! þ

Zð1Þ hi

Zð2Þ hi

uð1Þ hi uð2Þ hi

!

ð1Þ In the common case where Xð1Þ hi ¼ Zhi , then ð1Þ ð1Þ ð2Þ ð2Þ ð2Þ b þ u þ Xð2Þ gh E yhi juhi ¼ Xð1Þ hi h hi hi bh þ Zhi uhi ð1Þ and bð1Þ can be interpreted as a ph ð1Þ 1 vector of stochastic parameters (i.e. random h þ uhi coefficients, latent variables) for outcome h and subject i. In further discussions of stochastic parameters, the superscript (1) will be implied, but omitted for brevity and clarity. Let bhvi ¼ bhv þ uhvi denote the T vth element of this stochastic parameter vector. Then bh þ uhi ¼ bh1i bh2i bhph i , and the r vectors bh þ uhi for subject i can be concatenated into a p 1 vector

b11i b12i b1p1 i b21i b22i b2p2 i br1i br2i brpr i ÞT

If the double subscripts (h,v), h ¼ 1,. . .,r; v ¼ 1,. . .,ph, are replaced by a Tsingle subscript w, w ¼ 1,. . .,p, the above vector is expressed more simply as b1i b2i bpi , a p 1 vector of latent stochastic parameter predictors for subject i. The focus of this manuscript is to further develop methods for estimating and testing association between these stochastic parameters. In subsequent regression analyses, p stochastic parameters as variables b1, b2,. . . bp and their we will refer to the T collection as B ¼ b1 b2 bp .



4


3 Illustrative example: Daily marijuana and alcohol use in adolescent patients One useful application of the methods discussed here is to understand the interrelationships among addiction to multiple drugs, especially due to high rates of polysubstance use in adolescents. Data for this example come from a multisite pharmacotherapy trial of cooccurring attention-deficit hyperactivity disorder and substance use in adolescents.16 Self-reported daily use of marijuana, alcohol, and other drugs are available on up to 112 days of the treatment period and all participants received cognitive behavioral therapy treatment targeting their primary addiction which was predominantly marijuana.16 The sample utilized here are the 65 adolescents who completed the trial, used marijuana on 14 or more days in the month before treatment, and who also used alcohol at least once in the month prior to treatment. If treatment is successful, adolescents who receive treatment should reduce their marijuana use; their concurrent or subsequent alcohol use might feasibly also decrease as an indirect result of drug treatment or might feasibly increase in compensation for reduced marijuana use. Research questions of interest include (1) What is the association between daily number of marijuana joints smoked and daily use of alcohol at baseline (prior to treatment)? (2) What is the association between the rates of change in daily joints smoked and daily use of alcohol during treatment? (3) Is the association of their use stronger at specific times during treatment (e.g. midtreatment) or for specific time lags for each drug (e.g. marijuana use two weeks prior to alcohol use)? Evaluating the time of maximum correlation between use of the drug that treatment is targeting (marijuana) and use of a second drug (alcohol) might provide indication of when secondary drug use should be addressed in treatment. Strong correlation for one outcome early in treatment with a second outcome subsequently could be suggestive of causality and could indicate that the first outcome might be predictive of the second. We have not seen methods for evaluating the following more complicated associations discussed in the literature. (4) For alcohol use at a specific time (e.g. midtreatment, day 56), when is it most strongly correlated with marijuana use and how strong is that correlation? (5) What is the association between the rates of change in daily joints smoked and daily use of alcohol adjusting for their baseline use of both prior to treatment? (6) Is the subject-specific rate of change of marijuana usage (targeted by treatment) predictive of the subject-specific rate of change of alcohol usage, with or without adjusting for their baseline use? Negative binomial regression is often used to model count variables where distributions are overdispersed Poisson (e.g. Hayaki et al.17) as is the case with marijuana joints smoked per day. To simultaneously model daily counts of joints smoked with daily use of alcohol (yes, no), we assume the conditional likelihood is a negative binomial distribution with natural log link for marijuana use (h ¼ 1) and we assume the conditional likelihood is a Bernoulli distribution with logit link for alcohol use (h ¼ 2). Assuming the natural log of the conditional expectation of number of marijuana joints is linear in time, the log of expected daily marijuana use for subject i at time t is ln EðY1i ðtÞÞju1i , u2i , u3i , u4i ¼ 1 þ u1i þ 2 þ u2i t




5

with an overdispersion parameter k. Assuming the logit of the conditional expectation of daily alcohol use is linear in time, the logit of predicted daily alcohol use for subject i at time t is logit E y2i ðtÞ ju1i , u2i , u3i , u4i ¼ 3 þ u3i þ 4 þ u4i t Subject-specific intercepts and slopes (stochastic parameters) for both outcomes on their respective scales may be specified collectively as 2 3 2 3 1 þ u1i b1i 6 b2i 7 6 2 þ u2i 7 6 7¼6 7 4 b3i 5 4 3 þ u3i 5 b4i 4 þ u4i Subject-specific random effects are assumed to have a four-variate normal distribution with mean vector 0 and covariance matrix 2 3 d11 d12 d13 d14 d23 d24 7 6 d12 d22 ffl{zfflfflfflfflfflffl} 7 6 |fflfflfflfflfflffl{zfflfflfflfflfflffl} |fflfflfflfflffl 6 marijuana marij=alcohol 7 7 D¼6 6 d13 d23 d33 d34 7 6 7 4 d14 d24 d34 d44 5 |fflfflfflfflfflffl{zfflfflfflfflfflffl} |fflfflfflfflffl ffl{zfflfflfflfflfflffl} alcohol=marij

alcohol

labeling covariance elements corresponding to each outcome in this example for clarity. Because our primary purpose is to most clearly illustrate the proposed methodology for estimating and testing coefficients of association, we chose to avoid complicating the model more with higher degree polynomial coefficients in time and clinical covariates. The online supplement includes results from a more complex model with fixed effects of up to cubic coefficients in time for each outcome and evaluates inclusion of three potential clinical covariates for each outcome in that more complex model.

4 Measures of association between random coefficients These measures of association among the stochastic parameters from multivariate nonlinear and generalized linear mixed models are functions of the elements of D or D–1.

4.1

Simple correlation and regression coefficients

As noted, others have reported simple correlations between stochastic parameters. Examples relevant to this medical application include (1) The correlation between the subject-specific intercepts, e.g. the correlation between daily number of marijuana joints and daily use of alcohol at baseline prior to treatment is covðb1 , b3 Þ covðu1 , u3 Þ d13 b1 ,b3 ¼ 13 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi d11 d33 Vðb1 ÞVðb3 Þ Vðu1 ÞVðu3 Þ



6


(2) The correlation between the subject-specific slopes, e.g. the correlation between the rates of daily number of joints and daily use of alcohol during treatment is covðu2 , u4 Þ d24 b2 ,b4 ¼ 24 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi d22 d44 Vðu2 ÞVðu4 Þ (3) The correlation between number of joints smoked on day t and alcohol use on day t0 is b1 þb2 t,

b3 þb4 t0

covðu1 þ u2 t, u3 þ u4 t0 Þ d13 þ td23 þ t0 d14 þ tt0 d24 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Vðu1 þ u2 tÞVðu3 þ u4 t0 Þ d11 þ 2td12 þ t2 d22 d33 þ 2t0 d34 þ t0 2 d44

When t and t0 are both zero, this is simply the correlation between the random intercepts. As t and t0 increase, the marginal correlation converges to the correlation between the random slopes. (4) For a specified day t of marijuana use (h ¼ 1), the day (designated t0 max) that maximizes the correlation between joints smoked and alcohol use (h ¼ 2) can be obtained by differentiating (3) with respect to t0 and setting the resulting derivative equal to zero. Then t0 max ¼

½tðd24 d33 d34 d23 Þ þ ðd14 d33 d34 d13 Þ ½2d14 d34 d34 d14 d44 d13 þ tð2d24 d34 d34 d24 d44 d23 Þ

(5) The regression coefficient of the alcohol slope on the marijuana slope is the ratio b4 jb2 ¼ 4j2 ¼ covðu2 , u4 Þ=Vðu2 Þ ¼ d24 =d22

4.2

Partial correlation and partial regression coefficients obtained through recursion

Using well-known recursive formulae18 6. the partial correlation coefficient between marijuana slope b2 and alcohol slope b4 adjusting for alcohol intercept b3 is qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi b2 , b4 b3 ¼ 243 ¼ ð24 23 34 Þ= 1 223 1 234 The residual variance from the regression of marijuana slope on alcohol intercept, i.e. b2 on b3 is d223 ¼ d22 1 223 and the residual from the regression of alcohol slope on alcohol variance intercept, i.e. b4 on b3 is d443 ¼ d44 1 234 . Then, 7. the partial regression coefficient of b4 on b2 adjusting for b3 is pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4j2 3 ¼ 24 3 d443 =d22 3 :




7

8. The partial correlation coefficient between the slopes adjusting for both intercepts is qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 2413 ¼ ð243 123 143 Þ= 1 2123 1 2143

The residual variance from the regression of marijuana slope on both intercepts, i.e. b2 on b1 and 2 d d d d d d d d þd d2 b3 is d2213 ¼ d22 33 12 13 12d 23d d132 23 12 11 23 and the residual variance from the regression of the 11 33 13 2 d d d d d d d d þd d2 alcohol slope on both intercepts, i.e. b4 on b1 and b3 is d4413 ¼ d44 33 14 13 14d 34d d132 34 14 11 34 . 11 33

13

9. Then, the partial regression coefficient of b4 on b2 adjusting for b1 and b3 is pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4j2 13 ¼ 24 13 d44 13 =d22 13

In the context of linear models with conditional normal distributions, Zucker et al.3 alternately expressed 4j213 as 4j213 ¼ d24 =d44 0

where dw,w denotes element (w,w0 ) of D1 . Because Zucker et al.3 require only the asymptotic normality of the estimates of the distinct elements of the covariance matrix of the random effects their methods extend to these multivariate generalized linear mixed models.

5 Estimating and testing measures of association Typically a measure of association is estimated by a nonlinear function of maximum likelihood estimates of the elements of D and is tested for significance by a t-test using a standard error approximated by the delta method. Next, we describe various reparameterizations of the model so that the desired correlation or regression coefficient can be directly estimated as a parameter. Then the standard error of the estimate is based on the inverse of the Hessian. This offers a potential improvement over the delta method standard error because the former is based on an approximation involving second partial derivatives of the log likelihood whereas the delta method additionally contains a second approximation involving second derivatives of a nonlinear function of the parameters.

5.1

Reparameterization to directly estimate coefficients of association and inverse Hessian-based standard errors

For a simple correlation 2 d11 6 d12 D¼4 d13 d14

d24 such as 24 ¼ pffiffiffiffiffiffiffiffiffi , D can be reparameterized as d22 d44 3 2 3 d11 d12 d12 d13 d14 d13 14 pdffiffiffiffiffiffiffiffiffiffiffiffiffi d22 d23 d24 7 6 d21 d22 d23 24 d22 d44 7 ¼ 5 d33 d34 d23 d33 d34 5 4 d13 d 23 pffiffiffiffiffiffiffiffiffiffiffiffiffi d24 d34 d44 d14 24 d22 d44 d34 d44



8


Similarly, for a simple regression such as 4j2 ¼ dd2422 , D can be reparameterized to include d24 ¼ 4j2 d22 . To reparameterize D in terms of partial correlation and partial regression coefficients, consider the partitioned random effects distribution

v1 0 11 12 N , 0 v2 21 22 where v1 is the 2 1 vector of stochastic parameters that we wish to associate, v2 is the (p–2) 1 vector of stochastic parameters for which we wish to adjust, and 21 ¼ 0 12 . The covariance matrix of v1 conditional on v2 is 112 ¼ 11 12 1 22 21 , implying that the 2 2 matrix 11 ¼ 112 þ A112 may be used to express the covariance matrix of v1 as the sum of the conditional covariance matrix 112 of v1 on v2 and a matrix A112 ¼ 12 1 22 21 of adjustments for v2. Reparameterization is now made simple since only three elements of the p p D matrix, those corresponding to the three distinct elements of 112 need to be modified. For a partial correlation between two stochastic

parameters,

e.g. u2 and u4 , adjusted for two other u2 u1 , v2 ¼ , the conditional covariance matrix of stochastic parameters, e.g. u1 and u3 , v1 ¼ u4 u3 v1 on v2 is

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi d221 3 d2413 22 13 pdffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 2413 d2213 d44 13 ¼ 112 ¼ d2413 d44 13 24 13 d2213 d44 13 d44 13 where 2413 is the partial correlation coefficient between u2 and u4 adjusted for u1 and u3 , the matrix A112 of adjustments is

d12 d23 d11 d13 1 d12 d14 a22 13 a24 13 ¼ a24 13 a44 13 d14 d34 d13 d33 d23 d34 0 1 2 2 d33 d12 d13 d12 d23 d13 d23 d12 þ d11 d23 d33 d12 d14 d13 d14 d23 d13 d34 d12 þ d11 d34 d23 B C B C d11 d33 d213 d11 d33 d213 B C ¼B C 2 2 d33 d14 d13 d14 d34 d13 d34 d14 þ d11 d34 @ d33 d12 d14 d13 d14 d23 d13 d34 d12 þ d11 d34 d23 A 2 2 d11 d33 d13 d11 d33 d13 and the unconditional covariance matrix 11 is

d22 d24

d24 d44

¼

dffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 22 13 p ffi 24 13 d22 13 d44 13

24 13

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a22 13 d22 13 d44 13 þ a24 13 d44 13

a24 13 a44 13

Substituting 11 for the appropriate elements of D yields the reparameterized D 2

d11 6 d12 6 4 d13 d14

d12 d22 d23 d24

d13 d23 d33 d34

3 2 d11 d14 6 d12 d24 7 7¼6 d34 5 4 d13 d14 d44

d12 d2213 þ a2213 d23 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 24 13 d2213 d4431 þ a2413

d13 d23 d33 d34

3 d14 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 24 13 d2213 d4413 þ a2413 7 7 5 d34 d4413 þ a4413




9

Likewise, D can be reparameterized to include a partial regression coefficient adjusted for two stochastic parameters 4j213 ¼ dd2413 , with the same elements in D as for the analogous partial 2213 correlation coefficient except that d24 ¼ d2413 þ a2413 ¼ 4j213 d2213 þ a2413 , where

4j213 d22 13 d2213 is the covariance matrix of u2 and u4 adjusted for u1 and u3, and d d

4j213 22 13 4413 a22 13 a24 13 is the matrix of adjustments for u1 and u3. a24 13 a44 13

5.2

Relationship between delta method and inverse Hessian-based standard errors

Reparameterizing the model to directly estimate the association coefficient of interest provides a standard error estimate based on the inverse of the Hessian. We conjectured this inverse Hessianbased standard error would be superior to the delta method standard error obtained after more conventionally estimating covariances in D as parameters and computing association coefficients as nonlinear functions of those covariance estimates and we investigated their mathematical

relationship. ^ b ^ Consider a model where c ¼ denotes the entirety of m maximum likelihood estimators, both ^ fixed effects and variance components. Then c^ minimizes gðcÞ, the – log likelihood with respect to c. Nonlinear functions of c^ , say a^ ¼ f1 ð^cÞ, are also maximum likelihood estimators if the reparameterization of the parameter space from c onto a is a one-to-one mapping, such that f, ^ Then a^ minimizes hðaÞ, the – log likelihood with respect to a. the inverse of f1 exists, and c^ ¼ f ðaÞ. At maximization the two likelihoods with respect to a and with respect to c are equal, and we have hða^ Þ ¼ gðc^ Þ ¼ gðfða^ ÞÞ. Using the chain rule, Magnus and Neudecker19 (p. 96) prove the relationship Dhða^ Þ ¼ ðDgðc^ ÞÞ Dfða^ Þ |fflffl{zfflffl} |fflfflfflffl{zfflfflfflffl} |fflffl{zfflffl} 1m

1m

mm

^ between Dhða^ Þ the gradient vector of first partial derivatives of h with respect to a evaluated at a, Dgðc^ Þ the gradient of g with respect to c evaluated at c^ , and Dfða^ Þ ¼ @i =@j the m m Jacobian ^ Using the chain rule again, matrix of first partial derivatives of f with respect to a evaluated at a. Magnus and Neudecker19 (p. 110) prove that 10 0 1 1 0 0 CB C C B B C B B 0 1C 0 1 !C ! B ! CB C C B B CB C C B B B @ c^ AC D f a^ B @ c^ A Im C H f a^ H h |{z} a^ ¼B a^ C |{z} þ BD g |{z} |{z} C BH g |{z} C BD f |{z} |{z}C C C C B B B m1 m1 mm C m1 C m1 C B |fflfflfflfflffl{zfflfflfflfflffl} B |fflfflfflfflfflm1 B |fflfflfflfflffl{zfflfflfflfflffl} |fflfflfflfflffl{zfflfflfflfflffl} {zfflfflfflffl ffl } |fflfflfflffl ffl {zfflfflfflffl ffl } |fflfflfflffl fflm1 {zfflfflfflfflffl} A@ A A @ @ 11 m1 m1 m1 11 11 |fflfflfflfflfflfflffl{zfflfflfflfflfflfflffl} |fflfflfflfflfflfflffl{zfflfflfflfflfflfflffl} |fflfflfflfflfflffl ffl {zfflfflfflfflfflffl ffl } |fflfflfflfflfflfflffl{zfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} mm mm mm m2 m mm 1xm |fflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl ffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} mm mm2 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl ffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} !

mm

mm

where Hhða^ Þ is the m m Hessian (matrix of second partial derivatives) of the scalar h with respect ^ Hgðc^ Þis the m m Hessian of the scalar g with respect to c and 0 to a and a0 evaluated at a,



10


evaluated at c^ , and Hfða^ Þ is the m2 m Hessian matrix of second partial derivatives of the vector f ^ dimensions are included for clarity. Then if the likelihood is with respect to a and a0 evaluated at a; nearly maximized such that the gradient vector Dgðc^ Þ ffi 0 1 ðHhða^ ÞÞ1 ¼ ðDfða^ ÞÞ0 ðHgðc^ ÞÞDfða^ Þ þ ðDgðc^ Þ In ÞHfða^ Þ 1 ffi ðDfða^ ÞÞ0 ðHgðc^ ÞÞDfða^ Þ 1 ¼ ðDfða^ ÞÞ1 ðHgðc^ ÞÞ1 ðDfða^ ÞÞ0 ¼ Df1 ðc^ Þ Vðc^ Þ Df1 ðc^ Þ recalling that the Jacobian of a transformation Df1 ðc^ Þ is equal to the inverse of the Jacobian of 1 1 the inverse transformation ðDfða^ ÞÞ . Hence, ðHhða^ ÞÞ , the inverse Hessian approximation of the covariance matrix of a^ and Df1 ðc^ Þ Vðc^ Þ Df1 ðc^ Þ , the delta method approximation of the ^ are equal when the gradient vector Dgðc^ Þ ¼ 0 (when the likelihood is covariance matrix of a, maximized), and approximately equal when the gradient is small (when the likelihood is nearly Þ such that ^ j estimates a maximized). In particular, if we choose a reparameterization a^ ¼ f1 ð^ correlation or regression coefficient of interest, then the two standard errors of ^ j , the square roots of the jth diagonal elements of the two covariance matrices, will be approximately equal.

5.3

LRTs of association using reparameterized models

The t-tests using inverse Hessian or delta method standard errors are Wald-type tests. Fears et al.20 note that even though the Wald test is convenient to compute from standard computer output, it is a large sample procedure with a value dependent on the parameterization and it has been shown to have poor power in certain situations.21–23 After studying their behavior for hypothesis testing in exponential families, Vaeth23 concludes that Wald tests should be used with caution in discrete probability models. There are several reasons to prefer LRTs to Wald tests. Unlike Wald tests, LRTs are invariant for any monotonic transformation of the parameters. Whereas LRTs require only the approximate chisquare distribution of the test statistic, Wald tests require additional approximations about the standard error. Reparameterizing the model to directly estimate the coefficient of interest enables a LRT of that coefficient. The model can be run again, constraining the coefficient to be 0 and the difference between these models’ –2 log likelihoods provides the approximate LRT for the significance of the coefficient.

5.4

Fisher’s Z transformation

A common test of significance for correlation and partial correlation coefficients between two u’s adjusted for k other u’s where the u’s have a multivariate normal distribution is Fisher’s z where 1þ 1 ln 1 2 z ¼ qffiffiffiffiffiffiffiffiffiffiffi 1 nk3

has a standard normal distribution under the null hypothesis. Here the u’s are latent (unobserved), but z can still be computed from an observed estimate of r. The standard error depends only on the sample size n without delta method or inverse Hessian approximations.




11

Table 1. Estimates from the bivariate random effect model of marijuana use (mar) and alcohol use (alc). Estimate (InvH SE)

tInvH (p > jtj)

Overdispersion k Mar intercept (int) b1 Mar slope b2 Alc int b3 Alc slope b4 Variances (Var) and Covariances (Cov) Var mar int d11 Cov mar int/slope d12 Var mar slope d22 Cov mar int/alc int d13 Cov mar slope/alc int d23 Var alc int d33 Cov mar int/alc slope d14 Cov mar slope/alc slope d24 Cov alc int/alc slope d34 Var alc slope d44

0.87 (0.037) –0.21 (0.23) –0.0095 (0.0030) –2.86 (0.32) –0.0019 (0.0036)

t ¼ 23.5 (< 0.0001) t ¼ –0.90 (0.37) t ¼ –3.19 (0.0023) t ¼ –9.04 (< 0.0001) t ¼ –0.52 (0.61)

3.12 (0.72) –0.76 (0.64) 4.43 (1.02) 1.15 (0.61) –0.27 (0.69) 4.84 (1.26) –0.09 (0.65) 1.018 (0.73) –3.18 (1.14) 4.64 (1.26)

t ¼ 4.34 ( jtj)

LRT (p > 2)

Fisher’s Z (p > jzj)

13 24 243 2413 4j2 4j23 4j213

0.30 0.22 0.25 0.31 0.23 0.19 0.24

t ¼ 2.17 t ¼ 1.47 t ¼ 1.66 t ¼ 2.15 t ¼ 1.42 t ¼ 1.61 t ¼ 2.03

0.30 0.23 0.25 0.31 0.23 0.19 0.24

t ¼ 2.17 (0.03) t ¼ 1.48 (0.15) t ¼ 1.65 (0.10) t ¼ 2.14 (0.036) t ¼ 1.4 (0.16) t ¼ 1.60 (0.11) t ¼ 2.02 (0.048)

4.23 (0.040) 2.010 (0.16) 2.54 (0.11) 4.10 (0.043) 2.010 (0.16) 2.54 (0.11) 4.10 (0.043)

2.40 1.81 1.99 2.51

(0.14) (0.16) (0.15) (0.15) (0.16) (0.12) (0.12)

(0.03) (0.15) (0.10) (0.036) (0.16) (0.11) (0.047)

(0.14) (0.15) (0.15) (0.15) (0.16) (0.12) (0.12)

(0.016) (0.070) (0.046) (0.012)

a

1 ¼ marijuana use intercept, 2 ¼ marijuana use slope, 3 ¼ alcohol use intercept, 4 ¼ alcohol use slope.

6.3

Partial correlation and regression coefficients: Estimates and evaluating significance

Partial correlation coefficients and partial regression coefficients that are of medical interest from the example are reported in Table 3. Estimates and delta method standard errors in column 2 and t statistics and p-values in column 3 are computed as nonlinear functions of covariance parameters (elements of D). Estimates and inverse Hessian standard errors in column 4 and t statistics and p-values in column 5 are estimated directly as parameters in the reparameterized models. LRTs computed as the difference in –2 log likelihoods and their p-values are reported in column 6. Fisher’s z and its p-value are reported in column 7. For this example, Fisher’s z transformation always results in smaller p-values. As expected the pvalues for LRTs for correlation and regression coefficients with the same numerators (i.e. 24 and 4j2 ; 243 and 4j23 ; 2413 and 4j213 ) are always identical, whereas the p-values for Wald-type t-tests (using either delta method standard errors or inverse Hessian standard errors) are not. We also note that delta method standard errors and inverse Hessian-based standard errors from the reparameterized models are nearly identical, as are their p-values, which led us to investigate their mathematical relationship in Section 5.2. Although not addressed here, approximate confidence intervals for the correlation and regression coefficients could be easily constructed from the methods used here or with bootstrap intervals. Unadjusted, the simple correlation between stochastic slopes for marijuana and alcohol use is nonsignificant (p > 0.05) by all methods; adjusting for the alcohol intercept, slopes are minimally correlated (~ 243 ¼ 0.25, significant only after Fisher’s z transformation p ¼ 0.046). Stochastic slopes are significantly correlated by all methods after adjusting for both intercepts (~ 2413 ¼ 0.31), indicating that rates of change in alcohol and marijuana use are correlated after adjusting for baseline (i.e. pretreatment) use of both drugs. Similarly, the regression coefficient of latent alcohol slopes on latent marijuana slopes (~b4 jb2 ¼ 0.23) increases slightly and becomes significant by available methods after adjusting for baseline use of both drugs (~b4 jb2 ¼ 0.24). Data and annotated code to reproduce results in this section are included in online supplementary material. For illustrative purposes we kept this example simple with linear fixed and random effects and no covariates. Supplementary material also includes results from a more complex model



14

Statistical Methods in Medical Research 0(0) Table 4. Probability of type I error for null hypothesis 24 ¼ 0 and power for alternative hypotheses 24 ¼ 0.22 and 24 ¼ 0.60 by test. Test

24 ¼ 0.00; aNsim ¼ 1000 Type I error (%)

24 ¼ 0.22; aNsim ¼ 1000 Power (%)

24 ¼ 0.60; aNsim ¼ 991 Power (%)

tDelta tInvH Fisher’s z LRT

6.8 6.8 7.4 5.6

33.9 33.7 51.7 31.4

97.8 97.7 99.8 97.4

a Nsim ¼ number of simulated datasets where models converged with a positive definite Hessian matrix and without errors regarding optimization.

including significant higher order terms in time (up to cubic) for fixed effects in combination with intercept and linear random effects and evaluating inclusion of potential clinical covariates. Clinical conclusions are similar to those from the simpler model.

7 Size and power of tests of correlation coefficients We conducted limited simulations based on scenarios similar to the example to evaluate type I error and power of the different approaches for testing a correlation between two stochastic parameters from a multivariate generalized linear mixed model. Simulations were based on a sample size of n ¼ 80, each with 100 days of observations. Values for the two outcomes and the subject-specific u vectors were generated using parameter estimates from the medical example, i.e. estimates from Table 1 including a specified correlation between slopes of 24 ¼ 0.00, 24 ¼ 0.22 (the observed ^24 ), and 24 ¼ 0.60; 1000 datasets were simulated for each condition. Results from analyses of the simulated datasets agree with those for the example in the near equivalence between the delta method and inverse Hessian method standard errors for these models and show that they consistently overestimate the standard error of the association. Details of the methodology used and the specific results are included in supplementary materials. Table 4 shows type 1 error rate and power for testing 24 ¼ 0 via each test. The LRT was the only test with a type I error near 5% (5.6%) and thus the only test where power can be reasonably assessed. The LRT had 31.4% power to detect a correlation between stochastic parameters of the magnitude observed in the medical example (24 ¼ 0.22) and 97.4% power to detect a larger 24 ¼ 0.60.

8 Discussion Others have used the multivariate generalized linear mixed model to estimate simple correlations between random subject effects (e.g. subject-specific slopes) using Wald-type tests with delta method standard errors to test significance.7,9–13 We consider an example where both outcomes are nonnormal and more importantly, we extend the association options to include partial correlation coefficients, regression coefficients, partial regression coefficients, and to determine time of maximum correlation. Reparameterizing the models to directly estimate association coefficients leads to two contributions. First, we have demonstrated that standard errors based on delta method approximation and standard errors based on the inverse of the Hessian after reparameterization




15

are nearly equivalent when gradient vectors are near zero, raising the specter that inverse Hessian approximations may be no better than their delta method counterparts and corresponding Waldtype tests. Second, reparameterization provides a LRT as an alternative to Wald-type tests of association. These models require some (but not unreasonable) finesse in fitting. To analyze the example, we first fit separate models to marijuana use and alcohol use in NLMIXED using the default method of adaptive Gaussian quadrature with specified negative binomial and binomial likelihoods, respectively. For both outcomes, we divide time (days) by 100. In joint models, we use rounded starting estimates from separate models as initial values and the general option in NLMIXED to specify different log-likelihoods of the outcomes conditional on the random effects. Complete independence is assumed in the initial joint model by setting all covariance terms between outcomes equal to zero; we then fit the more complex model allowing for partial dependence (conditional independence) by estimating the four between subject covariances between outcomes. It is difficult to fit models with complete dependence except for the case where all distributions are normal; Zucker et al.3 and Fieuws and Verbeke7 estimate within subject covariances between outcomes for the normal case. To fit reparameterized models that directly estimate a correlation or regression coefficient, we use rounded starting estimates computed from prior runs that estimated the covariance matrix D. In general, computational difficulties can sometimes result from starting with initial values that are too close to the final estimated parameter. In their discussion of pitfalls of joint modeling, Fieuws and Verbeke7 distinguish between two situations for joint modeling: (1) where the primary interest is in the fixed effects parameters versus (2) where the covariance parameters are of interest as is our case in determining associations between them. In this second situation, more scrutiny of the correctness of the covariance structure is necessary because the interpretation of a covariance parameter depends on the set of other covariance parameters in the model. Our experience (and the more complex example in the supplement) suggests that including more predictors, nonlinear functions, etc. in the fixed component of a generalized linear mixed model does not hamper convergence and often improves fit, but expanding the random effects beyond a four-dimensional covariance matrix leads to computational difficulties discussed next. Theoretically, evaluating associations among more complex curves with higher order terms as random effects is possible with our proposed methodology, but we struggle with the clinical meaning of associations in the higher order terms and with computational difficulties. An important limitation to successful model convergence is the number of random effects. Like others,11,12,24 our experience suggests that model fitting becomes slow and intractable beyond four-dimensional covariance matrices of random effects when using adaptive Gaussian-quadrature implemented in widely available software such as NLMIXED. Fieuws et al.6,8 describe a solution for when the high dimensionality arises from increasing the number of outcome variables. Their method first fits bivariate mixed models to all possible pairs of outcomes and averages across shared covariance parameters to provide estimates for the full multivariate model; inference follows from pseudolikelihood arguments. This pairwise fitting does not solve the problem for when the number of random effects per outcome increases. Simulation methods based on the EM algorithm have been used successfully in other applications involving high dimensionality11–13 but can be very computationally intensive.13 Gueorguieva and Sanacora12 recommend considering alternatives based on analytical approximations to the marginal likelihood for generalized linear mixed models25 or a Bayesian paradigm for parameter estimation in higher dimensions. Others have used LRTs in the context of multivariate generalized linear mixed models to test subsets of D, i.e. test for covariance between outcomes evaluating whether outcomes are completely



16


independent versus conditionally independent, and to test the assumption of conditional independence for the special case of normal outcomes (e.g. Fieuws and Verbeke7). By introducing model reparameterizations, we extend the ability to apply LRTs to more complex functions of D such as partial correlation and partial regression coefficients. One referee suggested noting that the tests of association described here are not subject to the problems of testing a parameter being equal to a value on the boundary of its space (e.g. testing a variance equals zero) discussed by others.26 Our simulation study suggests that only the LRT has an acceptable type I error rate near the nominal 5%, with the other methods being inflated, and the LRT is therefore recommended for testing associations between stochastic parameters. Funding This work was supported in part by the National Institute of Drug Abuse (NIDA) (grant numbers R01DA034604, R01DA034604-02S1).

REFERENCES 1. Grizzle JE and Allen DM. Analysis of growth and dose response curves. Biometrics 1969; 25: 357–381. 2. Schluchter MD. Estimating correlation between alternative measures of disease progression in a longitudinal study. Modification of Diet in Renal Disease Study. Stat Med 1990; 9: 1175–1188. 3. Zucker DM, Zerbe GO and Wu MC. Inference for the association between coefficients in a multivariate growth curve model. Biometrics 1995; 51: 413–424. 4. Reinsel G. Estimation and prediction in a multivariate random-effects generalized linear model. J Am Stat Assoc 1984; 79: 406–414. 5. Marshall G, De la Cruz-Mesia R, Baron AE, et al. Nonlinear random effects model for multivariate responses with missing data. Stat Med 2006; 25: 2817–2830. 6. Fieuws S, Verbeke G, Boen F, et al. High-dimensional multivariate mixed models for binary questionnaire data. Appl Stat 2006; 55: 1–12. 7. Fieuws S and Verbeke G. Joint modelling of multivariate longitudinal profiles: pitfalls of the random-effects approach. Stat Med 2004; 23: 3093–3104. 8. Fieuws S and Verbeke G. Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics 2006; 62: 424–431. 9. Fieuws S, Verbeke G and Molenberghs G. Random-effects models for multivariate repeated measures. Stat Methods Med Res 2007; 16: 387–397. 10. Fitzmaurice GM. Longitudinal data analysis. Boca Raton: CRC Press, 2009, p.xiv, 618p. 11. Gueorguieva R. A multivariate generalized linear mixed model for joint modeling of clustered outcomes in the exponential family. Stat Model 2001; 1: 177–193. 12. Gueorguieva RV and Sanacora G. Joint analysis of repeatedly observed continuous and ordinal measures of disease severity. Stat Med 2006; 25: 1307–1322. 13. Gueorguieva RV and Agresti A. A correlated probit model for joint modeling of clustered binary and continuous responses. J Am Stat Assoc 2001; 96: 1102–1112.

14. SAS Institute Inc. The NLMIXED Procedure, Chapter 70 in SAS/STATÕ 13.2 User’s Guide. Cary, NC: SAS Institute Inc., 2014. 15. McCulloch CE and Searle SR. Generalized, linear, and mixed models. New York: John Wiley & Sons, 2001. 16. Riggs PD, Winhusen T, Davies RD, et al. Randomized controlled trial of osmotic-release methylphenidate with cognitive-behavioral therapy in adolescents with attentiondeficit/hyperactivity disorder and substance use disorders. J Am Acad Child Adolesc Psychiatry 2011; 50: 903–914. 17. Hayaki J, Hagerty CE, Herman DS, et al. Expectancies and marijuana use frequency and severity among young females. Addict Behav 2010; 35: 995–1000. 18. Afifi AA and Azen SP. Statistical analysis: a computer oriented approach. New York: Academic Press, 1972, p.xviii, 366p. 19. Magnus JR and Neudecker H. Matrix differential calculus with applications in statistics and economics. New York: John Wiley & Sons, 1990. 20. Fears TR, Benichou J and Gail MH. A reminder of the fallibility of the Wald statistic. Am Stat 1996; 50: 226–227. 21. Hauck Jr WWJr and Donner A. Wald’s test as applied to hypotheses in logit analysis. J Am Stat Assoc 1977; 72: 851–853. 22. Storer BE, Wacholder S and Breslow NE. Maximum likelihood fitting of general risk models to stratified data. J R Stat Soc 1983; 32: 171–181. 23. Vaeth M. On the use of Wald’s test in exponential families. Int Stat Rev 1985; 53: 199–214. 24. Diggle PJ, Heagerty P, Liang KY, et al. Analysis of longitudinal data. Oxford: Clarendon, 2002. 25. Wolfinger R and O’Connell M. Generalized linear mixed models: a pseudo-likelihood approach. J Stat Comput Simulat 1993; 43: 233–243. 26. Molenberghs G and Verbeke G. Likelihood ratio, score, and Wald tests in a constrained parameter space. Am Stat 2007; 61: 22–27.


Multiple testing correction in linear mixed models.

Use of generalized linear mixed models for network meta-analysis.

Linear mixed models for multiple outcomes using extended multivariate skew-t distributions.

The use of generalized linear models and generalized estimating equations in bioarchaeological studies.

Multivariate longitudinal data analysis with mixed effects hidden Markov models.

SNP_NLMM: A SAS Macro to Implement a Flexible Random Effects Density for Generalized Linear and Nonlinear Mixed Models.

Mixed-Effects Shape Models for Estimating Longitudinal Changes in Anatomy.

Log-gamma linear-mixed effects models for multiple outcomes with application to a longitudinal glaucoma study.

Predicting the multi-domain progression of Parkinson's disease: a Bayesian multivariate generalized linear mixed-effect model.

Estimating cumulative pathway effects on risk for age-related macular degeneration using mixed linear models.

Meaningful aspects of change as novel random coefficients: a general method for reparameterizing longitudinal models.

Linear mixed function-on-function regression models.

Estimate of influenza cases using generalized linear, additive and mixed models.

Estimating polygenic models for multivariate data on large pedigrees.

Penalized multivariate linear mixed model for longitudinal genome-wide association studies.

The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded.

Power analysis for generalized linear mixed models in ecology and evolution.

Flexible analysis of digital PCR experiments using generalized linear mixed models.

Measuring the individual benefit of a medical or behavioral treatment using generalized linear mixed-effects models.

Detection of genomic loci associated with environmental variables using generalized linear mixed models.

A re-formulation of generalized linear mixed models to fit family data in genetic association studies.

Generalized linear mixed models for multi-reader multi-case studies of diagnostic tests.

Adaptive Fitting of Linear Mixed-Effects Models with Correlated Random-effects.

A generalized K statistic for estimating phylogenetic signal from shape and other high-dimensional multivariate data.