Journal of Clinical Epidemiology 68 (2015) 1165e1175

Analytical results in longitudinal studies depended on target of inference and assumed mechanism of attrition Mark Jones*, Gita D. Mishra, Annette Dobson School of Public Health, University of Queensland, Public Health Building, Herston Road, Herston, Brisbane, Qld 4006, Australia Accepted 18 March 2015; Published online 31 March 2015

Abstract Objectives: To compare methods for analysis of longitudinal studies with missing data due to participant dropout and follow-up truncated by death. Study Design and Setting: We analyzed physical functioning in an Australian longitudinal study of elderly women where the missing data mechanism could either be missing at random (MAR) or missing not at random (MNAR). We assumed either an immortal cohort where deceased participants are implicitly included after death or a mortal cohort where the target of inference is surviving participants at each survey wave. To illustrate the methods a covariate was included. Simulation was used to assess the effect of the assumptions. Results: Ignoring attrition or restricting analysis to participants with complete follow up led to biased estimates. Linear mixed model was appropriate for an immortal cohort under MAR but not MNAR. Linear increment model and joint modeling of longitudinal outcome and time to death were the most robust to MNAR. For a mortal cohort, inverse probability weighting and multiple imputation could be used, but care is needed in specifying dropout and imputation models, respectively. Conclusion: Appropriate analysis methodology to deal with attrition in longitudinal studies depends on the target of inference and the missing data mechanism. Ó 2015 Elsevier Inc. All rights reserved. Keywords: Missing data; Longitudinal study; Mortal cohort; Dropout; Simulation study; Attrition

1. Introduction The prevention and treatment of missing data in clinical trials was discussed recently in the New England Journal of Medicine [1], and an accompanying article suggests the issues raised also apply to observational studies [2]. The recommendations were that missing data should not be ignored; methods such as complete case analysis or single imputation should only be used in the minority of cases where a simple approach is justified; and model-based methods or methods that include appropriate weighting are generally preferred. These high profile publications

Funding: The Australian Longitudinal Study on Women’s Health is funded by the Australian Department of Health. M.J. is funded by the Australian National Health and Medical Research Council (APP1000986) and G.D.M. is funded by an Australian Research Council Future Fellowship. The funding sources had no involvement in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication. Conflict of interest: None. * Corresponding author. Tel.: (61) 7-3365-5116; fax: (61) 7-3365-5540. E-mail address: [email protected] (M. Jones). http://dx.doi.org/10.1016/j.jclinepi.2015.03.011 0895-4356/Ó 2015 Elsevier Inc. All rights reserved.

reflect the increasing awareness of the importance of dealing appropriately with missing data in health research studies. Longitudinal studies where participants are repeatedly measured over time are essential for estimating changes in health-related variables in populations because they estimate within-cohort change which is not possible with repeated cross-sectional studies. Within-cohort change may represent the natural progression of health in a population and provide valuable information for clinical management of patients and governmental policy making. For these purposes, it is critical that information provided is accurate and representative of the population of interest. In this regard, a significant limitation of many longitudinal studies is that a proportion of the participants fail to provide data at all waves of data collection. Nonresponse may be transitory, and the participant later returns to the study or it may be due to participant dropout or death, in which case further response is not provided. This latter type of nonresponse is the focus of this article. Participant dropout leads to missing data, whereas participant death results in truncation of follow-up. This

1166

M. Jones et al. / Journal of Clinical Epidemiology 68 (2015) 1165e1175

What is new?  Researchers may not be aware that for some methods of analysis, such as likelihood-based methods, deceased participants are implicitly included after death (i.e., cohorts are immortal).  Using a longitudinal study of 12,432 Australian women aged over 70, we illustrated that appropriate methods for dealing with attrition depend on the target of inference and the missing data mechanism.  Linear-mixed models are appropriate for data missing at random for an immortal cohort but not for a mortal cohort, whereas multiple imputation and inverse probability weighting can be used for a mortal cohort.  Linear increment model and joint modeling of longitudinal outcome and time to death are both appropriate under missing not at random (MNAR) and an immortal cohort, but more research is needed for methods robust to MNAR for a mortal cohort.

situation is particularly problematic if the participants who drop out or die are systematically different with respect to the outcomes of interest from the participants who remain. Common sense would suggest that participants who drop out or die are likely to have poorer health on average than participants who remain in the study. Therefore, ignoring the potential impact of this type of missing data is likely to lead to biased estimates, especially at later stages of the study where greater proportions of participants have either dropped out or died. In addition, the direction of the bias is likely to be toward showing better health over time than is true. There is an important philosophical aspect of nonresponse due to death. How should deaths be taken into account when estimating mean levels of health-related variables over time? For example, how is a dead person’s quality of life to be represented on a quality of life scale? Is their quality of life zero? Should their quality of life even be estimated in an analysis after they have died? If not then how should deaths be handled in the analysis? Clearly, there are no simple answers to these questions [3]. Moreover, the way in which deaths are handled in the analysis will depend on the study aims [4]. For example, are we describing the population as defined at the initiation of the study? Or are we describing the population as it existed at each stage of the study? The former scenario implies an immortal population and a cohort where participants who die continue to be implicitly included after death, whereas the latter scenario implies a mortal population where

deceased participants are excluded after death leading to a cohort of diminishing size over time. Alternatively, we could report estimates separately for dropouts, deaths, and those participants who remain in the study. Missing data can be classified into one of three types according to Rubin and Little [5]. Missing completely at random (MCAR) refers to situations where the missing data are a random subset of the total data and do not depend on observed or unobserved measurements. In this scenario, an analysis that ignores the missing data will lead to unbiased estimates and valid inference. Missing at random (MAR) occurs when the missing data are not a random sample of the total data; however, given the observed data, the missingness mechanism does not depend on the unobserved data. In this case, there is sufficient information in the data collected to enable unbiased estimates and valid inference provided an appropriate analysis such as a likelihoodbased method is performed. Missing not at random (MNAR) is where the missing data are systematically different from the nonmissing data, and even after accounting for the observed information, the reason for the missing data still depends on the unobserved observations. In this situation, there is insufficient information contained in the observed data to ensure unbiased estimates and valid inference. There have been many analytical methods proposed for dealing with missing data due to participant dropout or death in longitudinal studies [6e8]. These methods include: joint models [9,10], inverse probability weighting (IPW) [11,12], multiple imputation (MI) [13e15], and linear increment (LI) models [16,17]. All these methods rely on specific assumptions being met to ensure valid inference. Although these assumptions often cannot be formally tested, their plausibility can be carefully considered in the context of the study, the data collected, and the aims of the analysis. In addition, sensitivity analyses and/or simulation studies can be performed to determine the robustness of the conclusions obtained. The aim of this article was to illustrate and compare a number of methods of analysis for longitudinal studies with significant participant dropout or death. In Section 2, we describe the example data used for the analysis. The methods of analysis are then described in Section 3 (with the more technical details included in an Appendix at www.jclinepi.com). Simulation studies are described in Section 4; analysis results presented in Section 5, and we conclude with a discussion of the results in Section 6 where we also provide some general guidelines for presenting results of longitudinal studies with dropout and death.

2. Example data Data were obtained from the Australian Longitudinal Study on Women’s Health (ALSWH) [18]. The ALSWH began in 1996 when around 40,000 adult women in three age group cohorts were recruited by randomly sampling

M. Jones et al. / Journal of Clinical Epidemiology 68 (2015) 1165e1175

the nationally representative Medicare database. The study was approved by Ethics Committees at the University of Queensland and University of Newcastle. We use data from the older cohort of 12,432 participants born between 1921 and 1926. These women have been surveyed at approximately 3-year intervals resulting in five waves of data by 2008. At that time, 28% had died and a further 27% had failed to respond to the survey. Failure to respond could have been due to participant withdrawal (17%), loss of contact (5%), or participant failure to return the survey (5%). Occasionally, participants returned to the study after previously not responding; however, this occurred rarely; hence, for the purposes of the analysis was ignored. A full breakdown of participant status by survey wave is provided in Supplementary Table S1 on the journal’s Web site at www.elsevier.com. The outcome of interest is SF36 (http://www.rand.org/health/surveys_tools/mos/mos_core_ 36item.html) physical functioning (PF) which is expected to change over time and be at high risk of bias if missing data are ignored. PF is a continuous measure, which can range from 0 to 100, where lower values indicate poorer health. In addition to the main analysis where we aimed to estimate PF at each wave, we also illustrate fitting a binary covariate, in this case highest educational achievement for each participant categorized as ‘‘did not complete high school’’ (73% of participants) or ‘‘completed high school or higher’’ (27% of participants).

3. Methods for comparison The methods compared are described below. Methods 1 and 2 are often used in practice but would only be valid in a minority of cases. They have been included to illustrate the potential problem with using simplistic methods. Methods 3e5 assume an immortal cohort where responses are implicitly included for missing data after death. This may not be an appropriate assumption in many cases. Therefore, we have also included two methods that assume a mortal cohort (Methods 6 and 7). In the case of a mortal cohort, we are referring to the participants alive at each wave. Methods 3e6 all involve fitting regression models and the more technical details on these methods including model equations and SAS code are available in the Appendix at www.jclinepi.com where we also provide details on fitting the covariate. 3.1. Method 1 Ignore the missing data and calculate the arithmetic mean of the observed data available at each wave. This method is only appropriate if missing data are MCAR. 3.2. Method 2 Calculate the arithmetic mean of the observed data at each wave only for the subset of participants with complete

1167

data for all five waves (completers). This method is also generally only appropriate under MCAR. However, there are other mechanisms where complete case analysis may be appropriate [19]. 3.3. Method 3 Fit a linear mixed model (LMM) with PF as the response and wave as a covariate. In the example, a random intercept was fitted for each participant and wave was included as a categorical variable because PF did not change linearly over time. In fact, the change in PF was approximately linear between waves 2 and 5, but there was a notably smaller change between waves 1 and 2. This is a likelihood-based method hence is appropriate under MCAR or MAR, and it assumes an immortal cohort because participants are implicitly included after death. 3.4. Method 4 Fit a linear increment (LI) model that involves fitting separate linear models of the difference in PF at each successive wave and then cumulating the estimates of PF at wave 1 and difference in PF at successive pairs of waves 2 to 5. Assumptions for the appropriateness of this method are:  The target for inference is the hypothetical response for each participant (i.e., the response that would have been observed if the participant had continued in the study). Hence, this method assumes an immortal cohort.  Any unexplained variability (in PF) exhibits stability before dropout that is maintained beyond each dropout time by the remaining participants.  The observed increments remain representative of the original sample. The LI method was initially published by Diggle et al. [16]; however, we have based the implementation used here on the model-based version of Farewell [20] so that we could include a covariate in the analysis. This implementation uses generalized estimating equations where LIs are incorporated via a user-defined working correlation matrix appropriate for the five survey waves. 3.5. Method 5 Fit joint regression models (JM) for longitudinal outcome (PF) and time to death with shared normally distributed random effects fitted to each model. The model for PF is a LMM with a function of observation time in years fitted as an explanatory continuous variable and random intercepts and slopes fitted for each participant. Survival time is modeled using a distribution such as Weibull with random frailty terms. This method assumes an immortal cohort and is valid under MNAR providing that conditional on the joint random effects, the dependent

1168

M. Jones et al. / Journal of Clinical Epidemiology 68 (2015) 1165e1175

Use multiple imputation (MI) to impute the missing PF at each wave. To make this method appropriate for a mortal cohort, we set all the imputed values to missing for the deceased participants after death. We based our imputation on a multivariate normal distribution for PF over time using Markov Chain Monte Carlo specification available in SAS Proc MI. MI is known to be appropriate under MAR provided the imputation model is well specified. Because MI provides complete data (albeit for the mortal cohort in our example), we could simply estimate means and standard deviations at each wave. We imputed 10 complete data sets for the mortal cohort, which we combined using the rules by Rubin [23]. For a detailed breakdown of the pattern of missing PF over the five survey waves see Supplementary Table S2 on the journal’s Web site at www.elsevier.com.

using the SAS procedure Proc Surveyselect. We specified simple random samples without replacement. To generate MAR data, we specified logistic models to estimate probabilities of death and dropout based on the explanatory variables: previous PF, previous MH score, and smoking status (see Appendix at www.jclinepi.com for model equations). These variables were chosen because they were the most predictive of dropout and death at the fifth wave. We used model equations based on logistic models for the participants with data at the fourth wave for predicting death and dropout at the fifth wave. However, we applied these models to waves 2 to 5. Using the predicted probabilities for each participant, we generated Bernoulli random variables using the ‘‘rand’’ function in SAS where 1 indicated dropped out (or died) and 0 indicated did not drop out (or die). We assumed participants who ‘‘dropped out’’ did not return at later waves. Fifty simulations of sample size, 1,000 were run for each scenario to ensure a 95% confidence interval (CI) for the mean of average PF of 60.5 units based on a standard deviation of 1.8. Results are reported in terms of mean bias and root mean square error (RSME) where bias is estimated as the difference from the ‘‘true’’ value (i.e., the value based on the complete observed data before generating missing values). An estimated mean bias of greater than 1 over the 50 simulations indicates a systematic difference from the true value. To generate MNAR missing data, we specified logistic models to estimate probabilities of death and dropout based on the explanatory variables PF and MH (see Appendix at www.jclinepi.com for model equations). This mechanism is MNAR because the probability of dropout or death is based on the current values of PF and MH; hence, if the Bernoulli random variable indicates participant dropout or death, then the current value for PF is unobserved. For inverse probability weighting, we initially specified a ‘‘correct’’ model (where all the variables determining the dropout mechanism are included as covariates and specified appropriately in the model) for estimating probability of response. However, as it is generally unknown whether a correct model has been specified, we also specified an incorrect model where one of the covariates (PF) was not included in the logistic model.

4. Simulation studies

5. Analysis results

We conducted simulation studies to evaluate each of the analytical methods based on MAR and MNAR missing data mechanisms. However, we did not include complete case analysis (method 2) in the evaluation because the direction of bias is clear from the analysis results alone. For the simulation studies, we identified the participants with complete data for all five waves for the variables of interest. This resulted in a total of 3,799 participants. For each simulation, we randomly chose a sample of 1,000 participants

Fig. 1 shows the decline in PF stratified by completers, dropouts, and deaths. As expected, the dropouts and participants who died had poorer PF than completers at each wave. Trends over time are similar for the groups; however, there appears to be a slightly steeper decline for the participants between the two surveys before death. Other notable features are that dropouts had higher PF than the participants who died and PF was worst for the participants who died earlier in the study. This trend was not apparent

variables are independent (e.g., in our case, PF is independent of time to death). 3.6. Method 6 Fit a marginal regression model with inverse probability weighting (IPW) where the weights are estimated using a logistic model. Covariates to be potentially included in the logistic model were based on those shown to be associated with probability of dropout in the ALSWH [21] as well as previous values of PF and mental health (MH). Estimation of weights was based on the method illustrated by Dufouil et al. [12] and Kurland and Heagerty [22] where the weights are based on probability of remaining in the study conditional on responding to the previous survey and surviving to the current survey. This method is partly conditional, and the target for inference is the mortal cohort (i.e., participants alive at each wave). As recommended by Kurland et al., we used generalized estimating equations with an independent working correlation structure for modeling PF to ensure that an immortal cohort is not implicitly implied. This is the only published method we could identify where the target for inference is based on the mortal cohort. Inverse probability weighting is appropriate under MAR providing a correct dropout model has been specified. 3.7. Method 7

M. Jones et al. / Journal of Clinical Epidemiology 68 (2015) 1165e1175

1169

Fig. 1. Physical functioning by wave for completers, dropouts, and deaths. DeathsXY refers to participants who died between waves X and Y. Similarly DropoutXY refers to participants who dropped out between waves X and Y.

Fig. 3. Physical functioning by wave based on various methods of analysis. MI, multiple imputation; LMM, linear mixed model; IPW, inverse probability weighting; LI, linear increment model.

for dropouts and timing of dropout appeared to make little difference. To investigate further the decline in PF before death, we plotted PF by years before death and estimated a smoothed average. The results shown in Fig. 2 suggest PF declines rapidly within 6 months of death. As time between survey waves is 3 years on average, this finding suggests the ‘‘dropout’’ mechanism for deaths is unlikely to be MAR and an assumption of MNAR appears more plausible. The results of the various methods for estimating PF across the five waves are presented in Fig. 3 (and Supplementary Table S3 on the journal’s Web site at www. elsevier.com). For method 1, where missing data are ignored, estimates show lesser decline in PF over waves compared with the other methods suggesting the participants who dropped out or died had lower PF. The results for method 2, where only completers are included, show a steeper decline in PF compared with method 1; however, the estimates at each survey are consistently higher than for the other methods. This confirms that the completers are a group with

higher PF compared with the participants who died or dropped out. Estimates based on LMM (method 3) show a similar decrease in PF to completers, but the estimates at each wave are much lower, with the value at wave 1 being consistent with the other methods (except method 2). The results for LI and JM show steeper declines in PF than for LMM with the greatest decline for JM. Estimates based on a mortal cohort (IPW and MI) showed the least decline of all the methods apart from ignoring the missing data. The results of analyses including the covariate are shown in Table 1. The estimates reported represent: the baseline PF for participants who did not complete high school (the reference group), the difference in PF for participants with higher educational achievement (compared with the reference group), the change in PF over a function of time for the reference group, and the difference in change in PF for the participants with higher educational achievement. The methods provide somewhat differing estimates; however, they all consistently show higher PF in participants with higher educational achievement by 4 to 5 units at baseline and little evidence of a difference in change of PF over time between groups. Using a joint model, time to death can be compared between participants who did not complete high school and participants with higher educational achievement. Results suggest longer time to death for participants with higher educational achievement by 17% (95% CI: 10%, 24%). This analysis illustrates that differential death rate in comparison groups of interest can be accounted for and reported using this methodology. Tables 2 and 3 show results of the simulation study for MAR where on average, 55% of participants either dropped out or died over the five waves. For MNAR, 57% of participants either dropped out or died over the five waves (Tables 4 and 5). The simulation results showed bias (and RMSE) tended to increase as the proportion of missing data increased. They

Fig. 2. Physical functioning by years before death for deceased patients.

1170

M. Jones et al. / Journal of Clinical Epidemiology 68 (2015) 1165e1175

Table 1. Effect of highest educational achievement on physical functioning (PF) by method of analysis

Method of analysis 1. 2. 3. 4. 5. 6. 7.

Ignore missing data Complete cases only Linear mixed model Linear increments Joint models Inverse probability weighting Multiple imputation a b

Difference in baseline PF between groups (standard error)

Reference groupa PF at baseline (standard error) 62.5 68.4 62.1 61.8 62.0 62.2 62.3

(0.28) (0.37) (0.28) (0.29) (0.27) (0.29) (0.28)

4.5 3.9 4.7 5.3 5.0 4.7 4.6

Reference group change in PF over a function of timeb (standard error) 0.39 0.53 0.53 0.58 0.66 0.46 0.44

(0.51) (0.63) (0.53) (0.53) (0.52) (0.53) (0.52)

(0.011) (0.010) (0.008) (0.011) (0.012) (0.013) (0.009)

Difference in change in PF between groups (standard error) 0.002 0.026 0.015 0.004 0.029 0.014 0.004

(0.020) (0.019) (0.014) (0.019) (0.018) (0.022) (0.017)

Reference group is participants who did not complete high school, and comparison group is participants who completed high school. The function of time that provided the best fit based on Akaike Information Criterion was time in years to the power of 1.5.

also suggest that for both MAR and MNAR, simply ignoring the missing data leads to significant bias in the estimates. Results for the immortal cohort analyses showed LMM performed adequately under MAR as expected but is biased under MNAR. However, bias is moderate and less than that if missing data are ignored. JM and LI showed the least bias under MNAR. However, these methods tended to overcompensate for the missing data under MAR although LI tracked the observed values reasonably closely until the last wave. Under a mortal cohort scenario, IPW performed adequately under MNAR provided a correct model for estimating probability of response was implemented. An incorrectly specified model for IPW led to some bias, but results were still clearly better than ignoring the missing data. Somewhat surprisingly IPW based on a correct model tended to slightly underestimate PF under MAR, whereas an incorrect dropout model performed adequately. However, the estimated bias for the correct model reduced for the final wave suggesting the underestimates at waves 3 and 4 may have been due to random fluctuation. MI performed satisfactorily under MAR as expected but also performed well under MNAR.

6. Discussion The results obtained from our comparative analysis and simulation study suggest that ignoring missing data leads to biased estimates of PF. In addition, the apparent bias increased over time as greater proportions of participants had either dropped out or died. Restricting the analysis to just the completers also produced biased estimates at all waves. Bias was in the expected direction, that is, the estimates suggest better average PF than was actually the case.

These results are consistent with the observed data that show that the participants who dropped out or died had lower values than the participants who remained. Informal comparison of the estimates obtained from the various methods should be conducted within the context of the target for inference (mortal or immortal population). If the target for inference is the immortal population where participants continue to be included implicitly after death, then LMM, LI, and JM are the relevant methods. The results for LI showed a steeper decline in PF compared with LMM, and JM showed an even steeper decline possibly because this method explicitly takes account of the rapid decline in PF 6 months before death. The differences in estimates between the three methods are up to 6 units at the final time point. This is clearly a clinically important difference; therefore, care is needed in deciding how the results are to be reported. Assuming an immortal cohort, the simulation study showed LMM is appropriate under MAR as expected but may not be under MNAR. Given that it is not possible to determine whether data are MAR or MNAR [24], it would be advisable to conduct primary analysis based on the most plausible dropout mechanism and appropriate method; as well as sensitivity analysis using a method that appears robust under alternative mechanism(s). The simulation study suggests JM and LI are the most appropriate for primary (MNAR) analysis of the example data and LMM could be used for sensitivity analysis (MAR). The LI method is an attractive option due to its simplicity compared with JM. Furthermore, JM may not be available in standard software packages. In SAS, we needed to use the Nlmixed procedure, which involved coding the likelihood function (see Appendix at www.jclinepi.com for details). In addition, the ensuing program took up to an hour to run or even longer

Table 2. Mean bias (root mean square error) physical functioning based on various methods of analysis for an immortal cohort and MAR (50 simulated samples of 1,000 participants) Method Ignore missing values Linear mixed model Linear increments Joint models

Wave 1 0 0 0 0.29

(0) (0) (0) (0.86)

Abbreviation: MAR, missing at random.

Wave 2 0.94 0.13 0.21 0.49

(1.01) (0.29) (0.93) (0.95)

Wave 3 2.02 0.32 0.20 2.26

(2.09) (0.48) (1.16) (2.48)

Wave 4 3.36 0.60 0.60 3.50

(3.43) (0.81) (1.19) (3.75)

Wave 5 4.22 0.42 1.26 5.04

(4.35) (0.94) (1.79) (5.41)

M. Jones et al. / Journal of Clinical Epidemiology 68 (2015) 1165e1175

1171

Table 3. Mean bias (root mean square error) physical functioning based on various methods of analysis for a mortal cohort and MAR (50 simulated samples of 1,000 participants) Method Ignore missing values IPW (correct) IPW (incorrect) Multiple imputation

Wave 1 0 0 0 0

(0) (0) (0) (0)

Wave 2 0.44 0.69 0.20 0.02

(0.52) (0.82) (1.06) (0.92)

Wave 3 1.09 1.33 0.39 0.45

(1.17) (1.55) (1.20) (1.05)

Wave 4 2.05 1.27 0.51 0.68

(2.14) (1.59) (1.53) (1.51)

Wave 5 2.70 0.85 0.78 0.77

(2.84) (1.34) (1.92) (1.85)

Abbreviations: MAR, missing at random; IPW, inverse probability weighting.

and often the model did not converge to a plausible solution. For further information on joint modeling, we recommend Rizopoulos [25] excellent textbook. If the target of inference is the mortal population where only the surviving participants are included at each wave, then IPW and MI are the appropriate methods. A further option could be to report estimates by dropout and survival status as we illustrate in Fig. 1. However, it is often desirable to report overall estimates and also fit covariates; therefore, this descriptive approach may have limited applicability in practice. Given the importance of specifying a correct dropout model for IPW, the relative simplicity of MI is compelling. However, care is needed in specifying the imputation model. In our example, we found previous values of PF were very predictive of current values with little need to include auxiliary variables predictive of missingness in the imputation process. This may help to explain why MI did not appear to lead to biased estimates in the simulation study under MNAR. This could also explain why a correct dropout model for IPW also appeared robust to MNAR, but an incorrect model (one that did not include previous PF as a predictor) was biased. To illustrate the predictability of previous PF, based on linear regression of PF at wave 2, previous PF at wave 1 explained 50% of the variation in the outcome, whereas nine auxiliary variables together only explained a further 2%. For more information, Spratt et al. [26] illustrate the steps needed to ensure the appropriate use of MI in longitudinal studies. Based on a reviewers comment that doubly robust methods [27] may be helpful in the context of IPW (where results may not be robust to a miss-specified dropout model), we investigated applying a doubly robust method in our study. We used regression estimation with residual bias correction [28] (see Appendix at www.jclinepi.com for more details). Results obtained (not shown) were similar to those for LI suggesting this method may be

robust to MNAR. Furthermore, this method could be applied to the mortal cohort. However, a limitation is standard errors, which may need to be estimated using bootstrapping, and fitting a covariate may not be straightforward. In addition, Kang and Schafer [28] suggest that two wrong models may not be better than one. Often the primary aim of a longitudinal analysis in epidemiological research is not to estimate the average response over time but rather to investigate the relationship between covariates and response. In our study, we illustrated the methods by investigating the relationship between highest education attainment and change in PF over time. Although there were large differences in average response over time depending on the method of analysis, the differences in results for the effect of education on change in PF were not large and conclusions from all analysis methods were largely consistent. Gustavson et al. [29] reported a similar finding in their longitudinal populationbased study over 15 years. However, this may not be the case for other covariates or studies. 7. Limitations The simulation study was based on those participants with complete data hence may not reflect the entire older cohort of the ALSWH. Furthermore, the models used to simulate risks of death and dropout were simplified compared with those found in the study. The outcome measure we used for analysis suffers from floor and ceiling effects as well as skewness; however, these potential departures from normality were negligible in this sample. We did not include all possible methods of analysis for missing data in our comparison. For example, we did not include selection models [30] mainly because these are complicated to fit and are not available in standard statistical software such as SAS. We have made an implicit assumption throughout that dropout had no effect on

Table 4. Mean bias (root mean square error) physical functioning based on various methods of analysis for an immortal cohort and MNAR (50 simulated samples of 1,000 participants) Method Ignore missing values Linear mixed models Linear increments Joint models

Wave 1 0 0 0 0.31

(0) (0) (0) (0.56)

Abbreviation: MNAR, missing not at random.

Wave 2 1.26 0.68 0.34 0.04

(1.31) (0.73) (1.02) (0.53)

Wave 3 2.70 1.25 0.63 0.69

(2.76) (1.30) (1.25) (0.99)

Wave 4 4.17 1.79 0.68 0.63

(4.21) (1.87) (1.42) (1.39)

Wave 5 5.80 2.36 0.90 0.71

(5.88) (2.49) (1.58) (1.99)

1172

M. Jones et al. / Journal of Clinical Epidemiology 68 (2015) 1165e1175

Table 5. Mean bias (root mean square error) physical functioning based on various methods of analysis for a mortal cohort and MNAR (50 simulated samples of 1,000 participants) Method

Wave 1

Ignore missing values IPW (correct) IPW (incorrect) Multiple imputation

0 0 0 0

(0) (0) (0) (0)

Wave 2 0.66 0.35 0.29 0.12

(0.72) (0.61) (1.21) (0.99)

Wave 3 1.50 0.79 0.68 0.03

Wave 4

(1.57) (1.29) (1.52) (1.13)

2.54 0.65 1.08 0.06

(2.59) (1.39) (1.78) (1.20)

Wave 5 3.86 0.41 1.67 0.13

(3.95) (1.29) (2.33) (1.42)

Abbreviations: MNAR, missing not at random; IPW, inverse probability weighting.

measurement apart from causing it to be missing. This is reasonable for observational studies but may not be for clinical trials where dropout could lead to assigned treatment not being taken, for example. Diggle et al. [16] discuss this issue at length in their article.

8. Conclusion The assumptions of any method of analysis need to be carefully considered before implementation in studies with missing data due to participant dropout. In addition, the method should appropriately reflect the study aims and target of inference when follow-up has been truncated by death. Sensitivity analysis can be useful to determine the robustness of results to varying plausible missing data mechanisms. We suggest the primary analysis should be performed using a method that is robust to the most plausible missing data mechanism.

where Yij is the observed physical functioning (PF) at the jth wave for participant i; b0 is the unknown fixed intercept to be estimated; bj1 are the unknown fixed effects at waves 2 to 5 to be estimated; IðWavej Þ is an indicator function for the jth wave; εij |Nð0; s2ε Þ is a random error term and ui |Nð0; s2u Þ is the random intercept to be estimated for each participant We also tried specifying an unstructured covariance for the residuals to take account of diminishing correlations within participants over time; however, it made little difference to the estimates. SAS code used to fit the linear mixed model: proc mixed data 5 long; class participant_id wave; model pf 5 wave/s; random int/subject 5 participant_id; lsmeans wave; run; Method 4: A model equation and user-defined working correlation matrix R appropriate for the five survey waves is illustrated below:

Appendix Additional technical information on methodology We have included the more technical information including model equations in this Appendix at www. jclinepi.com for interested readers. SAS code used for the regression analyses is also provided. The data were in long format for all the regression analyses but wide format for multiple imputation. Method 3: A model equation for participant i at wave j is given by: Yij 5 ðb0 þ ui Þ þ

5 X j52

  bj1  I Wavej þ εij ;

Yij 5 b0 þ

5 X j52

  bj1  I Wavej þ εij ; where εij |Nð0;Vi Þ 0

1 B B1 B and R 5 B B1 B @1 1

1 1 1 1 1 C 2 2 2 2C C 2 3 3 3C C C 2 3 4 4A 2 3 4 5

SAS code used to fit the linear increment model:

proc genmod data 5 long; class participant_id wave; model pf 5 wave; repeated subject 5 participant_id/TYPE 5 USER (1.0 1.0 1.0 1.0 1.0. 1.0 2.0 2.0 2.0 2.0. 1.0 2.0 3.0 3.0 3.0. 1.0 2.0 3.0 4.0 4.0. 1.0 2.0 3.0 4.0 5.0); lsmeans wave; run;

M. Jones et al. / Journal of Clinical Epidemiology 68 (2015) 1165e1175

Method 5: The joint models we fitted are as follows: Model for longitudinal outcome Yij 5 ðb11 þ U1i Þ þ Timeaj  ðb12 þ U2i Þ þ εij     V11 where εij |N 0; s2ε ; U1i ; U2i |N ½0; 0; V12 and a 5 1:5

V12



V22

Model for time to death ti |Weibullðr;mi ðtÞÞ; logðmi ðtÞÞ5b21 þ k1  U1i þ k2  U2i and rO0; where U1i and U2i are the shared random effects; mi ðtÞ is the event intensity (hazard) at time t and k1 and k2 are scaling factors These models were fitted in SAS using the Proc Nlmixed procedure as illustrated by Guo et al. [1]. Because of the nonlinear relationship between observation time and PF, we explored various functions of time (including linear and quadratic terms) using Akaike Information Criterion to compare models. The results of this analysis showed ‘‘Time’’ in years to the power of a 5 1.5 provided the best fit. It was not feasible to include wave as a categorical variable because we needed to include time as a random coefficient so that the effect of time to death could be taken account of in the trajectory of PF. An implied assumption of specifying these joint models is that the missing data mechanism for dropout is missing at random (MAR) (see Results where we attempt to justify this assumption). In cases where the missing data mechanism for dropout (and death) is believed to be missing not at random (MNAR), the joint survival model should allow for this proposed mechanism by modeling time to dropout as well as death. SAS code used to fit the joint models: proc nlmixed data 5 long qpoints 5 7; parameters bl1 5 63.5 bl2 5 0.5 v11 5 500 v12 5 0 v22 5 5 s2 5 250 r1 5 0.01 r2 5 0.5; bounds gamma O0, v22 O 0; *weibull survival; if (last) then do; linpsurv 5 bs1 þ r1*u0 þ r2*u1; alpha 5 exp(-linpsurv); G_t1 5 exp(-(alpha*start)**gamma); g 5 gamma*alpha*((alpha*start)**(gamma-1))*G_t1; G_t2 5 exp(-(alpha*finish)**gamma); if dropoutdead in (0, 1) then llsurv 5 log(G_t1); else llsurv 5 log(g); end; else llsurv 5 0; *linear mixed model; linplong 5 (bl1 þ u0) þ (bl2 þ u1)*obstime15;

1173

resid 5 (pf-linplong); if (abs(resid) O 1.3E100) or (s2 ! 1e-12) then do; lllong 5 -1e20; end; else do; lllong 5 0.5*(1.837877 þ resid**2/s2 þ log(s2)); end; model last | general(lllong þ llsurv); random u0 u1 | normal([0,0],[v11,v12,v22]) subject 5 participant_id; *These commands create estimates at 3-year intervals appropriate for each wave; estimate ’wave20 bl1 þ 5.2*bl2; estimate ’wave30 bl1 þ 14.7*bl2; estimate ’wave40 bl1 þ 27*bl2; estimate ’wave50 bl1 þ 41.57*bl2; run; Method 6: Model equations are given by: q X   logit pij 5 b0 þ bk  Xij ; k51

where pij is the probability of remaining in the study conditional on responding to the previous survey and surviving to the current survey, and Xij represents the design matrix for the q predictor variables Yij 5 b0 þ

5 X

  bj1  I Wavej þ εij ;

j52

where εij |Nð0; s2ε Þ and participants at the jth wave are weighted by 1=pij The HosmereLemeshow test [2] and the C statistic were used to assess goodness-of-fit for each of the models. Results from the HosmereLemeshow tests showed no evidence of lack of fit; however, the C statistics ranged from 0.62 to 0.65 indicating only moderate ability of the models to discriminate between participants who dropped out and those that remained. SAS code used to fit the dropout and estimation models: proc logistic data 5 long; class wave participant_id education smoking language self_rated_health exercise bmi; model dropout 5 education smoking language self_ rated_health exercise prev_pf prev_mh/lackfit; output out 5 logout pred 5 prob; by wave; run; proc genmod data 5 long; class participant_id wave; model pf 5 wave; weight invprob; repeated subject 5 participant_id/type 5 ind; lsmeans wave; run;

1174

M. Jones et al. / Journal of Clinical Epidemiology 68 (2015) 1165e1175

Method 7: SAS code for multiple imputation (data in wide format): proc mi data 5 wide seed 5 654321 nimpute 5 10 out 5 miout1; mcmc; var o1pf o2pf o3pf o4pf o5pf; run; *This data step creates the mortal cohort by setting PF to missing if a participant died before initiation of data collection at each survey wave; data miout2; set miout1; if 0 ! deathdate !17,606.00 then o5pf 5 .; if 0 ! deathdate !16,510.00 then o4pf 5 .; if 0 ! deathdate !15,414.00 then o3pf 5 .; if 0 ! deathdate !14,349.00 then o2pf 5 .; run; proc univariate data 5 miout2 noprint; var o1pf o2pf o3pf o4pf o5pf; output out 5 outuni mean 5 o1pf o2pf o3pf o4pf o5pf. stderr 5 so1pf so2pf so3pf so4pf so5pf; by _Imputation_; run; proc mianalyze data 5 outuni; modeleffects o1pf o2pf o3pf o4pf o5pf; stderr so1pf so2pf so3pf so4pf so5pf; run;

Simulation studies model equations The logistic model equations to estimate MAR probabilities of dropout and death are:   logit pdrop 5  0:7  0:007  PFprev  0:011  MHprev þ 0:21  Smk1 þ 0:23  Smk2  0:2  Smk3 logitðpdead Þ 5  0:7  0:018  PFprev  0:009  MHprev  0:28  Smk1 þ 0:1  Smk2 þ 0:23  Smk3; where PFprev, previous physical functioning score; MHprev, previous mental health score; Smk1, missing smoking status; Smk2, current smoker; Smk3, ex-smoker (never smokers are the reference group). The logistic model equations used to estimate MNAR probabilities of dropout and death are:   logit pdrop 5  0:7  0:007  PFcurrent  0:011  MHcurrent logitðpdead Þ 5  0:7  0:018  PFcurrent  0:009  MHcurrent ; where PFcurrent, current physical functioning score; MHcurrent, current mental health score.

Doubly robust estimation Fitting the covariate Fitting the covariate was relatively straightforward for the methods that involved regression modeling (methods 3e6). However, methods 1, 2, and 7 involved simple calculation of arithmetic means; hence, for these methods, we used generalized estimating equation regression models (specifying an independent working correlation matrix). For all methods, we fitted three covariates: an indicator variable for highest educational attainment; a continuous covariate representing a function of observation time; and the interaction between these two main effects. The function of time that provided the best fit based on Akaike Information Criterion was time in years to the power of 1.5. A general model equation for fitting the covariate is: Yij 5 b0 þ b1  IðEduÞ þ b2  Timeaj þ b12  IðEduÞ  Timeaj þ εij ; where IðEduÞ51 indicates ‘‘completed high school or higher’’ and IðEduÞ50 indicates ‘‘did not complete high school.’’

Doubly robust methods apply two types of models simultaneously and consistently estimate a parameter if either model is correctly specified. One model is based on using covariates to model the outcome and then using the estimates obtained to predict missing values. The other model is based on predicting probability of missing using covariates and then using the probabilities to obtain a weighted estimate. Based on a reviewers comment, we applied the following doubly robust method of estimation [3] to our study. We chose this method from a number of possible methods because of its simplicity and applicability to our study. 1 X 1 b i bε i ; m b BCOLS 5 m b OLS þ ti p n i where m b OLS is the least squares estimate of PF at each wave obtained from a linear regression model; n is the number of respondents at each wave; ti indicates whether a participant b 1 responded (1 5 yes, 0 5 no); p i is the inverse probability weight; and bε i is the residual obtained from a linear regression model of PF at each wave.

M. Jones et al. / Journal of Clinical Epidemiology 68 (2015) 1165e1175

References [1] Guo X, Carlin B. Separate and joint modeling of longitudinal and event time data using standard computer packages. Am Statistician 2004;58:16e24. [2] Hosmer D, Lemeshow S. Applied logistic regression. New York, NY: Wiley; 2000. [3] Kang J, Schafer J. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 2007;22(4):523e39.

Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.jclinepi.2015.03.011.

References [1] Little R, D’Agostino R, Cohen M, Dickersin K, Emerson SS, Farrar JT, et al. The prevention and treatment of missing data in clinical trials. N Engl J Med 2012;367:1355e60. [2] Ware J, Harrington D, Hunter D, D’Agostino R. Missing data. N Engl J Med 2012;367:1353e4. [3] Chaix B, Evans D, Merlo J, Suzuki E. Weighing up the dead and missing: reflections on inverse-probability weighting and principal stratification to address truncation by death. Epidemiology 2012;23: 129e31. [4] Kurland B, Johnson L, Egleston B, Diehr P. Longitudinal data with follow up truncated by death: match the analysis to research aims. Stat Sci 2009;24:211e22. [5] Rubin D, Little R. Statistical analysis with missing data. 2nd ed. New York, NY: Wiley; 2002. [6] Little R. Modelling the dropout mechanism in repeated measures studies. J Am Stat Assoc 1995;90:1112e21. [7] Matthews J, Henderson R, Farewell D, Ho W, Rodgers L. Dropout in crossover and longitudinal studies: is complete case so bad? Stat Methods Med Res 2014;23:60e73. [8] Hogan J, Roy J, Korkontzelou C. Tutorial in Biostatistics: handling drop-out in longitudinal studies. Stat Med 2004;23:1455e97. [9] Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics 2000;1:465e80. [10] Verbeke G, Fieuws S, Molenberghs G, Davidian M. The analysis of multivariate longitudinal data: a review. Stat Methods Med Res 2014; 23:42e59. [11] Seaman S, White I. Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res 2013;22: 278e95.

1175

[12] Dufouil C, Brayne C, Clayton D. Analysis of longitudinal studies with death and drop-out: a case study. Stat Med 2004;23:2215e26. [13] Schafer J. Analysis of incomplete multivariate data. Boca Raton, Florida: Chapman & Hall/CRC; 1997. [14] Donders A, van der Heijden G, Stijnen T, Moons K. Review: a gentle introduction to imputation of missing data. J Clin Epidemiol 2006;59: 1087e91. [15] Crawford S, Tennstedt S, McKinlay J. A comparison of analytic methods for non-random missingness of outcome data. J Clin Epidemiol 1995;48:209e19. [16] Diggle P, Farewell D, Henderson R. Analysis of longitudinal data with drop-out: objectives, assumptions and a proposal. Appl Statist 2007;56:499e550. [17] Gunnes N, Farewell D, Seierstad T, Aalen O. Analysis of censored discrete longitudinal data: estimation of mean response. Stat Med 2009;28:605e24. [18] Lee C, Dobson A, Brown W, Bryson L, Byles J, Warner-Smith P, et al. Cohort profile: the Australian Longitudinal Study on Women’s Health. Int J Epidemiol 2005;34:987e91. [19] White I, Carlin J. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 2010;29:2920e31. [20] Farewell D. Marginal analyses of longitudinal data with an informative pattern of observations. Biometrika 2010;97:65e78. [21] Brilleman S, Pachana N, Dobson A. The impact of attrition on the representativeness of cohort studies of older people. BMC Med Res Methodol 2010;10:71. [22] Kurland B, Heagerty P. Directly parameterized regression conditioning on being alive: analysis of longitudinal data truncated by deaths. Biostatistics 2005;6:241e58. [23] Rubin D. Multiple imputation for nonresponse in surveys. New York, NY: J. Wiley & Sons; 1987. [24] Molenberghs G, Beunckens C, Sotto C, Kenward M. Every missingness not at random model has a missingness at random counterpart with equal fit. J R Statist Soc B 2008;70:371e88. [25] Rizopoulos D. Joint models for longitudinal and time-to-event data with applications in R. Boca Raton: Chapman & Hall/CRC; 2012. [26] Spratt M, Carpenter J, Sterne J, Carlin JB, Heron J, Henderson J, et al. Strategies for multiple imputation in longitudinal studies. Am J Epidemiol 2010;172:478e87. [27] Bang H, Robins J. Doubly robust estimation in missing data and causal inference models. Biometrics 2005;61:962e73. [28] Kang J, Schafer J. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 2007;22(4):523e39. [29] Gustavson K, von Soest T, Karevold E, Røysamb E. Attrition and generalizability in longitudinal studies: findings from a 15-year population-based study and a Monte Carlo simulation study. BMC Public Health 2012;12:918. [30] Molenburghs G, Kenward M. Missing data in clinical studies. Chichester: England Wiley; 2007.

Analytical results in longitudinal studies depended on target of inference and assumed mechanism of attrition.

To compare methods for analysis of longitudinal studies with missing data due to participant dropout and follow-up truncated by death...
511KB Sizes 0 Downloads 8 Views