Research Article Received 2 September 2014,

Accepted 23 July 2015

Published online 16 August 2015 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/sim.6621

Terminating observation within matched pairs of subjects in a matched cohort analysis: a Monte Carlo simulation study Rinku Sutradhar,a,b,c*† Nancy N. Baxtera,b,d and Peter C. Austina,b Matched cohort analyses are becoming increasingly popular for estimating treatment effects in observational studies. However, in the applied biomedical literature, analysts and authors are inconsistent regarding whether to terminate follow-up among members of a matched set once one member is no longer under observation. This paper focused on time-to-event outcomes and used Monte Carlo simulation methods to determine the optimal approach. We found that the bias of the estimated treatment effect estimate was negligible under both approaches and that the percentage of censoring had no discernible effect on the magnitude of bias. The mean model-based standard error of the treatment estimate was consistently higher when we terminated observation within matched pairs. Furthermore, the type 1 error rate was consistently lower when we did not terminate follow-up within matched pairs. In conclusion, when the focus was on time-to-event outcomes, we demonstrated that there was no advantage to terminating follow-up within matched pairs. Continuing follow-up on each subject until their observation was naturally complete was superior compared with terminating a subject’s observation time once its matched pair had ceased to be under observation. Given the frequency with which these analyses are conducted in the applied literature, our results provide important guidance to analysts and applied researchers as to the preferred analytic approach. Copyright © 2015 John Wiley & Sons, Ltd. Keywords:

matched cohort study; terminate follow-up; censoring; treatment effect; Monte Carlo simulations; bias

1. Introduction Matched cohort analyses are becoming increasing popular for estimating treatment effects in observational studies [1–4]. Matched cohort analyses consist of matching each treated/exposed subject from a well-defined study population to one or more control/unexposed subjects from the same study population. To be consistent throughout the paper, we will use the terms exposed and unexposed to refer to treated and control individuals, respectively. Typically, subjects are matched either on specific characteristics defined at baseline or the time of cohort entry (e.g., matching on age and sex) or on their propensity of being exposed, or a combination of both [5]. The matched subjects are then followed over time to ascertain the occurrence of a specific outcome/event. The aim of matched cohort studies is to examine the association between the exposure and the outcome [6]. Matched cohort analyses have been shown to reduce confounding and minimize selection bias and thus provide an estimate of exposure association that is reflective of the experience in the study population [6]. In the applied biomedical literature, there is a lack of consistency in how analyses are conducted under a matched cohort design when the outcome is time to the occurrence of an event. In particular, authors are inconsistent as to whether to terminate follow-up among members of a matched set once one member is no longer under observation. Some authors censor all subjects in a matched set once one member a Institute for Clinical Evaluative Sciences, Toronto, Canada b Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, Canada c Department of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada d Department of Surgery and Keenan Research Centre, Li Ka Shing Knowledge Institute, St. Michael’s Hospital,

294

University of Toronto, Toronto, Canada *Correspondence to: Rinku Sutradhar, Institute for Clinical Evaluative Sciences, G1-06 2075 Bayview Avenue, Toronto, M4N 3M5, Ontario. † E-mail: [email protected]

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2016, 35 294–304

R. SUTRADHAR, N. N. BAXTER AND P. C. AUSTIN

in the set is not under observation [7–9]. For example, a recent study examined whether survivors of young adult malignancies had higher hospitalization rates compared with healthy individuals without a prior cancer diagnosis. The authors matched cancer survivors to healthy individuals based on age, sex, and geographic region. In regression analyses, the authors terminated observation among members of a matched set as soon as one member of the set was no longer under follow-up [8]. Another study examined the association between prostate cancer and hazard of a new primary cancer [9]. The authors matched men with prostate cancer to men without prostate cancer based on age. In their time-to-new primary cancer analysis, the authors terminated observation among members of a matched set once one member was not under observation. Other authors do not impose this restriction and instead follow each subject until their observation naturally ends because of the event occurring or because of censoring from loss of follow-up or study end [1–4, 10–12]. For example, a recent study examined the association between major depressive disorder and hazard of stroke [12]. Patients diagnosed with major depressive disorder were matched to healthy individuals on age and sex. In their time-to-stroke analysis, each subject continued to contribute to the analysis until their observation ended naturally. Similarly, another study investigated the association between exposure to androgen deprivation therapy and hazard of acute myocardial infarction among men over 65 years of age [3]. They matched exposed and unexposed men to be similar with respect to age and several other measured baseline covariates. In their time-to-acute myocardial infarction analysis, each individual continued to contribute to the analysis until their observation ended naturally. From our review of the literature, we discovered that the majority of papers did not declare how they treated follow-up within matched pairs during the time-to-event analysis; upon contacting some of the authors, we found that the decision to prematurely terminate observation was not uncommon. Reflective of the lack of consistent practice in the applied biomedical literature, this issue of whether to terminate observation among members of matched set has received little attention in the methodological literature. To the best of our knowledge, there has been no thorough evaluation of which analytic approach (termination or no termination) offers superior performance for estimating the exposure effect. In the regression analysis of time-to-event data for matched pairs, we anticipate that following each member of a pair until their observation naturally ends will be the favored approach. Generally, under the assumption of non-informative right censoring, forcing termination of follow-up should not lead to biased regression estimates. However, the termination approach may result in fewer events being captured in the analysis, which in turn may result in larger standard errors of the regression estimates. It remains unclear how various proportions of censoring will impact the estimates under the termination approach compared with the no termination approach. The aim of this paper is to estimate the exposure effect on a time-to-event outcome under a matched cohort design using two different analytic approaches concerning the management of the subjects within a pair (termination approach versus no termination approach). Section 2 describes a series of Monte Carlo simulations to offer insight into the optimal approach, and the results of the simulation study are provided in Section 3. Section 4 applies and compares the analytic approaches to a real data example studying the hazard of hospitalization among cancer survivors compared with matched healthy individuals. The paper concludes with a discussion in Section 5.

2. Monte Carlo simulations: methods

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2016, 35 294–304

295

We conducted an extensive series of Monte Carlo simulations to examine the consequences of different decisions about when to terminate follow-up on matched subjects on estimating hazard ratios in a matched cohort analysis. Specifically, we considered two different approaches. The first approach censored a subject within a matched pair once the other member of the pair was no longer under followup, either because of the occurrence of the event or because of censoring (termination approach). The second approach consisted of following each subject within a matched pair until they reached their natural observation end, either due to them experiencing the event or due to them being censored (no termination approach). We used a data-generating process similar to that described in previous work [13]. Based on a sample size of N = 10, 000 individuals, we first simulated data for 10 independent baseline covariates (X1 , … , X10 ). Six of these covariates were generated from a standard normal distribution (X1 , … , X3 ; X8 , … , X10 ), and the remaining four covariates were generated from a Bernoulli distribution with probability parameter 0.6 (X4 , … , X7 ). Of the 10 covariates, seven influenced exposure selection (X1 , … , X7 ), and seven influenced the outcome (X4 , … , X10 ). For each individual i(i = 1, … , N), we

R. SUTRADHAR, N. N. BAXTER AND P. C. AUSTIN

calculated the probability of being exposed (pi ) using a logistic regression model defined as follows: logit(pi ) = 𝛼0,𝑒𝑥𝑝𝑜𝑠𝑒𝑑 + 𝛼W xi1 + 𝛼M xi2 + 𝛼S xi3 + 𝛼W xi4 + 𝛼M xi5 + 𝛼S xi6 + 𝛼VS xi7 . The intercept parameter 𝛼0,𝑒𝑥𝑝𝑜𝑠𝑒𝑑 reflects the baseline log odds of being exposed for a reference subject whose covariates are all equal to zero, which we varied at the desired level. The regression parameters 𝛼W , 𝛼M , 𝛼S , and 𝛼VS were assigned to reflect weak, moderate, strong, and very strong associations, respectively; these coefficients were set to values log(1.25), log(1,5), log(1.75), and log(2), respectively. The i-th subject’s exposure status (Zi ) was generated from a Bernoulli distribution with probability parameter pi . The next step was to generate the event time for each individual, using a data-generating process described by Bender et al. [14]. The linear predictor for the i-th subject was calculated as LPi = 𝛽𝑒𝑥𝑝𝑜𝑠𝑒𝑑 Zi + 𝛼W xi4 + 𝛼M xi5 + 𝛼S xi6 + 𝛼VS xi7 + 𝛼W xi8 + 𝛼M xi9 + 𝛼S xi10 . Note that 𝛽𝑒𝑥𝑝𝑜𝑠𝑒𝑑 was the true exposure regression parameter denoting the effect of exposure on the hazard of the outcome, which we varied at the desired level. The i-th subject’s event time ti was determined using the function ti = (− log(ui ) / 𝜓 eLPi )1∕𝛾 , where u was drawn from a standard uniform distribution and the scale parameter 𝜓 and the shape parameter 𝛾 were set to 0.00002 and 2, respectively. We also varied the percentage of censored individuals in the simulated dataset by changing the study end time; this was a straightforward way to ensure non-informative right censoring. Subjects whose calculated event time occurred after the study end time were simply censored at the study end time. To avoid any confusion with the censoring that will be induced in the matched dataset under the forced termination method, we refer to the percentage of censoring in the simulated (unmatched) dataset as the original percentage of censoring. We varied the following factors in our Monte Carlo simulations: (i) the percentage of individuals exposed (10% and 20%); (ii) the true log-hazard ratio for the effect of exposure on the hazard of the outcome (0, log(1.5), log(3.0), and log(5.0)); and (iii) the original percentage of censoring (10%, 25%, 50%, 75%, and 90%). We therefore examined 40 (2 × 4 × 5) scenarios. Under each of the 40 scenarios, we simulated M = 500 datasets, each consisting of N = 10, 000 patients. Within each simulated dataset, we constructed a matched dataset by matching each exposed subject to one unexposed subject. An exposed subject was matched to an unexposed subject by hard matching on the binary covariates and greedy nearest-neighbor matching on the continuous covariates [15, 16]. The greedy nearest-neighbor matching was carried out by specifying a caliper width equal to 0.2 of the standard deviation of the continuous covariate; because each continuous covariate was generated from a standard normal distribution, this implied that the value of the covariate for the unexposed subject was within 0.2 of the value of the covariate for the exposed subject. Unexposed subjects were selected without replacement, meaning an unexposed subject could not be matched to more than one exposed subject. Within each matched dataset, we implemented four different estimation strategies. The first approach did not terminate follow-up and used a standard multivariable Cox model to estimate the true exposure regression parameter (referred to as the ‘no termination, na¨ıve model’). The term na¨ıve implies that the data were analyzed without accounting for the matched design. The second approach did not terminate follow-up and used a multivariable Cox model with a robust variance estimation approach (to accommodate the matched design) to compute the standard error of the estimated regression coefficient (referred to as the ‘no termination, robust model’) [17]. Compared with the na¨ıve model, note that Lin and Wei’s robust variance estimation approach affects only the standard errors; the estimates of the regression coefficients remain unchanged (page 116) [18]. The third approach terminated follow-up of both subjects in a pair as soon as one subject experienced the event or was censored; a standard multivariable Cox model was then used to estimate the true exposure regression parameter (referred to as the ‘termination, na¨ıve model’). The fourth approach also terminated follow-up of both subjects in a pair as soon as one subject experienced the event or was censored; however, a multivariable Cox model with a robust variance estimation approach was used to compute the standard error of the estimated regression coefficient (referred to as the ‘termination, robust model’). Note that there were a total of eight variables in each multivariable model: exposure status covariate Z and the seven covariates X4 to X10 . The multivariable Cox regression model for the i-th subject can be expressed as 𝜆i (t) = 𝜆0 (t) exp {𝛽𝑒𝑥𝑝𝑜𝑠𝑒𝑑 Zi + 𝛼4 X4i + · · · + 𝛼10 X10i }.

296

Maximum likelihood estimates of the regression parameters are obtained from Cox’s partial likelihood function. Under the standard Cox model approach, the inverse of the observed information matrix is used as the variance–covariance matrix of the estimate of the regression parameter vector; however, this variance–covariance matrix does not account for the matched design. To account for the correlation of Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2016, 35 294–304

R. SUTRADHAR, N. N. BAXTER AND P. C. AUSTIN

individuals within matched pairs, a robust variance–covariance matrix is computed as a function of the inverse of the information matrix and the matrix of the pair efficient score residuals [17]. This is referred to as the Cox model under a robust variance estimation approach. After applying each of the four methods, we estimated the mean bias, the mean model-based standard error, and the empirical standard deviation of the exposure regression estimates over the M = 500 simulated datasets, under each of the 40 scenarios. The mean bias, under any given scenario, is the average of the exposure coefficient estimates minus the value of the true exposure coefficient over the 500 1 ∑ ̂ 500 simulated datasets; it is defined as Bias = 500 {𝛽𝑒𝑥𝑝𝑜𝑠𝑒𝑑,r − 𝛽𝑒𝑥𝑝𝑜𝑠𝑒𝑑,true }. The mean model-based r=1

standard error under any given scenario is the average of the model-based standard error of the expo500 1 ∑ sure coefficient estimates over the 500 datasets; it is defined as 500 SE𝛽̂𝑒𝑥𝑝𝑜𝑠𝑒𝑑,r . The empirical standard r=1

deviation of the exposure regression estimates under any given scenario is the standard deviation of the estimated exposure coefficient estimates over the 500 datasets; it can be expressed as the square root of 500 500 2 ∑ ̂ 1 ∑ ̂ 𝛽 {𝛽 − 𝛽̂̄ } , where 𝛽̂̄ = 1 . We also determined the relative increase 500

r=1

𝑒𝑥𝑝𝑜𝑠𝑒𝑑,r

𝑒𝑥𝑝𝑜𝑠𝑒𝑑

𝑒𝑥𝑝𝑜𝑠𝑒𝑑

500

r=1

𝑒𝑥𝑝𝑜𝑠𝑒𝑑,r

in the mean model-based standard error comparing the termination approach with the no termination approach; under any given scenario, this was computed as the ratio of the mean standard error from the termination approach to the mean standard error from the no termination approach. Finally, we compared the different methods with respect to the empirical type 1 error rate under the null hypothesis of no exposure effect (true exposure hazard ratio 1.0). In our Monte Carlo simulations, for any given scenario with true 𝛽𝑒𝑥𝑝𝑜𝑠𝑒𝑑 = 0, the type 1 error rate was calculated as the proportion of simulated datasets (out of M = 500) in which we incorrectly rejected the null hypothesis using a significance level of 0.05.

3. Monte Carlo simulations: results

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2016, 35 294–304

297

Under an exposure probability of 0.10 and 0.20, the average number of matched pairs formed across the 10,000 simulated samples for each scenario was 992.2 and 1866.4, respectively. Thus, we matched approximately 99.2% and 93.3% of exposed subjects to an unexposed subject, respectively. Our matching methods should have minimal bias due to incomplete matching [19]. The mean bias of the estimated exposure regression estimates from the four estimation approaches are graphically presented in Figure 1 for each of the 40 scenarios. As the estimates of the regression parameters are the same under the na¨ıve multivariable Cox model and the multivariable Cox model with a robust variance estimation approach, each plot was simplified to have only two lines (one line for the termination approach and one line for the no termination approach). Irrespective of the scenario, bias was negligible under each method. In fact, there was no estimation approach for which the absolute of the mean bias values was greater than 0.005. This implies that the decision to either terminate followup within matched pairs or continue follow-up until each subject naturally reaches their observation end does not have an impact on the bias of the exposure regression estimate. Moreover, the degree of the original proportion of censoring had no discernible effect on the magnitude of bias. Figure 2 illustrates the mean of the model-based standard errors of the exposure regression estimates from the four estimation approaches under each of the 40 scenarios. Regardless of whether or not follow-up was terminated, the results from implementing the standard multivariable Cox model and the multivariable Cox model with a robust variance estimation approach were only different following the first decimal place and were thus graphically indistinct. Each plot was therefore simplified to have only two lines each (one line for the termination approach with robust variance estimation and one line for the no termination approach with robust variance estimation). In every scenario, we found that the mean standard error was consistently higher when we terminated observation within matched pairs. The mean standard error was also consistently higher when the exposure probability was low (10%) compared with when exposure probability was high (20%). Furthermore, the difference in the mean standard error between the two methods was the greatest when the original proportion of censoring was the lowest; this difference gradually diminished as the original proportion of censoring increased. The same patterns were found when examining the empirical standard deviation of the exposure regression estimates (data not shown). Under each estimation approach, there was an increasing trend in the mean model-based standard error as the original proportion of censoring became higher. However, the relative increase in the mean standard error comparing the termination approach with the no termination approach noticeably decreased as the

R. SUTRADHAR, N. N. BAXTER AND P. C. AUSTIN

298

Figure 1. Mean bias of the exposure regression estimates from the termination and no termination approaches under each of the 40 scenarios, according to the true exposure parameter and to the exposure probability.

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2016, 35 294–304

R. SUTRADHAR, N. N. BAXTER AND P. C. AUSTIN

Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2016, 35 294–304

299

Figure 2. Mean standard error of the exposure regression estimates from the termination and no termination approaches with the robust variance method under each of the 40 scenarios, according to the true exposure parameter and to the exposure probability.

R. SUTRADHAR, N. N. BAXTER AND P. C. AUSTIN

Figure 3. Relative increase in mean standard error comparing the termination approach versus the no termination approach.

original proportion of censoring became higher. This was further illustrated in Figure 3, which plots the relative increase in the mean model-based standard error comparing the termination approach analyzed under a multivariable Cox model with a robust variance estimator (termination, robust model) with the no termination approach analyzed under a multivariable Cox model under a robust variance estimator (no termination, robust model). When the association between exposure and the event hazard was large (true exposure hazard ratio 5.0) and when the original censoring proportion was low (implying the event proportion was high), the mean standard error was nearly 50% larger when we terminated follow-up within matched pairs. Under all levels of exposure association and exposure probability, the relative increase dropped below 1.1 when the original censoring proportion was at 0.90. This decrease in the relative standard error as the original censoring proportion increased was intuitively reasonable. Because subjects in the simulated (unmatched) datasets were censored at the determined study end time, a low proportion of events (equivalent to a high original proportion of censoring) implied there was less need to terminate follow-up in the matched datasets. This in turn meant that the termination approach became more similar to the no termination approach. We also compared the different methods with respect to the empirical type 1 error rate under the null hypothesis of no exposure effect (true exposure hazard ratio 1.0). The type 1 error rate was consistently lower when we did not terminate follow-up within matched pairs. The largest difference could be seen when there was a high event proportion (that is, a low original proportion of censoring): the type 1 error rate when we did not terminate observation was less than 0.05, whereas the type 1 error rate when we terminated observation was nearly 0.07. Finally, to address the concern that in individual empirical analyses, the covariates may not be independent of one another, we conducted an additional simulation in which the continuous covariates were generated from a multivariate standard normal distribution, where the correlation within pairs of covariates was assumed to be 0.5. As an illustration, this simulation was conducted under one randomly chosen scenario: 10% exposure probability, true log-hazard ratio of log(1.5), and original censoring percentage of 25%. The results of the mean bias and mean standard errors for both the termination approach and the no termination approach were very similar to that obtained under the generation of independent covariates (data not shown). The message remained the same: The mean standard errors were larger under the termination approach compared with the no termination approach, and there was no advantage in forcing the termination of follow-up within matched pairs.

4. Real data example

300

In this section, we present an empirical case study, borrowed from Sutradhar et al. [20]. These authors carried out a matched cohort study to determine if the rate of hospitalizations was higher among young adult survivors of colorectal cancer (CRC) relative to matched individuals who had never had cancer. To correspond to the dataset structure in our simulation studies, we modified the cohort used by Sutradhar Copyright © 2015 John Wiley & Sons, Ltd.

Statist. Med. 2016, 35 294–304

R. SUTRADHAR, N. N. BAXTER AND P. C. AUSTIN

Table I. Results from the real data example comparing cancer survivors with healthy controls for the hazard of first hospitalization, under the four estimation approaches. Approach

Exposure estimate

Hazard ratio estimate

Standard error

p-value

0.8223 0.8223 0.8361 0.8361

2.2757 2.2757 2.3075 2.3075

0.0942 0.0927 0.1159 0.1162

Terminating observation within matched pairs of subjects in a matched cohort analysis: a Monte Carlo simulation study.

Matched cohort analyses are becoming increasingly popular for estimating treatment effects in observational studies. However, in the applied biomedica...
1KB Sizes 1 Downloads 6 Views