MAIN PAPER (wileyonlinelibrary.com) DOI: 10.1002/pst.1680

Published online 6 April 2015 in Wiley Online Library

Assessing the treatment effect in a randomized controlled trial with extensive non-adherence: the EVOLVE trial Yumi Kubo,a * Lulu Ren Sterling,a Patrick S Parfrey,b Karminder Gill,c Kenneth W Mahaffey,d Ioanna Gioni,e Marie-Louise Trotman,a Bastian Dehmel,a and Glenn M Chertowd Intention-to-treat (ITT) analysis is widely used to establish efficacy in randomized clinical trials. However, in a long-term outcomes study where non-adherence to study drug is substantial, the on-treatment effect of the study drug may be underestimated using the ITT analysis. The analyses presented herein are from the EVOLVE trial, a double-blind, placebo-controlled, event-driven cardiovascular outcomes study conducted to assess whether a treatment regimen including cinacalcet compared with placebo in addition to other conventional therapies reduces the risk of mortality and major cardiovascular events in patients receiving hemodialysis with secondary hyperparathyroidism. Pre-specified sensitivity analyses were performed to assess the impact of non-adherence on the estimated effect of cinacalcet. These analyses included lag-censoring, inverse probability of censoring weights (IPCW), rank preserving structural failure time model (RPSFTM) and iterative parameter estimation (IPE). The relative hazard (cinacalcet versus placebo) of mortality and major cardiovascular events was 0.93 (95% confidence interval 0.85, 1.02) using the ITT analysis; 0.85 (0.76, 0.95) using lag-censoring analysis; 0.81 (0.70, 0.92) using IPCW; 0.85 (0.66, 1.04) using RPSFTM and 0.85 (0.75, 0.96) using IPE. These analyses, while not providing definitive evidence, suggest that the intervention may have an effect while subjects are receiving treatment. The ITT method remains the established method to evaluate efficacy of a new treatment; however, additional analyses should be considered to assess the on-treatment effect when substantial non-adherence to study drug is expected or observed. Copyright © 2015 John Wiley & Sons, Ltd. Keywords: non-adherence; intention-to-treat; lag-censoring; inverse probability of censoring weights; rank preserving structural failure time model; iterative parameter estimation

1. INTRODUCTION The intention-to-treat (ITT) analysis is the established method for evaluating the efficacy of a new treatment in randomized clinical trials. Although the ITT analysis is a valid test to compare two treatment strategies, its estimate of the treatment effect can differ considerably from the on-treatment effect estimates when there is a considerable non-adherence. In event-driven trials with long-term follow-up, non-adherence to study drug may be extensive, particularly in populations with substantial morbidities. If a large proportion of follow-up time and accumulation of events occur while patients are not taking randomized treatment, the on-treatment effect may be underestimated [1]. The assessment of the on-treatment effect is important for clinical decision-making, especially if no other therapies have effectively reduced the burden of the disease. For these reasons, we performed pre-specified analyses accounting for non-adherence in the EVOLVE (Evaluation of Cinacalcet Therapy to Lower Cardiovascular Events) trial. 1.1. The EVOLVE trial

242

The EVOLVE trial was a global, multi-center, placebo-controlled, double-blind, event-driven trial .N D 3883/ designed to assess

Pharmaceut. Statist. 2015, 14 242–251

the risks and benefits of cinacalcet .Sensiparr =Mimparar /, compared with placebo along with conventional, standard-of-care therapies, on a composite endpoint consisting of all-cause mortality and major cardiovascular events in patients receiving hemodialysis with moderate to severe secondary hyperparathyroidism (sHPT). This trial was the largest in number and longest in duration of follow-up ever conducted in the dialysis population and was marked by a large proportion of patients prematurely withdrawing from randomized treatment [2]. At the time of enrollment, cinacalcet was commercially available in 18 out of 22 (82%) countries participating in the study. The use of commercial cinacalcet was not allowed per the protocol; however,

a

Amgen, Inc., Thousand Oaks, CA, USA

b

Health Sciences Center, St. John’s, Newfoundland, Canada

c

Ascentiant International, Carlsbad, CA, USA

d

Stanford University School of Medicine, Palo Alto, CA, USA

e

on behalf of Amgen, Ltd., Uxbridge, UK

*Correspondence to: Yumi Kubo, Amgen, Inc. 1 Amgen Center Drive M/S 24-2-C Thousand Oaks, CA 91320, USA. E-mail: [email protected]

Copyright © 2015 John Wiley & Sons, Ltd.

Y. Kubo et al. some physicians prescribed it if they deemed it medically necessary. Papers describing the EVOLVE trial design, [3] baseline characteristics [4] and primary results [2] have been published. The study was approved by Institutional Review Boards at participating study sites and written-informed consent was obtained from all patients. As previously reported [2], the original trial duration was anticipated to be 4 years, but due to the lower-than-expected pooled (blinded) event rate, the trial was extended to 5.5 years. The first patient was enrolled in August 2006, and the trial was terminated on 28 January 2012 when 1882 primary endpoint events were anticipated to have occurred. During the course of the study, a large proportion of patients discontinued randomized treatment. Of the 1938 patients randomized to placebo, 1365 (71%) discontinued study drug, and of the 1948 patients randomized to cinacalcet, 1300 (67%) discontinued study drug (Table I). The median (Q1, Q3) time patients were followed for endpoints was 50.4 (26.7, 56.4) and 50.6 (31.3, 56.4) months in patients randomized to placebo and cinacalcet, respectively. The median time (Q1, Q3) on randomized treatment was 17.5 (7.1, 37.9) months in the placebo group compared with 21.2 (8.1, 40.8) months in the cinacalcet group. In addition, a substantial proportion of patients also received commercially available cinacalcet during the trial (11% in the group randomized to cinacalcet and 23% in the group randomized to placebo). Moreover, 14% of patients randomized to placebo and 7% of patients randomized to cinacalcet underwent parathyroidectomy, a surgical and more definitive approach to managing sHPT. A total of 384 (19.8%) patients randomized to placebo received commercially available cinacalcet prior to the

occurrence of a primary endpoint event (corresponding to an annual ‘drop-in’ rate of 7.4%). Conversely, 1207 (62.0%) of patients randomized to cinacalcet discontinued study drug prior to the occurrence of a primary endpoint event (corresponding to an annual ‘drop-out’ rate of 27.3%), effectively resulting in crossover between study arms [2]. Non-adherence was expected based on our previous phase 3 experience and was incorporated into the sample size calculation to ensure adequate power: 10% per year drop-out rate and 10% per year drop-in rate [2]. In addition to incorporating these hypothesized estimates into the study design, we also implemented measures in trial conduct to reduce non-adherence. Rigorous plans were implemented to ensure compliance to the trial protocol including extensive investment in time by trial monitors engaging with local research nurses and principal investigators communicating with local clinical investigators. We also planned several sensitivity analyses to account for non-adherence in both treatment groups including lag censoring, inverse probability of censoring weights (IPCW), rank preserving structural failure time model (RPSFTM) and iterative parameter estimation (IPE) methods. Although other more commonly used methods such as a per protocol analysis were performed, only methods that include and analyze all randomized patients using their randomized treatment assignment are presented here. The alternative analytic approaches we describe herein can be generalized to other therapeutics utilized in long-term studies.

Table I. Reasons for discontinuing study drug.

Subjects who discontinued study drug (%) Ineligibility determined Consent withdrawn Lost to follow-up Adverse event Protocol-specified reasons Parathyroidectomy Kidney transplant Calcium 1 represents a detrimental treatment effect. An alternative form for Equation (2) can be written using the following equation [19]: Z TRi exp Œ˜ Xi.t/dt (3) TLi D 0

Xi .t/ is a binary indicator which equals 1 when the patient is exposed to cinacalcet and 0 otherwise. For patients randomized to placebo who were never exposed to cinacalcet TLi D TRi . In the RPSFTM framework, TLi is a pre-randomization variable and is independent of randomization. Therefore, the treatment effect ˜ can be obtained from a grid search over a range of plausible values of ˜ until TLi is equally distributed between the two treatment groups using a test-based method (i.e. log-rank) [20]. It is worth noting that Equations (2) and (3) are applicable when all patients can be followed until an event is observed.

Copyright © 2015 John Wiley & Sons, Ltd.

245

where i is the patient index, t is the time interval index, C.t/ D 1 if patient was censored (i.e. stopped study drug) within the time interval t and 0 otherwise, R denotes randomized treatment (0 for placebo, 1 for cinacalcet), V is a vector of baseline covariates and L.t  1/ is a vector of time-varying confounders which are specified in the succeeding texts. In our analysis, data were split into 4-week intervals during the titration phase and 16-week intervals during the maintenance phase. We assessed 8-week intervals during the maintenance phase (frequency of protocol-scheduled assessments) but due to convergence issues, 16-week intervals were selected to minimize missing data and ensure clinical interpretation. In the calculation of weights, we included variables thought to be true confounders and predictors of the outcome [13]. Baseline covariates included in both models 1 and 2 are randomized treatment group, age group .< 65, > 65 years/, sex, race (white, Black and Other), region (the USA, Europe, Latin America, Russia, Canada and Australia), history of diabetes (yes/no) and categories of baseline laboratory measurements (PTH, corrected serum calcium and serum phosphorus). Model 2 also included additional time-varying confounders: adverse events of hypocalcemia, and on-study laboratory measurements of PTH, corrected serum calcium and serum phosphorus. All clinically relevant baseline covariates and time-varying confounders that were considered to be affected by prior exposure to study drug in both treatment groups were included in the calculation of the weights. In addition, we incorporated interaction terms between treatment and time-varying laboratory measures to account for the potential difference in their effect on adherence in the treatment groups. Although adverse events of hypocalcemia were anticipated to occur more frequently in patients randomized to cinacalcet, it was anticipated that some patients randomized to placebo would also discontinue study drug for this reason based on our previous phase 3 experience. Because investigators were blinded to randomized treatment, the same reasons for discontinuing study drug apply to both treatment groups. Covariates that are deterministic of being non-adherent (e.g. kidney transplant and parathyroidectomy) were not included in the models because the probability of being adherent following these events was 0 [14]. If more than one laboratory assessment occurred within a time interval, the last value was used in the calculation of the weight. On average, laboratory assessments occurred 3 weeks prior to the end of the interval, which is considered adequate time to allow these values to be used to inform subsequent treatment. Of note, 5.6% of weights could not be calculated because of missing laboratory values. To limit bias that might result from exclusion of selected patients, missing weights during individual time intervals were set to 1, which equated to no contribution to the cumulative weight. For each interval, a cumulative weight for

Pharmaceut. Statist. 2015, 14 242–251

each patient was calculated by multiplying the weights from each of the prior time intervals. Once the weights are derived, the treatment effect and corresponding 95% CI were estimated using a weighted pooled logistic regression, modelling the probability of an event within each time

Y. Kubo et al. However, as in many other long-term outcomes studies, observing the event for all patients may not be feasible when follow-up is terminated at a pre-defined date, the administrative censoring date Ci . Thus, TLi is censored accordingly at time Di using the following equation (4) [19]: Di D

Z

Ci

exp Œ˜ Xi.t/dt

(4)

0

As discussed previously by White [19], non-informative censoring of the observed failure time induces informative censoring of TLi if Xi and Ci are associated with the prognosis [18]. To overcome this, TLi was recensored at the earliest possible censoring time (Equation 5) to avoid potential bias [19].   Di D min Ci, Ci  e˜

(5)

2.1.5. Iterative Parameter Estimation (IPE). IPE was first proposed by Branson and Whitehead in 2002 [21]. Similar to RPSFTM, IPE relates the observed failure time to a counterfactual time using a causal model. Unlike RPSFTM, the counterfactual time TLi is defined as the failure time that would have been observed had patients remained adherent to the assigned treatment. Also, rather than using a test-based method, IPE uses a parametric likelihood estimation method to derive ˜. Similar to RPSFTM, IPE aims to estimate the efficacy of the study drug as if patients maintained their randomized treatment for the entire study duration. In both methods, the estimate reflects the potential efficacy of the study drug when adherence to study drug is achieved and maintained. However, this may be an unlikely scenario in a patient population with chronic illness given the high-pill burden and other therapies to treat co-morbid conditions. Furthermore, both approaches assume that the treatment effect is the same regardless of when treatment is taken, that is, an equal treatment effect [18] that may be biologically implausible.

Using the same notations and definitions from earlier, TLi was derived for each patient  T C e˜ TUnexposed patients randomized to cinacalcet TLi D TExposed C e˜ T patients randomized to placebo Unexposed

Exposed

(6) For patients who remained on their assigned treatment, TExposed or TUnexposed is 0 depending on their randomized treatment group. Based on the distribution of the observed failure times, a Weibull parametric model was selected for our analysis. Fit plots of the observed survival time using common survival distributions are presented in Supplementary Figure 2. An initial e˜ estimate was obtained from comparing the observed failure times between the two treatment groups. Using this initial estimate of e˜ , counterfactual failure times were derived for each patient using Equation (6). Using the same parametric model, a new estimate of e˜ was derived by comparing these counterfactual times. If the difference between the initial and new estimate of e˜ was less than a pre-defined threshold .105 /, then the IPE algorithm was considered to have converged. Otherwise, TL was re-derived using Equation (6) based on the new e˜ estimate. The iterative process was repeated until the difference between two consecutive e˜ estimates was < 105 . A flow diagram of the process and further details can be found in Branson and Whitehead’s publication [21]. Once the IPE algorithm converged, the relative hazard comparing cinacalcet to placebo was derived from the final estimate of ˜ : exp .˜=k/ where k is the estimated scale parameter from the Weibull distribution [22]. As recommended by Branson and Whitehead, 95% CIs were derived using the bootstrap method because the parametric analysis may result in underestimated variances [21]. It is worth noting that Branson and Whitehead only proposed recensoring at the administrative censoring date if the counterfactual survival time extends past this date. As suggested by

Table II. Pooled logistic regression analysis of baseline predictors on remaining adherent to study drug for IPCW Model 1. Baseline variables Treatment (cinacalcet vs placebo) Age > 65 years (Yes vs No) Male versus Female Race group (White, Black and Other) Region (the USA, Europe, Latin America, Russia, Australia and Canada) History of diabetes (Yes vs No) Baseline PTH categories .6 63.6, > 63.6  95.4, > 95.4  127.2, > 127.2 pmol/L/ Baseline serum calcium categories .6 2.00, > 2.00  2.25, > 2.25  2.50, > 2.50  2.75, > 2.75 mmol/L/ Baseline serum phosphorus categories .6 1.60, > 1.60  2.24, > 2.24 < 2.88, > 2.88 mmol/L/

Odds ratio 1.24 1.04 1.05

95% Wald confidence limits

p-value

1.14 0.93 0.97 —

< 0.001 0.511 0.217 < 0.001

1.35 1.15 1.15

< 0.001

— 0.95

0.86

1.05

0.299



< 0.001



< 0.001



0.499

246

Odds ratio (95% confidence intervals) are not presented for categorical variables with more than two levels. The restricted cubic spline function (daspline macro from Frank Harrell) [31] was used for the continuous interval variable. Numbers are not presented in the table. IPCW, inverse probability of censoring weights. PTH = intact plasma parathyroid hormone, 1 pg/mL = 0.106 pmol/L corrected serum calcium, 1 mg/dL = 0.25 mmol/L serum phosphorus, 1 mg/dL = 0.32 mmol/L

Copyright © 2015 John Wiley & Sons, Ltd.

Pharmaceut. Statist. 2015, 14 242–251

Y. Kubo et al. White [23], the full recensoring method as previously described for RPSFTM was applied to this method as well, because there can be substantial bias towards the null when full recensoring is not applied. Statistical analyses were performed using SAS software, version 9.2 (SAS Institute, Cary, NC, USA).

3. RESULTS 3.1. Intention-to-treat Nine hundred thirty-eight patients randomized to cinacalcet and 952 patients randomized to placebo experienced a primary composite endpoint. For the primary analysis, Kaplan–Meier product limit estimates of event free survival were compared between the two groups using a two-sided, stratified log-rank test. As reported previously, the primary endpoint did not achieve statistical significance .p D 0.112/ [2]. The relative hazard comparing cinacalcet to placebo was 0.93 (95% CI: 0.85, 1.02) (Table IV) using an unadjusted Cox proportional hazard regression model stratified by country and history of diabetes. After adjustment for baseline characteristics, the relative

hazard was 0.88 (95% CI: 0.81, 0.97) [2]. The median (Q1, Q3) observation time to primary composite endpoint event or censor date was 48 (21, 55) months in the cinacalcet group and 46 (17, 55) months in the placebo group (Table V). 3.2. Lag censoring Using the structure of the lag-censoring analysis, 638 patients randomized to cinacalcet and 658 patients randomized to placebo experienced a primary composite endpoint. Five hundred eighty-six (31.1%) primary endpoint events used in the ITT analysis occurred after the end of the lag-censoring period and were excluded from the lag-censoring analysis. The relative hazard was 0.85 (95% CI: 0.76, 0.95) (Table IV). The median (Q1, Q3) observation time to primary composite endpoint or censor date was 21 (11, 40) months and 18 (10, 35) months in the cinacalcet and placebo groups, respectively (Table V). 3.3. Inverse probability of censoring weight In these analyses, 1207 (48.8%) patients randomized to cinacalcet and 1266 (51.2%) patients randomized to placebo discontinued study drug prior to experiencing a primary composite endpoint

Table III. Pooled logistic regression analysis of baseline predictors and time-varying confounders on remaining adherent to study drug for IPCW Model 2. Odds ratio

95% Wald Confidence limits

p-value

— 0.82 0.89 — —

< 0.123 0.314 0.859 < 0.001 < 0.001

Baseline variables: Treatment (cinacalcet vs placebo) Age > 65 years (Yes vs No) Male versus Female Race group (White, Black and Other) Region (the USA, Europe, Latin America, Russia, Australia and Canada) History of diabetes (Yes vs No) Baseline PTH categories .6 63.6, > 63.6  95.4, > 95.4  127.2, > 127.2 pmol/L/ Baseline serum calcium categories .6 2.00, > 2.00  2.25, > 2.25  2.50, > 2.50  2.75, > 2.75 mmol/L/ Baseline serum phosphorus categories .6 1.60, > 1.60  2.24, > 2.24 < 2.88, > 2.88 mmol/L/

0.93 0.99

0.89

0.78 —

1.07 1.11

1.01

0.074 0.026



0.171



0.840

Time-dependent variables: Hypocalcemia adverse event (Yes vs No) PTH categories .6 63.6, > 63.6  95.4, > 95.4  127.2, > 127.2 pmol/L/ Serum calcium categories .6 2.00, > 2.00  2.25, > 2.25  2.50, > 2.50  2.75, > 2.75 mmol/L/ Serum phosphorus categories .6 1.60, > 1.60  2.24, > 2.24 < 2.88, > 2.88 mmol/L/ Treatment*PTH categories Treatment*Serum calcium categories Treatment*Serum phosphorus categories

0.92

0.52 — —

1.62

0.764 < 0.001 < 0.001



0.065

— — —

0.113 0.214 0.007

Pharmaceut. Statist. 2015, 14 242–251

Copyright © 2015 John Wiley & Sons, Ltd.

247

Odds ratio (95% confidence intervals) are not presented for categorical variables with more than two levels. The restricted cubic spline function (daspline macro from Frank Harrell) [31] was used for the continuous interval variable. Numbers are not presented in the table. IPCW, inverse probability of censoring weights. PTH = intact plasma parathyroid hormone, 1 pg/mL = 0.106 pmol/L corrected serum calcium, 1 mg/dL = 0.25 mmol/L serum phosphorus, 1 mg/dL = 0.32 mmol/L

Y. Kubo et al. event or censor date; their data were censored at the time of study drug discontinuation. Four hundred forty-nine patients randomized to cinacalcet and 463 patients randomized to placebo experienced a primary composite endpoint in this analysis. As previously reported, using a simplified application of the IPCW method, the relative hazard from the weighted Cox regression model was 0.77 (95% CI: 0.66, 0.88) [2] In the current analysis, we expanded upon this approach by incorporating interaction terms and additional baseline and time-dependent serum calcium and serum phosphorus laboratory parameters. Table II provides the results from the pooled logistic regression analysis of baseline predictors on remaining adherent to study drug, and Table III provides the analysis of baseline predictors together with time-varying confounders. A boxplot of the log (cumulative weight) at each interval is presented in Figure 1. IPCW down-weights and up-weights subjects’ data to create a pseudo population that would have been observed had subjects’ remained on randomized treatment. No large differences in weights between treatment groups were observed. This suggests that subjects in both treatment groups discontinued study drug in a similar fashion. In the weighted pooled logistic regression model incorporating intra-subject variability by using the weights as a time-varying variable, the odds ratio was 0.81 (95% CI: 0.70, 0.92) (Table IV). When the analysis was repeated using a time-varying weighted Cox regression model, the relative hazards were almost identical. The median (Q1, Q3) observation time to

primary composite endpoint or censor date was 18 (7, 38) and 14 (6, 31) months in the cinacalcet and placebo groups, respectively (Table V). When other values in addition to 1 were applied to impute missing weights, the impact to the effect estimator was small as expected because of the small proportion of missing weight values (5.6%). These results are presented in Supplementary Table I. 3.4. Rank preserving structural failure time model Using the RPSFTM method, the relative hazard was 0.85 (95% CI: 0.66, 1.04) (Table IV). A plot of the ˜’s assessed and the Wald Chi-square test statistic can be found in Supplementary Figure 1. Because the main principle is to equalize the survival distributions, the number of primary composite events and counterfactual failure times were similar between the two treatment groups and are not reported. 3.5. Iterative parameter estimation Using full recensoring in the IPE method, the relative hazard for the primary composite endpoint was 0.85 (95% CI: 0.75, 0.96) (Table IV). The median (Q1, Q3) counterfactual failure time to primary composite endpoint event or censor date was 48 (21, 55) and 41 (16, 47) months in the cinacalcet and placebo groups, respectively (Table V).

Log(cumulative weight)

3 2 1 0 -1 -2 -3 1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20 21 22

Interval Figure 1. Boxplot of log (cumulative weight) by interval. Crosses represent the mean value; horizontal lines represent the 25%, 50% and 75% percentiles. Whiskers represent minimum and maximum values. The time interval splits are based on the period used in the calculation of the weights. These intervals correspond to the protocol-specified visits: intervals 1-5 correspond to 4-week visits during the first 20 weeks of the study (i.e., titration phase) and then 16-week visits thereafter (i.e., maintenance phase).

Table IV. Relative hazard (cinacalcet vs placebo) and 95% confidence intervals in EVOLVE using different statistical methods. Analysis ITT Lag censoring Inverse probability of censoring weights* Rank preserving structural failure time model Iterative parameter estimation**

HR (95% CI) 0.93 (0.85, 1.02) 0.85 (0.76, 0.95) 0.81 (0.70, 0.92) 0.85 (0.66, 1.04) 0.85 (0.75, 0.96)

248

EVOLVE, Evaluation of Cinacalcet Therapy to Lower Cardiovascular Events; ITT, intention-to-treat. *Odds ratio and 95% confidence intervals (CI) from the final pooled logistic regression model. **95% CI were derived using bootstrapping with 1000 iterations.

Copyright © 2015 John Wiley & Sons, Ltd.

Pharmaceut. Statist. 2015, 14 242–251

Y. Kubo et al.

Table V. Median (Q1, Q3) time to primary composite endpoint event or censor date (months). Analysis ITT* Lag censoring Inverse probability of censoring weights Iterative parameter estimation*

Cinacalcet .N D 1948/

Placebo .N D 1935/

48 (21, 55) 21 (11, 40) 18 (7, 38) 48 (21, 55)

46 (17, 55) 18 (10, 35) 14 (6, 31) 41 (16, 47)

In RPSFTM, the number of primary composite events and counterfactual failure times were similar between the two treatment groups due to the equalization of the survival distributions and are not reported. ITT, intention-to-treat. *The transformed time in the cinacalcet group is the same as the observed time, due to patients discontinuing study drug close to the event/censor date and/or discontinuing study close to the study termination date.

4. DISCUSSION

Pharmaceut. Statist. 2015, 14 242–251

Copyright © 2015 John Wiley & Sons, Ltd.

249

The unadjusted ITT analysis in EVOLVE identified no significant treatment effect with cinacalcet compared with placebo on mortality and cardiovascular morbidity [2] As expected, the treatment estimates from a series of comprehensive pre-specified analyses using various statistical methodologies to adjust for non-adherence differed from the ITT results. The estimates of the ‘on-treatment’ benefit from these analyses were more favorable to cinacalcet. The many advantages of the ITT method have been discussed extensively in publications and regulatory guidelines (e.g. International Conference on Harmonisation of Technical Requirements) [24]. ITT is a valid and universally accepted analytic approach for testing two treatment strategies as it provides an unbiased estimate of the effect of randomization. The ITT results are considered to reflect the effect of the assigned treatment in the ‘real world’ (i.e. patients may have low adherence or discontinue study drug due to an adverse event). The ITT results reflect an average treatment effect in the study population, regardless of adherence to randomized treatment. In clinical practice where physicians treat patients in a highly individualized manner, knowledge of the on-treatment effect is important for clinical decision-making. Furthermore, many international healthcare authorities may require the on-treatment effect to support reimbursement strategies. Examples where on-treatment effect may be required include when on-treatment estimates can be used in a persistency improvement program. The on-treatment estimates can be used to weigh potential investments in improving adherence against potential offsets due to improved effectiveness. Another scenario is the risk-sharing agreement strategies where sponsors and payors agree to pay parts of the therapeutic trajectory. For example, the payor may agree to pay for benefits while patients are on treatment for the first year based on expected discontinuation rates (i.e. price could be based on on-treatment estimates), while sponsors would pay thereafter. Despite understanding the nature of the treated population (patients receiving hemodialysis who are chronically ill, frequently develop adverse events and are often prescribed 20 or more pills per day), and the known side effects of cinacalcet (nausea, vomiting and hypocalcemia), non-adherence was higher than expected. The discontinuation rates were 2–3 times higher than in other large, long-term cardiovascular outcomes studies of comparable sample size, in which study drug discontinuations rates ranged between 20% and 30% [25]. In the group randomized to

cinacalcet, 16% stopped study drug due to adverse events, 22% for protocol specified reasons and 21% for non-protocol specified reasons. As previously reported, the incidence of adverse events was higher in patients randomized to cinacalcet that was expected given the gastrointestinal side effects of the drug. Adherence in patients randomized to placebo was also diminished by treatment of sHPT using surgical interventions (i.e. parathyroidectomy and kidney transplantation) and the use of off-protocol, commercially available cinacalcet. The ‘other administrative decision/subject request’ reason for study drug discontinuation was also higher in the placebo group. However, this was a ‘catch-all’ category for reasons that were not determinable. Because the reasons for study drug discontinuation were recorded by study personnel and were not adjudicated, the distribution of the reasons should be interpreted with caution. As non-adherence was anticipated, we planned several sensitivity analyses that might account for reduced adherence in both treatment groups to estimate the on-treatment effect of cinacalcet. We performed lag-censoring analysis to account for the markedly shorter time on study drug compared with study follow-up in both treatment groups. This method assumes that discontinuation of study drug is independent of the outcome. However, this assumption is rarely correct in practice, because patients who discontinue treatment may have different prognostic characteristics and be more or less likely to experience an outcome than those who remain on treatment (i.e. informative censoring) [25]. If informative censoring occurs, then the lag-censoring results are biased. Although the limitations of censoring analyses have been discussed, these analyses are widely used as exploratory analyses when reporting trial results [26]. Inverse probability of censoring weight has been used in many large, long-term clinical trials to account for non-adherence to therapy (e.g. AIDS Clinical Trial Group 320 [27], the Management of Elevated Cholesterol in the Primary Prevention Group of Adult Japanese [28] study) and is accepted by many healthcare agencies (e.g. National Institute for Health and Care Excellence [NICE]) as part of the supportive evidence for the reimbursement assessment along with other methods. Examples include the submission to NICE for Everolimus for the second-line treatment of advanced renal cell carcinoma and Pazopanib for the first-line treatment of advanced renal cell carcinoma, where both IPCW and RPSFTM methods were used in the assessment [29]. A limitation of the IPCW method is that it assumes data are missing at random and that there are no unmeasured confounders. In reality, all confounders may not be known or simply cannot be measured

Y. Kubo et al. and if a key confounder is not included in the model, the results derived from IPCW may be biased [30]. Also, IPCW is computationally intensive, especially for large trials of long duration. The calculation of the weight can be challenging because it requires partitioning the data into appropriate time intervals. The width of the time intervals should be selected to minimize the amount of missing data to ensure the probability of outcomes is small and short enough to ensure that confounders are predictive of exposure to study drug and outcomes at the end of the interval. These requirements are particularly important so that the pooled logistic regression model can be better approximated to the time-varying Cox regression model [15]. In our analysis, the time intervals are wider during the later part of the study due to the assessment schedule, which could potentially introduce some bias. Also, the weighted pooled logistic regression model does not account for the randomness of the weights, and this will tend to bias the variance of the estimate downwards. Moreover, the choices made in defining the model and selection of the time intervals may affect whether convergence is achieved. We explored other methodologies including RPSFTM and IPE to further evaluate the consistency of the treatment effect estimate. Although these methods are easier to implement than IPCW, they impose a stronger assumption on the survival time by using parametric accelerated failure time models. They do not account for informative censoring because they assume that the treatment effect on the survival time would be the same for adherent and non-adherent patients. As anticipated, the results observed from the RPSFTM and IPE methods are comparable because the IPE method is a variant of RPSFTM. When assessing an on-treatment effect using an AFT model, it may be sufficient to apply one of these methods because they are quite similar. We did not implement special handling for patients who stopped being followed for outcomes (either due to loss to follow-up or withdrawal of consent) prior to the administrative censoring date and for whom the survival status is unknown, because the dropout rates were small and similar between the two groups, 2.2% in patients randomized to placebo and 1.9% in patients randomized to cinacalcet. It is worth noting that both ITT and censoring analysis methods can be easily implemented using common statistical software. For other methods including IPCW, RPSFTM and IPE methods, extensive programming may be needed depending on software availability.

5. CONCLUSION

250

The pre-specified analyses performed in the EVOLVE trial, while not providing definitive evidence, suggest that cinacalcet may have an effect while subjects are receiving treatment. Although the ITT method provides unbiased estimates of the treatment effect based on randomization, additional analyses should be considered in long-term trials with substantial non-adherence to randomized treatment to assess the impact of non-adherence on the estimate from the ITT analysis. Such analyses are particularly useful in assessing the biologic effect of the study drug, which can aid evaluation of clinical trial results.

Copyright © 2015 John Wiley & Sons, Ltd.

Acknowledgements The authors would like to thank Bill Goodman, Caroline Mattin, Thy Do, Steve Snapinn, Arline Nakanishi and the members of the EVOLVE Executive Committee for their helpful comments. We would also like to thank the EVOLVE investigators for their participation in the EVOLVE trial. Contract Grant Sponsor: Amgen, Inc. ClinicalTrials.gov number, NCT00345839

REFERENCES [1] Sheiner LB, Rubin DB. Intention-to-treat analysis and the goals of clinical trials. Clinical Pharmacology and Therapeutics 1995; 57:6–15. [2] Chertow GM, Block GA, Correa-Rotter R, Drueke TB, Floege J, Goodman WG, Herzog CA, Kubo Y, London GM, Mahaffey KW, Mix TC, Moe SM, Trotman ML, Wheeler DC, Parfrey PS. Effect of cinacalcet on cardiovascular disease in patients undergoing dialysis. The New England Journal of Medicine 2012; 367:2482–94. [3] Chertow GM, Pupim LB, Block GA, Correa-Rotter R, Drueke TB, Floege J, Goodman WG, London GM, Mahaffey KW, Moe SM, Wheeler DC, Albizem M, Olson K, Klassen P, Parfrey P. Evaluation of Cinacalcet Therapy to Lower Cardiovascular Events (EVOLVE): rationale and design overview. Clinical Journal of the American Society of Nephrology : CJASN 2007; 2:898–905. [4] Chertow GM, Correa-Rotter R, Block GA, Drueke TB, Floege J, Goodman WG, Herzog CA, Kubo Y, London GM, Mahaffey KW, Mix TC, Moe SM, Wheeler DC, Parfrey PS. Baseline characteristics of subjects enrolled in the Evaluation of Cinacalcet HCl Therapy to Lower Cardiovascular Events (EVOLVE) trial. Nephrology, Dialysis, Transplantation : Official Publication of the European Dialysis and Transplant Association - European Renal Association 2012; 27:2872–9. [5] Schulz KF, Altman DG, Moher D. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ; 2010:340. [6] Gupta SK. Intention-to-treat concept: a review. Perspectives in Clinical Research 2011; 2:109–12. [7] Block GA, Spiegel DM, Ehrlich J, Mehta R, Lindbergh J, Dreisbach A, Raggi P. Effects of sevelamer and calcium on coronary artery calcification in patients new to hemodialysis. Kidney International 2005; 68:1815–24. [8] Chertow GM, Burke SK, Raggi P. Sevelamer attenuates the progression of coronary and aortic calcification in hemodialysis patients. Kidney International 2002; 62:245–52. [9] Raggi P, Chertow GM, Torres PU, Csiky B, Naso A, Nossuli K, Moustafa M, Goodman WG, Lopez N, Downey G, Dehmel B, Floege J; ADVANCE Study Group. The ADVANCE study: a randomized study to evaluate the effects of cinacalcet plus low-dose vitamin D on vascular calcification in patients on hemodialysis. Nephrology, Dialysis, Transplantation : Official Publication of the European Dialysis and Transplant Association - European Renal Association 2011; 26:1327–39. [10] Robins JM, Finkelstein DM. Correcting for noncompliance and dependent censoring in an AIDS Clinical Trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics 2000; 56:779–88. [11] Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology (Cambridge, Mass) 2000; 11:561–70. [12] Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology (Cambridge, Mass) 2000; 11:550–60. [13] Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Sturmer T. Variable selection for propensity score models. American Journal of Epidemiology 2006; 163:1149–56. [14] Howe CJ, Cole SR, Chmiel JS, Munoz A. Limitation of inverse probability-of-censoring weights in estimating survival in the presence of strong selection bias. American Journal of Epidemiology 2011; 173:569–77.

Pharmaceut. Statist. 2015, 14 242–251

Y. Kubo et al. [15] D’Agostino RB, Lee ML, Belanger AJ, Cupples LA, Anderson K, Kannel WB. Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham Heart Study. Statistics in Medicine 1990; 9:1501–15. [16] Robins JM, Tsiatis AA. Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics-Theory and Methods 1991; 20:2609–31. [17] Kleinbaum DG. Survival Analysis, a Self-Learning Text. Biometrical Journal 1998; 40:107–8. [18] Morden JP, Lambert PC, Latimer N, Abrams KR, Wailoo AJ. Assessing methods for dealing with treatment switching in randomised controlled trials: a simulation study. BMC Medical Research Methodology 2011; 11:4. [19] White IR, Babiker AG, Walker S, Darbyshire JH. Randomization-based methods for correcting for treatment changes: examples from the Concorde trial. Statistics in Medicine 1999; 18:2617–34. [20] Greenland S, Lanes S, Jara M. Estimating effects from randomized trials with discontinuations: the need for intent-to-treat design and G-estimation. Clinical Trials 2008; 5:5–13. [21] Branson M, Whitehead J. Estimating a treatment effect in survival studies in which patients switch treatment. Statistics in Medicine 2002; 21:2449–63. [22] Allison PD. Survival Analysis Using SASr : A Practical Guide, Second Edition. SAS Institute Inc.: Cary, NC, 2010. [23] White I. Estimating treatment effects in randomized trials with treatment switching [Letter to the Editor]. Statistics in Medicine 2006; 25:1619–22. [24] International conference on harmonisation, guideline for statistical principles for clinical trials, 1998.

[25] Snapinn SM, Jiang Q, Iglewicz B. Informative noncompliance in endpoint trials. Current Controlled Trials in Cardiovascular Medicine 2004; 5:5. [26] Lee YJ, Ellenberg JH, Hirtz DG, Nelson KB. Analysis of clinical trials by treatment actually received: is it really an option? Statistics in Medicine 1991; 10:1595–605. [27] Cain LE, Cole SR. Inverse probability-of-censoring weights for the correction of time-varying noncompliance in the effect of randomized highly active antiretroviral therapy on incident AIDS or death. Statistics in Medicine 2009; 28:1725–38. [28] Yoshida M, Matsuyama Y, Ohashi Y. Estimation of treatment effect adjusting for dependent censoring using the IPCW method: an application to a large primary prevention study for coronary events (MEGA study). Clinical Trials (London, England) 2007; 4:318–28. [29] NICE. Everolimus for the second-line treatment of advanced renal cell carcinoma, 2011. Available from: https://www.nice.org. uk/guidance/ta219/chapter/3-the-manufacturers-submission (accessed 2014 Dec 11). [30] Cole SR, Hernan MA. Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology 2008; 168:656–64. [31] Harrell F. Macro procedure daspline, 2013. Available from: http:// www.public.iastate.edu/~kkoehler/stat565/sas/daspline.macro.sas (1986, accessed 2013 Nov 05).

SUPPORTING INFORMATION Additional supporting information may be found in the online version of this article at the publisher’s web-site.

251

Pharmaceut. Statist. 2015, 14 242–251

Copyright © 2015 John Wiley & Sons, Ltd.

Assessing the treatment effect in a randomized controlled trial with extensive non-adherence: the EVOLVE trial.

Intention-to-treat (ITT) analysis is widely used to establish efficacy in randomized clinical trials. However, in a long-term outcomes study where non...
255KB Sizes 0 Downloads 27 Views