BMJ 2014;349:g7513 doi: 10.1136/bmj.g7513 (Published 5 December 2014)

Page 1 of 2

Endgames

ENDGAMES STATISTICAL QUESTION

Randomised controlled trials: subgroup analyses Philip Sedgwick reader in medical statistics and medical education Institute for Medical and Biomedical Education, St George’s, University of London, London, UK

The effectiveness of wound edge protection devices in reducing surgical site infection after abdominal surgery was investigated. A randomised controlled trial was undertaken. The intervention was standard intraoperative care plus use of a wound edge protection device during the intra-abdominal part of the operation. The control was standard intraoperative care alone. The sample size for the trial was based on having 80% power to detect a 50% reduction in the infection rate, assuming a rate of 12% under standard care. A total sample size of 710 participants (355 in each arm) was needed. It was necessary to recruit 750 patients to the trial to allow for an estimated 5% dropout rate. In total, 760 patients undergoing laparotomy were recruited; 382 patients were allocated to the intervention group and 378 to the control group.1 The primary outcome was surgical site infection within 30 days of surgery, as assessed by clinicians who were blind to the patients’ device group allocation. Secondary outcomes included quality of life and duration of hospital stay. Subgroup analyses that compared surgical site infection within 30 days of surgery between the device groups were performed, with the aim of establishing whether the intervention was beneficial in particular groups of patients. The analyses incorporated 15 characteristics of the patients and the operations.

Six patients in the intervention group and five in the control group did not undergo laparotomy. Furthermore, seven patients in each group were lost to follow-up. Although the proportion of patients with surgical site infection was lower in the intervention group, the difference was not significant (24.7% (n=91) v 25.4% (n=93); odds ratio 0.97, 95% confidence interval 0.69 to 1.36; P=0.85). No significant difference was seen between the groups in the secondary outcomes (quality of life and duration of hospital stay). The subgroup analysis did not identify any group of patients for which there was evidence that the wound edge protection device was associated with clinical benefit. It was concluded that wound edge protection devices do not reduce the rate of surgical site infection in patients undergoing laparotomy and that their routine use for this role cannot be recommended. Which of the following statements, if any, are true?

a) The subgroup analyses that compared the device groups in proportion of patients with surgical site infection were prone to confounding b) Before undertaking the subgroup analyses, the comparison of device groups would have had statistical power of at least 80% to demonstrate the smallest effect of clinical interest c) The subgroup analyses were prone to type I errors

Answers

Statements a and c are true, whereas b is false.

The aim of the trial was to determine the effectiveness of wound edge protection devices in reducing surgical site infection after abdominal surgery. No significant difference was seen between the intervention group and control group in the primary outcome of surgical site infection within 30 days of surgery. However, the participants would have been heterogeneous with respect to their characteristics, and there may have been differences between the operations in how they were performed. The researchers therefore undertook subgroup analyses for the outcome of surgical site infection within 30 days of surgery, incorporating 15 characteristics of the patients and the operations. For example, the effectiveness of the intervention was investigated separately in subgroups that were based on age (≤65 years; >65 years), diabetes status (absent; present), and type of surgery (elective; non-elective). The purpose of the subgroup analyses was to establish whether the intervention worked significantly better (or worse) for a particular subgroup of patients, but not others. Such subgroup analyses might help target clinical recommendations to particular patient subgroups and therefore improve patient care. However, the results of subgroup analyses in trials are typically misleading. The trial participants were allocated to a device group using simple random allocation. The aim of simple random allocation was to achieve two groups similar in baseline characteristics, thereby minimising confounding.2 Confounding would have resulted if there had been differences between groups in baseline characteristics that influence the device and outcome measures. These factors include demographic characteristics, prognostic factors, and other characteristics that may influence someone to participate in or withdraw from a trial. Therefore, if

[email protected] For personal use only: See rights and reprints http://www.bmj.com/permissions

Subscribe: http://www.bmj.com/subscribe

BMJ 2014;349:g7513 doi: 10.1136/bmj.g7513 (Published 5 December 2014)

Page 2 of 2

ENDGAMES

confounding is minimised, any differences between device groups in outcomes at the end of the trial will tend to be due to differences in devices and not differences in baseline characteristics. Random allocation achieves similarity between groups in baseline characteristics only if the sample size is large enough. The numbers of participants within a subgroup would have been smaller than the overall sample size, and sometimes relatively small. For example, for the subgroup of patients without diabetes, 309 were allocated to the intervention and 316 to the control. However, in the subgroup of patients with diabetes, 60 were allocated to the intervention and 50 to the control. If the number of participants in a subgroup is relatively small, random allocation does not guarantee that equal numbers of participants will have been allocated to the intervention and control groups. If the device groups are not similar in group size, then baseline characteristics are unlikely to be comparable. Therefore, subgroup analyses are prone to confounding (a is true). Although a multiple regression analysis could have adjusted for potential confounding, this would have been possible only for those variables that were measured in the study. As such, not all confounding factors would have been measured, if they could be measured at all. In the study above, sample size was based on having 80% power to detect a 50% reduction in the infection rate, assuming a 12% rate in the control group, using a two sided hypothesis test and critical level of significance of 0.05. The 50% reduction in the infection rate is called the smallest effect of clinical interest. Statistical power is the probability of demonstrating the smallest effect of clinical interest if it exists in the population. A total sample size of 710 participants (355 in each arm) was required. To allow for an estimated 5% dropout rate it was necessary to recruit 750 patients to the trial. Sample size and power in clinical trials have been described in a previous question.3 In the above study, 760 patients were recruited, of whom 11 did not undergo laparotomy and 14 were lost to follow-up. Therefore, the trial had sufficient power to detect the smallest effect of clinical interest. However, the subgroup analyses would have lacked sufficient power to demonstrate the smallest effect of clinical interest within a subgroup. The number of participants in each subgroup would have been smaller than the overall number that needed to be recruited to the trial to demonstrate the smallest effect of clinical interest. Therefore, within each subgroup the statistical comparison of the device groups would have had less than 80% power to demonstrate the smallest effect of clinical interest (b is false). Although the researchers reported that no significant difference in effectiveness between the intervention and control groups was seen in any of the subgroups, it cannot be inferred that none existed in the population. It is most likely that the subgroup analyses were underpowered to demonstrate the smallest effect of interest. If the difference between the device groups in infection rates within a subgroup was significant despite the comparison lacking statistical power, it would most likely have been a type I error (c is true). A type I error would have occurred if the null hypothesis for the comparison of devices in infection rates for a subgroup was rejected in favour of the alternative hypothesis

For personal use only: See rights and reprints http://www.bmj.com/permissions

when in fact no difference existed in the population.4 As described in a previous question, the probability of a type I error occurring when performing a single statistical test is 0.05 (5%).5 The probability of a type I error occurring increases as the number of tests increases. In the above study, a total of 41 hypothesis tests were performed as a result of the subgroup analyses, and it can be shown that the probability of a type I error occurring was about 0.88 (88%). Subgroup analyses were performed to investigate the effectiveness of wound edge protection devices in subgroups of patients, with the aim of establishing whether the intervention worked better (or worse) for a particular subgroup. For example, the effectiveness of the intervention was analysed separately for each age group (≤65 years; >65 years), with a separate significance test in each subgroup. However, as described, such an analysis can be misleading. It would have been inappropriate to have concluded that a subgroup effect existed if the effect of the intervention was significant in one age subgroup but not the other. It would have been more appropriate to have undertaken a significance test for a potential interaction between the device group and the age of the patient.6 If an interaction had existed, the effect of the intervention would have differed between the age subgroups. The test for an interaction would have been less likely than subgroup analyses to produce misleading results. As described, caution is needed when interpreting apparent treatment or device differences on the basis of subgroup analyses in clinical trials. It has been suggested that certain criteria should be met when trying to establish whether results are credible. These recommendations include a requirement for the subgroup analyses to have been planned a priori—that is, before the trial started. Such an approach prevents the process of “data dredging,” also known as a “fishing expedition,” whereby subgroup analyses are made within a dataset that were not specified before the start of the study, thereby minimising spurious significant results occurring as a result of type I errors. Obviously, any differences in the primary outcome within subgroups should be clinically important. Furthermore, if the treatment or device effect is consistent across studies the results will have greater credibility. Spurious results from subgroup analyses can also occur in the analysis of cohort studies, not just clinical trials. This scenario will be described in a future question. Competing interests: None declared. 1

2 3 4 5 6

Pinkney TD, Calvert M, Bartlett DC, Gheorghe A, Redman V, Dowswell G, et al; on behalf of the West Midlands Research Collaborative and the ROSSINI Trial Investigators. Impact of wound edge protection devices on surgical site infection after laparotomy: multicentre randomised controlled trial (ROSSINI Trial). BMJ 2013;347:f4305. Sedgwick P. Why randomise in clinical trials? BMJ 2012;345:e5584. Sedgwick P. Sample size: how many participants are needed in a trial? BMJ 2013;346:f1041. Sedgwick P. Pitfalls of statistical hypothesis testing: type I and type II errors. BMJ 2014;349:g4287. Sedgwick P. Pitfalls of statistical hypothesis testing: multiple testing. BMJ 2014;349:g5310. Sedgwick P. Randomised controlled trials: tests of interaction. BMJ 2014;349:g6820.

Cite this as: BMJ 2014;349:g7513 © BMJ Publishing Group Ltd 2014

Subscribe: http://www.bmj.com/subscribe

Randomised controlled trials: subgroup analyses.

Randomised controlled trials: subgroup analyses. - PDF Download Free
504KB Sizes 0 Downloads 6 Views