Confidence Intervals Assess Both Clinical Significance and Statistical Significance Investigators sometimes base their claim of superiority for a new treatment over older treatments on statistical significance (P values) without stressing whether there is a real clinical advantage. Not all statistically significant differences are clinically significant (1-3). Fortunately, confidence intervals can address both clinical and statistical significance (1). Annals (4-6) and other journals (1, 7-11) recommend using confidence intervals in reporting the main results of studies. In this editorial, I use hypothetical examples to illustrate point estimates and confidence intervals of the difference between the percentages of patients responding to two treatments for a cancer. These examples show how confidence intervals can help assess the clinical and statistical significance of such differences. In this issue of Annals, Epstein and colleagues (12) provide another example of this approach. Example 1. Suppose, in a multicenter clinical trial, 480 of 800 patients (60%) responded to a new cancer treatment and 416 of 800 different patients (52%) responded to the standard treatment. An ''observed significance lever' or P value assesses the existence of a real difference between the percentages responding to these two treatments among all the cancer patients (called the population) (13). The chi-square test comparing 480/800 (60%) with 416/800 (52%) yields P = 0.001 (14, 15). The P value of 0.001 is the probability of obtaining by chance, the observed 8-percentage-point difference between the study samples (60% minus 52%) or an even larger difference when the hypothesis is that the population of interest really would show no difference between the percentages responding if tested (13). Because 0.001 is less than the conventional 0.05 threshold, the percentages responding are statistically distinguishable and their difference is termed "statistically significant" (13). Although the small P value suggests a difference between the groups, it does not measure the size or importance of that difference. Two statistical measures do estimate the size of the true difference in the target population: point estimates and confidence intervals. The point estimate is the actual difference observed, 8 percentage points. Although 8 percentage points is the best single-number estimate, it is unlikely to be exactly equal to the unknown true difference (between the percentages responding to the two treatments) among all such patients. A confidence interval provides a plausible range for the true value, given the

difference seen in the trial. A standard method gives the 95% confidence interval (95% CI) of the 8 percentage point difference as 3% to 13% (4, 7, 8) (Figure 1, example 1). About 95% of all such intervals would include the unknown true difference and 5% would not (13, 15). The following examples explain a single 95% CI as those values that are statistically indistinguishable from the observed sample difference. It is important to specify the minimum difference between treatment responses needed to conclude that a new treatment has a clinically important advantage (2, 3). Experienced clinicians weigh the side effects, longterm complications, and other costs against the benefits of the two treatments to judge the size of the smallest clinically important difference. Assume in my examples that the smallest clinically important difference is 15 percentage points. In example 1, because every value in the 95% CI, 3% to 13%, is less than 15%, the trial's difference of 8 percentage points is considered "not clinically significant" (although it is statistically significant) (Figure 1, example 1). Example 2. Suppose, in a phase II study, 15 of 25 patients (60%) responded to a new treatment and 13 of 25 (52%), to a standard treatment. The sample size is 1/32 of that in example 1, but the percentages responding are the same. The P value is 0.57 in this smaller study in contrast to P = 0.001 in the large trial. These disparate P values correspond to the same observed 8percentage-point difference, demonstrating that the P value provides no information about the size of the response difference. In the small study, the point estimate of the difference, 8 percentage points, and the 95% CI, - 19% to 35%, are statistical estimates of the size of the true but unknown population difference (4, 7, 8). This 95% CI extends from the new treatment's being 19 percentage points worse than the standard to 35 percentage points better; these values are all statistically indistinguishable from the observed difference of 8 percentage points (15). Although the point estimate is the same in both example 1 and example 2, the larger study enables us to estimate the true value more precisely. The greater width of the confidence interval in example 2 indicates the greater uncertainty in an estimate based on a smaller sample (Figure 1, examples 1 and 2) (5, 6, 16). Uncertainty is an inherent part of the statistical inference about the population of all the cancer patients (the whole) that was drawn from a comparison of the

15 March 1991 • Annals of Internal Medicine • Volume 114 • Number 6

Downloaded From: http://annals.org/pdfaccess.ashx?url=/data/journals/aim/19724/ by a University of California San Diego User on 03/26/2017

515

Figure 1. Statistical significance and clinical significance using 95% confidence intervals (95% CI) of differences between percentages of patients responding to two treatments.

two treatments in samples (the part) (1). This unavoidable uncertainty is called sampling variability (sampling error, random error, chance error) (1, 15-17). These confidence intervals address chance errors but not non-chance errors such as treatment selection bias (differing average pretreatment prognoses in the treatment groups) (17). In sum, confidence intervals provide information on both the magnitude of a difference estimate and the uncertainty of the estimate (16), which helps us assess both the statistical significance and clinical (practical) significance of the difference. Example 3. Suppose, in another phase II trial, 15 of 25 (60%) and 9 of 25 (36%) patients responded to the new and the standard treatment, respectively. The observed difference between the percentages responding to the new treatment (60%) and the standard treatment (36%) is 24 percentage points, and the 95% CI is - 3% to 51%. A close relation exists between a P value and the corresponding confidence interval. A difference that is not statistically significant (P > 0.05) is equivalent to a 95% CI of that difference which includes zero (7, 17). The 95% CI, - 3% to 51%, includes 0%; thus, the difference is not statistically significant (P > 0.05) (Figure 1, example 3). Because 0%, or no difference, is only one among many values inside the 95% CI, "no statistically significant difference'' does not imply "no true difference." In fact, the point estimate, 24%, is the best single-number estimate of the unknown population difference, and the further any value is from the observed difference (24%), the less plausible an estimate it is. How can we decide if that large difference (24%) represents a clinically significant advantage? A difference in outcome between two groups in a study is defined as "not clinically significant" when its entire 95% CI is below the smallest clinically important difference (Figure 1, example 1). An observed difference is clinically significant when its 95% CI is completely above the smallest clinically important difference (Figure 1, example 4) (18). When the 95% CI contains the smallest clinically important difference, as in examples 2 and 3 (Figure 1), no definite conclusion about clinical significance is possible. However, the point estimate and its confidence interval may still suggest trends. Because, in example 3, the point estimate (24%) and two thirds of its 95% CI exceed 15% (Figure 1), the 516

difference tends toward clinical significance. This new treatment should not be discarded prematurely by misinterpreting "no statistically significant difference'' (absence of evidence) as "no difference" (evidence of absence of effect) (2). Nevertheless, additional evidence is needed to reach a definite conclusion. In example 2, the point estimate and most (but not all) of the 95% CI are below the smallest clinically important difference (15%) (Figure 1). Because this difference tends toward being "not clinically significant," the new treatment is not promising in that inconclusive study. The next example shows how using a confidence interval lets readers with dissimilar views of the size of the smallest clinically important difference reach their own conclusions on the clinical significance of the difference. Example 4. Suppose, in another multicenter trial, 240 of 400 patients (60%) responded to the new treatment compared with 144 of 400 patients (36%) receiving the standard treatment. The observed difference is 24 percentage points and the 95% CI is 17% to 31%. Interpret the 95% CI for the difference as the set of values for the population difference that are statistically indistinguishable from the observed sample difference. Because a 0% difference is outside the 95% CI, 17% to 31%, the observed difference (24 percentage points) is statistically distinguishable from zero. Thus, the difference is statistically significant (P < 0.05) (Figure 1, example 4). More important, in this example we can assess clinical significance using the 95% CI. Because all of the 95% CI, 17% to 31%, is above the predetermined 15% (Figure 1), the difference is clinically significant. If, however, a reader decided that the smallest clinically important difference was not 15 percentage points, but 35 percentage points, the difference would be considered "not clinically significant" (but still statistically significant). In this way, confidence intervals enable readers to assess clinical significance using their own value for the smallest clinically important difference without having to rely on the author's interpretation (1). What are the roles of point estimates, confidence intervals, and P values in presenting study results? Although each of them uses sample data to infer properties of the population of interest, only the point estimate is a summary of the sample data. When certain statistical assumptions (for example, random sampling or

15 March 1991 • Annals of Internal Medicine • Volume 114 • Number 6

Downloaded From: http://annals.org/pdfaccess.ashx?url=/data/journals/aim/19724/ by a University of California San Diego User on 03/26/2017

randomization, samples sufficiently large) (1, 13, 15, 19, 20) are grossly violated, statistical inference from the study sample to the population of interest using either the P value or confidence interval is risky but still done. Readers should be aware of the limited accuracy of P values and confidence intervals in study samples not representative of the population of interest (13, 19, 20). Because the point estimate is the observed sample difference or ratio (17), the point estimate both summarizes the sample and infers the true value; it should always be reported. Confidence intervals should be used to assess the clinical significance as well as the statistical significance of the main study results. When space permits, presenting all the raw data for important results (for example, in a graph) is best; this is practical only for relatively small studies. In reporting results of statistical tests, exact P values are preferable to verbal statements of "statistical significance" (or P < 0.05) or of nonsignificance (P > 0.05) because they contain more information. To judge whether a new treatment is clinically advantageous, interpret statistical results using the smallest clinically important difference and carefully consider plausible alternative explanations for the observed difference. Leonard E. Brahman, PhD University of Pennsylvania Cancer Center Philadelphia, PA 19104 Acknowledgments: The author thanks Paul Rosenbaum, Al Schar, Eli Abrutyn, Howard Frumpkin, Gary Sorock, Delray Schultz, Jesse Berlin, Ataharul Islam, and Cynthia Little for their assistance. Requests for Reprints: Leonard E. Braitman, PhD, Biostatistics Unit, University of Pennsylvania Cancer Center, 312 NEB/S2 6020, 420 Service Drive, Philadelphia, PA 19104. Annals

of Internal

Medicine.

1991;114:515-517.

References 1. Berry G. Statistical significance and confidence intervals [Editorial]. Med J Austr. 1986;144:618-9. 2. Sackett DL, Haynes RB, Tugwell P. Clinical Epidemiology—A Basic Science for Clinical Medicine. Boston: Little, Brown & Co.; 1985. 3. Levitt SH, Boen J, Potish RA. Rebuttal to letter entitled "Clinical trials: statistical significance and clinical importance." Int J Radiation Oncology Biol Phys. 1981;7:1741-2. 4. Simon R. Confidence intervals for reporting results of clinical trials. Ann Intern Med. 1986;105:429-35. 5. Rothman KJ. Significance questing. Ann Intern Med. 1986; 105: 445-7. 6. Braitman LE. Confidence intervals extract clinically useful information from data [Editorial]. Ann Intern Med. 1988;108:296-8. 7. Gardner MJ, Altman DG, eds. Statistics with Confidence: Confidence Intervals and Statistical Guidelines. London: British Medical Journal; 1989. 8. Confidence Interval Analysis (CIA): Microcomputer Program Manual and Disk. London: British Medical Journal; 1989. 9. Bulpitt CJ. Confidence intervals. Lancet. 1987;1:494-7. 10. Rothman KJ, Yankauer A. Confidence intervals vs significance tests: quantitative interpretations [Editor's note]. Am J Public Health. 1986;76:587-8. 11. Pocock SJ, Hughes MD. Estimation issues in clinical trials and overviews. Statistics in Medicine. 1990;9:657-71. 12. Epstein WV, Henke CJ, Yelin EH, Katz PP. Effect of parenterally administered gold therapy on the course of adult rheumatoid arthritis. Ann Intern Med. 1991;114:437-44. 13. Freedman D, Pisano R, Purves R. Statistics. New York: Norton; 1978. 14. Armitage P, Berry G. Statistical Methods in Medical Research. 2d ed. Boston: Blackwell Scientific Publications; 1987. 15. Wonnacott RJ, Wonnacott TH. Introductory Statistics. 4th ed. New York: John Wiley & Sons; 1985. 16. Rothman KJ. Modern Epidemiology. Boston: Little, Brown & Co.; 1986. 17. Fletcher RH, Fletcher SW, Wagner EH. Clinical Epidemiology: The Essentials. 2d ed. Baltimore: Williams & Wilkins; 1988. 18. Anscombe FJ. The summarizing of clinical experiments by significance levels. Statistics in Medicine. 1990;9:707. 19. Shott S. Statistics for Health Professionals. Philadelphia: W.B. Saunders Company; 1990:73-4. 20. Box GE, Hunter WG, Hunter JS. Statistics for Experimenters: An Introduction to Design, Data Analysis and Model Building. New York: Wiley; 1978.

© 1991 American College of Physicians

15 M a r c h 1991 • Annals

of Internal

Medicine

• Volume 114 • Number 6

Downloaded From: http://annals.org/pdfaccess.ashx?url=/data/journals/aim/19724/ by a University of California San Diego User on 03/26/2017

517

Confidence intervals assess both clinical significance and statistical significance.

Confidence Intervals Assess Both Clinical Significance and Statistical Significance Investigators sometimes base their claim of superiority for a new...
524KB Sizes 0 Downloads 0 Views