STATISTICS IN MEDICINE, VOL. 9, 363-367 (1990)

YATES’S CORRECTION FOR CONTINUITY AND THE ANALYSIS OF 2 x 2 CONTINGENCY TABLES MARK G. HAVILAND Department of Psychiutry. Lorna Linda University School of Medicine, Loma Linda, CA 92350, U . S . A

SUMMARY Despite recommendations to the contrary, medical researchers still routinely use the Yates-corrected chisquare statistic in analyses of 2 x 2 contingency tables. Research has shown that these ‘corrected’statistics are overly conservative and that the conventional Pearson chi-square generally provides adequate control over type I error probabilities. This paper makes a straightforward argument against use of Yates’s correction for continuity and Fisher’s exact probability test.

That statistical errors occur in medical journals is well documented.’ Godfrey, for instance, points out specific statistical errors in the New England Journal of Medicine.293However, it is clear that journal editors, clinician-investigators, and statisticians have taken this problem seriously. For example, in psychiatry, representatives from these three groups met at a conference to discuss methodological issues associated with psychiatric research. Both Kraemer et al.’s report of this conference4 and Moses and Louis’s related article’ offer clinician-investigators and statisticians useful suggestions on how to work with each other productively. All confront the problem directly, and their responses are uniformly constructive, not damning. In that same spirit, I should like to offer a critical comment on the continuing use of Yates’s correction for continuity6 with the chi-square statistic (x2).To analyse frequency data arranged in a 2 x 2 contingency table, most older statistics textbooks and current standard reference works (such as Dixon and Massey’ and Fleiss’) recommend modifying Pearson’s conventional chisquare statistic, x 2 = C [ ( f , -fe)2/fe], with Yates’s correction for continuity. In practice, investigators typically ‘correct’ their test statistics only when they encounter small expected cell frequencies in a 2 x 2 table, even though prominent correction proponents (such as Mantel and Greenhouse’) argue in favour of its general use. Statisticians (for example Camilli and Hopkins”) have now shown, however, that the routine use of chi-square statistics modified by Yates’s method decreases the accuracy of the resulting probability statements for most experimental situations likely to arise in actual practice. In spite of the number of impressive papers that discourage the Yates correction, still it appears often in the medical literature. Given the prevalence of contingency table analysis in medical research,’ it is important that clinicianinvestigators and their advisers understand why Yates’s method is inappropriate. (I will show, too, that Fisher’s exact probability is often used inappropriately as well.) The argument is straightforward. Actual chi-square statistics calculated from discrete frequency data take on only discrete values, which produce irregular stepwise distributions; theoretical chi-square distributions, on the 0277-671 5/90/040363-05$05.00 0 1990 by John Wiley & Sons, Ltd.

Received July 1988 Revised March 1989

364

M.G. HAVILAND

other hand, are smooth and continuous. As a consequence, substantial discrepancies may exist between the empirical chi-square step function and the continuous theoretical distribution against which probabilities are e v a l ~ a t e d . ’ ~Yates’s * ’ ~ correction for continuity, an adjustment for this discrepancy, involves reducing the differences between observed and expected frequencies by 0.5. Specifically, observed frequencies greater than expected under the null hypothesis are reduced by 0.5; those that are less than expected are increased by 0.5:

The ‘correction’ reduces the magnitude of the calculated chi-square, rendering it less likely to be found significant. For purposes of this discussion,it is important to recognize differences in the constraints placed on marginal frequenciesof a 2 x 2 table by three different research paradigms. First, for example, one may sample individuals randomly from a specified population and record their status on two binary attributes. At issue is the independence of the two discrete variables. Both marginal frequencies in the resulting 2 x 2 table are subject to sampling variability, which in turn affects the sampling distribution of derived statistics. Second, a different situation arises in the use of chisquare to test homogeneity of binary response probabilities in two treatment groups of fixed sample size. The marginal frequencies for the binary response variable are subject to sampling variability, but the marginal frequencies for the treatment groups are not because the experimenter has fixed the sample sizes. The third situation involves experimenter control over both sets of marginal frequencies in the 2 x 2 table. Although this situation does not occur often in practice, an example might be the measurement of two interval-scaled attributes, each of which one then divides at its median value to form a 2 x 2 table with fixed marginal frequencies (see also Kendall and Stuarti6 and Barnard17*i8). The reason for distinguishing these three paradigms is that Yates’s correction for continuity was developed to approximate, with a chi-square statistic, the exact probabilities of different frequency patterns in 2 x 2 tables in which both sets of marginal frequencies are fixed-the situation least often encountered in practice! The frequency patterns in such tables conform to a hypergeometric distribution for which Fisher’s exact probability test is the exact solution. It is important here to understand the sense in which Fisher’s test is ‘exact’. It is exact hypergeometric probability, not an exact solution for evaluating independence or homogeneity in 2 x 2 tables with marginal frequencies that are free to vary. Yates’s correction for continuity was developed to provide a chi-square approximation to Fisher’s exact hypergeometric probability; hence, both Yates’s ‘corrected’chi-square and Fisher’s exact test are inappropriate in the unconditional cases where one or both tests of marginal frequencies are free to vary (see B e r k ~ o n , Barnard,’O ’~ Basu’’ and KempthorneZ2for their views on the appropriate use of Fisher’s exact test). Some contemporary statisticians, however, continue to recommend the use of Yates’s correction for continuity. For instance, Fleiss’ recognizes that the ‘corrected chi-square approximates Fisher’s hypergeometric probability, which in turn is based on the assumption that both sets of marginal frequencies are fixed. He reasons (p. 27) that because investigators almost never know the actual value of the population marginals, they must use the observed marginals in the 2 x 2 table to estimate them. From this, he concludes that investigators must proceed to analyse their data with the restriction that the observed marginal proportions represent the true parameters. This appears tantamount to assuming away the recognized fact that the observed marginal frequencies are subject to sampling variability when chi-square is used to test independence or homogeneity in 2 x 2 tables. Clinician-investigators and epidemiologists who operate on a less theoretical plane justifiably may be uncomfortable with assuming away that troublesome fact.

YATES’S CONTINUITY CORRECTION

365

Quite some time ago, Pear~on,’~ PlackettZ4 and Grizzlez5 expressed doubts about the appropriateness of Yates’s correction in their empirical and analytical reports. Subsequently, after other statisticians (for example, Bennett and Underwood,26Detre and White,” Roscoe and Byers” and Conoverz9;see also Starmer et al.’~,~’Mantel’s3’ and Miettinen’s3’ comments and con over'^^^ rejoinder) directly and indirectly criticized Yates’s method, computer simulation of data from the different research paradigms produced compelling evidence of the inadequacy of Yates’s correction for continuity. The sampling experiments of Camilli and Hopkins” revealed that use of corrected chi-square statistics causes the actual proportion of type I errors to be substantially lower than the nominal alpha levels for the common chi-square tests for independence (both sets of marginal frequencies free to vary) and homogeneity (one set of marginal frequencies fixed and the other free to vary). Overall and colleague^^^-^' have been the most persistent in questioning the appropriateness of the Yates correction and Fisher’s exact test. Although they do not consider the Pearson chi-square as totally without fault (it has a slight nonconservative bias),38 they do support Camilli and Hopkins” in documenting that the uncorrected chi-square statistic generally provides adequate control over type I error probabilities without the severe conservative bias produced by Yates’s correction for continuity. Conservative bias means that the chi-square test results have an actual p-value that may be substantially smaller than the nominal alpha level reported. The consequence is that the null hypothesis may not be rejected when it should be. While ‘conservative’ sounds good when referring to science, conservative bias in this instance translates into reduced power. It is unfortunate that Yates’s correction and Fisher’s exact probability test have been promoted as especially appropriate when cell frequencies are small, for this is precisely where the adequacy of power is most q ~ e s t i o n a b l e . ~ ~ The literature on evaluating 2 x 2 frequency data sets with small expected cell frequencies is ample (see also Camilli and H ~ p k i n s ’ sand ~ ~ Bradley et al.’s4’ sampling experiments, D e l u c c h i ’ ~review ~ ~ article, up ton'^^^ paper on alternative tests, and D’Agostino et U I . ’ ~ recent ~~ recommendations). Admittedly, a single best strategy for dealing with this problem across all experimental situations has not emerged. It is clear, however, that if a chi-square test of association (independence or homogeneity)is considered appropriate, the conventional statistic is preferable to the one modified by Yates’s method.

REFERENCES 1. Altman, D. G. ‘Statistics in medical journals’, Statistics in Medicine, 1, 59-71 (1982). 2. Godfrey, K. ‘Comparing the means of several groups’, New England Journal ofkfedicine, 313,1450-1456 (1985). 3. Godfrey, K. ‘Simple linear regression in medical research‘, New England Journal of Medicine, 313, 1629-1636 (1985). 4. Kraemer, H. C., Pruyn, J. P., Gibbons, R. D., Greenhouse, J. B., Grochocinski, V. J., Waternaux, C. and Kupfer, D. J. ‘Methodology in psychiatric research: report on the 1986 MacArthur Foundation Network I Methodology Institute’, Archives of General Psychiatry, 44, 1100-1106 (1987). 5. Moses, L. and Louis, T. A. ‘Statistical consulting in clinical research the two-way street’, Statistics in Medicine, 3, 1-5 (1984). 6. Yates, F. ‘Contingency tables involving small numbers and the xz test’, Journal of The Royal Statistical Society (Supplement), 1, 217-235 (1934). 7. Dixon, W. J. and Massey, F. J. Introduction to Statistical Analysis (3rd edn), McGraw-Hill, New York, 1969. 8. Fleiss, J. L. Statistical Methodsfor Rates and Proportions, Wiley, New York, 1981. 9. Mantel, N. and Greenhouse, S . W. ‘What is the continuity correction?, American Statistician, 22, 27-30 (1968).

366

M. G . HAVILAND

10. Camilli, G. and Hopkins, K. D. ‘Applicability of chi-square to 2 x 2 contingency tables with small expected cell frequencies’, Psychological Bulletin, 85, 163-167 (1978). 11. Emerson, J. D. and Colditz, G. A. ‘Use of statistical analysis in The New England Journal ofhledicine’, New England Journal of Medicine, 309, 709-713 (1983). 12. Fisher, R. A. The Design of Experiments, Oliver and Boyd, Edinburgh, 1935. 13. Fisher, R. A. Statistical Methods for Research Workers, Oliver and Boyd, Edinburgh, 1936. 14. Tate, M. W. and Hyer, L. A. ‘Inaccuracy of the xz test of goodness of fit when expected frequencies are small’, Journal of the American Statistical Association, 68, 836-841 (1973). 15. Radlow, R. and AN, E. F. ‘An alternate multinomial assessment of the accuracy of the x2 test of goodness of fit’, Journal of the American Statistical Association, 70, 81 1-813 (1975). 16. Kendall, M. G. and Stuart, A, The Advanced Theory qfStatistics (3rd edn), vol. 2, Hafner, New York, 1973. 17. Barnard, G. A. ‘Significance tests for 2 x 2 tables’, Biometrika, 34, 123-138 (1947). 18. Barnard, G. A. ‘2 x 2 tables. A note on E. S. Pearson’s paper’, Biornetrika, 34, 168-169 (1947). 19. Berkson, J. ‘In dispraise of the exact test’, Journal of Statistical Planning and Inference, 2 , 2 7 4 2 (1978). 20. Barnard, G. A. ‘In contradiction to J. Berkson’s dispraise: conditional tests can be more efficient’, Journal of Statistical Planning and Inference’, 3, 181-187 (1979). 2 I. Basu, D. ‘Discussion of Joseph Berkson’s paper “In dispraise of the exact test”’, Journal of Statistical Planning and Inference, 3, 189-192 (1979). 22. Kempthorne, 0.‘In dispraise of the exact test: reactions’, Journal ofStatistica1 Planning and Inference, 3, 199-213 (1979). 23. Pearson, E. S. ‘The choice of statistical tests illustrated on the interpretation of data classed in a 2 x 2 table’, Biometrika, 34, 139-167 (1947). 24. Plackett, R. L. ‘The continuity correction in 2 x 2 tables’, Biometrika, 51, 327-337 (1964). 25. Grizzle, J. E. ‘Continuity correction in the x2-test for 2 x 2 tables’, American Statistician, 21, 28-32 (1 967). 26. Bennett, B. M. and Underwood, R. E. ‘On McNemar’s test for the 2 x 2 table and its power function’, Biometrics, 26, 339-343 (1970). 27. Detre, K. and White, C. ‘The comparison of two Poisson-distributed observations’, Biometrics, 26, 851-854 (1970). 28. Roscoe, J. T. and Byers, J. A. ‘An investigation of the restraints with respect to sample size commonly imposed on the use of the chi-square statistic’, Journal of the American Statistical Association, 66, 755-759 (1971). 29. Conover, W. J. ‘Some reasons for not using the Yates continuity correction on 2 x 2 contingency tables’, Journal ofthe American Statistical Association, 69, 374376 (1974). 30. Starmer, C. F., Grizzle, J. E. and Sen, P. K. ‘Comment’, Journal ofthe American Statistical Association, 69, 376-378 (1974). 31. Mantel, N. ‘Comment and a suggestion’, Journal of the American Statistical Association, 69, 378-380 (1974). 32. Miettinen, 0. S. ‘Comment’, Journal ofthe American Statistical Association, 69, 380-382 (1974). 33. Conover, W. J. ‘Rejoinder’, Journal of the American Statistical Association, 69, 382 (1974) 34. Overall, J. E. ‘Continuity correction for Fisher’s exact probability test’, Journal of Educational Statistics, 5, 177~190 (1980). 35. Overall, J. E. and Hornick, C. W. ‘An evaluation of power and sample-size requirements for the continuity-corrected Fisher exact test’, Perceptual and Motor Skills, 54, 83-86 (1982). 36. Overall, J. E., Rhoades, H. M. and Starbuck, R. R. ‘Small-sample tests for homogeneity of response probabilities in 2 x 2 contingency tables’, Psychological Bulletin, 102, 307-314 (1987). 37. Overall, J. E. and Starbuck, R. R. ‘F-test alternatives to Fisher’s exact test and to the chi-square test of homogeneity in 2 x 2 tables’, Journal of Educational Statistics, 8, 59-73 (1983). 38. Rhoades, H. M. and Overall, J. E. ‘A sample size correction for Pearson chi-square in 2 x 2 contingency tables’, Psychological Bulletin, 91, 418423 (1982). 39. Overall, J. E. ‘Power of chi-square tests for 2 x 2 contingency tables with small expected frequencies’, Psychological Bulletin, 87, 132-1 35 (1980). 40. Camilli, G. and Hopkins, K. D. ‘Testing for association in 2 x 2 contingency tables with very small sample sizes’, Psychological Bulletin, 86, 1011-1014 (1979). 41. Bradley, D. R., Bradley, T. D., McGrath, S. G. and Cutcomb, S. D. Type I error rate of the chi-square test of independence in R x C tables that have small expected frequencies’, Psychological Bulletin, 86, 1290-1297 (1979).

YATES’S CONTINUITY CORRECTION

367

42. Delucchi, K. L. ‘The use and misuse of chi-square: Lewis and Burke revisited’, Psychological Bulletin, 94, 166-176 (1983). 43. Upton, G . J. G. ‘A comparison of alternative tests for the 2 x 2 comparative trial‘, Journal ofthe Royal Statistical Society, Series A, 145, 86-105 (1982). 44. DAgostino, R. B., Chase, W. and Belanger, A. ‘The appropriateness of some common procedures for testing the equality of two independent binomial populations’, American Statistician, 42, 198-202 (1988).

Yates's correction for continuity and the analysis of 2 x 2 contingency tables.

Despite recommendations to the contrary, medical researchers still routinely use the Yates-corrected chi-square statistic in analyses of 2 x 2 conting...
365KB Sizes 0 Downloads 0 Views