STATISTICS IN MEDICINE, VOL. 11,845-848 (1992)

LETTERS TO THE EDITOR YATES’S CORRECTION FOR CONTINUITY AND THE ANALYSIS OF 2 x 2 CONTINGENCY TABLES by M. G. Haviland, Statistics in Medicine, 9, 363-383 (1990)

From: Eric Peritz Department of Social Medicine The School of Public Health and Community Medicine The Faculty of Medicine of the Hebrew University and Hadassah P.O. Box 1172, Jerusalem 91010 Israel

I was surprised to find in your journal Haviland’s discussion of Yates’s correction and Fisher’s exact test. Thirty-one years ago, Lehmann’ proved that Fisher’s exact test was uniformly most powerfully unbiased whether the total sample size, the row totals, or the column totals are taken as fixed. This should, indeed, put to rest the issue revived in this article. The treatment of Yates’s correction is equally puzzling. The usual manner of deriving the null distribution of Pearson’s chi-square (or its square root) consists in applying a normal large-sample approximation to the hypergeometrical conditional distribution of one of the frequencies in the 2 x 2 table. In what sense, then, could Yates’s correction be suitable for the hypergeometric distribution but not for Pearson’s chi-square? Surely, the question should be in what instances the hypergeometric can be approximated by the normal distribution? If the normal approximation is considered to be adequate, Yates’s correction should be used; otherwise one should probably refrain from using it. If one wishes to add a new feature to the discussion of 2 x 2 tables, one might try to look at the following question. The uniformly most powerfully unbiased version of Fisher’s exact test uses randomization on the boundary, and it is well known that researchers are reluctant to incorporate a random element into their decision procedure. It would be useful to propose a procedure in which decisions on the boundary are based exclusively on the data -without inflicting too much harm to the test’s optimum property. REFERENCE 1. Lehmann, E. L. Testing Statistical Hypotheses, Wiley, New York, 1959, (2nd edn. 1988), Chapter 5.

AUTHOR’S REPLY I shall respond first to Professor Peritz’s questions and then attempt both to clarify and amplify the points I made previously. First, although an important contribution, Lehmann’s’ ‘proof’ does not settle the matter of how best to analyse frequency data arranged in a 2 x 2 contingency table. O n the other hand, one might reasonably conclude (as Hirji et al.‘ have) that Lehmann’s work sets the stage well for a potentially fruitful exercise, a re-examination of the conditioning principle. Second, Fisher’s exact test (FET) and Yates’s chisquare approximation (K:) often are called into play when adequacy of power already is at issue. For example, a stock element in a paper’s ‘data analysis’ subsection is ‘For categorical data, we used chi-square and, where required, Fisher’s exact test’. ‘Where required’ is short for ‘when sample sizes were small’ o r ‘when [pick a number] of cells had expected frequencies less than [pick a number]’. That is a mistake. Use of F E T rarely is justified by its theoretical superiority, nor is its ultraconservative nature acknowledged. Results then are tagged ‘significant’ or ‘not significant,’ a second mistake. Finally, a procedure of the sort Professor Peritz

846

LETTERS TO THE EDITOR

describes in his last paragraph, indeed, would be useful (as, I believe, Hirji eta/.’ have demonstrated [see also Lancaster3]). It seems appropriate to argue that the hypergeometric model is responsible for the severity of the discontinuity problem, which, in turn, is responsible for the conservative bias in F E T and 1:. To blame the discontinuity problem on the poor fit between the hypergeometric model and stochastic reality, however, is more difficult. I can offer this: ‘A statistical distribution is a good model to the extent that its behaviour represents the stochastic realities of the actual experimental paradigm. Exact probabilities notwithstanding, the hypergeometric model is not a good model for the null sampling distribution of 2 x 2 tables in which binary response frequencies are free to vary. Were the model to consider more realistically a distribution of the marginal frequencies that might be observed in testing homogeneity of binary response probabilities in a 2 x 2 table, the number of possible patterns of cell frequencies to which probability could be assigned would markedly increase. Discontinuities would be reduced in magnitude because the combined probability of all patterns admissible under the unconditional model would still be 1.0.’ (John E. Overall, Ph.D., personal communication, July 3, 1987). Four points now seem clear: 1. FET is appropriate if one believes that a statistical model with fixed marginal frequencies actually represents the stochastic realities in the experimental paradigm and if one is willing to interpret P-values, and perhaps mid-P-values, as either Camilli4 or Barnard’ suggest; 2. Pearson’s conventional chi-square statistic without Yates’s ‘correction’ is appropriate if one or both sets of marginal frequencies are free to vary and the marginal splits are not extreme; 3. The exact binomial test6 is appropriate if stochastic realities fix only one set of marginal frequencies and there is an extreme (Poisson-like) imbalance on the other; 4. A problem for which there is no clear solution occurs when both sets of marginal frequencies are extreme and sample sizes are not large. Tea or t for 2 x 2?’ -9 That has been the central question; however, there now is considerable dissent within the ‘conditionalist’ and the ‘frequentist’ camps. Because it seems clear that no one test performs best in all situations (see, for example, Richardson” and Kroll’ ’), conventional and alternative tests must be selected on the basis of their theoretical and philosophical appeal, as well as their performance, an important point that Greenland” and Hirji et a/.’ have underscored. MARKG. HAVILAND Department of Psychiatry School of Medicine Lorna Linda University Lorna Linda, California 92350 U.S.A. REFERENCES 1. Lehmann, E. L. Testing Statistical Hypotheses, Wiley, New York, 1959. 2. Hirji, K. F., Tan, S. J. and Elashoff, R. M. ‘A quasi-exact test for comparing two binomial proportions’, Statistics in Medicine, 10, 1137-1 153 (1991). 3. Lancaster, H. 0. ‘Significance tests in discrete distributions’, Journal of the American Statistical Association, 56, 223-234 (1961). 4. Camilli, G . ‘The test of homogeneity for 2 x 2 contingency tables: A review of and some personal opinions on the controversy’, Psychological Bulletin, 108, 135-145 (1990). 5 . Barnard, G. A. ‘On alleged gains in power from lower P-values’, Statistics in Medicine, 8, 1469-1477 (1989). 6. Overall, J. E. and Starbuck, R. R. ‘F-test alternatives to Fisher’s exact test and to the chi-square test of homogeneity in 2 x 2 tables’, Journal of Educational Statistics, 8, 59-73 (1983). 7. Fisher, R. A. The Design of Experiments, Oliver and Boyd, Edinburgh, 1935. 8. Fisher, R. A. Statistical Methodsfor Research Workers, Oliver and Boyd, Edinburgh, 1936. 9. DAgostino, R. B., Chase, W. and Belanger, A. ‘The appropriateness of some common procedures for testing the equality of two independent binomial populations’, The American Statistician, 42, 198-202 (1988). 10. Richardson, J. T. E. ‘Variants of chi-square for 2 x 2 contingency tables’, British Journal ofMathematical and Statistical Psychology, 43, 309-326 (1990).

847

LETTERS TO THE EDITOR

11. Kroll, N. E. A. ‘Testing independence in 2 x 2 contingency tables’, Journal of Educational Statistics, 14,

47-79 (1 989). 12. Greenland, S. ’On the logical justification of conditional tests for two-by-two contingency tables’, The American Statistician, 45, 248-251 (1991).

A COMPARISON OF TWO SIMPLE HAZARD RATIO ESTIMATORS BASED O N THE LOGRANK TEST by G. Berry, R. M. Kitchin and P. A. Mock, Statistics in Medicine, 10, 749-755 (1991) From: Tosiya Sat0 Institute of Statistical Mathematics Minato. Tokyo 106 Japan

Recently, Berry et al. have reported the performance of the two incidence rate ratio (or hazard ratio) estimators which are easy to compute based on the logrank test; the Pike and the one-step (Peto) estimators. They excluded examining the Mantel-Haenszel and other estimators, because these estimators ‘require substantive additional computations’. In this note, I will show that the Mantel-Haenszel estimator for the constant incidence rate ratio requires no additional computations other than the logrank test. The logrank test is the two-group-survival-analysis version of the Mantel-Haenszel test.’ Consider a series of 2 x 2 tables ordered by the failure times and formed by pairs of failures (d,, d,) with (n,, n,) at risk observations, which yield the data as follows:

Treatment

A B

Total

Failed

Survived

Total

dA dB

nA - d A

nA

nB - d,

“6

d

n-d

n

For simplicity, no failure-time index is used in this notation. When testing the null hypothesis that the incidence rate ratio between two treatment groups is unity, the logrank test is given by To = (10, - E+l - co)z/V+9 (1) where 0, = Ed,, E , = En,d/n, V += En,n,d(n - d ) / [ n z ( n- l)], and the summations are over all informative tables in which both treatment groups have observations at risk. The statistic To has asymptotically a chi-squared distribution with one degree of freedom. When one would like to approximate the Fisher p-value then c,, = 1/2, or when the mid-p value then co = 0.’ As described in Berry et al. one can calculate the Pike and the one-step estimates and associated confidence intervals only from 0+,E , , V , , and E d . Let R , = x d , ( n , - d,)/n and S + = x d , ( n , - d,)/n. The alternative form of the test statistic To is given by

To = (IR+ - S+l - C O ) * / V , , and the Mantel-Haenszel estimate for the constant incidence rate ratio is obtained by IRR,, = R ,/ S + , The approximate confidence interval is calculated from the variance estimate of log, (1RRMH)3 or the Mantel-Haenszel a p p r ~ a c h The . ~ latter gives an approximate confidence interval for IRR as the two solutions to the quadratic equation (1R+ - IRR S+I - c,)’/(IRR W + )= z’,

(2) where W + = E[d,(n, - dB)(n, - d , d, + 1) + (n, - d,)d, (d, + nB - d, l)]/n2, z is the 100 (1 - a/2) percentile point of the standard normal distribution, and ci = (1 + IRR)/4, when continuity correction is

+

+

Yates's correction for continuity and the analysis of 2 x 2 contingency tables.

STATISTICS IN MEDICINE, VOL. 11,845-848 (1992) LETTERS TO THE EDITOR YATES’S CORRECTION FOR CONTINUITY AND THE ANALYSIS OF 2 x 2 CONTINGENCY TABLES b...
214KB Sizes 0 Downloads 0 Views