206

Common Problems With Statistical Aspects of Periodontal Research Papers* Lawrence J. Emrich

Enormous advances have been made over the past decade concerning the proper use of statistical techniques in reporting periodontal research. However, numerous articles still appear which use inappropriate or less than optimal statistical methods. This paper describes the most common problems encountered in a recent review of the literature, and presents suggestions which will hopefully improve the quality of statistical information reported in future research articles. J Periodontol 1990;61:206-208.

Key Words: Statistics; periodontal research; site/subject.

STATISTICAL TREATMENT OF MEASUREMENTS TAKEN FROM MULTIPLE SITES WITHIN THE SAME MOUTH The most common problem encountered in the periodontal research literature is the incorrect statistical treatment of measurements taken from multiple sites within the same mouth. Because of a common oral environment and various subject-wide influences, measurements taken at several sites within the same mouth are correlated to some degree. The actual correlation varies from study to study and measurement to measurement. For example, within-subject correlations of 0.57 have been reported for plaque measurements, correlations of 0.17 have been reported for dental probing attachment level, and correlations of 0.43 have been found for changes in counts of Actinobacillus actinomycetemcom-

itans.1-3

Because of these inherent correlations, any analysis which is based on the assumption of statistical independence is inappropriate. The typical result of such analyses is an inflated Type I error rate; that is, the authors will incorrectly claim a "statistically significant" finding when there is no difference between the groups being compared, or there is no relationship between the variables being studied. For example, the actual Type I error rate is 0.22 rather than 0.05 if the /-test is used inappropriately with only five sites per subject and with only moderate (0.4) intra-subject correlation.4 The effect becomes even more dramatic as the number of sites per subject increases. The inflated Type I error rate is a result of an underestimate of the variance for the test statistic. Laster4 illustrates this point with data on pocket probing depths for six sites

•Department of Biomathematics, falo, NY.

Roswell Park Memorial Institute, Buf-

on each of six patients. The intraclass correlation coefficient for this data was estimated to be 0.31. Assuming independence between the sites resulted in an estimated standard error of the mean of 0.23, while the estimate was 0.37 when appropriate methods were used to account for the correlation. Hence, the use of inappropriate statistical techniques resulted in a 38% underestimate of the standard error of the mean. If the data had represented changes in pocket depth after treatment, and if the mean change had been 0.6 mm, for example, one would have incorrectly concluded that the treatment had resulted in a statistically significant change using a two-tailed West and a significance level of 0.05. This same phenomenon can occur when analyzing categorical data such as the presence or absence of a suspected periodontal pathogen in a set of sites within a patient. Of course, this does not mean that every analysis performed using an incorrect statistical technique results in an incorrect conclusion. Reanalysis of the same data using correct statistical methods can frequently result in the same conclusion as was found based on the assumption of statistical independence. This does not imply, however, that the use of the more complicated statistical methods required for correlated data is a "waste of time." The results of any scientific investigation will be believable and most likely to have a lasting impact on the scientific and clinical community if they are based on sound scientific methods. This applies to the statistical methods used to analyze the data as well as to the medical and biologic techniques used to generate and collect the data. The fact that sites within a mouth cannot be considered to be statistically independent does not necessarily mean that one should revert to analyzing only whole-mouth averages, since this practice frequently makes inefficient use

Volume 61 Number 4

of the data. The appropriate "unit of analysis" should be dictated by the medical or biological question being asked of the data. If interest centers on determining which subjectspecific variables (e.g., sex, race, age, systemic diseases) influence the chance that the subject will develop periodontal disease in one or more sites, for example, then the unit of analysis is the subject. On the other hand, if interest centers on whether certain microbiologie pathogens (found only at certain sites, or to varying degrees in different sites) are associated with the depth of a pocket, the appropriate unit of analysis is clearly the site. However, in these types of investigations, the correlation among measurements taken at various sites within a mouth must be taken into account by the use of appropriate statistical techniques. Techniques which take into account the correlation among measurements taken at various sites within a mouth range from the relatively straightforward paired i-test, to a repeated measures analysis of variance (which is available in most of the standard statistical packages such as SPSS, SAS, and BMDP), to the more recently developed generalized estimating equation approach to analyzing correlated binary data.5-7 Many studies, for example, are designed as matched-pair studies in which one periodontal defect in each of several patients is chosen at random to be treated with a new approach, while another defect is treated with a standard treatment. Because each pair of defects is located within the same mouth, the appropriate analysis is a matched pair analysis, such as the matched pair i-test or Wilcoxon signed rank test for continuous data (e.g., pocket depth, attachment level), or the McNemar test for binary data (e.g., presence or absence of a suspected periodontal pathogen). In contrast, the two-sample i-test and Mann-Whitney rank sum test for continuous data, and the usual chi-squared test for binary data, are appropriate only for comparing two independent groups of measurements, and should not be used in the matched-pair setting. More complicated statistical techniques are required when more than two sites per mouth are considered. (This includes designs in which two or more pairs of sites per mouth are analyzed.) For continuous data, the repeated measures analysis of variance model is often appropriate.8 This technique allows one to compare groups using the data from the individual sites, but takes into account the correlations among the sites. Methods for analyzing correlated binary data are less well developed; however, a recent technique based on generalized estimating equations appears to be very

promising.5-7

USE OF THE i-TEST AND "BELL-SHAPED" FREQUENCY DISTRIBUTION A second common statistical problem encountered in periodontal research articles is the inappropriate use of the i-test. This test relies for its validity on the assumption that the data from the two groups being compared come from sam-

STATISTICAL ASPECTS OF RESEARCH PAPERS

207

ples of populations in which the data have the Gaussian or "bell-shaped" frequency distribution curve. This is not true for many parameters studied in periodontal and other biomedicai research, where data are often highly skewed toward smaller values. If the sample sizes in the two groups are relatively "large" (e.g., 20 or more), this assumption is not required, since a result known as the central limit theorem may be used to justify the use of the i-test. However, for small samples, researchers should either justify the assumption using an appropriate statistical technique prior to applying the i-test, or should analyze their data using a nonparametric technique, such as the Mann-Whitney test, rather than the i-test. These nonparametric techniques make fewer assumptions concerning the distribution of the data and may be more appropriate than the i-test in many circumstances. One common misconception about these nonparametric methods is that they are not as "powerful" as their parametric equivalents; i.e., that their use will result in more false negative conclusions. The fact is that the power of the Mann-Whitney test is very close to that of the i-test, and the Mann-Whitney test can be much more powerful than an inappropriately applied i-test. An excellent, relatively non-technical summary of the most common nonparametric procedures is given by Siegel.9 INFERENCES FROM NEGATIVE RESULTS A third problem which arises frequently in the periodontal

research literature concerns inferences from "negative" (nonsignificant) results. A null hypothesis is never "accepted" in classical statistical hypothesis testing. If two or more groups are compared and no statistically significant differences are found, one should not conclude that the groups are the same. Clinically and/or biologically meaningful differences may still be present which were not detected by the statistical hypothesis test. This phenomenon occurs frequently in studies involving small sample sizes, since the statistical test is not "powerful" enough to detect meaningful differences. In order to draw inferences from a "negative" result, the non significant P-value should be accompanied by a 90% or 95% confidence interval for the difference between the groups. This confidence interval will show the likely range of possible differences; one can safely conclude that the groups are similar only if this range does not contain any clinically/biologically meaningful values. To illustrate this concept, consider a study designed to compare two groups of 10 subjects each with respect to average whole-mouth probing pocket depth. Suppose that the mean and standard deviations for the first group were 4.0 mm and 2.5 mm, respectively, while the corresponding values for the second group were 6.2 mm and 2.5 mm. A two-sided i-test for comparing the groups would result in a P-value of 0.06, a non-significant result. On the other hand, the 95% confidence interval for the difference in means between the groups is -0.15 mm to 4.55 mm. Thus, while the possibility that the two groups have the same mean

208

J Periodontol April 1990

EMRICH

by these data, the confidence interval they may differ by as much as 4.55 mm, a clinically meaningful result. Hence, the results are inconclusive, and more data would have to be gathered to draw cannot be ruled out

shows that

any further conclusions. Other, less frequently encountered problems in the periodontal research literature include a lack of detail concerning which statistical procedures were used, the use of the symbol "NS" for non-significant results, and the misuse of the term "random." Authors should state in the Methods section of their papers precisely which statistical procedures were used to design and analyze the experiment. If multiple statistical procedures were used, P-values should be accom0.04, panied by a reference to the statistical test (e.g., /-test) in situations where it may not be obvious which statistical test was used. The results of all statistical comparisons should be accompanied by a P-value, rather than the symbol "NS" for non-significant (P > 0.05) results; an analysis with a Pvalue of 0.06 should not be considered the same as one in which 0.86. Finally, the term "random" should be reserved for situations in which patients or one member of a pair of sites was truly assigned to a group at random by using a fair coin, a random number table, or a computer random number generator. Assignments made haphazardly, or according to the day of the week, are not random assignments. =

ate unit of

analysis is the site rather than the subject, then the correlations among the data from multiple sites within the same mouth should be taken into account in the analysis. Care should be taken to justify the assumptions inherent in all statistical analyses, particularly those involved with the use of the /-test. Inferences concerning "negative" results should be based on an analysis of confidence intervals rather than on a P-value greater than 0.05. Adherence to these basic principles will hopefully improve the usefulness of periodontal research in developing effective diagnostic, preventive, and treatment regimes. REFERENCES 1. Laster L, Listgarten M. The effect of subsampling sites within subjects. J Dent Res 1984; 63:223. 2. Haffajee AD, Socransky SS, Goodson JM, Lindhe J. Intraclass correlation of periodontal measurements. J Dent Res 1984; 63:341. 3. Christersson LA, Emrich LJ, Dunford RG, Genco RJ. Analysis of data 4.

=

SUMMARY As discussed earlier, the results of any scientific investigation must be based on sound scientific methods if they are to have an impact on the field. Not all research findings require statistical analysis, but those which do should be based on appropriate statistical techniques. If the appropri-

5.

6. 7. 8.

9.

from clinical studies of localized juvenile Periodontitis. / Clin Periodontol 1986; 13:476-480. Laster LL. The effect of subsampling sites within patients. J Periodont Res 1985; 20:91-96. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73:13-22. Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics 1986; 42:121-130. Prentice RL. Correlated binary regression with covariates specific to each binary observation. Biometrics 1988; 44:1033-1048. Winer BJ. Statistical Principles in Experimental Design. McGraw-Hill: New York; 1971. Siegel S. Nonparametric Statistics for the Behavioral Sciences. McGrawHill: New York; 1956.

Send reprint requests to: Dr. LJ. Emrich, Department of Biomathematics, Roswell Park Memorial Institute, Elm and Carlton Streets, Buf-

falo,

NY 14263.

Accepted for publication

October

21, 1989.

Common problems with statistical aspects of periodontal research papers.

Enormous advances have been made over the past decade concerning the proper use of statistical techniques in reporting periodontal research. However, ...
521KB Sizes 0 Downloads 0 Views