SPECIAL CONTRIBUTION biostatistics

Introduction to Biostatistics: Part 4, Statistical Inference Techniques in Hypothesis Testing Statistical methods used to test the null hypothesis are termed tests of significance. Selection of an appropriate test of significance is dependent on the type of data to be analyzed and the number of groups to be compared. Parametric tests of significance are based on the parameters, mean, standard deviation, and variance, and thus are used appropriately when interval or ratio data are analyzed. The t-test and analysis of variance (ANOVA) are examples of parametric tests of significance. Assumptions regarding the data to be analyzed when using the t-test or ANOVA include normality of the populations from which the sample data are drawn, homogeneity of the variances of the populations from which the sample data are drawn, and independence of the data points within a sample group. The t-test is the appropriate test of significance to use if there are only two groups to compare. If there are three or more groups to compare, ANOVA is the appropriate test. ANOVA holds the preset c~ level constant. While ANOVA will imply a significant difference between the groups compared, a multiple comparison test will define which of the three or more groups differ significantly. [Gaddis GM, Gaddis ML: Introduction to biostatistics: Part 4, statistical inference techniques in hypothesis testing. Ann Emerg Med July 1990;19:820-825.]

Gary M Caddis, MD, PhD* Monica L G~tddis, PhDt Kansas City, Missouri From the Departments of Emergency Health Services* and Surgery,t Truman Medical Center, University of MissouriKansas City School of Medicine. Received for publication September 1, 1989. Accepted for publication March 30, 1990. Address for reprints: Monica L Caddis, PhD, Department of Surgery, Truman Medical Center, 2301 Holmes, Kansas City, Missouri 64108.

INTRODUCTION The research process follows an organized, stepwise pattern. A problem is identified, the research hypothesis is generated, methods of data collection are devised, and the statistical analysis of the data to be collected is designed. Calculation of measures of central tendency and variability are easily completed, but alone these numbers have only descriptive value. Making a decision to reject or accept the null hypothesis (Ho) requires much more extensive statistical analysis of the data. Statistical methods used to test the null or statistical hypothesis (Ho) are termed tests of significance. 1 Recall from Part 3 of this series [May 1990;19:591-597] that hypothesis testing involves accepting or rejecting He .2 Selection of an appropriate test of significance is dependent on several factors, including the number of groups to be compared and the type of data to be analyzed. This fourth in the series of six articles will address the concepts of parametric statistical inference techniques in hypothesis testing.

PARAMETRIC VERSUS NONPARAMETRIC METHODS The mean and the standard deviation (SD) of a population describe a normally distributed population. 3 (Because the SD is computed as the square root of the variance, it can also be said that the variance also describes a normal distribution.) Not only are the mean, median, and mode equal in a normal distribution of data, but known percentages of data fall within set SDs from the mean with a normally distributed set of data. The mean, SD, and variance of a population are termed parameters of that population. Parametric statistical methods are based on these parameters. 1 Thus, given the relationship between these parameters and normality, the underlying assumption of parametric statistical methods is that the data being analyzed are normally distributed. If the data are not normally distributed and cannot be defined as interval or ratio data, other statistical

19:7 July 1990

Annals of Emergency Medicine

820/137

BIOSTATISTICS Gaddis & Gaddis

Set a level

Planned comparisons

I

Ftest

I

Control of experiment-I wise error rate? I I

I

I

Yes

No

( I

Experimentwise method

I Small or large number of contrasts

Small

Contrasts with control only

I

Large

I

No

Tukey test

FIGURE. Flow chart for multiple

comparison decisions (adapted from Hopkins and Chadboum [1967] and Keppel [1973]). methods appropriately termed nonparametric statistical methods are used.

II

I I

Newman-Keuls test

J

LSD test

Duncan test

I

I More than two means J involved in contrast

Dunn test

138/821

Layer method

I

Dunnetttest

Yes L I Scheff~ test

In addition to differences in type of data analyzed and the assessment of normality of the data, there are other characteristics possessed by these two classifications of statistical tests that illustrate their inherent differences. First, parametric tests prove to be m o r e p o w e r f u l t h a n n o n parametric tests. That is, if a differ-

Annals of EmergencyMedicine

ence between groups truly exists, all else being the same, that difference would more likely be found using the parametric test. Furthermore, more , information about the data is generated from parametric tests.1 However important these differences are, the nonparametric statistical test should not be discounted. Because not all

19:7 July 1990

data are n o r m a l l y distributed and not all are of an i n t e r v a l or r a t i o scale, nonparametric methods that are sound in their m a t h e m a t i c a l theory often offer the only l e g i t i m a t e m e a n s of data analysis available.

PARAMETRIC STATISTICAL INFERENCE TESTS t-Test S t u d e n t ' s t-test (t-test) is the parametric statistical method with w h i c h researchers are m o s t often familiar. It is certainly the m o s t comm o n s t a t i s t i c a l m e t h o d r e p o r t e d in the m e d i c a l literature. 1 The t-test is used to accept or reject H o. It is simplistic in that a comparison b e t w e e n two groups can be m a d e and a decision rendered w i t h o u t further analysis. Yet the t-test is powerful; it is a parametric method that mathem a t i c a l l y and t h e o r e t i c a l l y is based on the means, SDs, and variances of the data. The t-test also requires that several a s s u m p t i o n s r e g a r d i n g t h e d a t a be m a d e prior to use. If the data do not m e e t the assumptions, then the t-test is n o t the appropriate m e t h o d to use. A s s u m p t i o n s of the t-test include the following: 1) T h e p o p u l a t i o n s f r o m which the samples were drawn s h o u l d a p p r o a c h a n o r m a l distribution; 2) the variances of the populations from w h i c h sample 1 and sample 2 were drawn should be equal or nearly equal; and 3) the observations within a p o p u l a t i o n or sample group s h o u l d be i n d e p e n d e n t , ie, " n o t paired, matched, correlated, or interdependent in any way. ''4 W h i l e t h e s e a s s u m p t i o n s are important, the t-test is robust enough to be an appropriate test if an assumption is n o t m e t in the strictest sense ( e x c e p t i n g t h e a s s u m p t i o n of i n d e pendence, w h i c h m u s t be m e t at all times).4, s However, this is n o t to say that it is appropriate to use the t-test for n o m i n a l or o r d i n a l data or data that do n o t come from a n o r m a l l y or near-normally distributed population. W h i l e the t-test is used to compare two sample groups, the e x p e r i m e n t a l design of the study m u s t be considered b e c a u s e n o t all t-tests are the same. Consideration of the following is i m p o r t a n t : 1) Are the observations between g r o u p s i n d e p e n d e n t (as is the case for a control vs e x p e r i m e n t a l group design), so t h a t a nonpaired t - t e s t is appropriate? 2) Are the ob19:7 July 1990

servations between groups d e p e n d e n t (as is the case for a p r e t e s t / p o s t - t e s t design), so that a paired t-test is appropriate? 3) Are the groups equal or unequal in size? 4) Is the c o m p a r i s o n b e t w e e n a p o p u l a t i o n m e a n and samp l e m e a n or b e t w e e n t w o s a m p l e means? 5) Is t h e direction of the difference between the two groups k n o w n or unknown? If a direction of difference is postulated, the t-test is t e r m e d a or~e-tailed test. If no direction of difference is p o s t u l a t e d , t h e t-test is t e r m e d two-tailed. A very c o m m o n e x p e r i m e n t a l design in t h e medical l i t e r a t u r e is a situ a t i o n in w h i c h there are two differe n t i n d e p e n d e n t groups, a c o n t r o l group and an e x p e r i m e n t a l group. For example, suppose a n e w drug is being tested to see if it will decrease arterial pressure in persons w i t h hypertension. Two sample groups w o u l d be selected by r a n d o m assignment. Group 1 will receive a placebo while group 2 will receive the drug in question. The alpha (cx) level is preset. (Because the drug in question is h y p o t h e s i z e d to l o w e r arterial pressure, a direction of change is postulated, and this data should be tested by a one-tailed t-test.) T h e data are c o l l e c t e d , d e s c r i p t i v e s t a t i s t i c s are calculated, and the t v a l u e is coinputed. T h e t-test calculation is easily referenced. 4-6 Once a t value is obtained, the res e a r c h e r s h o u l d c o n s u l t a t a b l e of critical values for t w i t h the appropriate c~ level and degrees of freedom. If t h e c a l c u l a t e d t v a l u e is g r e a t e r than the critical t, H o is rejected and it is c o n c l u d e d t h a t the m e d i c a t i o n in question does lower diastolic arterial pressure in hypertensives. If t h e c a l c u l a t e d t v a l u e is less t h a n t h e critical t, H o is accepted as tenable. A n o t h e r experimental design comm o n to the medical literature is t h e p r e t e s t / p o s t - t e s t design. This results in dependent or related data b e t w e e n groups (repeated measure) and is analyzed using the paired t-test. For e x a m p l e , a n e w t h r o m b o l y t i c agent is developed that is p o s t u l a t e d to halt the progression of a myocardial infarction. Patients entering t h e emergency department with an evolving myocardial infarction undergo D o p p l e r e c h o c a r d i o g r a p h y to assess e j e c t i o n f r a c t i o n . F o l l o w i n g this procedure, the experimental t h r o m b o l y t i c agent is a d m i n i s t e r e d . Two days later, e j e c t i o n fraction is Annals of Emergency Medicine

assessed again. Pre- and post-thrombolytic a d m i n i s t r a t i o n data are c o m pared using a paired t-test so that pat i e n t s serve as t h e i r o w n c o n t r o l s . T h e l a c k of a significant difference between pre- and post-treatment e j e c t i o n f r a c t i o n e s t i m a t e s is expected if the drug is efficacious. The t-test is the m e t h o d of choice when making a single comparison b e t w e e n two groups whose data m e e t the a s s u m p t i o n s required of parametric analysis methods. However, w h a t is done if the e x p e r i m e n t a l design consists of three or m o r e groups to be compared? T h e researcher m a y incorrectly compare these groups using several t-tests. For example, if an e x p e r i m e n t consisted of one control group (C), and three experimental groups (El, E2, E3), the comparisons m a d e using t-tests w o u l d be C versus El, C versus E2, C versus E3, E1 versus E2, E1 versus E3, and E2 versus E3. W h i l e this s e e m s logical and certainly easy, it is i m p r o p e r and can lead to serious errors in drawing conclusions from the data.L4-6 W h e n several groups from an exp e r i m e n t are compared using " m u l t i ple t-tests," the p r o b a b i l i t y of m a k ing a type I error (rejecting a true Ho) is increased as the n u m b e r of comparisons made using independent t-tests increases. 4 T h e increase in c~ level can be calculated as follows: Step i Number of comparisons: X = no. of groups in experiment C = no. of comparisons X(X - l) 2 Step 2 Corrected a level: c~corrected = 1 - (1 - a) c Example: As shown above, w i t h four groups, there can be a m a x i m u m of 4(4 - 1)/2 = 6 paired comparisons. If the original c~ level was P - .05, the corrected o~ will be 1 (1 - .05) 6 = .26. Thus, there is n o w a .26 chance of i n a p p r o p r i a t e l y r e j e c t i n g the n u l l h y p o t h e s i s (type I error) in at l e a s t one of the six comparisons made. 4 In m o s t s t u d i e s , t h i s w o u l d be u n a c ceptable! Should m u l t i p l e t - t e s t s be m a d e a m o n g d e p e n d e n t groups, the c o r r e c t e d c~ l e v e l s are e v e n g r e a t e r t h a n t h o s e c a l c u l a t e d for i n d e p e n dent groups. 4 Thus, m u l t i p l e t - t e s t s s h o u l d not be a c c e p t e d as a legitim a t e m e a n s of data analysis for the comparison of m o r e t h a n t w o groups.4, 6 822/139

BIOSTATISTICS Gaddis & Gaddis

ANALYSIS OF VARIANCE Analysis of variance (ANOVA) has long b e e n an a c c e p t e d m e t h o d of comparing three or more groups from one experiment. The advantages of A N O V A over multiple t-tests include the following: 6 1) The R level is held c o n s t a n t at the preset level w i t h ANOVA, while the ~ level for multiple t-tests increases as the n u m b e r of c o m p a r i s o n s i n c r e a s e s ; 4 2) o n e A N O V A is less cumbersome to calculate than are several t-tests; and 3) A N O V A is a m o r e p o w e r f u l data analysis m e t h o d than is the t-test. A N O V A is the appropriate statistical method to test for differences among more than two groups. Often, it is assumed that A N O V A is used to determ i n e if there is a difference a m o n g the m e a n s of t h e s e g r o u p s r a t h e r t h a n a m o n g the groups' collective values. This is an incorrect assumption. W h i l e t h e m e a n describes a group in a meaningful way, it is simply a descriptor of the group. Many s t a t i s t i c a l r e f e r e n c e s will discuss A N O V A as a c o m p a r i s o n b e t w e e n means, but intragroup and intergroup variability is what is actually being analyzed. It is also of value to u n d e r s t a n d h o w A N O V A relates to the theory of hypothesis testing. W i t h o u t the ted i u m of a guided tour t h r o u g h the calculation of ANOVA, a simple explanation of A N O V A follows. A test of the null hypothesis can be made in terms of two sets of differences (subjects participate in only o n e t r e a t m e n t , ie, s u b j e c t s are "nested" within treatments). "One of these sets of differences is obtained by comparison of differences among t r e a t m e n t groups, referred to as external or between-group differences. The other set is obtained by comparison of differences among subjects receiving the same treatment within a t r e a t m e n t group, termed internal or within-group differences. Betweengroup differences are a result of the c o m b i n e d influence of the experim e n t a l treatment plus experimental error. W i t h i n - g r o u p differences are t h e r e s u l t of e x p e r i m e n t a l e r r o r alone. ''7 The comparison ratio: Between-group differences Within-group differences is sensitive to the effects of experimental treatment and can be written as"

140/823

Treatment effect + experimental error Experimental error Assuming that the experimental error rate estimates are approximately equal, a n y i n f l u e n c e of t r e a t m e n t will result in a ratio that is greater than 1. 7 The above example of hypothesis testing illustrates the general theory behind the mathematical calculations of ANOVA. Just as the t-test involves calculation of a t-statistic, w h i c h is compared with a critical t, A N O V A inv o l v e s c a l c u l a t i o n of an F-ratio, w h i c h is c o m p a r e d w i t h a critical F-ratio. The F-ratio answers the question, Is "the variability between the groups large enough in comparison to the variability of data within each group to justify the conclusion that two or more of the groups differ? ''6 If the v a r i a b i l i t y b e t w e e n g r o u p s is large enough, we can conclude that there is a significant difference between groups. The F-r~}tio is defined as follows: F-ratio = Between-groups variance Within-groups variance A N O V A is not just one simply defined computation. The experimental design p o s s i b i l i t i e s are n u m e r o u s with ANOVA. By using one test, several factors (eg, drugs, dose levels, dose times) can be analyzed for relationship at one time. The number of F-ratios calculated in an ANOVA is directly related to the n u m b e r of factors in t h e e x p e r i m e n t a l design. Thus, each A N O V A computation is u n i q u e to the e x p e r i m e n t a l design being tested. It is the researcher's responsibility to ensure that the appropriate A N O V A is used, given the design of the study. The a s s u m p t i o n s for A N O V A are the same as those for the t-test. 4-6 To reiterate: 1) T h e p o p u l a t i o n s f r o m which the samples are drawn should approach normal distribution; 2) the variances of the p o p u l a t i o n s f r o m which the samples were drawn should be equal or nearly equal; and 3) the o b s e r v a t i o n s within groups m u s t be independent. These assumptions can usually be met by random sampling and by use of a good m e a s u r e m e n t scale. 6 The more that the above assumptions for ANOVA are violated, the more likely a type I or type II error will be made. 6 As with the t-test, A N O V A is robust enough to be an appropriate test Annals of Emergency Medicine

if the a b o v e a s s u m p t i o n s are n o t strictly m e t (excepting the assumption of independence, which m u s t be met at all times).4, s When the compared groups have equal values of n, population variances need not be hom o g e n o u s . Also, n o r m a l i t y of the population distributions m a y be violated to a l i m i t e d degree w i t h o u t c o n s e q u e n c e . 4-6 F i n a l l y , b e c a u s e A N O V A is calculated using a parameter (variance), it is considered to be a parametric statistical analysis method and its use should be limited to interval and ratio scale data. Thus, there exist m a n y similarities between the t-test and ANOVA. This can further be extended to the calculated t from the t-test and to the F-ratio from ANOVA. If an A N O V A was being used instead of the t-test to c o m p a r e t w o groups, it w o u l d be found that F = t 2 for these data.4, 5

MULTIPLE COMPARISON METHODS Following a significant F test, the next logical step w o u l d be to ask, Which of the groups compared in the A N O V A are significantly different? This q u e s t i o n can be a n s w e r e d by the use of multiple comparison procedures. "All are essentially based upon the t-test but include appropriate corrections for the fact that more t h a n o n e c o m p a r i s o n is b e i n g made."1 There exist n u m e r o u s legitimate m e t h o d s of m u l t i p l e c o m p a r i s o n , each looking for unplanned yet "interesting" differences in the experim e n t a l data, but operating under a different set of rules and a s s u m p tions. 5 The test that is selected for use should be the test that meets the needs of the researcher and the design of the study. But overall, it is important to remember that the reason for using A N O V A and a multiple comparison method is ultimately to control the experimentwise error rate (the type I error rate for all comparisons) while at the same time making several different comparisons.7 The experimentwise error rate can be limited by reducing the n u m b e r of comparisons made or reducing the error rate w i t h i n each c o m p a r i s o n . Because most researchers do not want such imposing conditions placed on their work, as would be the case by limiting the n u m b e r of comparisons allowed, the only other way to control the experimentwise error rate is 19:7 July 1990

to control the type I error rate w i t h i n each comparison; hence, the purpose behind multiple comparison techniques. H o w e v e r , it is i m p o r t a n t to n o t e t h a t in reducing the type I error rate in such a way, there w i l l be an i n c r e a s e i n t h e t y p e II e r r o r r a t e . T h u s , b e f o r e p r o g r e s s i n g , t h e res e a r c h e r m u s t d e t e r m i n e w h i c h is m o r e d e t r i m e n t a l to the work, m a k ing t y p e I errors or m a k i n g type II errors.7 A s u m m a r y flow chart of the selection of m u l t i p l e comparison tests is s h o w n (Figure). Use of this figure w i l l help guide t h e r e s e a r c h e r to select the test m o s t appropriate for the exp e r i m e n t a l design tested and research questions asked. This flow chart, developed by H o p k i n s and Chadbourn 8 a n d m o d i f i e d b y K e p p e l , 7 w a s intended to show the similarities and differences b e t w e e n some of the various m u l t i p l e c o m p a r i s o n m e t h o d s . It should n o t be used as a "fixed and rigid plan for analysis. ''7 For the purposes of this article, this chart serves as a logical guide to aid the reader in t h e u n d e r s t a n d i n g of m u l t i p l e comparison methods. Before a n y m u l t i p l e c o m p a r i s o n test, an c~ level is determined. Next, an F test is performed. If a significant F - r a t i o is o b t a i n e d , t h e p r o c e s s of data analysis continues to d e t e r m i n e w h i c h groups differ statistically. T h e t e s t of Least Significant Difference (LSD) is an o p t i o n if the r e s e a r c h e r w i s h e s to c o n t r o l t h e c o m p a r i s o n wise error rate ( i n d i v i d u a l type I error rates for each comparison) ~ and if a small n u m b e r of comparisons, relative to the total n u m b e r of comparisons possible, are to be made. However, if the e x p e r i m e n t w i s e error rate (type I error rate for all comparisons) must be held constant, other m e t h o d s of m u l t i p l e c o m p a r i s o n m u s t be considered. T h e r e are two ways to control the e x p e r i m e n t w i s e error rate, These include the layer or s t e p w i s e m e t h o d and the e x p e r i m e n t w i s e method. T h e layer method gradually adjusts the type I error rate. T h e e x p e r i m e n t w i s e m e t h o d h o l d s t h e t y p e I error rate c o n s t a n t for a set of c o m p a r i s o n s . T h e N e w m a n - K e u l s test and D u n c a n test are examples of layer methods. If an e x p e r i m e n t w i s e m e t h o d is selected, the "type of comparisons to be made will determine the multiple comparison m e t h o d selected. If comp a r i s o n s are m a d e o n l y b e t w e e n a 19:7 July 1990

control group and experimental groups, t h e D u n n e t t test is an option of m u l t i p l e c o m p a r i s o n to consider. However, if t h e group c o m p a r i s o n s are b e t w e e n a n y groups, t h e r e are other test options. The Dunn test could be considered if there are only a few c o m p a r i s o n s t o be m a d e . If there are a large n u m b e r of comparisons to be made, the T u k e y test or the Scheff4 test m i g h t be considered. The Tukey test assumes that the groups being c o m p a r e d are of equal size and is appropriate in the simple c o m p a r i s o n of one group w i t h another. T h e Scheff4 t e s t is based on t h e F s t a t i s t i c a n d t h u s is less affected b y v i o l a t i o n s of the a s s u m p tions of n o r m a l i t y and h o m o g e n e i t y of variances. Should comparisons be desired b e t w e e n c o m p l e x c o m b i n a tions of groups, the Scheff4 test w i l l be sensitive in detecting real differences. 7

While not included in the flow chart, the Bonferroni t-test is a m u l t i ple c o m p a r i s o n m e t h o d f r e q u e n t l y used in m e d i c a l literature. The Bonf e r r o n i t - t e s t a d j u s t s t h e p r e s e t c~ level by the n u m b e r of comparisons to be made. t,9 O~adj =

Cgp

where p is the preset ~ level and n is t h e n u m b e r of c o m p a r i s o n s to b e made. "If each c o m p a r i s o n is m a d e using the critical t corresponding to COp/n, t h e error rate for all c o m p a r i sons t a k e n as a group will be at m o s t % . ' q T h u s the preset R level is protected. However, the Bonferroni t-test b e c o m e s very c o n s e r v a t i v e as the n u m b e r of comparisons m a d e increases. 1 Finally, as previously noted, confidence intervals m a y be m o r e useful t h a n m u l t i p l e c o m p a r i s o n t e s t s in analysis of intergroup similarity.2,3, 9 " C o n f i d e n c e i n t e r v a l s : 1) Show t h e degree of u n c e r t a i n t y in each c o m p a r i s o n i n an e a s i l y i n t e r p r e t a b l e way; 2) m a k e it easier to assess t h e practical significance of a difference as well as the statistical significance; and 3) are less l i k e l y to lead non-stati s t i c i a n s to t h e i n v a l i d c o n c l u s i o n t h a t n o n s i g n i f i c a n t l y different sample m e a n s i m p l y e q u a l p o p u l a t i o n means.'9 T h e above d i s c u s s i o n of m u l t i p l e c o m p a r i s o n m e t h o d s and t h e i r uses is a basic overview of just a few of the possible options available to the Annals of Emergency Medicine

researcher. There are other l e g i t i m a t e m e t h o d s that have n o t been i n c l u d e d in this d i s c u s s i o n b e c a u s e of space limitations. Furthermore, statistical procedures and opinions on m u l t i p l e c o m p a r i s o n t h e o r y are c o n t i n u a l l y evolving. The researcher is free to select whatever multiple comparison m e t h o d is d e s i r e d as l o n g as t h e m e t h o d is appropriate for the experim e n t a l design and research questions asked.

SUMMARY In conclusion, w h e n selecting t h e m e t h o d for h y p o t h e s i s testing, s i m plicity and familiarity must be p u s h e d aside for a s s u r a n c e t h a t t h e data being analyzed m e e t t h e defined a s s u m p t i o n s r e q u i r e d for use of a given test. For the t-test and ANOVA, these a s s u m p t i o n s i n c l u d e n o r m a l i t y of t h e p o p u l a t i o n s f r o m w h i c h the d a t a come, h o m o g e n e i t y of the variances of the sample populations, and independence of the data points w i t h i n a sample group. If t h e e x p e r i m e n t a l design consists of only two groups, the t-test is appropriate to test for a significant difference b e t w e e n these groups. However, if there are three or m o r e groups to compare, the t-test is inappropriate because the preset level will increase w i t h the n u m b e r of comparisons made. A N O V A is a p o w e r f u l s t a t i s t i c a l test to d e t e r m i n e s i m u l t a n e o u s l y if t h e r e is a s i g n i f i c a n t d i f f e r e n c e a m o n g three or m o r e groups. W h i l e t h e F - r a t i o w i l l t e l l if s i g n i f i c a n c e a m o n g any of the groups exists, it gives no information regarding w h i c h of the groups differs. T h u s , f o l l o w i n g a s i g n i f i c a n t Fratio, a m u l t i p l e c o m p a r i s o n test can be selected t h a t will define w h i c h of the three or m o r e groups is differenL T h e m u l t i p l e comparison m e t h o d selection is based on the e x p e r i m e n t a l d e s i g n and t h e r e s e a r c h q u e s t i o n s asked.

REFERENCES

1. Glantz SA: Primer of Biostatistics, ed 2. New York, McGraw-Hill Book Co, 1987. 2. Gaddis GM, Gaddis ML: Introduction to biostatistics: Part 3, Sensitivity, specificity, predictive value and hypothesis testing. Ann Emerg Med 1990;19:591-597. 3. Gaddis GM, Gaddis ML: Introduction to biostatistics: Part 2, Descriptive analysis. Ann Emerg Med 1990;19:309-315. 4. Hopkins KD, Glantz GV: Basic S~atistics for the Behavioral Sciences. Engfewood Cliffs, New 824/141

BIOSTATISTICS Gaddis & Gaddis

Jersey, Prentice~Hall, Inc, 1978. 5. Sokal RR, Rolph FJ: Biometry, ed 2. New York, WH Freeman and Co, 1981.

7. Keppel G: Design and Analysis: A Researcher's Handbook. Englewood Cliffs, New Jersey, Prentice-Hall, Inc, 1973.

6. Elston RC, Johnson WD: Essentials of Biostatistics. Philadelphia, FA Davis Co, 1987.

8. Hopkins KD, Chadbourn RA: A schema for proper utilization of multiple comparisons in

142/825

Annals of Emergency Medicine

research and a case study. Amer Educ Res J 1967;4:407-412. 9. SAS Institute Inc: SAS/STAT ® User's Guide, Release 6.03 edition. Cary, North Carolina, SAS Institute Ine, 1988, p 1028.

19:7 July 1990

Introduction to biostatistics: Part 4, statistical inference techniques in hypothesis testing.

Statistical methods used to test the null hypothesis are termed tests of significance. Selection of an appropriate test of significance is dependent o...
556KB Sizes 0 Downloads 0 Views