This article was downloaded by: [University of Auckland Library] On: 08 October 2014, At: 18:48 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Biopharmaceutical Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lbps20

Some thoughts on the one-sided and two-sided tests Satya D. Dubey

a

a

Center for Drug Evaluation and Research , U.S. Food and Drug Administration , Rockville, Maryland Published online: 29 Mar 2007.

To cite this article: Satya D. Dubey (1991) Some thoughts on the one-sided and two-sided tests, Journal of Biopharmaceutical Statistics, 1:1, 139-150, DOI: 10.1080/10543409108835011 To link to this article: http://dx.doi.org/10.1080/10543409108835011

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/ terms-and-conditions

Journal of Biopharmaceutical Statistics, 1(1), 139-150 (1991)

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

SOME THOUGHTS ON THE ONE-SIDED AND TWO-SIDED TESTS Satya

D. Dubey

Center for Drug Evaluation and Research U.S. Food and Drug Administration Rockville, Maryland

Key words. One-sided and two-sided tests; Administrative hearings; Regulatory perspectives; Academic views; Current practices

Abstract This paper addresses some scientific and regulatory perspectives, insightful regulatory experiences, academic views, and current practices pertinent to one-sided and two-sided tests within the framework of clinical trials designed for specific diseases and drugs. It makes compelling points in favor of applying a two-sided test under a wide variety of situations. Additionally, it discusses situations where one may reasonably argue for use of a one-sided test and concludes by suggesting the use of two-sided tests in controversial situations This paper was presented on Monday, Aug. 6 , 1990 at the Invited Paper Session on "One-sided and TWO-sided ~ e s t s " - o fthe 150th Annual Meeting of the American Statistical Association with the Biomemc Society (ENAR and WNAR) held at Anaheim Hilton and Towers and Anaheim Marriott Hotels, Anaheim, California (Aug. 6-9, 1990). This session was sponsored by the Biophmaceutical Section, Biometric Section and ENAR-WNAR. The views expressed in this paper are those of the author and not necessarily of the Food and Drug Administration. This paper has appeared in slightly different form in Dubey SD: Proceedings of rhe Biopharmaceutical Section of the American Statistical Association, 1990, pp. 11-17.

Copyright O 1991 by Marcel Dekker, Inc.

Dubey

Introduction In this paper I address some scientific and regulatory perspectives as well as experiences on the topic of one-sided and two-sided tests and make a few concluding comments. This paper is applicable to situations relevant to confirmatory clinical trials. For exploratory trials, sound rationale can be applied to justify the use of one-sided tests in many appropriate situations. The latter situation is beyond the scope of this paper.

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

Scientific and Regulatory Perspectives The role of one-sided or two-sided tests depends on the type of hypothesis one intends to test, and the formulation of the hypothesis must be a function of the proper clinical claim. In a regulatory framework, the proper clinical claim has to be finally established between the drug sponsor and the Food and Drug Administration (FDA) review team. This is necessary because the field of clinical trials methodology pertinent to all diseases and drugs is vast and extremely complex, requiring specific considerations emerging from the current state of knowledge of the disease, the drug under investigation, the study population, the nature of uncertainties, variabilities, and a variety of difficulties associated with the conduct of scientifically appropriate and effective clinical trials. Scientific considerations necessary to design and conduct effective clinical trials for specific drug classes vary. In recognition of these facts, the FDA has developed clinical guidelines for specific diseases in addition to general guidelines. Occasionally, the FDA decisions have been challenged by several drug sponsors on the Drug Efficacy Study Implementation (DESI) drugs at the Administrative Hearings, and the Administrative Law Judge, Daniel J. Davidson, has issued legal rulings in several cases which include the topic of the p value and the one-sided and the two-sided tests. I intend to present some selected salient conclusions of the legal judgments which should shed proper light on this subject matter. Frequently, real experiences reveal surprising results. In this connection I intend to provide selected views of several distinguished clinical trial practitioners as well as prominent researchers.

Regulatory Experiences In an administrative DESI hearing, the following situations occurred. The FDA opposed a drug sponsor's use of a one-tailed test on two main grounds. (References 1-4, pertinent to DESI Administrative Hearings, are given at the end of this paper).

One-sided and Two-sided Tests

141

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

1. A two-tailed test is appropriate in studies of a combination product like Viofonn-HC because it protects against possible antagonism of the two components. (Dr. Paul Levy). (Levy, Ex. No. G-85 at 7-8). 2. A one-tailed test is appropriate for a combination drug study only if it is known prior to initiation of the study that the efficacy of one component is not lessened by the presence of another component. This knowledge concerning the interaction of the components must be a priori knowledge (Levy, Ex. G-85 at 11). The drug sponsor's own medical witnesses admitted that the hydrocortisone component interferes with the antifungal action of the vioform component, even though they characterize this interference as clinically unimportant. (Jolly, Ex. No. C-20, #85; Urbach Ex. No. C-22, #90; Maibach, Ex. No. C-21, #90, Stoughton, Ex. No. C-24 #91). The administrative judge stated, In sum, the evidence indicates that the drug sponsor (Ciba-Geigy) originally used two-tailed tests of significance and subsequently decided to apply the more liberal one-tailed test. The effect of this post-hoc decision was to create significance (or near significance) on several parameters when significance did not previously exist. This critical switch was not adequately explained. Dr. Levy's testimony that a two-tailed test is more appropriate for studies involving combination drug remains uncontroverted (the FDA and the drug sponsor do not agree on what constitutes the most important parameters). In another administrative hearing, the following points were discussed: A one-tailed test is clearly proper in two types of studies: 1. Where there is truly only concern with outcomes in one tail, see, e.g., B. Brown, Ex. No. S-1. 81 at 5 2. Where it is completely inconceivable that the results could go in the opposite direction, see, e.g., Gehan, Ex. No. S-1 . 80 at 4 The drug sponsors infer that the prophylactic value of the combination drug is greater than that posted by the null hypothesis of equal incidence, and that therefore the risk of finding an effect when none in fact exists is located only in the upper tail, thus calling for a one-tailed test (see, e.g., Gehan, Ex. No. S-1. 80 at 9). The FDA, on the other hand, applies twotailed tests in order to take into account not only the possibility that the combination drugs are better than the single agent alone at preventing candidiasis, but also the possibility that they are worse at doing so (e.g., Scott, Ex. No. G-8, 6 at 5.).

Dubey

142

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

Comment Biologically, a medical researcher may expect, on the basis of reported research, a new drug to yield positive results, but if such expectations are considered questionable by clinical trial experts on the basis of reported clinical studies, then it is not completely inconceivable that the results cannot go in the opposite direction. Therefore, a one-tailed test is inappropriate in this situation. An investigator cannot simply say, for example, that he will compare a new cancer drug against an established therapy using a one-tailed test because he is interested only in finding out if the new drug is more effective. He must also take into account the possibility that the new drug is worse than the old treatment. In essence, his interest must be sensitive to the consequences implied by the possibility that a postulated benefit is in fact nonexistent. Therefore, a two-tailed test is appropriate in this situation. A one-tailed test is appropriate in the following situation. Suppose one samples drinking water and performs bacterial counts to detect contamination. There is, then, cause for concern only when the population mean count is above some null hypothesis standard that corresponds with a safe level. A population mean count below this standard, i.e., water more pure than the standard, causes no concern (Ex. No. G-8, 57 at 1). Therefore, a one-tailed test is appropriate here.

Comment This example indicates that in toxicity studies, safety evaluation, analysis of occurrences of adverse drug reactions data, risk evaluation, laboratory research data, and the like, the one-sided test may be justified.

Discussion (One-Tailed vs. Two-Tailed) If the combination drugs at issue here are ineffective and are allowed to remain in the market, the result would be unavailing use of these drugs by patients saddled with the added expense of an antifungal agent, when alternatives for lowering the risk of candidiasis associated with antibiotic therapy would be preferable. Lack of effectiveness also raises the question of possibly increasing safety risk. The consequences that follow from the possibility that the efficacy hypothesis is wrong are clearly important. The submitted studies thus cannot be analyzed using a one-tailed test if that use is justified only by reference to investigatory interest. The other rationale for using a one-tailed test, the impossibility of a result going in the direction opposite that postulated, requires "solid and con-

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

One-sided and Two-sided Tests

143

vincing evidence." The evidence available in this proceeding does not meet the required standard for concluding a one-tailed test appropriate, however. The possibility of an interaction between the antibiotic and antifungal components, which could result in changed activity for either component, cannot be ruled out. If the effect of the interaction were to change the activity of either component, or in some fashion increases the risk of contracting candidiasis for some individuals, the use of a one-tailed test would end up sacrificing medically correct findings in the attempt to achieve statistically significant ones. Such a result would be untenable. In addition, the evidence supported by the efficacy studies themselves cannot be used to argue for the appropriateness of applying a one-tailed test. Doing so would be assuming from the beginning the conclusion to be proved that the combination is better at preventing candidiasis than the antibiotic alone. It is not impossible that the reverse is true, given the inconclusive nature of the evidence on drug activity and interaction in this proceeding. A one-tailed test is justified if it is inconceivable for a test result to go in the opposite direction. This is very rare in the context of clinical drug investigations (Ex. No. G-8. 57 at 1). We cannot here rule out the possibility that the actuality is the opposite of that hypothesized. A two-tailed test of significance is therefore necessary for studies attempting to show the combination drugs superior to antibiotics alone in reducing the risk of contracting candidiasis. In another administrative hearing, the administrative judge deals with the following issues.

Placebo-Controlled Trials The manufacturers have asserted that one-tailed tests are appropriate when one is only concerned that the efficacy of the compound in question exceeds the efficacy of a placebo (Cornell, Ex. No. W-104.2 at 2). The manufacturers then contend that two-tailed tests are only needed when one is determining whether there is a dzflerence in efficacy between the compound and placebo. In contrast, the FDA states that one-tailed tests are only appropriate where there is truly only concern with outcomes in one-tail or where it is completely inconceivable that the results could go in the opposite direction (Leung, Ex. No. G-253 at 13). The FDA then concludes that the situations do not apply to a DESI drug because of the concern that a compound may be less effective than a placebo.

Dubey

144

Other Pertinent Points

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

The choice of the test (one-sided or two-sided) should be specified in the protocol, thereby eliminating any possibilities of post hoc manipulations. Such a procedure will allow the test results to be compared to an appropriate benchmark, which should be specified in the protocol (DESI Hearing). It is important that a one-tailed value not be compared to a two-tailed benchmark of significance (DESI Hearing). The Commissioner's final decision in the Cyclomate proceeding stated: . . . as the range of the p-value relied on increases, a greater number of similar consistent results would be necessary to establish that the observed effect is real. Other relevant factors include the quality of each study or the introduction of variables that might affect the results of the study (Fed. Reg. 61, 474; 61, 480, 1980).

Academic Views Distinguished clinical trials practitioners, researchers, and educators have expressed the following views on one-sided and two-sided tests.

Peter Armitage: . . . there is often a temptation to use one-sided tests rather than two-sided tests because the probability level is lower, and therefore the apparent significance is greater. A decision to use a one-sided test should never be made after looking at the data and observing the direction of the departure. Before the data are examined one should decide to use one-sided test only if it is quite certain that departures in one particular direction will always be ascribed to chance, and therefore regarded as non-significant however large they are. This situation rarely arises in practice, and it will be safe to assume that significance tests should almost always be two-sided. (1)

Donald Mainland: If one agent is a placebo, and the other is a drug it is sometimes asserted that only the tail in favor of the drug should be marked off, on the grounds that a placebo, being a no-drug treatment, can not do better than a drug. Those who act on this belief . . . apply a "one-tailed test." This implies that, however much the result of a trial seemed to favor placebo, the investigators would say that they know that it was due to the randomization, but one wonders what would happen if such an extreme result were actually encountered. The method may be suitable in some experiments, but many drugs are so potent that patients are better off on a placebo. In any kind of experiment we should beware

One-sided and Two-sided Tests

145

of the temptation to lower our standards by giving ourselves a better chance of obtaining a "positive result." (2)

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

Eric Lehmann: A confusion as to which of the two tests is appropriate sometimes arises when at the beginning of an investigation it is not clear which of the treatments would be expected to be better should they turn out to differ at all. However, after the observations have been taken, a visual inspection of the data may suggest that, say, B is better and also an explanation of why this might have been expected. It is then tempting to see whether the superiority of B is significant by testing the hypothesis H against the one-sided alternative that B is better than A . Under the given circumstances this is not appropriate. If A had come out ahead, an explanation might also have suggested itself for this finding, and the same line of argument would have led to testing the hypothesis H against the alternative that A is superior. This shows that the hypothesis H would have been rejected if the test statistic had been either sufficiently large or sufficiently small. (3)

Friedman et al. : Two-sided tests should be used unless there is strong justification for expecting a difference in only one direction. In the context of safety data monitoring of clinical trials, a one-sided construct "better reflects data monitoring procedures and patient safeguard needs than do conventional two-tailed test." ( 6 , p. 86)

Joseph L. Fleiss (Columbia University): On the basis of research protocols submitted to Columbia University's Institutional Review Board that I have read, and on the basis of discussions with colleagues elsewhere, I sense a growing tendency toward using one-tailed significance tests in clinical trials. The matter is not trivial, and may be symptomatic of attitudes that are inappropriate in clinical trials.

One explanation offered by several investigators is a variation of "I would not run a trial if I thought the experimental agent (we will label it A) could be worse than the control (we will label it B), so why bother looking for a difference in the 'wrong' direction?" Of course ethical investigators would study a new therapy only if they believed that it might be of benefit, in terms either of efficacy or of safety. What is troublesome to me is that investigators are taking their hopes or expectations as establishing the truth; they're moving from "I do not expect A to be worse than B" to "it is not possible for A to be worse than B." Such certainty has no place in clinical science. A's inferiority to B might be unexpected, but it is never impossible. Another reason sometimes given for conducting a one-tailed test is, "I acknowledge that A might be worse than B" (perhaps the intention is to recommend that A replace B). In my opinion, the investigator should care if A

Dubey

146

is inferior to B. Something was wrong in the theory that predicted A's superiority, and further laboratory research might be indicated. At a minimum, others should be advised that A was found to be inferior to B so that they do not, in ignorance, conduct the same kind of trial. Only a two-tailed test will permit the investigator to distinguish between "A may be no different from B" and "A may be worse than B." (5)

Cochran and Cox:

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

1. A one-tailed test is used when it is known that the new treatment must be at least as good as the standard procedure. 2. A two-tailed test is made when we do not know which treatment is better. (10)

Marvin Zelen (Harvard School of Public Health): My view on one-sided or two-sided tests is that only in very unusual circumstances would one carry out a one-sided test in a clinical trial. Even though trials are initiated comparing an experimental treatment with best standard treatment, in expectation of a positive benefit, there has been a growing body of experience which indicates that it is not uncommon for the experimental treatment to be inferior. The use of a one-sided test is usually a "signal* that the trial has too small a sample size and the investigators are attempting to "squeeze out" a significant result by a "statistical maneuver." (4)

Current Practices Case 1

If a drug sponsor wants to test a new drug against a vehicle known to have fairly high cure rates, then the two-sided test would be considered appropriate. (Incidentally, examples exist which show that in some studies a chosen vehicle has outperformed the drug.) Case 2

If a drug sponsor wants to compare a new drug with a true placebo (i.e., the placebo has absolutely no activity), then the use of a one-sided test may be considered reasonable. Here the responsibility of convincing peers that this is a valid case of true or pure placebo resides with the drug sponsor. An example would be, if the measure of efficacy is an objective scientific measurement which cannot be influenced by the chosen "placebo" or by any other intervention factors. On the other hand, if the measure of efficacy is subjec-

One-sided and Two-sided Tests

147

tive in a placebo-controlled trial, then the use of a two-sided test would be considered appropriate (e.g., psychotropic drug studies).

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

Comment Quite frequently, placebo control is not pure; it is usually pseudo. An example of pseudo-placebo control would be when patients of the study population are allowed to receive concomitant medications for other diseases, such as the use of an antacid for heartburn condition. In gastrointestinal ulcer trials, where an efficacy measure is a healing rate, antacids are also known to heal ulcers. Consequently, placebo control provides a biased efficacy effect. Hence, placebo control is to be considered pseudo-placebo control and not pure placebo control. This kind of situation is not uncommon in clinical trials.

Case 3 In the case of generic bioequivalence clinical trials for topical drugs, a vehicle group is used, and it is required to demonstrate that both the reference and test drugs are superior to the vehicle groups. Here it is recognized that the reference drug is known to be effective for the given indication; thus the possibility of the active drug being inferior to vehicle is not a viable option. Consequently, it is reasonable to use a one-sided test for comparisons with the vehicle arm. (A vehicle arm is used here to keep the study blinded and honest. )

Case 4 For establishing therapeutic equivalence between a test drug and a reference drug, one may formulate the statistical hypothesis in the following manner, requiring the use of one-sided tests. Let 6 be the true unknown difference between the test and the reference drug with respect to the direct difference in the response rates of the two drugs. Let the interval (A, B) be the prespecified clinical "equivalence interval" for this unknown 6. Then test

Here the null hypothesis is the hypothesis of nonequivalence and the alternative hypothesis is the hypothesis of equivalence.

Dubey

148

Case 5

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

In the case of combination drugs, if it is not known, on the basis of prior studies, that the components of a combination drug might be antagonistic between themselves, then the use of the two-sided test would be appropriate. On the other hand, if it is known on the basis of prior studies that the components of a combination drug product are not antagonistic between themselves, then the use of the one-sided test in the formulation of a proper null hypothesis would be appropriate. For instance, in the situation where two components A and B of a combination drug C both contribute to a single outcome efficacy variable, and where A and B are known to be nonantagonistic, then the proper hypothesis can be formulated as follows:

VS.

or equivalently,

where p(A), p(B), and p(C) denote the mean effects of drugs A, B, and the combination drug C, respectively. The one-sided hypotheses for multiple outcome efficacy variables have been formulated by Leung and O'Neill under various situations in a separate paper. ( 7 )

Case 6 For exploring dose response, one may develop a reasonable argument for using a one-sided test. In antihypertension drug studies, this is commonly done. Consider an (r + 1) X (s + 1) factorial trial in which the combination of drugs A and B is studied. Let the doses chosen for study be coded as 0 , 1, . . ., r for drug A and 0, 1, . . ., s for drug B, where r , s 2 1. The dose combination (0, 0 ) represents the placebo. Let mij denote the true mean effect of the ij-th dose combination. For k { l , . . ., r ) and j ~ { l ,. . ., s), let

which quantifies the minimum advantage in drug effect of the ij-th dose com-

One-sided and Two-sided Tests

149

bination as compared to its components. A positive 8,. indicates that the ijth dose combination is therapeutically more effective than either of its components. Let 8, = Max(BV)> 0 , where Max stands for taking maximum over the lattice

Then a drug sponsor may test (8,9)

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

Case 7 Two-sided tests are usual for active control studies in AIDS and anticancer drugs.

Comment Special consideration may be made in life-threatening disease situations under the treatment IND regulation and the kind of new drug under investigation.

Case 8 Two-sided tests are routinely used for placebo-controlled trials in psychotropic drugs.

Concluding Comments In this paper many examples and views have been discussed which make a compelling point in favor of applying a two-sided test. Additionally, several situations have been discussed where one may reasonably argue for use of a one-sided test. In controversial situations, prudence suggests the use of twosided tests.

Acknowledgments The author would like to express his gratitude to his colleague Dr. S. Edward Nevius for thoughtful review of this paper. He would also like to express his thanks to Mr. John Leahy and the Computer Applications Group for assistance rendered in the production of this manuscript.

Dubey

Downloaded by [University of Auckland Library] at 18:48 08 October 2014

References Armitage P: Statistical Methods in Medical Research. Wiley, New York, 1971, p 104. Mainland D: Elementary Medical Statistics. Saunders, Philadelphia, 1963, p 222. Lehmann EL: Nonparamerrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco, 1975, p 26. Ellenburg JH: Biostatistical collaboration in medical research. Biornetrics 46(1): 1-32, 1990. Fleiss J: Some thoughts on two-tailed tests (Letter to the Editor). Controlled Clinical Trials 8: 394, 1987. Friedman, LM, Furberg CD, DeMets DL: Fundamentals of Clinical Trials, 2nd ed. PSG, Littleton, MA, 1985. Leung HM, O'Neill RT: Statistical assessment of combination drugs-A regulatory view. 1986 Proceedings of the Biopharmaceutical Section of the American Statistical Association, ASA, Washington, DC, 1986, pp 33-36. Ng T, Hung HMJ, Chi GYH: A new statistical method for testing a combination therapy's superiority to both of its components. Unpublished report, 1990. Hung HMJ: Testing for global superiority of a combination agent in the presence of baseline imbalance. Unpublished report, 1990. Cochran WG, Cox GM: Experimental Designs, 2nd ed. Wiley, New York, 1966, p 18.

DESZ Administrative Hearings 1. Vioform-Hydrocortisone (HC), Docket No. SON-0012. 2. Certain Combination Drugs Containing Antibiotics and Antifungal Agents, Docket No. 82N-0153. 3. Deprol Tablets, Docket No. 85N-0083. 4. Certain Single-Entity Coronary Vasodilators Containing Pentaerythntol Tetranitrate (PETN), Docket No. 87N-0262.

Some thoughts on the one-sided and two-sided tests.

This paper addresses some scientific and regulatory perspectives, insightful regulatory experiences, academic views, and current practices pertinent t...
457KB Sizes 0 Downloads 0 Views