This article was downloaded by: [The University of Texas at El Paso] On: 08 November 2014, At: 11:05 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Biopharmaceutical Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lbps20

Equivalence Tests for Interchangeability Based on Two One-Sided Probabilities a

a

a

Xiaoyu Dong , Yi Tsong & Meiyu Shen a

Office of Biostatistics/Office of Translational Sciences, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA Accepted author version posted online: 17 Jul 2014.Published online: 31 Oct 2014.

To cite this article: Xiaoyu Dong, Yi Tsong & Meiyu Shen (2014) Equivalence Tests for Interchangeability Based on Two One-Sided Probabilities, Journal of Biopharmaceutical Statistics, 24:6, 1332-1348, DOI: 10.1080/10543406.2014.941987 To link to this article: http://dx.doi.org/10.1080/10543406.2014.941987

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

Journal of Biopharmaceutical Statistics, 24: 1332–1348, 2014 ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2014.941987

EQUIVALENCE TESTS FOR INTERCHANGEABILITY BASED ON TWO ONE-SIDED PROBABILITIES

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

Xiaoyu Dong, Yi Tsong, and Meiyu Shen Office of Biostatistics/Office of Translational Sciences, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA A test treatment is considered to be interchangeable with its reference treatment if they are equivalent and expected to produce the same clinical result in any given patient. To assess interchangeability, FDA Draft Guidance (1999) and Guidance for Industry (2001, 2003) recommend using individual bioequivalence (IBE) and population bioequivalence (PBE) procedures. Chow (1999) and Chow and Liu (1999) gave a discussion on the limitation of the aggregate criteria of the IBE and PBE proposed therein. They mentioned that it is not clear whether IBE or PBE can imply average bioequivalence. Alternative approaches have been proposed to address the weakness of IBE and PBE. Dong et al. (2014) discuss the tolerance interval method and an approximate test for interchangeability defined by a two-sided probability. These tests may not be able to test for the two one-sided tests (TOST) with asymmetric margins around the true mean difference. In addition, the tests of two-sided probability provide no direction when failing the equivalence in interchangeability. Thus, we reexamine the statistical properties of the two one-sided tolerance interval approaches proposed by Tsong and Shen (2007, 2008). In this project, we extend their approach for parallel arms trials and paired/crossover data without the assumption of equal sample sizes and variances. We also develop the exact power function and assess the type I error rate of our proposed approach. In addition, we study the sample size determination based on the interchangeability testing utilizing the tolerance interval method. Key Words: Interchangeability; Power; Sample size; TOST; Type I error rate.

1. INTRODUCTION Individual bioequivalence and population bioequivalence have been proposed in FDA Draft Guidance (1999) and Guidance for Industry (2001, 2003) to assess interchangeability between test and reference treatments. As an alternative approach, probability of response difference between treatments bounded within a prespecified interval was proposed to establish interchangeability. For example, Esinhart and Chinchilli (1994, 1996) proposed to assess the interchangeability with a tolerance-interval-based measurement with a replicated crossover design. Tse et al. (2006) proposed to measure this two-sided probability and test the corresponding hypothesis using an approximate approach by Taylor expansion. As discussed in Dong et al. (2014), these approaches are two-sided tests with respect to the This article not subject to US copyright law. Received October 7, 2013; Accepted May 7, 2014 Address correspondence to Xiaoyu Dong, PhD, Office of Biostatistics/Office of Translational Sciences, Center for Drug Evaluation and Research, FDA, 10903 New Hampshire Ave., Building 21, Room 3617, Silver Spring, MD 20993, USA; E-mail: [email protected]

1332

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

EQUIVALENCE TESTS FOR INTERCHANGEABILITY

1333

proportion of the response difference covered by a prespecified interval (L, U). However, in equivalence assessment, interchangeability can also be assessed by the tail probabilities, which are the proportions of the response difference beyond the lower and upper equivalence margins. Sometimes we also wish to detect the direction of difference. In these circumstances, two one-sided tests are more appropriate for interchangeability assessment. Earlier work on this area can be seen from Liu and Chow (1997) as well as Tsong and Shen (2007, 2008), in which they proposed to use a parametric tolerance interval approach under the normality assumption to assess interchangeability defined by tail probabilities. One of the advantages of using the tolerance interval is that this approach is equivalent as using two one-sided hypotheses of testing the tail probabilities. Under the normality assumption, the two one-sided hypotheses can be further expressed as linear combinations of mean and standard deviation (Tsong and Shen, 2007, 2008). However, the tolerance intervals or test statistics provided in these earlier research studies has made the assumption of either equal sample sizes or equal variances. There are two aims of this article. The first is to extend the two one-sided toleranceinterval approach proposed by Liu and Chow (1997) and Tsong and Shen (2007, 2008) to unequal sample sizes and variances for parallel studies. Unequal sample sizes or variances are likely to happen in parallel studies. The results can be easily simplified to equal sample sizes and equal variance cases. The second aim is to develop the exact sample size and power relationship for the interchangeability testing. With these aims, our article is arranged as follows. In section 2, we introduce the notation and the two one-sided tests (TOST). In particular, the test statistics and the parametric tolerance interval method are proposed for parallel-arm study with and without Satterthwaite’s approximation, as well as for paired and crossover trials. In section 3, we derive the power function for the two one-sided hypotheses for parallel-arm study and paired/crossover data. In section 4, we show that the TOST for the interchangeability is a size-α test and compute the type I error rate and power under various parameter scenarios. Section 5 shows the sample size results under different power. The conclusion is summarized in section 6. 2. NOTATION AND HYPOTHESES 2.1. TOST for Parallel-Arm Study To assess the interchangeability between treatments, the first trial design we consider is the parallel-arm design. This design can be used for drug or biological products with long half-life. The patient response in test (T) X T and reference (R) arms X R are assumed to be normally distributed with 

XT XR



 ∼N

μT μR

  2 σT , 0

0 σR2

 .

Then we can measure the interchangeability between T and R through the distribution of the response difference. More specifically, interchangeability is measured by the tail probabilities, which are the proportions of response difference beyond the prespecified limits L and U. Following the preceding definition, the two one-sided tests for interchangeability can be written as H0L : Pr(XT − XR < L) ≥ P1 vs. HaL : Pr(XT − XR < L) < P1

1334

DONG ET AL.

H0U : Pr(XT − XR > U) ≥ P2 vs. HaU : Pr(XT − XR > U) < P2

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

where P1 and P2 are usually small with 0 < P1 < 1 and 0 < P2 < 1. Hypothesis H0L is testing whether less than 100P1 % of the difference XT − XR is lower than L. Hypothesis HaL is testing whether less than 100P2 % of the difference XT − XR is higher than U. The two products are interchangeable if both tail probabilities are small. That is, Pr(XT − XR < L) < P1 and Pr(XT − XR > U) < P2 . Under the normality assumption, the preceding hypotheses are equivalent to testing the percentile of the distribution of X T - X R as follows: H0L :  −

  σT2 + σR2 Z1−P1 ≤ L vs. HaL :  − σT2 + σR2 Z1−P1 > L

(1)

H0U :  +

  σT2 + σR2 Z1−P2 ≥ U vs. HaU :  + σT2 + σR2 Z1−P2 < U

(2)

where Zp is the 100p%-th percentile of the standard normal distribution, and  = μT − μR , σT2 and σR2 are the variances of the test and the reference products. We will conclude interchangeability if both hypotheses (1) and (2) are rejected at α level. Usually, , σT2 , and σR2 are unknown and need be estimated from the sample. Following the definition in Hall (1984), we define the sample pooled variance S2 by   1 (nT − 1)ST2 + R(nR − 1)SR2 S2 = 1 + R nT + nR − 2 with R = σT2 /σR2 and assumed to be known. If the variance ratio R is unknown, we can use Rˆ = ST2 (nR − 3)/SR2 (nR − 1) proposed by Hall (1984). Then it can be shown that (nT + nR − 2)S2 ∼ χn2T +nR −2 σT2 + σR2 where χv2 is the chi-squared distribution with v degrees of freedom. To test H0L in equation (1), let us consider the following test statistic: TL (N) = On the boundary of H 0L , L =  −



X¯ T − X¯ R − L . S

σT2 + σR2 Z1−P1 . Thus, T L (N) can be written as

   2 2 ˆ  −  − σT + σR Z1−P1

ˆ −L  = S S  ⎛ ⎞ ⎛ ⎞ Z1−P1 σT2 + σR2 ˆ − S  ⎠ ⎝ ⎠. + = ⎝ σT2 /nT + σR2 /nR σT2 /nT + σR2 /nR σT2 /nT + σR2 /nR

TL (N)|H0L =

ˆ = X¯ T − X¯ R and Z = √ Let 

ˆ − σT2 /nT +σR2 /nR

∼ N(0, 1); then

(3)

EQUIVALENCE TESTS FOR INTERCHANGEABILITY

TL (N)|H0L = ⎛



1335

√ Z + Z1−P1 / a ⎞  χn2T +nR −2 (σT2 + σR2 ) σT2 /nT + σR2 /nR ⎠ (nT + nR − 2)

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

√ Z + Z1−P1 / a =

χn2T +nR −2 σT2 + σR2 2 nT + nR − 2 σT /nT + σR2 /nR =



√ √ √ Z + Z1−P1 / a ∼ a × tnT +nR −2 Z1−P1 / a a×

χn2T +nR −2 nT + nR − 2

where tv (γ ) denotes the noncentral t-distribution with v degrees of freedom and noncentrality parameter γ . The constant a is defined by a=

1 σT2 /nT + σR2 /nR = + nT σT2 + σR2



1 1 − nR nT



1 . 1+R

(4)

With equal sample sizes, nT = nR = n and a = 1/n; with equal variances, a = (1/nT + 1/nR )/2. H 0L is rejected if TL (N)|H0L >

√ √ a × tnT +nR −2,1−α Z1−P1 / a

(5)

√ √ where tnT +nR −2,1−α Z1−P1 / a is the 1 – α percentile of tnT +nR −2 Z1−P1 / a . Similarly, the following test statistic can be derived for testing H 0U :    ˆ −  + σT2 + σR2 Z1−P2  ˆ −U √ √  TU (N)|H0U = = ∼ a × tnT +nR −2 −Z1−P2 / a . S S H 0U is rejected if TU (N)|H0U
tv,1−α Z1−P1 / a and TU (N)|H0U < tv,α −Z1−P2 / a . Again, the hypothesis testing just shown is equivalent to applying a one-sided toleranceinterval method. With P1 = P2 = P, interchangeability will be concluded √if the tolerance ˆ + kSSW ) is covered by (L, U) with k = tv,1−α Z1−P / a . With equal ˆ − kSSW ,  interval ( 2 2 = (ST2 + SR2 )/n, v = 2(n – 1), and n(n − 1)SSW /σ 2 ∼ sample sizes and equal variances, SSW 2 χ2(n−1) exactly. 2.3. TOST for Paired Data/Crossover Trial If the data are naturally paired, we can obtain the paired difference variable with di = XTi – XRi; then di ∼ N(, σd2 )

EQUIVALENCE TESTS FOR INTERCHANGEABILITY

1337

with μd = μT − μR , σd2 = σT2 + σR2 − 2ρσT σR , ρ is the correlation, and i = 1, . . . ., n. In this case, the two one-sided probabilities can be represented by the following linear combination of the parameters from one sample: H0L :  − σd Z1−P1 ≤ L vs. HaL :  − σd Z1−P1 > L

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

H0U :  + σd Z1−P2 ≥ U vs. HaU :  + σd Z1−P2 < U 2 ˆ = X¯ T − X¯ R ∼ N(, σd2 /n) and Sd2 ∼ σd2 χn−1 with  /(n − 1). The one-sided tolerance fac√ √ tor k is simply from the one-sample formula with k = tn−1,1−α (Z1−P n)/ n. Conclude interchangeability if (X¯ d − kSd , X¯ d + kSd ) is completely within (L, U).

3. POWER FUNCTION After introducing the test procedure of TOST, we derive its exact power function for the interchangeability assessment in this section. First, let us consider the parallel design. To simplify the notation, let σd2 denote the total variability of the response difference of 2 2 σ assume √T /nT + σR /nR and √ equal tail probability with P1 = P2 = P; then we have k = a × tnT +nR −2,1−α Z1−P / a . The overall power of rejecting both equations (1) and (2), γ (n), can be written as   γ (n) = Pr RH0L ∩ RH0U |, σT2 , σR2   ˆ − kS <  ˆ + kS < U|, σT2 , σR2 = Pr L <  

ˆ − S  U− S L− +k < < −k = Pr σd σd σd σd σd

 .

ˆ − )/σd and Y 2 = (nT + nR − 2)S2 /(σT2 + σR2 ). To further derive the function, let Z = ( 2 2 Then we have Z ∼ N(0,1) and Y ∼ χnT +nR −2 . The power function γ (n) can be expressed as





 U− S2 σT2 + σR2 S2 σT2 + σR2 L− γ (n) = Pr +k < Z < − k σd σd σT2 + σR2 σd2 σT2 + σR2 σd2 

⎛ L− k = Pr⎝ +√ σd a



U− k Y2 L = lim Pθl − lim Pθl (X ∈ R1 ) = lim Pθl  − k σS > σ

l→∞

l→∞

l→∞

( √  U−L −Z(1+P)/2 T +nR −2,δ= n σ

= 1 − lim Gn

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

σ →0

1339

L− σ



√ nk) = 1

where Gv,δ (x) is the cumulative probability function for noncentral t-distribution with degrees of freedom v and non-centrality parameter δ. Based on Theorem 2 in Berger and Hsu (1996), the intersection–union test with rejection region R = R1 ∩ R2 for TOST of interchangeability is a size-α test. We also compute the rejection rate based on the derived power functions (9) and (11) for the parallel-arm study with P1 = P2 = P. Let A(L) =  −



σT2 + σR2 Z1−P − L

and B(U) = U −  −

 σT2 + σR2 Z1−P .

With the preceding definition, A(L) > 0 and B(U) > 0 represent the parameter regions in which the test and reference treatments are interchangeable. Thus, the rejection rate in these regions is the power. Rejection rate in the other regions corresponds to the type I error rate. In our computation, we set P = 0.10. In other words, we are testing whether the two tail probabilities are both less than 10%. We also assume the true mean difference between treatments  = 0 and the total variability σT2 + σR2 = 1 at the boundary of H 0L and H 0U . With these prespecified values, (L, U) = (–1.282, 1.282). Given any combination of A(L) and B(U), we can solve for the values of  and σT2 + σR2 . Under equal sample size, we have shown that for a given total variability, the power function is independent of the variance ratio R. Thus, we just need to compute the rejection rate when R = σT2 /σR2 = 1. In Table 1, we show the rejection rates for various values of A(L) and B(U) with equal sample size (nT , nR ) = (20, 20) and (50, 50), equal variance R = 1, and α = 5% on each hypothesis of TOST. As Table 1 shows, the type I error rate achieves the nominal level 5% when the one-sided hypothesis is on the boundary with either A(L) = 0 or B(U) = 0, and the other hypothesis is at the alternative space with A(L) > 0 or B(U) > 0. Second, for a given value of A(L), the rejection rate increases monotonically with B(U). A similar trend is also observed for a given value of B(U). The type I error rate with either A(L) < 0 or B(U) < 0 is more conservative for larger sample size. When A(L) > 0 and B(U) > 0, the power increases with the sample size as expected. In addition, we compute the rejection rate with unequal sample sizes and various values of variance ratio R. Table 2 gives the results with (nT , nR ) = (20, 40) and variance ratio of R = σT2 /σR2 = 0.5, 1, 2, and α = 5% on each hypothesis of TOST. Table 3 lists the results with a similar setting as in Table 2 with larger sample sizes of (nT , nR ) = (50, 100). As the results show from Tables 2 and 3, the rejection rate decreases with the variance ratio R = σT2 /σR2 when nT < nR . This can be explained as follows. With nT < √ nR , 1 is an increasing function of R. The tolerance factor k = a × a = n1T + ( n1R − n1T ) 1+R

1340

DONG ET AL.

Table 1 Rejection rate (%) of the two one-sided tests (TOST) given the values of A(L) and B(U) with α = 5% for each one-sided hypothesis with equal sample sizes (nT , nR ) = (20, 20) and (50, 50) and equal variance ratio R = 1, with (L, U) = (–1.282, 1.282), P = 0.10, total variance = 1,  = 0 (nT , nR ) = (20, 20), R = 1, k = 1.79 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.001 0.000

0.000 0.000 0.000 0.000 0.001 0.004 0.011 0.015 0.008

0.000 0.000 0.000 0.001 0.011 0.062 0.202 0.301 0.222

0.000 0.000 0.001 0.011 0.109 0.767 2.864 4.768 4.999

0.000 0.000 0.004 0.062 0.767 6.176 23.436 38.756 49.277

0.000 0.001 0.011 0.202 2.864 23.436 73.337 94.187 98.978

0.000 0.001 0.015 0.301 4.768 38.756 94.187 99.998 100.000

0.000 0.000 0.008 0.222 4.999 49.277 98.978 100.000 100.000

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

(nT , nR ) = (50, 50), R = 1, k = 1.58 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.001 0.014 0.050 0.035 0.014

0.000 0.000 0.000 0.001 0.058 1.326 4.567 5.000 5.000

0.000 0.000 0.000 0.014 1.326 24.773 61.434 73.480 85.466

0.000 0.000 0.000 0.050 4.567 61.434 99.590 99.990 100.000

0.000 0.000 0.000 0.035 5.000 73.480 99.990 100.000 100.000

0.000 0.000 0.000 0.014 5.000 85.466 100.000 100.000 100.000

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

√ tnT +nR −2,1−α Z1−P / a is also an increasing function of a. Thus, k increases with the ˆ − kS,  ˆ + kS) is wider for larger variance ratio R. As a result, the tolerance interval ( values of R. This leads to a lower rejection rate. In other words, the probability of concluding interchangeability would be smaller with larger values of R = σT2 /σR2 when nT < nR . When conducting equivalence assessment using a therapeutic endpoint, one also needs to show that the test treatment is effective. This often calls for a three-arm clinical trial including a placebo arm. The equivalence assessment with the interchangeability may be applied only after showing that test treatment is superior to the placebo (Chang et al., 2013). When it is impractical or unethical to include a placebo arm, the efficacy of test treatment may need to be evaluated through the comparison to a reference arm. In such a case, one may need to use the probability Pr(XT − XR > L) > p∗ to imply the superiority over the placebo as in the noninferiority test with α = 2.5% instead of α = 5% used in the standard equivalence test. For this reason, we may need to apply different values of α to each one-sided hypothesis, such as testing H0L : Pr(XT − XR < L) ≥ P1 vs. HaL : Pr(XT − XR < L) < P1 with α L = 2.5% and testing

(13)

EQUIVALENCE TESTS FOR INTERCHANGEABILITY

1341

Table 2 Rejection rate (%) of the two one-sided tests (TOST) given the values of A(L) and B(U) with α = 5% for each one-sided hypothesis with unequal sample sizes (nT , nR ) = (20, 40) and variance ratio R = 0.5, 1, and 2, with (L, U) = (–1.282, 1.282), P = 0.10, total variance = 1,  = 0 (nT , nR ) = (20, 40), R = 0.5, k = 1.68 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.001 0.003 0.003 0.001

0.000 0.000 0.000 0.000 0.004 0.034 0.125 0.144 0.082

0.000 0.000 0.000 0.004 0.080 0.924 3.646 4.968 5.000

0.000 0.000 0.001 0.034 0.924 11.100 38.478 52.955 65.509

0.000 0.000 0.003 0.125 3.646 38.478 92.400 99.198 99.958

0.000 0.000 0.003 0.144 4.968 52.955 99.198 100.000 100.000

0.000 0.000 0.001 0.082 5.000 65.509 99.958 100.000 100.000

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

(nT , nR ) = (20, 40), R = 1, k = 1.70 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.001 0.003 0.003 0.001

0.000 0.000 0.000 0.000 0.003 0.027 0.119 0.165 0.100

0.000 0.000 0.000 0.003 0.058 0.728 3.298 4.936 5.000

0.000 0.000 0.001 0.027 0.728 9.188 35.095 50.320 62.607

0.000 0.000 0.003 0.119 3.298 35.095 90.054 98.802 99.921

0.000 0.000 0.003 0.165 4.936 50.320 98.802 100.000 100.000

0.000 0.000 0.001 0.100 5.000 62.607 99.921 100.000 100.000

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

(nT , nR ) = (20, 40), R = 2, k = 1.71 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.001 0.003 0.004 0.002

0.000 0.000 0.000 0.000 0.002 0.021 0.110 0.182 0.120

0.000 0.000 0.000 0.002 0.043 0.578 2.962 4.889 5.000

0.000 0.000 0.001 0.021 0.578 7.645 31.997 47.954 59.956

0.000 0.000 0.003 0.110 2.962 31.997 87.502 98.312 99.863

0.000 0.000 0.004 0.182 4.889 47.954 98.312 100.000 100.000

0.000 0.000 0.002 0.120 5.000 59.956 99.863 100.000 100.000

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

H0U : Pr(XT − XR > U) ≥ P2 vs. HaU : Pr(XT − XR > U) < P2

(14)

(14), one-sided tolerance with α U = 5%.√To test equations (13) and√ the corresponding √ √factors are kL = a × tnT +nR −2,1−αL Z1−P1 / a and kU = a × tnT +nR −2,1−αU Z1−P2 / a . Then the power function becomes

1342

DONG ET AL.

Table 3 Rejection rate (%) of the two one-sided tests (TOST) given the values of A(L) and B(U) with α = 5% for each one-sided hypothesis with unequal sample sizes (nT , nR ) = (50, 100) and variance ratio R = 0.5, 1, and 2, with (L, U) = (–1.282, 1.282), P = 0.10, total variance = 1,  = 0 (nT , nR ) = (50, 100), R = 0.5, k = 1.523 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.005 0.016 0.007 0.002

0.000 0.000 0.000 0.000 0.047 1.888 4.923 5.000 5.000

0.000 0.000 0.000 0.005 1.888 44.579 78.446 88.031 95.635

0.000 0.000 0.000 0.016 4.923 78.446 99.993 100.000 100.000

0.000 0.000 0.000 0.007 5.000 88.031 100.000 100.000 100.000

0.000 0.000 0.000 0.002 5.000 95.635 100.000 100.000 100.000

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

(nT , nR ) = (50, 100), R = 1, k = 1.533 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.005 0.019 0.010 0.003

0.000 0.000 0.000 0.000 0.032 1.542 4.858 5.000 5.000

0.000 0.000 0.000 0.005 1.542 39.547 75.486 85.676 94.280

0.000 0.000 0.000 0.019 4.858 75.486 99.982 100.000 100.000

0.000 0.000 0.000 0.010 5.000 85.676 100.000 100.000 100.000

0.000 0.000 0.000 0.003 5.000 94.280 100.000 100.000 100.000

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

(nT , nR ) = (50, 100), R = 2, k = 1.542 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.004 0.022 0.014 0.004

0.000 0.000 0.000 0.000 0.022 1.256 4.769 5.000 5.000

0.000 0.000 0.000 0.004 1.256 35.057 72.650 83.332 92.812

0.000 0.000 0.000 0.022 4.769 72.650 99.962 100.000 100.000

0.000 0.000 0.000 0.014 5.000 83.332 100.000 100.000 100.000

0.000 0.000 0.000 0.004 5.000 92.812 100.000 100.000 100.000

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

    ˆ − kL S <  ˆ + kU S < U γ (n) = Pr RH L ∩ RH U = Pr L <  0

 = 0

0

⎡ ⎛ b

kU ⎣ ⎝ U −  − √ σd a



⎞ ⎛ ⎞⎤

2 k Y2 Y L −  L ⎠ − ⎝ ⎠⎦f (Y 2 )dY 2 +√ nT + nR − 2 σd a nT + nR − 2 (15)

EQUIVALENCE TESTS FOR INTERCHANGEABILITY

1343

Table 4 Rejection rate (%) of the two one-sided tests (TOST) given the values of A(L) and B(U) with α L = 2.5% and α U = 5% with equal sample sizes (nT , nR ) = (20, 20) and (50, 50) and equal variance ratio R = 1, with (L, U) = (–1.282, 1.282), P = 0.10, total variance = 1,  = 0 (nT , nR ) = (20, 20), R = 1, kL = 1.90, kU = 1.79 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000

0.000 0.000 0.000 0.000 0.000 0.002 0.008 0.012 0.008

0.000 0.000 0.000 0.001 0.005 0.037 0.148 0.272 0.221

0.000 0.000 0.000 0.005 0.055 0.473 2.233 4.540 4.997

0.000 0.000 0.002 0.027 0.391 3.933 19.437 38.148 49.276

0.000 0.000 0.004 0.085 1.450 15.404 64.658 93.956 98.978

0.000 0.000 0.005 0.121 2.390 26.471 88.180 99.991 100.000

0.000 0.000 0.003 0.086 2.500 35.587 97.191 100.000 100.000

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

(nT , nR ) = (50, 50), R = 1, kL = 1.64, kU = 1.58 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.008 0.042 0.035 0.014

0.000 0.000 0.000 0.000 0.026 0.869 4.232 4.999 5.000

0.000 0.000 0.000 0.005 0.635 17.998 60.133 73.480 85.466

0.000 0.000 0.000 0.017 2.271 48.216 99.166 99.990 100.000

0.000 0.000 0.000 0.012 2.500 61.140 99.957 100.000 100.000

0.000 0.000 0.000 0.004 2.500 76.004 100.000 100.000 100.000

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

with b = a(nT + nR − 2)(U − L)2 /[(kL + kU )2 σd2 ] and all other notations are the same as defined earlier for power function (9). We computed the rejection rate with the same scenarios as in Tables 1 to 3 except using difference α levels for the two one-sided hypotheses with α L = 2.5% and α U = 5%. The results with equal sample size and equal variance are shown in Table 4. The results with unequal sample sizes and different values of the variance ratio R are listed in Table 5 and Table 6. The results show patterns similar to those from Tables 1 to 3, except the rejection rate is lower with a tighter α value for H0L , as expected.

5. SAMPLE SIZE CALCULATION To compute the sample size, we assume equal sample sizes in the test and reference arms with nT = nR = n for the parallel arm study. The power function is shown in equation (10). Because there is no analytical formula for the sample size, we plot the power curve as a function of n with the true mean difference  = 0, (L, U) = (–1.282, 1.282) and the total variability σT2 + σR2 = 0.2(0.1)0.7 in Figure 1. Here, total variability σT2 + σR2 < 1 is in the rejection region of HaL and HaU . Again, P1 = P2 = P = 0.10, α = 5%, and R = σT2 /σR2 = 1. The sample sizes to achieve power of 0.80, 0.90, and 0.95 are listed in Table 7.

1344

DONG ET AL.

Table 5 Rejection rate (%) of the two one-sided tests (TOST) given the Values of A(L) and B(U) with α L = 2.5% and α U = 5% with unequal sample sizes (nT , nR ) = (20, 40) and variance ratio R = 0.5, 1, and 2, with (L, U) = (–1.282, 1.282), P = 0.10, total variance = 1,  = 0 (nT , nR ) = (20, 40), R = 0.5, kL = 1.766, kU = 1.682 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.002 0.001

0.000 0.000 0.000 0.000 0.002 0.020 0.097 0.138 0.082

0.000 0.000 0.000 0.001 0.039 0.577 3.056 4.915 5.000

0.000 0.000 0.000 0.013 0.454 7.376 34.776 52.897 65.509

0.000 0.000 0.001 0.048 1.818 27.090 88.192 99.195 99.958

0.000 0.000 0.001 0.053 2.484 39.331 97.829 100.000 100.000

0.000 0.000 0.000 0.029 2.500 51.926 99.833 100.000 100.000

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

(nT , nR ) = (20, 40), R = 1, kL = 1.785, kU = 1.698 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.003 0.001

0.000 0.000 0.000 0.000 0.001 0.015 0.087 0.154 0.100

0.000 0.000 0.000 0.001 0.027 0.438 2.667 4.847 5.000

0.000 0.000 0.000 0.010 0.351 5.944 31.053 50.215 62.607

0.000 0.000 0.001 0.046 1.631 24.317 85.044 98.794 99.921

0.000 0.000 0.001 0.062 2.467 36.882 96.952 100.000 100.000

0.000 0.000 0.000 0.036 2.500 48.914 99.709 100.000 100.000

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

(nT , nR ) = (20, 40), R = 2, kL = 1.803, kU = 1.713 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.003 0.002

0.000 0.000 0.000 0.000 0.001 0.011 0.077 0.165 0.120

0.000 0.000 0.000 0.001 0.019 0.335 2.314 4.752 5.000

0.000 0.000 0.000 0.008 0.272 4.818 27.709 47.781 59.956

0.000 0.000 0.001 0.042 1.451 21.824 81.727 98.297 99.863

0.000 0.000 0.001 0.068 2.441 34.728 95.933 100.000 100.000

0.000 0.000 0.001 0.043 2.500 46.230 99.532 100.000 100.000

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

We also plot the power curves in Figure 2 and compute the sample sizes for α L = 2.5% and α U = 5% in Table 8. As the preceding results show, the power increases with the sample size per arm and decreases with the total variability. Moreover, we would require more sample size with a more stringent α value of α L = 2.5% and α U = 5% to achieve a given power. The increment in sample size becomes larger with larger total variability.

EQUIVALENCE TESTS FOR INTERCHANGEABILITY

1345

Table 6 Rejection rate (%) of the two one-sided tests (TOST) given the values of A(L) and B(U) with α L = 2.5% and α U = 5% with unequal sample sizes (nT , nR ) = (50, 100) and variance ratio R = 0.5, 1, and 2, with (L, U) = (–1.282, 1.282), P = 0.10, total variance = 1,  = 0 (nT , nR ) = (50, 100), R = 0.5, kL = 1.572, kU = 1.523 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.003 0.014 0.007 0.002

0.000 0.000 0.000 0.000 0.020 1.318 4.824 5.000 5.000

0.000 0.000 0.000 0.002 0.896 35.549 78.296 88.031 95.635

0.000 0.000 0.000 0.005 2.457 67.288 99.980 100.000 100.000

0.000 0.000 0.000 0.002 2.500 79.743 100.000 100.000 100.000

0.000 0.000 0.000 0.001 2.500 91.309 100.000 100.000 100.000

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

(nT , nR ) = (50, 100), R = 1, kL = 1.583, kU = 1.532 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.003 0.017 0.010 0.003

0.000 0.000 0.000 0.000 0.013 1.026 4.700 5.000 5.000

0.000 0.000 0.000 0.001 0.715 30.742 75.232 85.676 94.280

0.000 0.000 0.000 0.006 2.419 63.770 99.955 100.000 100.000

0.000 0.000 0.000 0.003 2.500 76.536 99.999 100.000 100.000

0.000 0.000 0.000 0.001 2.500 89.087 100.000 100.000 100.000

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

(nT , nR ) = (50, 100), R = 2, kL = 1.595, kU = 1.542 B(U)

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.002 0.018 0.014 0.004

0.000 0.000 0.000 0.000 0.009 0.798 4.541 5.000 5.000

0.000 0.000 0.000 0.001 0.568 26.570 72.254 83.332 92.812

0.000 0.000 0.000 0.007 2.365 60.505 99.910 100.000 100.000

0.000 0.000 0.000 0.004 2.500 73.460 99.998 100.000 100.000

0.000 0.000 0.000 0.001 2.500 86.787 100.000 100.000 100.000

A(L) −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00

6. SUMMARY In this article, we propose the test statistics and tolerance interval approach for testing two one-sided equivalence hypotheses for interchangeability between treatments. Unlike the two-sided hypothesis shown in Dong et al. (2014), the hypothesis testing is equivalent to using a tolerance-interval method for TOST of interchangeability with exact solutions. Under the normality assumption, the two one-sided hypotheses of interchangeability can

DONG ET AL.

0.6 0.4 0.2

Total Var = 0.2 Total Var = 0.3 Total Var = 0.4 Total Var = 0.5 Total Var = 0.6 Total Var = 0.7

0.0

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

Rejection Rate

0.8

1.0

1346

0

50

100

150

200

250

300

Sample Size per Arm

Figure 1 Power versus sample size per arm for TOST with  = 0, (L, U) = (–1.282, 1.282), P = 0.10, α = 5% for each one-sided hypothesis, total variability σ 2 = σT2 + σR2 = 0.2 (0.1) 0.7, and R = σT2 /σR2 = 1 for parallel-arm study. Table 7 Sample size per arm for TOST with μ = 0, (L, U) = (–1.282, 1.282), P = 0.10, α = 5% for each one-sided hypothesis, σ 2 = σT2 + σR2 = 0.2 (0.1) 0.7, and R = σT2 /σR2 = 1

Power = 0.80 Power = 0.85 Power = 0.90

σ 2 = 0.2

σ 2 = 0.3

σ 2 = 0.4

σ 2 = 0.5

σ 2 = 0.6

σ 2 = 0.7

8 9 10

15 16 18

27 30 34

50 55 62

96 106 120

206 228 259

be expressed as linear combination of the mean difference and total variability as shown in equations (1) and (2). With such properties, the exact solutions can be provided. Our work is different from the previous work because we provide the exact test statistics, tolerance interval, and exact power function without equal sample sizes and equal variances assumption. We also show the TOST of interchangeability is a size-α test from both theoretical derivation and simulation studies. With respect to the power function, we derive the exact power function for both parallel-arm and paired data for the interchangeability assessment. Due to the complexity of the power function, no close form of the sample size can be obtained. In general, two one-sided probabilities exert a more stringent requirement for interchangeability than the two-sided probability. For two-sided probability, we are testing whether the proportion within (L, U) is larger than a given number, say P = 80%. Or equivalently, we are testing whether the proportion outside (L, U) is less than 1 – P. For two one-sided hypotheses, we are testing a particular partition of the proportion outside (L, U). More specifically, we wish to test whether the proportion less than L is less than a specific value P1 and the proportion greater than U is less than a specific value P2 as well. If either the mean or the variance is given, the two-sided hypothesis and two one-sided hypotheses can be converted to each other.

1347

0.6 0.4 0.2

Total Var = 0.2 Total Var = 0.3 Total Var = 0.4 Total Var = 0.5 Total Var = 0.6 Total Var = 0.7

0.0

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

Rejection Rate

0.8

1.0

EQUIVALENCE TESTS FOR INTERCHANGEABILITY

0

50

100

150

200

250

300

350

Sample Size per Arm

Figure 2 Power versus sample size per arm for TOST with  = 0, (L, U) = (–1.282, 1.282), P = 0.10, α L = 2.5% and α U = 5%, total variability σ 2 = σT2 + σR2 = 0.2 (0.1) 0.7, and R = σT2 /σR2 = 1 for parallel-arm study. Table 8 Sample size per arm for TOST with μ = 0, (L, U) = (–1.282, 1.282), P = 0.10, α L = 2.5% and α U = 5%, σ 2 = σT2 + σR2 = 0.2 (0.1) 0.7, and R = σT2 /σR2 = 1

Power = 0.80 Power = 0.85 Power = 0.90

σ 2 = 0.2

σ 2 = 0.3

σ 2 = 0.4

σ 2 = 0.5

σ 2 = 0.6

σ 2 = 0.7

9 10 11

17 19 21

31 34 38

56 62 69

108 119 134

231 256 289

DISCLAIMER This article does not represent the official position of the U.S. Food and Drug Administration.

REFERENCES Berger, R. L., Hsu, J.C. (1996), Bioequivalence trials, intersection–union tests and equivalence confidence sets. Statistical Science 11(4):283–319. Chang, Y.-W., Tsong, Y., Dong, X. (2014). Sample size determination for a three-arm equivalence trial of normally distributed responses. Journal of Biopharmaceutical Statistics 24(6):XX–XX. Chinchilli, V. M. and Esinhart, J. D. (1996). Design and analysis of intra-subject variability in crossover experiments. Statistics in Medicine 15:1619–1634. Chow, S.-C. (1999). Individual bioequivalence—A review of the FDA draft guidance. Drug Information Journal 33:435–444. Chow, S.-C., Hsieh, T.-C., Chi, E., Yang, J. (2010). A comparison of moment based and probabilitybased criteria for follow-on biologics. Journal of Biopharmaceutical Statistics 20:1, 31–35.

Downloaded by [The University of Texas at El Paso] at 11:05 08 November 2014

1348

DONG ET AL.

Dong, X., Tsong, Y. (2014). Equivalence assessment for interchangeability based on two sided tests. Journal of Biopharmaceutical Statistics 24(6):XX–XX. Esinhart, J. D., Chinchilli, V. M. (1994). Sample size considerations for assessing individual bioequivalence based on the method of tolerance interval. International Journal of Clinical Pharmacology and Therapeutics 32:1, 26–32. Esinhart, J. D., Chinchilli, V. M. (1994). Extension to the use of tolerance interval for assessment of individual bioequivalence. Journal of Biopharmaceutical Statistics. 4:1, 39–52. Food and Drug Administration. (1999). FDA Draft Guidance for Industry. Average, Population, and Individual Approaches to Establishing Bioequivalence. Rockville, MD: Center for Drug Evaluation and Research, Food and Drug Administration. Food and Drug Administration. (2001). FDA Guidance for Industry. Statistical Approach to Establishing Bioequivalence. U.S. Department of Health and Human Services. Rockville, MD: Center for Drug Evaluation and Research, Food and Drug Administration. Food and Drug Administration. (2003). FDA Guidance for Industry. Bioavailability and Bioequivalence Studies for Orally Administered Drug Products—General Considerations. U.S. Department of Health and Human Services. Rockville, MD: Center for Drug Evaluation and Research, Food and Drug Administration. Hall, I. J. (1984). Approximated one-sided tolerance interval for the difference or sum of two independent normal variables. Journal of Quality Technology 16:15–1. Liu, J-P., Chow, S.-C. (1999). Some thoughts on individual bioequivalence. Journal of Biopharmaceutical Statistics 16:861–874. Tse, S.-K., Chang, J.-Y., Su, W.-L., Chow, S.-C., Hsiung, C., Lu, Q. (2006). Statistical quality control process for traditional Chinese medicine. Journal of Biopharmaceutical Statistics 7(1):41–48. Tsong, Y., Shen, M. (2007). An alternative approach to assess exchangeability of a test treatment and the standard treatment with normally distributed response. Journal of Biopharmaceutical Statistics 17:329–338. Tsong, Y., Shen, M., Lostritto, R.T., Poochikian, G.K. (2008). Parametric two-tier sequential quality assurance test of delivery dose uniformity of multiple-dose inhaler and dry powder inhaler drug products. Journal of Biopharmaceutical Statistics 18(5):976–984.

Equivalence tests for interchangeability based on two one-sided probabilities.

A test treatment is considered to be interchangeable with its reference treatment if they are equivalent and expected to produce the same clinical res...
292KB Sizes 1 Downloads 3 Views