Research Article Received 23 May 2013,

Accepted 12 January 2014

Published online 9 February 2014 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/sim.6103

Likelihood ratio based tests for longitudinal drug safety data‡ Lan Huang,* † Jyoti Zalkikar and Ram Tiwari This article presents longitudinal likelihood ratio test (LongLRT) methods for large databases with exposure information. These methods are applied to a pooled large longitudinal clinical trial dataset for drugs treating osteoporosis with concomitant use of proton pump inhibitors (PPIs). When the interest is in the evaluation of a signal of an adverse event for a particular drug compared with placebo or a comparator, the special case of the LongLRT, referred to as sequential LRT (SeqLRT), is also presented. The results show that there is some possible evidence of concomitant use of PPIs leading to more adverse events associated with osteoporosis. The performance of the proposed LongLRT and SeqLRT methods is evaluated using simulated datasets and shown to be good in terms of (conditional) power and control of type I error over time. The proposed methods can also be applied to large observational databases with exposure information under the US Food and Drug Administration Sentinel Initiative for active surveillance. Published 2014. This article is a US Government work and is in the public domain in the USA. Keywords:

safety surveillance; drug exposure; active surveillance; sequential method; longitudinal method

1. Introduction Risk assessment of every drug during development is usually conducted in a thorough manner; however, it is impossible to identify all adverse events (AEs) during clinical trial phases before the Food and Drug Administration (FDA) approval (as trials are usually powered for efficacy). Once the drug is marketed after approval, a large number of patients from the general population become exposed, and reported cases of AEs become available through electronic health record databases, insurance companies, healthcare professionals, patients, consumers, and other reporting resources. Data mining methods for safety signal detection in the large postmarket databases include reporting odds ratio [1], proportional odds ratio [2], multi-Gamma Poisson shrinker [3], Bayesian confidence propagation neural network [4], and likelihood ratio test (LRT) [5]. The postmarket databases, such as the FDA Adverse Event Reporting System (FAERS), do not have the information on the actual patient population at risk or exposed to drugs; consequently, signals detected from these data mining methods are called passive surveillance. In May 2008, the FDA’s Sentinel Initiative launched the FDA Sentinel System [6], to develop an enhanced ability to monitor the safety of drugs and other medical products after they reach the market. The electronic health record observational administrative databases (including longitudinal datasets) from participating healthcare partners in the Sentinel Initiative include subject level information on AEs, drugs, drug exposure, and demographic information. A scientific approach, called active surveillance, refers to the statistical signal detection methods for detecting signals at different analysis periods (i.e., at different looks) as the longitudinal data accumulate. Under the Mini-Sentinel pilot program launched by the FDA, the Sequential Methods Working Group developed a regression-based sequential method [7] and compared its performance with conditional sequential sampling procedure (CSSP) [8] and group sequential likelihood ratio test [7]. Another method,

2408

Division of Biometrics V, Office of Biostatistics, OTS, CDER, FDA, Silver Spring, MD 20993, U.S.A. *Correspondence to: Lan Huang, Division of Biometrics V, Office of Biostatistics, OTS, CDER, FDA, Silver Spring, MD 20993, U.S.A. † E-mail: [email protected] ‡ This article reflects the views of the authors and should not be construed to represent FDA’s views or policies.

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

L. HUANG, J. ZALKIKAR AND R. TIWARI

maximized sequential probability ratio test (maxSPRT), is developed for vaccine data monitoring [9]. However, these methods, with controlled type I error, are only for a single AE comparing two drugs (one test drug and a placebo) with exposure information over time. They cannot be used for recurrent cases of a single AE, for multiple AE cases, or for more than two drugs. In safety surveillance, it is common to observe multiple events for one subject (recurrent cases or multiple AE cases). For example, a patient with leukemia can have recurrent thrombocytopenia events during the treatment. A patient with osteoporosis can have multiple fractures such as foot fracture, ankle fracture, and hip fracture (multiple AE events) in a study. Therefore, it is important to develop methods for evaluating the risk of recurrent AE or multiple AEs. Some of the signal detection methods for multiple-drug AE comparisons in passive surveillance for spontaneous databases can be modified for active surveillance of the longitudinal databases [10] allowing the use of exposure-time as denominator (see, for example, OMOP library of methods at http://omop.fnih.org/MethodsLibrary). However, the type I error and false discovery rate (FDR) for these methods are not controlled. Here, we propose a longitudinal LRT method, referred to as LongLRT, for the exposure-based data for active surveillance in observational or clinical trial databases for one or more drugs (such as a drug class) and/or one or more AEs or safety health outcomes (safety endpoints or events) of interest. This method covers the special case of sequential LRT (SeqLRT) where the process stops at a look when a success (i.e., a signal) is found for a drug of interest versus placebo or a comparator with a single AE of interest such as acute myocardial infarction. In Section 2, we describe the large proton pump inhibitors (PPIs) database consisting of multiple clinical trials conducted in the 1990s and the 2000s. We illustrate, in Section 5, the application of the proposed LongLRT for exploring safety signals of the drugs (test drugs and placebo) treating postmenopausal osteoporosis with and without the concomitant use of PPIs. As a special case, we illustrate the application of SeqLRT for comparison of placebo versus placebo with concomitant use of PPIs for a composite adverse event (AEOST) with recurrence (more details provided in Appendix A1) over five looks.

2. Data 2.1. Medical background and data structure

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

2409

Proton pump inhibitors are a class of drugs that decrease gastric acid secretion through inhibition of the proton pump. It helps in the secretion of acid from the stomach glands. In a recent study, it has been found that PPIs are associated with increased risk of hip fractures (side effect) [11, 12]. It is important to study the safety issues of PPIs using clinical trial datasets for regulatory labeling. Six drugs have been used as the test drugs for treating patients with osteoporosis in the clinical trials with data available from FDA/OTS/CSC legacy database. Three of these six drugs (raloxifene, teriparatide (parathyroid hormone PTH 1-34), and ibandronate) were approved in the USA in the 1990s and the 2000s and are sold in the US market after the FDA approval. The other three drugs (bazedoxifene, lasofoxifene, and PTH (parathyroid hormone PTH 1-84)) were only approved in Europe. The intention of these trials was to evaluate if the concomitant use of PPIs reduce the efficacy of the six test drugs treating osteoporosis among targeted patients. The minimum and maximum exposure (EX) times, shown in Table I, are obtained using the start date and the end date of the test drugs and placebo for individuals. The exposure information for the test drugs, including their start and end dates, is complete in the database. However, the exposure information for the concomitant use of PPIs is not complete. Also, in the database, a patient may have taken one or more PPI drugs with varying durations within the overall exposure period of the test drug. Therefore, we simply code the patients with concomitant PPIs having test drug as ‘test drugCPPIs’ and the exposure-time of test drugCPPIs as the exposure-time of the test drug. It is expected that more AE cases associated with osteoporosis are to be observed in the subjects with concomitant PPIs. The minimum and maximum AE times (shown in Table I) are, respectively, the minimum and maximum of the occurrence dates of AEs from all AEs reported in the trials. Aggregated clinical trial data with exposure information were generated from the individual-level clinical trial data. As shown in Table I, 10 trials from the legacy database are included in this study. Among the subjects taking a test drug for treating osteoporosis, there is a small percent of subjects taking PPIs concomitantly. Six test drugs plus placebo were used for treating female participants with

L. HUANG, J. ZALKIKAR AND R. TIWARI

Table I. Summary information of the clinical trials for PPIs in the legacy database. Test drug for treating osteoporosis Trials 1

2

3 4 5 6 7 8 9 10

Concomitant PPIs

EX time

AE time

Test drug

N (subject)

# PPI drugs

N (subject)

min

max

min

max

Bazedoxifene Placebo Raloxifene Bazedoxifene Placebo Raloxifene Lasofoxifene Lasofoxifene Placebo Lasofoxifene Placebo PTH Placebo Teriparatide Placebo Teriparatide Placebo Ibandronate Placebo Ibandronate Placebo

1025 334 332 3778 1892 1856 8556 685 230 734 245 1261 1223 1093 544 289 147 1964 982 489 159

5 5 5 5 5 5 5 5 4 5 5 5 5 3 3 3 0 4 3 3 3

71 25 29 477 245 302 965 64 20 60 36 176 165 38 19 12 0 82 32 21 9

2001

2004

1983

2014

2001

2007

1947

2006

2001 2000

2006 2003

2000 2000

2006 2003

2000

2003

2000

2003

2000

2003

2000

2003

1996

1999

1920

2019

1997

1999

1920

1999

1996

2000

1996

2000

1998

2001

1998

2001

AE, adverse event; PPIs, proton pump inhibitors; PTH, parathyroid hormone.

postmenopausal osteoporosis and severe osteoporosis. The sample sizes (number of subjects) in the studies ranged from hundreds to thousands in each arm. Some patients were treated with concomitant PPIs (a single PPI or mixed use of several PPIs) during the exposure period of the test drugs. There are a total of five PPIs. The sample size for patients with concomitant PPIs in each arm is usually less than 10% of the total sample size. It is difficult to evaluate the safety issues of the concomitant use of PPIs in a single trial, as generally the study is powered for efficacy. Thus, the pooled data consisting of multiple clinical trial datasets, resulting into a large sample size, are needed for evaluating the safety issues of the concomitant use of multiple PPI drugs. The common data models, under the FDA Sentinel Initiative, have similar data structure, and hence, the methodology developed here can be applied to the common data models. 2.2. Definitions of drug exposure We present different definitions of the drug exposure (e.g., event-time, person-time, and exposure-time), which can be used as a denominator for evaluating safety issues, along with different statistical models. We define all AE cases that occur during the exposure period between the start and end dates of a test drug with or without PPIs as countable cases. The event-time is defined as the duration from the start date of the test drug exposure to the AE (event) start date if the AE case is a countable case. Note that this definition of a countable case implicitly implies that all countable AE cases occur between the start and end dates of a test drug. However, there are situations, for example, in vaccine safety studies, where other definitions of countable cases are considered such as the AE cases that occur 7 days after the drug exposure evaluation. In this paper, we do not consider events after the exposure duration. Each countable AE case has one event-time, and each subject could have many countable AE cases l.i;s/ and hence many event-time records. As shown in Figure 1, define event-time Pijs as the event-time for

2410

l.i;s/ subject s, taking drug j , and having lth occurrence of the ith AE. Note that Pijs allows for unequal event-times for different subjects. The event-time for the j th drug and the ith AE is then the sum of P P l.i;s/ the event-times of the countable cases over all subjects and occurrences, Pij D s l.i;s/ Pijs , where summations are over s D 1;    ; S and l.i; s/ D 1;    ; L.i; s/, respectively, S is the total number of subjects, and L.i; s/ is the total number of occurrences of the ith AE for the sth subject.

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

L. HUANG, J. ZALKIKAR AND R. TIWARI

Figure 1. Definition of event-time. * is the start date of drug j , and ** is the stop date of drug j . Circled dots are the occurrences of adverse events (AEs) .AE i; i D 1; 2; 3;    /. Only AEs between * and ** are countable cases 1 and are shown in the plots over time. Pijs is the event-time for the sth subject taking the j th drug and having 2 first occurrence of the ith AE. Pijs is the event-time for the sth subject taking the j th drug and having second occurrence of the ith AE.

Figure 2. Definition of person-time. * is the start date of drug j , and ** is the stop date of drug j . Circled dots are the occurrences of adverse events (AEs) (AE1 is the AE of interest). Only AE1s between * and ** are 1 countable cases and are shown in the plots over time. P1js D P1js is the event-time for the sth subject taking the j th drug and having first occurrence of the ith AE1, which is also person-time.

P P The marginal row and column totals of Pij are Pi: D j Pij and P:j D i Pij , and the grand total is P:: . When each subject takes only one drug and has only a single AE reported, L.i; s/ D 1 and i D 1, and the event-time is exactly the person-time (shown in Figure 2), which is the definition used by Brown et al. [13], Li [8], and Cook et al. [7] (in active surveillance for dealing with one AE of interest and to compare a single drug versus a comparator). If a subject does not have any AE, the person-time is simply the duration of drug exposure. The person-times for the ith AE and P the j th drug is simply the sum of person-times for the ith AE and the j th drug over subjects, Pij D s Pijs . Another way of incorporating the exposure information is to define the ‘exposure-time’, Pds , for subject s taking drug d (from start date to the end date of the drug use) as the overall drug exposure duration (assuming that each subject only takes one drug among the 14 drugs (six test drugs, one placebo, six test drugCPPIs, the total exposure P and one placeboCPPIs)). As shown in Figure 3, P P P duration for the drug d is Pd D s Pds , and the grand total of the exposure is P: D d Pd D d s Pds . Note that all AEs, for subject s, that occurred during Pds share the same exposure duration, Pds . The countable cases are the AE cases that occur during the exposure-time. For subjects without any event during the drug exposure, the person-time and the exposure-time are defined as the duration from the start date to the end date of the drug exposure. 2.3. Defining multiple looks

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

2411

The analysis period or look k .k D 1;    ; K/ includes the cumulative data from time intervals 1; 2;    ; up to the kth interval, and look K is the last look consisting of the largest analysis period (i.e., the period

L. HUANG, J. ZALKIKAR AND R. TIWARI

Figure 3. Definition of exposure-time. * is the start date of drug d , and ** is the stop date of drug d . Circled dots are the occurrence of adverse events (AEs) .AE i; i D 1; 2; 3;    /. Only AEs between * and ** (exposure duration) are countable cases and are shown in the plots over time. Pds is the exposure-time for the sth subject taking the d th drug and having multiple AEs during the exposure duration.

containing the time intervals 1; 2;    ; K). Note that there are two ways to define the study intervals: calendar time and time after treatment. Using calendar time, we simply combine all the 10 clinical studies. With unequal time intervals 1996–1997, 1998–1999, 2000–2001, 2002–2003, and 2004–2007, we have five analysis periods or looks of the data defined by 1996–1997, 1996–1999, 1996–2001, 1996–2003, and 1996–2007. Using time after treatment, which is defined as the time of the first occurrence of a single AE (e.g., a composite AE consisting of all PT terms associated with osteoporosis) after the start of the treatment, one can evaluate the safety signal over time after treatment (30 days, half year, 1 year, etc.). However, in this paper, only calendar time is considered for defining the analysis periods.

3. Exposure-based LongLRT statistics We define event rate as the ratio of the number of countable cases to the event-time (as denominator). Similarly, we define risk as the ratio of the single countable case to the person-time (as denominator) or ratio of the multiple countable cases to the exposure-time (as denominator).

3.1. LongLRT for comparing multiple events by drug using event-time We consider K looks .k D 1;    ; K/ of the data. At look k (i.e., kth analysis k be the P period), let nijP number of countable cases for the j th drug and the ith AE. Define ni:k D j nij k ; n:j k D i nij k , P and n::k D ni:k . For event-times, let Pij k beP the sum of event-times P Pfor the j th drug and the ith AE, over subjects. Define Pi:k D j Pij k ; P:j k D i Pij k ; and P::k D i Pi:k . The two I  J tables, one for the event counts and the other for event-times, with rows as AEs and columns as drugs are given in Tables A2.1 and A2.2 (shown in Appendix A2). There are I; 2  2 tables, one corresponding to each i, for event counts and event-times, respectively.   With exposure information available, assume that nij k i nd Poisson pij k  Pi:k , and for other AEs     combined, assume that n:j k  nij k i nd Poisson qij k  .P::k  Pi:k / , where i D 1;    ; I; k D 1;    ; K, and where pij k and qij k are the event rates. For fixed j and a particular look k, the test hypotheses are H0 W pij k D qij k D p0k ; for all AE i’s, Ha W pij k > qij k ; for at least one AE i. n

n

2412

Note that under H0 , the index j in p0k is dropped, and pO0k D P:j::kk . Under Ha , pOij k D Piji:kk ,   n n qO ij k D P:j::kk Piji:kk . This model evaluates relative event rate pij k =qij k instead of relative reporting rate by incorporating the exposure information. The relative event rate is 1 under the null hypothesis and >1 under the alternative hypothesis. The signals detected are the AEs with higher relative event rate (for drug j ). Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

L. HUANG, J. ZALKIKAR AND R. TIWARI

The likelihood ratio statistic (based on Poisson model) for the ith AE and fixed j th drug is nij k  n:j k nij k  nij k Pi:k

LRij k D

n:j k nij k P::k Pi:k  n:j k n :j k

P::k

 D P

nij k 

nij k Eij k

 n:j k  nij k .n:j k nij k / ; n:j k  Eij k

n

where Eij k D i:kP::k :j k . The LRT statistic for testing H0 at look k is max16i 6I LRij k .   If the counts are assumed to follow independent binomial distributions nij k i nd Binomial Pi:k ; pij k   i nd   and n:j k  nij k  Binomial .P::k  Pi:k /; qij k , then the (binomial based) likelihood ratio for the ith AE and the j th drug at look k is  LRij k D

nij k  1

nij k Pi:k

nij k Pi:k

ni:k nij k 

  .n::k ni:k /.n:j k nij k / n:j k nij k n:j k nij k n n 1  P:j::kk Piji:kk P::k Pi:k :  n:j k  n::k n:j k n n :j k

P::k

1

:j k

P::k

     For large Pi:k  nij k and .P::k  Pi:k /  n:j k  nij k , the binomial-based LRij k converges to Poisson-based LRij k . If the event-times are each one-unit time, then Pi:k D ni:k and P::k D n::k , and the LRij k becomes the likelihood ratio statistic based on spontaneous reports data discussed in [5]. LongLRT is not a regression-based approach; however, covariates can be brought in the LRT statistic P through the expected counts, which are derived for the pre-specified strata as Eij k D m Eij.m/ , where k D Eij.m/ k

.m/

.m/

Pi:k n:j k .m/

P::k

.m// .m/ is the expected count for the mth stratum. The n.m/ ; Pi:k ; and P::k are based on :j k

the two I  J tables for n.m/ and Pij.m/ for each stratum m. Usually, a limited number of factors and ij k k strata are recommended because of low sample size per stratum. In the presence of large number of covariates, propensity scores could be obtained for combining the information from the covariates if the subject-level data are available, and strata could be defined by the propensity scores. 3.2. SeqLRT for comparing two drugs and one AE with single occurrence using person-time When each subject takes only one drug, either the test drug or the comparator, and has one AE event (such as death) or the first occurrence of AE such as the first occurrence of stroke or bleeding, the eventtime (time from the drug exposure to the first occurrence of the AE) discussed in Section 3.1 is exactly the person-time (Figure 2). For each k, we work with a 2  1 table, with the rows denoting the test drug and the comparator P (I D 2) and the column as the single AE of interest (J D 1). Here, Pi:k D Pi1k ; and P:: D i Pi1k D P11k C P21k . The sum of the person-times over subjects for each drug is used as the denominator, and the relative risk is evaluated. For the AE of interest (j D 1), the test hypotheses are H0 W p11k D q21k D p0k ; Ha W p11k > q21k : Under H0 ; pO0k D nP:1k (risk of the AE of interest for the test drug or the comparator). Under ::k n11k n11k D (risk of the AE of interest for the test drug) and qO 11k D nP:1k D Pn21k Ha ; pO11k D nP11k P11k 1:k ::k P1:k 21k (risk of the AE of interest for the comparator). The likelihood ratio statistic here is n11k  n21k  n11k n21k   n21k  P11k P21k n21k n1k n11k LR11k D  ; n11k Cn21k D E11k n:1k  E11k n11k Cn21k where E11k D

P11k n:1k . P11k CP21k

The LRT statistic for testing H0 at look k is maxi LRi1k ; i D 1; 2.

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

2413

P11k CP21k

L. HUANG, J. ZALKIKAR AND R. TIWARI

This model evaluates relative risk (RR11k D p11k =q21k ) using person-time instead of relative event rate. The relative risk is 1, under the null hypothesis, and is greater than 1, under the alternative hypothesis. The signal detected is the drug with higher relative risk (for the AE of interest). Stratified analysis can be conducted along the same lines as discussed in Section 3.1. 3.3. LongLRT for comparing multiple drugs and one AE with recurrence using exposure-time The methods discussed in Section 3.1 cannot be modified directly and applied to comparing multiple drugs using exposure-time. In the following, we consider a total of D drugs and J AEs (note the change in the notations) and K analysis periods. We suppress k in the notations and assume that each subject takes only one drug (from a total of D D 14 drugs described in Section 2). For a fixed j (j , say, a particular AE of interest or a composite AE), define Pds to be the exposure-time (from start date to end date) for the sth subject taking the d th drug .d D 1;    ; D; D > 2/; see Figure 3 in which AE2,  AE1, P l.i;s/  and AE3 in this case can be treated as recurrent events for a composite AE. Let ndj s D l ndj s be the total number of countable cases of the j th AE for the d th drug and the sth subject. Note that the I  J matrix is now a D  1 matrix with rows as drugs and the single column as the AE (or the composite AE). The distribution of the events can be written as ndj s i nd Poisson pdj s Pds , where pdj s is the risk of a single AE or composite AE with recurrence for the sth subject. Assuming that the risks pdj s ; s D 1;    ; S (S is the total number of subjects) are homogeneous over the S subjects, and denoting this common risk by pdj  , we can rewrite the distribution for the  i nd events as ndj s  Poisson pdj  Pds . Then, the sum of the events over all subjects, for the d th drug   P   P ndj  D s ndj s , follows ndj  i nd Poisson pdj  Pd , where Pd D s Pds . Note that index j  is dropped from Pds (and hence from Pd ) because we only assume that  work with  just one AE. Also,   the events for other drugs have the following distribution n:j   ndj  i nd Poisson qdj  .P:  Pd / , across all subjects) for other drugs (not including the d th drug) and where qP dj  is the risk (homogeneous P n:j  D d ndj  and P: D d Pd are the total number of countable cases and the total exposure-time pdj  is called as the relative risk of the j th AE for drug for all drugs, respectively. The ratio RRdj  D qdj  d versus other drugs. The signals identified using this approach are the drugs with higher relative risk (for the AE of interest). For fixed j (j ), the test hypotheses are H0 W pdj  D qdj  D p0 ; for all drug d ’s, Ha W pdj  > qdj  ; for at least one drug d . n

n

n

n

 Under H0 ; pO0 D P:j: , and under Ha ; pOdj  D Pdjd ; qO dj  D :jP: Pd;j : The likelihood ratio statistic d for the d th drug and fixed j *th AE is     ndj  ndj  n:j  ndj  n:j  ndj      ndj  ndj  n:j   ndj  .n:j  ndj  / Pd P: Pd LRdj  D D ;   n:j  n:j  Edj  n:j   Edj 

P:

P n

where Edj  D d P: :j  . The maximum likelihood ratio test statistic is maxd LRdj  . Here, for stratified analysis, the expected counts for the pre-specified strata are given by Edj k D .m/ .m/ P Pd k n:j k .m/ .m/ . .m/ m Edj k , where Edj k D P:k

3.4. Independence assumption

2414

For the model described in Section 3.1, where each subject may take several drugs and may have multiple AEs, the assumption that the countable cases for the ith AE and the j th drugs are independent may not hold in general because the countable cases within one cell (i; j ) and across cells could come from the same subject several times. Therefore, we relax the independence assumption for nij k and .n:j k  nij k /; i D 1; ij k as follows.     ; I , at each fixed look k, inthe derivation  of LR i nd i nd Let nij k j  Poisson pij k Pi:k ; i D 1;    ; I; and n:j k  nij k j  Poisson .qij .P::k  Pi:k //; i D 1;    ; I , where   Gamma.1; 1/, with both the scale and shape parameters equal to 1 and with PDF g.j1; 1/. Then, it can be shown that (dropping indexes k) Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

L. HUANG, J. ZALKIKAR AND R. TIWARI

  E.nij / D Pi: pij ; Var.nij / D Pi: pij 1 C Pi: pij and     C ov nij ; ni 0 j D .Pi: pij /  Pi 0 : pi 0 j ; so that 



0 < C orr nij ; ni 0 j D

s



 Pi: pij  .Pi 0 : pi 0 j /  < 1:  .1 C Pi: pij /  1 C Pi 0 : pi 0 j

As shown in Appendix A3, the expression for  the likelihood  ratio LRij is unaffected at any look k, and the conditional distribution of n ;    ; n , given n:j k under H0 , is still 1j k Ij k    P1:k PI:k M ult i nomial n:j k ; P::k ;    ; P::k ; independent of . However, the serial dependence of nij k through an autocorrelation between nij k and nij k 0 .k < k 0 / is currently being investigated.

4. Statistical inference with multiple looks Because the asymptotic distribution of the LongLRT is analytically not tractable, Monte Carlo approach is used to obtain the empirical distribution. For the test statistics discussed in Sections 3.1 and 3.2, the joint distribution of the cell counts is given by      PI:k P1:k n1j k ;    ; nIj k jn:j k  Multinomial n:j k ; : ; ; P::k P::k For the test statistic discussed in Section 3.3, the joint distribution of the cell counts is given by      PDk P1k n1j k ;    ; nDj k jn:j k  Multinomial n:j k ; ; ; ; P:k P:k where D is the total number of drugs under comparison. The totals n:j k ; Pi:k ; P::k ; ndj k ; Pd k ; and P:k are from the observed data and are fixed in the simulation for p-value calculation. The values of the test statistic max16i 6I LRij k are calculated for the observed dataset and 9999 simulated null datasets and are used to derive the empirical distribution. At each look k, we test the hypothesis H0 versus Ha , with level of significance ˛.k/ specified. We can use an increasing ˛-spending function [8, 14], given by ˛.k/ D K1 ˛; k D 1;    ; K with k ˛.0/ D 0, so that cumulative error ˛  .k/ D K ˛ 6 ˛, or a decreasing ˛-spending function (proposed P ˛ by Goodman et al. [15]), given by ˛.k/ D 2k so that cumulative error ˛  .k/ D ˛ krD1 21r 6 ˛. If ˛ is specified as 0.05, the cumulative error at the kth look is always less than 0.05 for both the functions. The advantage of the decreasing spending function is that one does not need to specify K, the maximum number of looks. Other ˛-spending functions, incorporating the fraction of information used, can also be considered [16, 17]. A signal is found if the observed maxi LRij k from the real data is greater than the 100.1  ˛.k//% cutoff point of the empirical distribution of the maxi LRij k under H0 . We use step-down procedure to identify the secondary, tertiary, and other lower-order signals. The proposed LongLRT method controls (family-wise) type I error, ˛, in two ways: first, it controls alpha across pre-specified number of looks of the data using a monotone alpha-spending function; and then, at each look, it controls the fraction of alpha assigned at that look for multiplicity by using the step-down procedure for testing multiple hypotheses for potential signals. The LongLRT method also controls the FDR with FDR 6 ˛ (for details, see [5]).

5. Applications

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

2415

In the following, we present the results from applications of LongLRT and SeqLRT discussed in Sections 3 and 4 to the pooled clinical trial data with exposure information for detecting AE signals for a drug or for detecting drug signals for the composite AE associated with osteoporosis. The SeqLRT and LongLRT with Poisson and binomial models provide similar results in applications. Therefore, only results for Poisson model are presented here.

L. HUANG, J. ZALKIKAR AND R. TIWARI

We use ˛ D 0:05 and decreasing alpha-spending function, as it helps to detect more signals early on, in a short duration, and get the confirmation of the detected signals with more data later on (with smaller ˛). If at look k the p-value for an AE from the LongLRT method is less than ˛.k/ D 2˛k (with ˛.1/ D 0:025; ˛.2/ D 0:0125; ˛.3/ D 0:00625; ˛.4/ D 0:003125; ˛.5/ D 0:001563/, then that AE is a signal. Once a signal is identified, the search process can be stopped. However, for the purpose of signal confirmation or refinement, we do not stop the search process even though there may be signals detected in early periods. For example, a signal detected at look k D 2 may not be a signal any more at look k D 3. In that case, the signal detected at look k D 2 is not confirmed. Note that there are approximately 10% patients with concomitant use of PPIs in this clinical trial database. The prevalence of use of the PPIs will likely have an effect on the number of the AEs we observe, and that in turn will affect the power of the test. 5.1. Safety signals (among multiple AEs) by drug The AE signals were explored by comparing all AEs by drug in the pooled clinical trial database (1996–2007) including 14 drugs (six test drugs, one placebo, six test drugsCPPIs, and one placeboCPPIs), over five cumulative analysis periods (1996–1997, –1999, –2001, –2003, and –2007). The LongLRT model discussed in Section 3.1 was applied here. Event-time defined as duration from the start of the drug exposure to any of the AE events is evaluated. The AEs with higher relative event rates were reported as signals. Because of the space limitation, results for only six drugs (not for all the 14 drugs) are presented in Table II. The table gives ‘ndotj’ as the total number of countable cases for the particular drug j , number of AE signals, and selected AE signals (with relative event rate, RR) related to osteoporosis. As some

Table II. AE signals detected by drug for the pooled clinical trial data (1996–2007) with 14 drugs.

Placebo

PlaceboCPPIs

Bazedoxifene

BazedoxifeneCPPIs

PTH

PTHCPPIs

ndotj AE signals Muscle cramp (RR) Bone pain (RR) ndotj AE signals Muscle cramp (RR) Muscle spasms (RR) ndotj AE signals Muscle cramp (RR) ndotj AE signals Muscle cramp (RR) Muscle spasms (RR) ndotj AE signals Joint sprain (RR) Muscle spasms (RR) Muscle cramp (RR) ndotj AE signals Foot fracture (RR) Muscle cramp (RR) Bone pain (RR)

kD1

2

3

4

5

1251 3

4703 6

8282 34 4.1

29731 43 2.3

95 0

273 0

50364 74 4.4 2.2 9043 30 6.8 3.9 74032 115 2.5 13721 61 2.4 4.8 9538 114 4

1094 23

4833 26

516 16

34234 115 3.1 5760 53 6 6.2 9538 73 3.1

846 6

2938 89 11.6 16.6 677 35 22 47.8

2416

RR values are presented when AE is detected as a signal for a particular look   ˛= 2k .

8.3 1724 20 9.4 

27.1 1724 36 10.5 30.5

p-value < ˛.k/ D

AEs, adverse events; RR, relative risk; PPIs, proton pump inhibitors; PTH, parathyroid hormone.

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

L. HUANG, J. ZALKIKAR AND R. TIWARI

drugs were used only in some trials conducted in later periods (k D 3, 4, and 5), we do not have any AE data for them in the two earlier periods. For placebo (PL), the AE data is available for all the five periods, k D 1 to 5. Some AE signals associated with osteoporosis were detected for all 14 drugs. The relative event rates for these AE signals are usually high for the drugCPPIs groups. In the following subsections, we illustrate exploring the composite AE, AEOST, whose definition is given in Appendix A1. 5.2. Safety signals for the first occurrence of a composite AE and two drugs using SeqLRT The SeqLRT (discussed in Section 3.2) was applied to the pooled clinical trial data for the first occurrence of the single composite AE, AEOST, and for comparing placebo versus placeboCPPIs. Once the first occurrence of AEOST is detected as a signal in the kth analysis period, we stop the process of searching for safety signals in the (k C 1)th analysis periods. Each subject has, if any, only one event. Here, the event-time defined as the duration from the start of the drug exposed to the first occurrence of the event (or to the end of drug exposure if there is no countable event) is the person-time for each subject. The relative risk of the first occurrence of AEOST is evaluated in this application. The sample sizes .n:j /, for AEOST, are 57, 163, 232, 439, and 500 for analysis periods 1, 2, 3, 4, and 5, respectively. The relative risks of placeboCPPIs versus placebo are 4.7, 2.4, 2.5, 1.9, and 1.7 for analysis-periods 1, 2, 3, 4, and 5, respectively. When k D 1, the p-value is 0.001 .< ˛.1/ D 0:025/. The relative risk of AEOST for placeboCPPIs versus placebo is significant at the first analysis period. Therefore, we chose to stop the search for detecting further signals by the sequential method. 5.3. Safety signals for multiple occurrences of a composite AE and two drugs using LongLRT The LongLRT (discussed in Section 3.3) was applied to the data for the single composite AE, AEOST, with recurrence, and for comparing pair of drugs (test drug versus test drugCPPIs, or placebo versus placeboCPPIs). In order to evaluate the relative risk of any occurrence of AEOST, the exposure-time instead of event-time is used. The exposure-time is defined as the duration from the start of a drug d exposure to the end of the drug d exposure for a subject s, Pds . In Table III (based on Section 5.3), ndotj is the total number of countable cases of the composite AE for the two drugs for comparison (drug and drugCPPIs), and RR is the relative risk of a drug versus the

Table III. The composite AEOST and two drug comparison for seven drug pairs.

PlaceboCPPIs vs. placebo

IbandronateCPPIs vs. ibandronate

TeriparatideCPPIs vs. teriparatide

RaloxifeneCPPIs vs. raloxifene

PTHCPPIs vs. PTH

BazedoxifeneCPPIs vs. bazedoxifene

LasofoxifeneCPPIs vs. Lasofoxifene

ndotj RR p-value ndotj RR p-value ndotj RR p-value ndotj RR p-value ndotj RR p-value ndotj RR p-value ndotj RR p-value

kD1

2

65 5.7 0 105 1.3 0.50 4

195 3.2 0 338 1.3 0.34 16 4 0.07

3 286 2.9 0 385 1.4 0.15 16 4 0.07 4 3.2 0.14 86 1.6 0.09 2

36 2.4 0.04

4

5

647 2.5 0 385 1.4 0.15 16 4 0.07 157 1.1 0.66 249 1.4 0.09 343 2.8 0 2105 1.4 0

787 2.4 0 385 1.4 0.15 16 4 0.07 313 1.2 0.38 249 1.4 0.09 648 1.6 0 3643 1.4 0

2417

Some drugs are only available from look 3 (k D 3). PPIs, proton pump inhibitors; RR, relative risk; PTH, parathyroid hormone.

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

L. HUANG, J. ZALKIKAR AND R. TIWARI

drugCPPIs. As shown in Table III, the placeboCPPIs group has higher relative risk of AEOST (with RR 2–6) than the placebo group over the five analysis periods. The relative risk (RR 1.1–4) was higher for all drugCPPIs than the drug only groups after the first analysis period. In addition to the placeboCPPIs, bazedoxifeneCPPIs and LasoxifeneCPPIs also have significant AEOST signals detected at k D 4 and 5 (RR 1.4–3). Note that the sample sizes for AEOST for placeboCPPIs versus placebo in Table III are larger than the corresponding sample sizes given in Section 5.2 because here the reoccurrences of AEOST were counted and added for the analyses. 5.4. Safety signals for a composite AE with recurrence from multiple drugs using LongLRT Here, we consider 14 drugs in the clinical trial database with composite AE, AEOST, and apply the LongLRT. In Table IV, the composite AE, AEOST, is a signal when the p-values are 0 (i.e., p-values < ˛.k/), and RR is the relative risk of one drug versus the other 13 drugs. AEOST appears to be a signal for lasofoxifene, lasofoxifeneCPPIs, PTH, PTHCPPIs, and bazedoxifeneCPPIs for some periods in addition to it being a signal with high risk for placeboCPPIs for all the five analysis periods. The patients in the placeboCPPIs group were treated without drugs for treating osteoporosis, but with some PPIs. The high risks with significant p-values across all the analysis periods for placeboCPPIs indicate a possible relationship between PPIs and the composite AE, AEOST.

6. Simulation study for longitudinal LRT methods We consider a simulation study for evaluating the performance characteristics of the proposed methods in order to obtain a better understanding of them for identifying signals correctly. Table IV. Single AE (the composite AEOST) and multiple drugs (14 drugs) comparison. kD1 Placebo PlaceboCPPIs Ibandronate IbandronateCPPIs Teriparatide TeriparatideCPPIs PTH PTHCPPIs Bazedoxifene BazedoxifeneCPPIs Lasofoxifene LasofoxifeneCPPIs Raloxifene RaloxifeneCPPIs

ndotj RR p-value RR p-value RR p-value RR p-value RR p-value RR p-value RR p-value RR p-value RR p-value RR p-value RR p-value RR p-value RR p-value RR p-value

174 1.0 0.99 5.9 0 2.0 0 1.8 0.50 0.1 0.99

2 549 1.0 0.98 3.4 0 1.6 0 1.6 0.23 0.1 0.99 0.6 0.99

3

4

5

815 1.0 0.99 3.0 0 1.2 0.05 1.6 0.24 0.1 0.99 0.5 0.99 3.9 0 5.9 0 1.7 0.99

3902 0.7 0.99 1.8 0 0.6 0.99 0.9 0.99 0.1 0.99 0.3 0.99 1.5 0 2.0 0 0.7 0.99 2.2 0 1.6 0 1.9 0 0.9 0.99 1.0 0.99

6041 0.6 0.99 1.6 0 0.7 0.99 1 0.99 0.1 0.99 0.3 0.99 1.7 0 2.3 0 0.5 0.99 0.9 0.99 1.8 0 2.0 0 0.6 0.99 0.7 0.99

0.4 0.99 1.1 0.99 4.0 0.46 12.9 0.05

2418

Some drugs are only available from look 3 (k D 3). AE, adverse event; RR, relative risk; PPIs, proton pump inhibitors; PTH, parathyroid hormone.

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

L. HUANG, J. ZALKIKAR AND R. TIWARI

The focus of the simulation study is on (i) the SeqLRT using person-time for the first occurrence of AEOST and (ii) the LongLRT using exposure-time for the recurrence of AEOST for the two drugs placebo (PL) versus PLCPPIs. The power and type I error are obtained. 6.1. Data simulation Consider the simple case of comparing PL (i D 1) versus PLCPPIs (i D 2). The AE of interest is the first occurrence of AEOST (j D 1). We use the Pperson-time .Pi1k P/ and cell counts .ni1k / from the real data and obtain Pi:k D Pi1k ; n:1k D n ; P D ::k i i1k i Pi1k over the cumulative analysis periods .k D 1;    ; 5/. The cases n are simulated from the binomial distribution,   i1k RRi1k Pi:k ni1k jn:1k  Binomial n:1k ; RRi1k Pi:k C.P::k Pi:k / , where i D 1; 2. Note that the binomial probability of an AE depends on RR (unknown) and the event-times Pi:k and P::k (known from the real data). The SeqLRT method is applied to this simulated data. Similarly, using the exposure-time (Pd ) and cell counts (nij / from the real data, we simulate     P1k PDk ; n1j k ;    ; nDj k jn:j k  M ult i nomial n:j k ; RR1j rr0 ;    ; RRDj rr0 P:k P:k P where rr0 is the baseline risk, which may vary in different datasets; P:k D d Pd k ; RRdj > 1; and PD Pd k d D1 RRdj rr0 P:k D 1: For generating the data under the null hypothesis, we set RRi1 D 1 for i D 1; 2, and under the alternative hypothesis, we set RR11 D 1 (PL) and RR21 D c > 1 (PLCPPIs) for all the five analysis periods (c values could be constant, decreasing, or increasing, over looks) assuming that the signal exists in some periods. M.D 1000/ datasets are simulated with different c values (1.2, 1.5, 2, 4). To evaluate the effect of sample size, we define the sample size as ´  n:1k , where ´ D 0:5; 1; 2; 4. And for all cases with varying c values over time, ´ is 1. 

6.2. Performance characteristics Define the probability of rejecting H0 at analysis period k as pr.k/ D

#rejecting H0 at kth period ; k D 1;    ; 5: 1000

This is the conditional power of the LongLRT at analysis period k. For the SeqLRT method with person-time generated under Ha , the probability of rejecting H0 at analysis period k is the sum of the probabilities of rejecting H0 from analysis periods 1 through k: power.k/ D pr.1/ C    C .1  pr.1//      .1  pr.k  1//  pr.k/: This is the unconditional power of the SeqLRT at analysis period k. When the data are generated under H0 ; pr.k/ is the conditional type I error rate for the kth analysis period, and the values of power.k/ become type I error for the SeqLRT. For the LongLRT, without stopping the procedure, the cumulative error rate (cumer(k)) is the sum of probabilities of rejecting H0 when H0 is true. 6.3. Simulation results

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

2419

We present the conditional power, conditional type I error rate, and the cumulative error rate for the LongLRT, and power, type I error, and the unconditional error rate for the SeqLRT in Tables V and VI. As shown in Table V with constant RR values, the power for the SeqLRT is large for the later analysis periods, for large sample sizes, and for higher relative risk (RR) values. The simulation study shows that the power at analysis period 1 is 0.75, for ´ D 1 (sample size as the real data), and RR D 4 (placeboCPPIs versus placebo). This result supports the finding from the real data analysis. P In addition, the cumulative error rate of the LongLRT (shown in Table V) is close to ˛ krD1 21r level for each analysis period. Similar patterns on the error rates are observed when the increasing ˛-spending   k function ˛.k/ D K ˛ with K D 5 is used (results not shown).

L. HUANG, J. ZALKIKAR AND R. TIWARI

Table V. (Conditional) power and type I error from simulation study. Performance of SeqLRT ´ 1 1 1 1 0.5 1 2 4 ´ 1 2 4

RR 1.2 1.5 2.0 4.0 2.0 2.0 2.0 2.0 RR 1.0 1.0 1.0

pr(1) 0.052 0.105 0.220 0.746 0.079 0.220 0.326 0.531 pr(1) 0.027 0.022 0.025

pr(2) 0.036 0.110 0.328 0.962 0.122 0.328 0.404 0.746 pr(2) 0.012 0.016 0.011

pr(3) 0.029 0.124 0.512 0.998 0.282 0.512 0.800 0.990 pr(3) 0.007 0.005 0.007

pr(4) 0.018 0.278 0.900 1.000 0.512 0.900 0.999 1.000 pr(4) 0.003 0.003 0.003

pr(5) Power(1) Power(2) Power(3) Power(4) Power(5) 0.017 0.052 0.086 0.113 0.129 0.143 0.317 0.105 0.203 0.302 0.496 0.656 0.949 0.220 0.476 0.744 0.974 0.999 1.000 0.746 0.990 1.000 1.000 1.000 0.653 0.079 0.191 0.419 0.717 0.902 0.949 0.220 0.476 0.744 0.974 0.999 0.999 0.326 0.598 0.920 1.000 1.000 1.000 0.531 0.881 0.999 1.000 1.000 pr(5) type I er(1) type I er(2) type I er(3) type I er(4) type I er(5) 0.001 0.027 0.039 0.045 0.048 0.049 0.001 0.022 0.038 0.042 0.045 0.046 0.002 0.025 0.036 0.042 0.045 0.047 Performance of LongLRT

´ 1.0 1.0 1.0 1.0 0.5 1.0 2.0 4.0

rr 1.2 1.5 2.0 4.0 2.0 2.0 2.0 2.0

´ 1.0 2.0 4.0

RR pr(1) pr(2) pr(3) pr(4) pr(5) 1.0 0.018 0.005 0.007 0.004 0.001 1.0 0.025 0.011 0.007 0.004 0.002 1.0 0.027 0.012 0.007 0.003 0.002

pr(1) 0.036 0.087 0.197 0.760 0.129 0.197 0.381 0.601

pr(2) 0.019 0.083 0.298 0.973 0.174 0.298 0.501 0.893

pr(3) 0.021 0.130 0.580 1.000 0.299 0.580 0.896 0.997

pr(4) 0.025 0.412 0.981 1.000 0.711 0.981 1.000 1.000

pr(5) 0.045 0.583 0.997 1.000 0.881 0.997 1.000 1.000

Unconditional powers NA NA NA NA NA NA NA NA cumer(1) 0.018 0.025 0.027

cumer(2) 0.023 0.036 0.039

cumer(3) 0.030 0.042 0.045

cumer(4) 0.034 0.046 0.048

cumer(5) 0.035 0.048 0.050

RR is relative risk (with constant values 1, 1.2, 1.5, 2, 4 over looks). Pr(k) is the probability of rejecting the null hypothesis for the kth look (conditional power). Power(k) is the unconditional power of SeqLRT for the kth look. Cumer(k) is the cumulative error of LongLRT. NA, not applicable; SeqLRT, sequential likelihood ratio test; LongLRT, longitudinal likelihood ratio test.

Table VI. (Conditional) power and type I error from simulation study.

Case 1 2 3 4

Case 1 2 3 4

Look 1

Look 2

Look 3

Look 4

pr(1) 0.746 0.953 0.027 0.105

pr(2) 0.328 0.962 0.012 0.110

pr(3) 0.124 0.007 0.007 0.124

pr(4) 0.278 0.003 1.000 0.900

Look 1

Look 2

Look 3

pr(1) 0.760 0.962 0.018 0.087

pr(2) 0.298 0.973 0.005 0.083

pr(3) 0.177 0.009 0.009 0.177

Performance of SeqLRT Look 5 Look 1 Look 2 pr(5) 0.317 0.001 1.000 1.000

Power(1) 0.746 0.953 0.027 0.105

Power(2) 0.829 0.998 0.039 0.203

Look 3

Look 4

Look 5

Power(3) 0.850 0.998 0.045 0.302

Power(4) 0.892 0.998 1.000 0.930

Power(5) 0.926 0.998 1.000 1.000

Look 3

Look 4

Look 5

Performance of LongLRT Look 4 Look 5 Look 1 Look 2 pr(4) 0.412 0.004 1.000 0.981

pr(5) 0.583 0.001 1.000 1.000

Unconditional powers NA NA NA NA

2420

RR is relative risk (with different values over looks). Case 1: RR is 4 for look 1, 2 for look 2, 1.5 for the remaining 3 looks; case 2: RR is 6 for look 1, 4 for look 2, and 1 for the remaining looks; case 3: RR is 1 for the first 3 looks, 4 for look 4, and 6 for look 5; case 4: RR is 1.5 for the first 3 looks, 2 for look 4, and 4 for look 5. Pr(k) is the probability of rejecting the null hypothesis for the kth look (conditional power). Power(k) is the unconditional power of SeqLRT for the kth look. NA, not applicable; SeqLRT, sequential likelihood ratio test; LongLRT, longitudinal likelihood ratio test.

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

L. HUANG, J. ZALKIKAR AND R. TIWARI

Table VI provides the simulation results with varying RR values over time. The results show that the conditional power of both, the SeqLRT and LongLRT, have increasing (decreasing) patterns with increasing (decreasing) RR values. The unconditional power of SeqLRT is increasing over all five looks and approaches 1 at look 5.

7. Discussion The longitudinal LRT methodology presented here is quite general and covers handling of the data from a clinical trial data or from administrative/claims databases, with different kinds of exposure information. The proposed methods can be used in active surveillance for signal detection or signal generation, signal refinement, and signal validation. One of the advantages of the LongLRT methods presented here is that the methods can be used to find signals of multiple AEs (or drugs) in real time, while controlling the FDR and the type I error assigned at each look and also across the looks. The other advantage of these methods is that one does not have to stop the analysis after a signal is detected at a look. Therefore, the number of looks for the LongLRT methods is usually not pre-specified and is left to the user to be more flexible. In the special case with two drugs (e.g., placebo and placeboCPPIs) and a single AE, the SeqLRT evaluates the same relative risk that is evaluated by the CSSP method proposed by Li [8]. There is a difference in the two test statistics: the SeqLRT statistic is the likelihood ratio test statistic defined as maxLR D max.LR11k ; LR21k /, Section 3.2, whereas the CSSP statistic [8] is the number of AEs from the drug conditional on the total number of events from the drug and comparator; that is, T11k D n11k jn:1k .D n11k C n21k /. However, the null data simulation process for the CSSP is the same  for the SeqLRT method with exposure data: under H0 , n11k jn:1k   as that proposed . For a given AE of interest, when a signal is detected at a look k using Binomial n:1k ; P11kP11k CP21k the SeqLRT method, that signal is either the drug signal or the placebo/comparator signal. However, for a given AE, when a signal is detected at look k using the CSSP method, that signal is only the drug signal. We applied the CSSP method to the example in Section 5.2 (for a single AE) and noticed that the results turned out to be the same as that from the SeqLRT (PLCPPI was detected as drug signal) at the same significance level. In addition, by a simulation exploration using the setup described in Section 6, we observed that both the SeqLRT and CSSP methods control the type I error and both have comparable power (results not shown). The binomial maxSPRT method, developed by Lieu et al. [18] for vaccine safety data surveillance, is a special case of the SeqLRT as shown in the following text. Consider the exposure matching .P1:k W P::k / in a fixed matching ratio form (1:M+1). Let n:k D n:1k be the total number of cases D n11k be the number of cases exposed to the drug. Then, exposed up to time interval k, and let ndrug k n:k P1:k n:k 1 n:k E11k D P::k D M C1 D M C1 , and the LRT11k , based on Poisson model discussed in Section 3, can be rewritten as 2

drug

6 n LRTk D 4 k n:k

!ndrug k

drug

n:k  nk n:k

!n:k ndrug k

3 2  7 4 5=

1 M C1

ndrug  k

M M C1

n:k ndrug k

3 5;

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

2421

which is the binomial maxSPRT ([18]). The proposed LongLRT methods with exposure unit as 1 (Section 3) can also be used for count data collected through spontaneous system. We explored the AE signals for the five PPIs and three test drugs, approved in the USA (raloxifene, teriparatide, and ibandronate), in the whole FAERS database (2000– 2009) over six analysis periods (2000–2004, –2005, –2006, –2007, –2008, and –2009). Only counts and relative reporting rates were evaluated in this case because the exposure information is very limited in the FAERS data. We found that very few AE signals were detected among the PPI drugs. However, many AE signals including some AEs associated with osteoporosis such as bone density decreased, bone pain, and muscle spasms were detected for the three drugs for treating osteoporosis (raloxifene, teriparatide, and ibandronate). It is possible that the patients with the symptoms of osteoporosis (taking the treatment drugs) reported the symptoms of osteoporosis as AEs in the FAERS data. This reveals a common problem in this kind of databases that the drug products for treating a disease are reported to be associated with the symptoms as of that disease.

L. HUANG, J. ZALKIKAR AND R. TIWARI

Because the safety data analyzed in this paper are coming from clinical trials, the under-reporting (or over-reporting) is, if any, not a serious concern. A stratified analysis can be used if the information on severity, disease status, and other covariates is available. In observational safety databases, the spontaneous reports could be subject to the problem of under-reporting (or over-reporting). For delay or error in reporting the AEs, models can be developed along the lines of Clegg et al. [19] and Midthune et al. [20]. This is a future topic of research.

Appendix A1 A single composite AE related with osteoporosis symptoms is defined as AEOST, including the following AE terms appeared in the 10 clinical trial datasets: ‘BONE FRACTURE ACCIDENTAL’, ‘BONE PAIN’, ‘Bone pain’, ‘Bone density decreased’; ‘MUSCLE ATROPHY’, ‘MUSCLE CRAMP’, ‘MUSCLE CRAMPS’, ‘MUSCLE SPASMS’, ‘Muscle cramp’, ‘Muscle rupture’, ‘Muscle spasms’, ‘Muscle strain’, ‘MUSCULOSKELETAL PAIN’, ‘Musculoskeletal pain’; ‘JOINT SPRAIN’, ‘JOINT STIFFNESS’, ‘Joint contracture’, ‘Joint crepitation’, ‘Joint injury’, ‘Joint sprain’, ‘Joint stiffness’; ‘OSTEOPOROSIS’, ‘OSTEOPOROSIS FRACTU’, ‘OSTEOPOROSIS FRACTURE’, ‘Osteoporosis’; ‘ANKLE FRACTURE’, ‘Ankle fracture’, ‘Clavicle fracture’, ‘FOOT FRACTURE’, ‘Facial bones fracture’, ‘Femoral neck fracture’, ‘Femur fracture’, ‘Fibula fracture’, ‘Foot fracture’, ‘Fracture’, ‘HUMERUS FRACTURE’, ‘Hand fracture’, ‘Hip fracture’, ‘Humerus fracture’, ‘Lower limb fracture’, ‘Lumbar vertebral fracture’, ‘OSTEOPOROSIS FRACTURE’, ‘PATHOLOGICAL FRACTURE’, ‘Patella fracture’, ‘Pelvic fracture’, ‘RIB FRACTURE’, ‘Radius fracture’, ‘Rib fracture’, ‘SPINAL FRACTURE’, ‘STRESS FRACTURE’, ‘Sternal fracture’, ‘Tibia fracture’, ‘Ulna fracture’, ‘Upper limb fracture’, ‘WRIST FRACTURE’, and ‘Wrist fracture’.

Appendix A2 At look k .k D 1; ; K D 5/, there are two I  J tables constructed from the individual level data (discussed in Section 3.1). nij is the events for the AE i and drug j . Pij is the event-time (unit here is a day) for the AE i and drug j . We suppress k in the notation. The row represents AE, and the column represents drug.

Table A2.1. I  J table with event information. 1



j



J D 14

Row total

1 2  i  I

n11 n21    nI1

     

n1j   nij  nIj

     

n1J n2J  niJ  nIJ

n1: n2:  ni:  nI:

Column total

n:1



n:j



n:J

n:: as grand total

Table A2.2. I  J table incorporating exposure information.

2422

1



j



J D 14

Row total

1 2  i  I

P11 P21    PI1

     

P1j   Pij  PIj

     

P1J P2J  PiJ  PIJ

P1: P2:  Pi:  PI:

Column total

P:1



P:j



P:J

P:: as grand total

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

L. HUANG, J. ZALKIKAR AND R. TIWARI

Appendix A3 Drop the indexes j and k whenever convenient, and assume that 1/ nij j i nd Poisson .pi Pi: / ; i D 1;    ; I 2/ .n:j  nij /j i nd .qi .P::  Pi: / ; i D 1;    ; I 3/   Gamma.1; 1/: Then the marginal (integrated) likelihood function under H0 W pi D q  i D p0 ; 8i; is Z L0;ij /

1

    P nij jpi Pi: P n:j  nij jqi .P::  P i:/ g .j1; 1/ d;

0

where P .xj/ is the Poisson probability mass function and g.j1; 1/ is the PDF of Gamma.:j1; 1/. Simplifying, we have n

L0;ij .p0 / /

p0 ij .1 C P:: p0 /n:j C1

or logŒLij .p0 / / nij ln.p0 / C .n:j C 1/log .1 C P:: p0 / n

and differentiating w.r.t. p0 gives p0 as pO0 D P:j:: : Also, under two-sided alternative, Ha W pi ¤ qi for at least one i, the marginal likelihood function is n

L0;ij .pi ; qi / /

n nij

pi ij qi :j

.1 C Pi: pi C .P::  Pi: / qi /n:j C1

or la;ij D logLa;ij / nij ln.pi / C .n:j  nij /ln.qi /  .n:j C 1/ln .1 C pi Pi: C .qi .P::  Pi: // ; so that   n:j C 1 @la;ij nij D0 D 0)  .1 C pi Pi: C qi .P::  Pi: // @pi pi @la;ij n:j  nij .n:j C 1/ D 0: D 0)  .1 C pi Pi: C qi .P::  Pi: // @qi qi Solving these equations, yield pOi D L0;ij and La;ij , and simplifying, we have Lij D with Eij D

Pi: n:j P::

nij ; Pi:

and qO i D

n:j nij P:: Pi:

. Substituting these estimates in

n    n n La;ij .pOi ; qO i /  D nij =Eij ij n:j  nij = n:j  Eij :j ij ; L0;ij .pO0 /

; independent of .

Acknowledgements

Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

2423

The authors are thankful to Zhongjun Luo, ShaAvhree Buckman-Garner, Sue Bell, and Susan McCune (OTS/FDA) for providing assistance with valuable medical background and clinical trial datasets. The authors would also like to thank the associate editor and the referees for their valuable comments.

L. HUANG, J. ZALKIKAR AND R. TIWARI

References 1. Rothman KJ, Lanes S, Sacks ST. The reporting odds ratio and its advantages over the proportional reporting ratio. Pharmacoepidemiology and Drug Safety 2004; 13(8):519–523. 2. Evans SJ, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiology and Drug Safety 2001; 10(6):483–6. 3. DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician 1999; 53(3):177–190. 4. Bate A, Lindquist M, Edwards IR, Olsson S, Orre R, Lansner A, De Freitas RM. A Bayesian neural network method for adverse drug reaction signal generation. European Journal of Clinical Pharmacology 1998; 54(4):315–21. 5. Huang L, Zalkikar J, Tiwari R. A likelihood ratio test based method for signal detection with application to FDA’s drug safety data. Journal of the American Statistical Association 2011; 106(496):1230–1241. 6. Platt R, Carnahan RM, Brown JS, Chrischilles E, Curtis LH, Hennessy S, Nelson JC, Raccosin JA, Robb M, Schneeweiss S, Toh S, Weiner MG. The U.S. Food and Drug Administration’s Mini-Sentinel program: status and direction. Pharmacoepidemiology and Drug Safety 2012; 21(1):1–8. 7. Cook A, Wellman RD, Tiwari RC, Li L, Heckbert S, March T, Heagerty P, Nelson JC. Statistical approaches to group sequential monitoring of postmarket safety surveillance data: current state of the art for use in the mini-sentinel pilot. Pharmacoepidemiology and Drug Safety 2012; 21:72–81. 8. Li L. A conditional sequential sampling procedure for drug safety surveillance. Statistics in Medicine 2009; 28:3124–3138. 9. Kulldorff M, Davis RL, Kolczak M, Lewis E, Lieu T, Platt R. A maximized sequential probability ratio test for drug and vaccine safety surveillance. Sequential Analysis: Design Methods and Applications 2011; 30:58–78. 10. Schuemie MJ. Methods for drug safety signal detection in longitudinal observational databases: LGPS and LEOPARD. Pharmacoepidemiology and Drug Safety 2011; 20(3):292–299. 11. Yang YX, Lewis JD, Epstein S, Metz DC. Long term proton pump inhibitor therapy and risk of hip fracture. Journal of the American Medical Association 2006; 296(24):2947–53. 12. Khalili H, Huang ES, Jacobson BC, Camargo CA, Feskanich D, Chan AT. Use of proton pump inhibitors and risk of hip fracture in relation to dietary and lifestyle factors: a prospective cohort study. British Medical Journal 2012; 344:e372. 13. Brown JS, Kulldorff M, Petronis KR, Reynolds R, Chan KA, Davis RL, Graham D, Andrade SE, Raebel MA, Herrinton L, Roblin D, Boudreau D, Smith D, Gursitz JH, Gunter MJ, Platt R. Early adverse drug event signal detection within population-based health networks using sequential methods: key methodologic considerations. Pharmacoepidemiology and Drug Safety 2009; 18(3):226–34. 14. Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC: New York, 2000. 15. Goodman M, Li Y, Tiwari R. Detecting multiple change points in piecewise constant hazard functions. Journal of Applied Statistics 2011; 38:2523–2532. 16. O’Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics 1979; 35:549–556. 17. Lan KKG, Demets DL. Discrete sequential boundaries for clinical-trials. Biometrika 1983; 70:659–663. 18. Lieu TA, Kulldorff M, Davis RL, Levis EM, Weintraub E, Yih K, Yin R, Brong JS, Platt R. Real-time vaccine safety data surveillance. Medical Care 2007; 45(2):S89–95. 19. Clegg LX, Feuer EJ, Midthune DN, Fay MP, Hankey BF. Impact of reporting delay and reporting error on cancer incidence rates and trends. Journal of the National Cancer Institute 2002; 94:1537–45. 20. Midthune DN, Fay MP, Clegg LX, Feuer EJ. Modeling reporting delays and reporting corrections in cancer registry data. Journal of the American Statistical Association 2005; 100(469):61–70.

2424 Published 2014. This article is a US Government work and is in the public domain in the USA.

Statist. Med. 2014, 33 2408–2424

Likelihood ratio based tests for longitudinal drug safety data.

This article presents longitudinal likelihood ratio test (LongLRT) methods for large databases with exposure information. These methods are applied to...
279KB Sizes 2 Downloads 3 Views