pharmacoepidemiology and drug safety 2015; 24: 922–933 Published online 4 June 2015 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/pds.3797

ORIGINAL REPORT

Association rule mining in the US Vaccine Adverse Event Reporting System (VAERS)† Lai Wei and John Scott* Division of Biostatistics, Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA

ABSTRACT Purpose Spontaneous adverse event reporting systems are critical tools for monitoring the safety of licensed medical products. Commonly used signal detection algorithms identify disproportionate product–adverse event pairs and may not be sensitive to more complex potential signals. We sought to develop a computationally tractable multivariate data-mining approach to identify product–multiple adverse event associations. Methods We describe an application of stepwise association rule mining (Step-ARM) to detect potential vaccine-symptom group associations in the US Vaccine Adverse Event Reporting System. Step-ARM identifies strong associations between one vaccine and one or more adverse events. To reduce the number of redundant association rules found by Step-ARM, we also propose a clustering method for the postprocessing of association rules. Results In sample applications to a trivalent intradermal inactivated influenza virus vaccine and to measles, mumps, rubella, and varicella (MMRV) vaccine and in simulation studies, we find that Step-ARM can detect a variety of medically coherent potential vaccinesymptom group signals efficiently. In the MMRV example, Step-ARM appears to outperform univariate methods in detecting a known safety signal. Conclusions Our approach is sensitive to potentially complex signals, which may be particularly important when monitoring novel medical countermeasure products such as pandemic influenza vaccines. The post-processing clustering algorithm improves the applicability of the approach as a screening method to identify patterns that may merit further investigation. Copyright © 2015 John Wiley & Sons, Ltd. key words—data mining; association rule discovery; signal detection; spontaneous reporting system; vaccine safety; pharmacoepidemiology Received 3 November 2014; Revised 16 March 2015; Accepted 20 April 2015

INTRODUCTION Spontaneous adverse event reporting systems (SRS), in which adverse medical product experiences are reported to health authorities and/or manufacturers, are critical tools for monitoring the safety of licensed medical products. There are a number of national and international passive surveillance systems, including the US FDA Adverse Event Reporting System (FAERS; formerly AERS),1 the

*Correspondence to: J. Scott, Division of Biostatistics, Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Ave., Bldg 71 Rm. 1018, Silver Spring, MD 20993, USA. E-mail: [email protected]

Prior postings and presentations Poster presentation at 2014 MCMi Regulatory Science Symposium, Silver Spring, MD. Conference presentation at 2014 Joint Statistical Meetings, Boston, MA.

Copyright © 2015 John Wiley & Sons, Ltd.

US Vaccine Adverse Event Reporting System (VAERS),2 and VigiBase by the World Health Organization International Drug Monitoring Programme.3 VAERS is jointly managed by FDA and the Centers for Disease Control and Prevention (CDC). It serves as the primary data resource for the study and identification of vaccine–adverse event reactions for licensed vaccines in the USA. It can be used for detecting new, unusual, or rare events and assessing newly licensed vaccines.2 Through December 2013, VAERS had received over 450 000 reports from vaccine manufacturers, healthcare professionals, and the general public, with approximately 2 850 000 vaccine–adverse event pairs for 84 vaccines. Each report to VAERS contains one or more adverse events that the reporter believes may be associated with the administration of one or more vaccines. In 2013 alone, there were over 30 000 reports received, with approximately 200 000

923

association rule mining in vaers

vaccine–adverse event pairs. One consequence of wider reporting to SRS is an increasing computational challenge to the early detection of potential vaccine safety signals. Data mining in Vaccine Adverse Event System Data-mining algorithms (DMAs) are used to conduct screening for statistically “higher than expected” product-event combinations in SRS. Commonly used DMAs involve disproportionality analyses that project high-dimensional SRS data onto a series of twodimensional contingency tables, based on an independence assumption.4 Huang et al.5 conducted a review of passive surveillance DMAs using frequentist and Bayesian methods. The frequentist methods include proportional reporting ratio,6 reporting odds ratio,7 chi-square tests,8,9 and likelihood ratio test-based method.10 Bayesian methods include Bayesian confidence propagation neural network,11 Bayesian method based on a new information component (IC),12 multi-item gamma Poisson shrinker (MGPS),13 and simplified Bayes.14 These algorithms are designed to find bivariate associations between individual products and individual adverse events. The reduction of vaccine–adverse event analysis to two dimensions does not support the discovery and analysis of more complex vaccine–adverse event relationships. Harpaz et al.15 referred to methods for the detection of higher-dimensional drug safety phenomena as “multivariate” methods. Vaccine–adverse event associations can involve vaccine–vaccine interactions, multisymptom clusters, or both. One example of a potential vaccine–vaccine interaction was between trivalent inactivated influenza vaccination and 13-valent pneumococcal conjugate vaccine; the possibility that concomitant administration of these two vaccines increased the incidence of febrile seizures was investigated in a Vaccine Safety Datalink study.16 There are many examples of multi-symptom clusters; Ball and Botsis provided examples of patterns of adverse events associated with syncope and intussusception following vaccine administration.17 Harpaz and colleagues classified existing multivariate methods as15 (i) disproportionality analysis extensions,42 (ii) multivariate logistic regressionbased approaches,18,19 and (iii) unsupervised machinelearning approaches such as association rule mining,20 clustering,21 and network analysis.17

Association rule mining Association rule mining (ARM), also called association rule discovery, is a well-established data-mining method for discovering interesting relations between variables in large databases.22 It is derived from the “market basket” analysis of transaction data. For example, for a set of items I ¼ fmilk; bread; butter; beerg; an association rule can be {milk, bread} ⇒ {butter}. Various measures of “interestingness” can be calculated for each rule.23,24 If, for example, the interestingness measures are high for the rule {milk, bread} ⇒ {butter}, it means that a customer who purchases milk and bread together will also be likely to get butter. The left-hand side (LHS) of the rule {milk, bread} ⇒ {butter} is called the antecedent, and the right-hand side (RHS) is called the consequent. Adverse events reported in VAERS are coded using preferred terms (PTs) from the Medical Dictionary for Regulatory Activities.25 For a set of items fVaccineX ; PTY 1 ; PTY 2 ; …; PTY n g, we are interested in studying rules having the form fVaccineX g ⇒fPTY 1 ; PTY 2 ; …; PTY k g, where fPTY 1 ; PTY 2 ; …; PTY k g is a group of adverse events among fVaccineX ; PTY 1 ; PTY 2 ; …; PTY n g. The strength of the rule can be evaluated by different measures of interestingness, such as support, confidence, and lift that are derived from a 2 ×2 contingency table (Tables 1 and 2).23,24 The support of the rule is the proportion of all reports that contain the itemset fVaccineX ; PTY 1 ; PTY 2 ; … ; PTY k g . The confidence is the conditional probability of observing the RHS of the rule given the LHS. Lift measures the distance between PðfVaccineX g and fPTY 1 ; PTY 2 ; …; PTY k gÞ and PðfVaccineX g Þ  PðfPTY 1 ; PTY 2 ; …; PTY k gÞ . Lift is equal to 1 when {VaccineX} and fPTY 1 ; PTY 2 ; …; PTY k g are statistically independent and is greater than 1 when there is an association between {VaccineX} and fPTY 1 ; PTY 2 ; …; PTY k g. Note that the calculation of lift is the same as the usual relative reporting ratio commonly used in pharmacovigilance; the critical distinction here is that the columns of the contingency table (Table 1) indicate presence or absence of one or more PTs rather than one PT alone.

Table 1. 2 × 2 contingency table for rule fVaccineX g⇒fPTY 1 ; PTY 2 ; …; PTY k g Number of reports

Associated with fPTY1 ; PTY2 ; …; PTYk g

Not associated with fPTY1 ; PTY2 ; …; PTYk g

a c nY = a + c

b d nY ¼ b þ d

Containing VaccineX Not containing VaccineX

Copyright © 2015 John Wiley & Sons, Ltd.

nX = a + b nX ¼ c þ d n

Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

924

l. wei and j. scott

Table 2. Interestingness measures for rules based on counts from the contingency table Formula based on probability

Formula based on table

Support Confidence

P(XY) P(Y|X)

Lift

PðXY Þ PðX ÞPðY Þ

a aþbþcþd a aþb a=ðaþbÞ ðaþcÞ=ðaþbþcþdÞ

Association rule mining provides a potential solution for discovering multi-item vaccine-symptom group associations. Rouane-Hacene et al. conducted ARM of the SRS of the French Medicines Agency using formal concept analysis.26 Harpaz et al. carried out ARM of the FDA Adverse Event Reporting System using the Apriori algorithm; they were able to find association rules involving up to six items.20 Fan et al. proposed a MapReduce-based ARM algorithm to improve the performance of ARM in a distributed computation enviornment.27 Pool and Chen applied ARM to the US VAERS to detect multi-symptom events associated with one specific vaccine.28 They pointed out that ARM is a promising approach for small patterns or disproportions among rare events. However, they found that there are too many associations based on existing algorithms and suggested the development of association rule reduction techniques. When mining association rules of the form fVaccineX g⇒fPTY 1 ; PTY 2 ; …; PTY k g, the application of existing ARM algorithms such as Apriori29 can be limited by the computational challenge posed by the presence of multiple PTs on the RHS of the rule. In this paper, we propose a novel stepwise association rule mining (Step-ARM) algorithm to detect vaccine-symptom group associations in addition to vaccine–adverse event associations. We also describe a post-processing algorithm to find clusters of multisymptom events associated with the vaccine of interest. This helps address the redundant-rule issue noted by previous authors of ARM. METHODS Step-ARM: a stepwise association rule mining algorithm To find multi-symptom events associated with one vaccine with no limitations on the RHS of the rule, we propose a Step-ARM algorithm as shown in Figure 1 using the measures of interestingness introduced above. The basis of the algorithm is that any subset of a frequent itemset must itself be frequent. The algorithm starts with every possible rule containing a single vaccine, Copyright © 2015 John Wiley & Sons, Ltd.

VaccineX, on the LHS and one PT on the RHS. Based on thresholds of support, confidence, and lift, we obtain a number of interesting rules of the form fVaccineX g⇒fPTY 1 g, which implies a set of PTs (Pool1) that are strongly associated with VaccineX. We then move on to association rules with two items on the RHS, which will be generated based on all rules of the form fVaccineX g⇒fPTY 1 ; PTY 2 g, where both PTs are contained in Pool1. Sufficiently interesting rules in this second step imply a set of PTs that form Pool2. This process continues in such a way that in step k, we consider all rules of the form fVaccineX g⇒fPTY 1 ; PTY 2 ; …; PTY k g where fVaccineX g⇒fPTY 1 ; PTY 2 ; …; PTY k1 g was a sufficiently interesting rule in step k1 and PTYk is another PT from Poolk1. Instead of searching for all the k PT combinations among Poolk1, the aforementioned algorithm would reduce the computation burden significantly. The process stops at step n when there are no sufficiently interesting rules with fVaccineX g⇒fPTY 1 ; PTY 2 ; …; PTY n g . Note that the maximum number of steps is equal to the maximum number of PTs in any report in the database containing VaccineX. In general, we are interested in finding vaccinesymptom group associations for which there are at least three reports; that is, a ≥ 3. Therefore, our first threshold for interestingness is support ≥ 3/nreport, where nreport is the total number of reports. We also set the thresholds for confidence and lift at 0.001 and 1, respectively. The threshold for confidence was set empirically based on achieving a tractable number of interesting rules across several example applications. The lift threshold indicates statistical association, as lift = 1 corresponds to statistical independence. Note that the general Apriori ARM algorithm30 carries out a breadth-first search on the subset lattice and determines the support of itemsets by subset tests. While it utilizes the downward closure property of support, the algorithm suffers from inefficiencies because large numbers of subsets are produced in the candidate generation step. Step-ARM uses a combination of support, confidence, and lift to prune the search space, which improves the efficiency of the ARM significantly, but may also involve the loss of interesting associations when generating higher-order association rules. We mitigated this trade-off by setting conservative thresholds for confidence and lift. Association rules post-processing After applying the Step-ARM algorithm, there are typically a number of rules that share common PTs on the RHS. We developed a post-processing algorithm, as Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

association rule mining in vaers

Figure 1.

925

Flow chart of the stepwise association rule mining algorithm

shown in Figure 2, to combine rules sharing common adverse events and generate clusters of adverse events associated with VaccineX. The algorithm starts by setting k to be the maximum number of PTs on the RHS based on the Step-ARM stopping rule. For the fVaccineX g⇒fPTY 1 ; PTY 2 ; …; PTY k g rules, we combine rules with (k  1) PTs in common to form clusters. For instance, we can combine the rules of fVaccineX g⇒fPTY 1 ; PTY 2 ; PTY 3 ; PTY 4 ; PTY 5 g and fVaccineX g⇒fPTY 1 ; PTY 2 ; PTY 3 ; PTY 4 ; PTY 6 g to form a cluster of fVaccineX g⇒fPTY 1 ; PTY 2 ; PTY 3 ; PTY 4 ; PTY 5 ; PTY 6 g because the two rules have 4 PTs in Copyright © 2015 John Wiley & Sons, Ltd.

common. As a rule of thumb, we also remove clusters with size greater than 10 because clusters with large numbers of PTs may be too general for medical evaluation purposes. While the strongest rule is obtained for each cluster, we would keep one cluster with the greatest lift if the clusters share the same strongest rule. This process is then repeated for rules with (k  1) PTs and so on. The process terminates when k = 3 because it is meaningless to combine rules with 2PTs that share 1PT in common. After combining the rules, we remove redundant clusters, where redundant clusters are defined as clusters that are contained in and have lower lift than Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

926

l. wei and j. scott

Figure 2. Flow chart of post-processing clustering algorithm

other clusters. We then calculate the lift mean for all the Step-ARM-identified rules inside each cluster and use that as a summary interestingness score for the cluster. We also note the strongest rule (by lift) contained in each cluster. In the event of ties, other measures of interest such as support and confidence can be used to find the strongest signal. DATA EXAMPLES Intradermal influenza vaccine The first US trivalent inactivated influenza intradermal vaccine (TIV-ID) formulation was licensed by FDA on 9 May 2011.31 In pre-licensure clinical trials, Copyright © 2015 John Wiley & Sons, Ltd.

TIV-ID elicited a higher proportion of local reactions with greater clinical severity than intramuscular TIV, with the exception of pain.32 However, most of these reactions were self-limited. No other clinically important differences were detected for adverse events.33 We applied Step-ARM and the post-processing algorithm described earlier to VAERS to study the adverse events associated with TIV-ID administered during 1 July 2011 through 28 February 2013 and reported up to 15 March 2013. See Moro et al., for detailed discussions about the clinical review of these reports.33 To compare simple ARM with existing disproportionality analyses, Table 3 shows the top 20 vaccine–adverse event pairs by empirical Bayes geometric mean (EBGM) based on the multi-item gamma Poisson shrinker Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

927

association rule mining in vaers Table 3. Top 20 EBGM vaccine–adverse event pairs in comparison with pairwise association rules of TIV-ID => (one adverse event) Rule of TIV-ID =>

n

Support (E-04)

Confidence (E-02)

Lift

EBGM†

EB05†

Lift rank

EBGM rank

Injection-site nodule Injection-site pruritus Pruritus generalized Drug administered to patient of inappropriate age Extensive swelling of vaccinated limb Pruritus Mass Lymphadenopathy Swelling Dysgeusia Induration Pharyngeal oedema Respiratory tract congestion Paresthesia Visual impairment Chest discomfort Contusion Lymph node pain Hypoaesthesia Palpitations

24 64 13 14 4 62 6 16 38 4 16 7 7 24 6 11 8 4 20 6

5.36 14.30 2.90 3.13 0.89 13.90 1.34 3.57 8.49 0.89 3.57 1.56 1.56 5.36 1.34 2.46 1.79 0.89 4.47 1.34

5.41 14.40 2.93 3.15 0.90 14.00 1.35 3.60 8.56 0.90 3.60 1.58 1.58 5.41 1.35 2.48 1.80 0.90 4.50 1.35

10.25 4.33 4.84 4.16 5.93 2.62 4.35 2.86 2.68 4.03 2.84 2.70 2.98 1.89 3.07 1.99 2.45 3.54 1.75 2.32

7.28 3.95 3.72 2.79 2.56 2.52 2.49 2.34 2.34 2.21 2.16 2.13 2.03 2.03 2.00 1.90 1.88 1.87 1.85 1.80

4.81 3.19 2.23 1.77 1.08 2.03 1.25 1.55 1.78 0.98 1.43 1.15 1.10 1.44 1.04 1.16 1.06 0.85 1.28 0.93

1 5 3 6 2 17 4 12 15 7 13 14 10 36 9 31 20 8 42 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

EBGM, empirical Bayes geometric mean; TIV-ID, trivalent inactivated influenza intradermal vaccine. The calculations of EBGM and EB05 are described in detail in DuMouchel13 and Szarfman et al.34



(MGPS) algorithm.13,34 The column labeled n represents the count of vaccine–adverse event(s) combinations (i.e., a in the setting of Table 1). Support, confidence, and lift are the measures of interestingness of the rules introduced in Table 2. The rules listed have the TIV-ID vaccine on the LHS and one adverse event on the RHS. The rank of nodes by lift is very close to the rank of nodes by EBGM. The top 10 pairwise association rules by lift are all included within the top 20 EBGM nodes.

In addition to studying the pairwise association between the vaccine and one adverse event, Table 4 gives the top 20 association rules with multiple adverse events on the RHS by applying the Step-ARM algorithm described above. Adverse events within the top 20 EBGM nodes (i.e., in Table 3) are marked with superscript E. Step-ARM was able to put adverse events in a medically coherent context. For example, the second association rule (R2) contains

Table 4. Top 20 association rules of TIV-ID => (multiple adverse events) by the stepwise association rule mining No. R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20

Rule of TIV-ID => E

E

Pain, pruritus , and generalized pruritus E Erythema, local reaction, and lymphadenopathy Immediate post-injection reaction, injection-site swelling, and musculoskeletal pain Dyspnoea, malaise, and rhinorrhea E E Injection-site erythema, injection-site nodule , and injection-site pruritus E Local reaction, pain, and pruritus E Asthenia, fatigue, feeling abnormal, and paraesthesia E Dizziness, fatigue, pain, and pruritus E Injection-site pain, pain, and generalized pruritus E Blister, erythema, pain, and swelling Injection-site erythema, injection-site swelling, insomnia, and pain in extremity Injection-site erythema, injection-site swelling, musculoskeletal pain, and pain in extremity Erythema, skin discoloration, and skin warm E Injection-site erythema, neck pain, and paraesthesia E Asthenia, chills, fatigue, and paraesthesia E Injection-site pain, insomnia, pain in extremity, and paraesthesia E Injection-site erythema, injection-site pain, injection-site swelling, pain in extremity, and paraesthesia Cough, hypersensitivity, and urticaria Injection-site pain, injection-site swelling, musculoskeletal pain, and neck pain E Hypoaesthesia , injection-site erythema, and pain

n

Support (E-05)

Confidence (E-03)

Lift

3 3 3 3 17 3 3 3 3 3 3 4 3 3 3 3 3 3 3 5

6.70 6.70 6.70 6.70 38.00 6.70 6.70 6.70 6.70 6.70 6.70 8.94 6.70 6.70 6.70 6.70 6.70 6.70 6.70 11.20

6.76 6.76 6.76 6.76 38.30 6.76 6.76 6.76 6.76 6.76 6.76 9.01 6.76 6.76 6.76 6.76 6.76 6.76 6.76 11.30

100.81 75.61 75.61 75.61 53.55 50.40 50.40 50.40 43.20 37.80 37.80 36.66 33.60 33.60 33.60 33.60 33.60 30.24 30.24 29.65

EBGM, empirical Bayes geometric mean. Note that adverse events within the top 20 EBGM nodes (i.e., in Table 3) are marked with superscript E.

Copyright © 2015 John Wiley & Sons, Ltd.

Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

928

l. wei and j. scott

adverse events related to skin problems, and R5 contains three symptoms of injection-site reactions. Note that it is usual to find some rules that share adverse events in common among the Step-ARM-identified rules. For instance, R1 and R6 share two adverse events (pain and pruritus). Figure 3 is a graphical display of the top 20 association rules of TIV-ID. Each vertex represents a rule, and each adverse event is displayed in a rectangle. The arrows connect each rule with the adverse events that appear on its RHS. We next applied the post-processing algorithm described in Figure 2, which identified 11 clusters as shown in Table 5. These clusters are formed by combining rules with adverse events in common. The algorithm is thereby able to put groups of medical coherent adverse events together as clusters and provide a clearer view of the associations between TIV-ID and adverse events, see Figure 4 for details. We list the strongest rule corresponding to each cluster in Table 5, which may be useful for case identification by medical reviewers. For instance, after considering the strongest rule in the first cluster (TIV-ID => {erythema, local reaction, lymphadenopathy}), the corresponding reports can be extracted for further investigation.

Measles, mumps, rubella, and varicella vaccine Measles, mumps, rubella, and varicella (MMRV) vaccine was licensed in the USA in September 2005 and is recommended for routine childhood vaccination by CDC’s Advisory Committee on Immunization Practices.35,36 There is a known small risk of febrile convulsion associated with MMRV vaccination, as described in the CDC Vaccine Information Statement and investigated in several publications.37–39 The febrile convulsion risk is an interesting case study for StepARM, as it can appear in VAERS reports in several ways, including as separate preferred terms “pyrexia” and “convulsion” or as the specific preferred term, “febrile convulsion.” We applied Step-ARM and the post-processing algorithm described in Methods section to VAERS reports associated with MMRV administration between 1 October 2005 and 31 December 2014, and reported up to 31 December 2014. Tables 6–8 show the results of pairwise vaccine– adverse event signals by EBGM and vaccine–adverse event group association rules and clusters, respectively. In Table 6, the strongest vaccine–adverse event pairs by EBGM score also have high measures of lift, and the top 9 pairwise association rules by lift are found within the top 20 EBGM nodes. However, none

Figure 3. Graph-based visualization of top 20 association rules of TIV-ID => (multiple adverse events) by the stepwise association rule mining (Step-ARM) algorithm: Each vertex represents a rule, and each adverse event is displayed in a rectangle; the arrows connect each rule with the adverse events that appear on its right-hand side

Copyright © 2015 John Wiley & Sons, Ltd.

Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

929

association rule mining in vaers

Table 5. Eleven clusters of adverse events associated with TIV-ID vaccine by the post-processing algorithm and the strongest rules of TIV-ID => (multiple adverse events) corresponding to clusters found by the post-processing algorithm No.

Cluster

Lift mean

Cluster length

C1

Erythema, local reaction, and lymphadenopathy

17.63

3

C2 C3

Dyspnoea, malaise, and rhinorrhea Burning sensation, chest discomfort, cough, dyspnoea, hypersensitivity, pruritus, and urticaria Chills, cough, and respiratory tract congestion

14.54 7.29

3 7

4.38

3

3.57 3.47

4 7

3.45

4

3.39

C4 C5 C6 C7 C8 C9 C10 C11

Dizziness, fatigue, pain, and pruritus Blister, erythema, induration, pain, pruritus, skin warm, and swelling Injection-site pain, injection-site swelling, musculoskeletal pain, and neck pain Erythema, induration, injection-site erythema, injection-site pain, and swelling Asthenia, chills, dizziness, fatigue, feeling abnormal, headache, and paraesthesia Erythema, injection-site swelling, peripheral oedema, pain, and skin warm Asthenia, dizziness, headache, nausea, and pain

Strongest rule

Lift of the strongest rule

Erythema, local reaction, and lymphadenopathy Dyspnoea, malaise, and rhinorrhea Cough, hypersensitivity, and urticaria Chills, cough, and respiratory tract congestion Pain and pruritus Blister and swelling

75.61

11.41

5

Injection-site swelling and musculoskeletal pain Induration and injection-site pain

2.50

7

Feeling abnormal and paraesthesia

6.30

2.30

5

Pain and skin warm

4.43

1.74

5

Dizziness and pain

3.21

75.61 30.24 13.15 7.15 7.56

7.47

TIV-ID, trivalent inactivated influenza intradermal vaccine.

Figure 4. Graph-based visualization of 11 clusters of adverse events associated with trivalent inactivated influenza intradermal vaccine by the post-processing algorithm: Each vertex represents a cluster, and each adverse event is displayed in a rectangle; the arrows connect each cluster with corresponding adverse events

of the PTs most closely associated with febrile convulsion appear in the top 20 EBGM nodes. A different picture emerges from the Step-ARM vaccine-symptom group rules reported in Table 7: Two of the top 10 rules (R7 and R10) are closely related to febrile convulsion. The stronger of these two rules includes the separate Copyright © 2015 John Wiley & Sons, Ltd.

terms of pyrexia and convulsion but not the specific term febrile convulsion, suggesting that the multivariate approach is appropriately sensitive to a more complex signal. None of the five post-processing clusters shown in Table 8 include terms closely related to febrile Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

930

l. wei and j. scott

Table 6. Top 20 EBGM vaccine–adverse event pairs in comparison with pairwise association rules of MMRV => (one adverse event) Rule of MMRV => Incorrect product storage Cold compress therapy Incorrect dose administered Varicella Incorrect storage of drug Rash morbilliform Measles No adverse event No adverse effect Autism Overdose Autism spectrum disorder Drug administered to patient of inappropriate age Rash vesicular Injection-site bruising Varicella post vaccine Rash maculopapular Heat therapy Skin tightness Wrong drug administered

n

Support (E-04)

Confidence (E-02)

Lift

EBGM†

EB05†

Lift rank

EBGM rank

51 37 156 94 201 49 15 548 18 20 10 10 62

2.09 1.52 6.41 3.86 8.26 2.01 0.62 22.51 0.74 0.82 0.41 0.41 2.55

1.18 0.85 3.59 2.17 4.63 1.13 0.35 12.63 0.41 0.46 0.23 0.23 1.43

7.77 8.83 5.04 5.87 3.03 6.56 6.90 2.82 5.02 5.53 6.16 6.45 2.87

15.00 8.51 6.10 5.88 5.73 5.10 5.09 4.79 4.34 4.19 4.06 3.81 3.53

11.84 6.32 5.31 4.90 5.08 3.94 2.99 4.46 2.79 2.78 2.20 2.10 2.85

2 1 11 7 27 4 3 32 12 9 6 5 31

1 2 3 4 5 6 7 8 9 10 11 12 13

80 28 53 69 5 13 150

3.29 1.15 2.18 2.83 0.21 0.53 6.16

1.84 0.65 1.22 1.59 0.12 0.30 3.46

2.97 2.74 3.40 3.26 5.84 2.96 2.52

3.40 3.24 3.12 2.95 2.94 2.89 2.81

2.82 2.35 2.48 2.41 1.32 1.80 2.45

28 33 19 22 8 29 40

14 15 16 17 18 19 20

EBGM, empirical Bayes geometric mean; MMRV, measles, mumps, rubella, and varicella. The calculations of EBGM and EB05 are described in details in DuMouchel13 and Szarfman et al.34



Table 7. Top 20 association rules of MMRV => (multiple adverse events) by the stepwise association rule mining No. R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20

Rule of MMRV => E

Incorrect storage of drug and wrong drug administered E E Incorrect storage of drug, no adverse event , and wrong drug administered E Cold compress therapy , injection-site erythema, injection-site swelling, and injection-site warmth E E Overdose and wrong drug administered Immunoglobulins and petechiae E E E Incorrect dose administered , no adverse event , and wrong drug administered Convulsion, intubation, and pyrexia E Aphasia, autism , and pyrexia E Cold compress therapy and injection-site induration Convulsion, febrile convulsion, pyrexia, and tonic clonic movements E Abnormal behaviour, autism , and pyrexia Abnormal behaviour, irritability, lethargy, and pyrexia Convulsion, febrile convulsion, pyrexia, and status epilepticus Lethargy, pyrexia, and generalized rash Abscess, cellulitis, and injection-site swelling E Cold compress therapy and pyrexia Measles and generalized rash E Injection-site bruising , injection-site erythema, injection-site swelling, and injection-site warmth Injection-site papule and pyrexia E Measles and rash

n

Support (E-05)

Confidence (E-03)

Lift

6 5 14 6 5 8 5 6 5 6 5 6 6 7 5 5 6 9 5 6

2.46 2.05 5.75 2.46 2.05 3.29 2.05 2.46 2.05 2.46 2.05 2.46 2.46 2.88 2.05 2.05 2.46 3.70 2.05 2.46

1.38 1.15 3.23 1.38 1.15 1.84 1.15 1.38 1.15 1.38 1.15 1.38 1.38 1.61 1.15 1.15 1.38 2.07 1.15 1.38

56.10 56.10 27.08 21.04 20.04 17.95 17.53 16.83 14.03 13.46 12.20 11.61 11.61 11.55 10.79 10.39 10.20 10.10 9.35 9.10

MMRV, measles, mumps, rubella, and varicella; EBGM, empirical Bayes geometric mean. Note that adverse events within the top 20 EBGM nodes (i.e., in Table 6) are marked with superscript E.

convulsion. This is due to our practice (described in the Association rules post-processing section) of deleting clusters that have length greater than 10 in the post-processing algorithm. There were three clusters deleted for this reason, all of which were related to febrile convulsion. This suggests that further refinements of the post-processing algorithm are required to deal with clusters that are too long. Copyright © 2015 John Wiley & Sons, Ltd.

SIMULATION In order to evaluate the sensitivity and specificity of the proposed post-processing algorithm of the rules, we created a series of simulated SRS databases with known signals. The simulations were created using a strategy described by Scott et al., in which the SRS database grows over time following a preferential Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

931

association rule mining in vaers

Table 8. Five clusters of adverse events associated with MMRV vaccine by the post-processing algorithm and the strongest rules of MMRV => (multiple adverse events) corresponding to clusters found by the post-processing algorithm No.

Cluster

Lift mean

Cluster length

C1

Incorrect dose administered, incorrect storage of drug, no adverse event, and wrong drug administered Abscess, cellulitis, and injection-site swelling

15.95

4

3.74

3

2.95

3

2.38

3

2.18

4

C2 C3 C4 C5

Idiopathic thrombocytopenic purpura, petechiae, and pyrexia Injection-site reaction, injection-site swelling, and macular rash Abnormal behaviour, irritability, lethargy, and pyrexia

Lift of the strongest rule

Strongest rule Incorrect storage of drug, no adverse event, and wrong drug administered Abscess, cellulitis, and injection-site swelling Idiopathic thrombocytopenic purpura, petechiae, and pyrexia Injection-site reaction, injection-site swelling, and macular rash Abnormal behaviour and lethargy

56.10 10.79 6.47 5.19 3.94

MMRV, measles, mumps, rubella, and varicella.

attachment mechanism.40 Our simulated networks each contain 36 000 reports; the signal consists of eight adverse events associated with one vaccine. We applied Step-ARM and the post-processing algorithm described earlier to create association rules and cluster them to reduce redundancy. We are interested in studying the associations between the signal vaccine and the multiple adverse events found in the clusters. We calculated the percentage of signals detected in each simulation (that is, the proportion of the eight possible signal PTs included in clusters; Table 9) and the number of non-signal adverse events included in the clusters based on 5000 simulations (Table 10). Note that we excluded simulated SRS databases if there were fewer than three reports corresponding to the true signal; each of the 5000 simulations contained at least three such reports. From Table 9, the probabilities of detecting one to three adverse events are higher than 0.95, indicating that the proposed post-processing algorithm is sensitive to detecting at least partial symptom groups. Table 10 shows the probability of having non-signal PTs in the clusters. The probability of including zero non-signal PTs is 0.035. The simulation results suggest that it would be common for the post-processing

Table 9. Probability of detecting given percentages of the signal PTs in the clusters using the post-processing algorithm Percentage 12.5% (≥1PT) 25.0% (≥2PT) 37.5% (≥3PT) 50.0% (≥4PT) 62.5% (≥5PT) 75.0% (≥6PT) 87.5% (≥7PT) 100% (==8PT) PTs, preferred terms.

Copyright © 2015 John Wiley & Sons, Ltd.

Probability 1.000 0.995 0.958 0.848 0.771 0.685 0.587 0.450

Table 10. Probability of finding given numbers of non-signal PTs in the clusters using the post-processing algorithm n = number of non-signal PTs 0 1 2 3 4 5 6

Probability 0.035 0.133 0.201 0.155 0.127 0.091 0.079

PTs, preferred terms.

algorithm to include one or more non-signal adverse events in the clusters. DISCUSSION The Step-ARM algorithm presented in this paper is sensitive to potentially complex signals, which may be particularly important when monitoring novel medical countermeasures products such as pandemic influenza vaccines. Step-ARM provides a solution to detect and study potential multi-symptom events in a medically coherent context. In the MMRV example, we saw that Step-ARM was more successful in identifying the known risk of febrile convulsion than univariate methods. The computationally efficient Step-ARM algorithm improves upon simple ARM by allowing no limitation of items on the RHS of the rules. In some cases, it may be of interest to preferentially investigate rules with smaller numbers of terms. The StepARM algorithm makes it possible to study rules with a specific number of PTs on RHS. It would also be possible to modify the ranking metric to preferentially promote rules with fewer terms, for instance by dividing lift by a factor proportional to the number of terms in the rule. Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

932

l. wei and j. scott

The proposed post-processing algorithm for association rules significantly reduces the number of redundant association rules found by Step-ARM and can be used to identify a set of patterns that may merit further investigation. Graph-based visualization of the association rules and clusters can provide clear pictures about associations among the adverse events to physicians, providing a context for understanding simpler bivariate associations that may be identified with other signal detection approaches. In addition, the approach can be easily extended to introduce covariates such as gender and age or study vaccine–vaccine interactions by placing covariates/items on the RHS/LHS of the rule. We also plan as future work further refinements of the post-processing algorithm to group adverse events in a medically coherent context based on measures of rule distance or similarity. Bayesian hierarchical models can also be introduced into ARM to “borrow strength” using conditions of similar individuals.41 Lastly, we also believe that the methodology could be applied to larger SRS datasets such as FAERS. The computation time required to apply Step-ARM for the TIV-ID example (44 759 reports) was approximately one hour, using 100 cores in parallel on FDA highperformance computing clusters. For the MMRV example (243 476 reports), Step-ARM required approximately three hours using 500 cores in parallel on FDA high-performance computing clusters.

CONFLICT OF INTEREST The authors have no conflicts of interest to disclose. KEY POINTS

• • • •

Typical approaches to data-mining spontaneous adverse event reporting systems rely on bivariate product–adverse event associations and may not be sensitive to more complex signals. Our novel stepwise association rule mining approach provides a computationally feasible approach to multivariate signal detection in adverse event spontaneous reporting systems. By post-processing the association rules into clusters, we can remove redundancy and place overlapping patterns of adverse events into context. In two data examples and in simulations, we demonstrate that the Step-ARM approach is able to detect medically coherent potential vaccinesymptom group associations and clusters.

Copyright © 2015 John Wiley & Sons, Ltd.

ETHICS STATEMENT The authors state that no ethical approval was needed. ACKNOWLEDGEMENTS This project was supported by the FDA Medical Countermeasures Initiative via an appointment to the Research Participation Program at the Center for Biologics Evaluation and Research administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration. We are especially grateful to an anonymous referee and to Dr. Martin Kulldorff for their helpful suggestions, which significantly improved our manuscript.

REFERENCES 1. Wysowski DK, Swartz L. Adverse drug event surveillance and drug withdrawals in the United States, 1969–2002: the importance of reporting suspected reactions. Arch Intern Med 2005; 165(12): 1363–1369. 2. Chen RT, Rastogi SC, Mullen JR, et al. The Vaccine Adverse Event Reporting System (VAERS). Vaccine 1994; 12(6): 542–550. 3. Lindquist M. VigiBase, the WHO global ICSR database system: basic facts. Drug Inf J 2008; 42(5): 409–419. 4. Hauben M. Signal detection in the pharmaceutical industry. Drug Saf 2007; 30(7): 627–630. 5. Huang L, Guo T, Zalkikar JN, Tiwari RC. A review of statistical methods for safety surveillance. TIRS 2014; 48(1): 98–108. 6. Evans SJW, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf 2001; 10(6): 483–486. 7. Rothman KJ, Lanes S, Sacks ST. The reporting odds ratio and its advantages over the proportional reporting ratio. Pharmacoepidemiol Drug Saf 2004; 13(8): 519–523. 8. Greenwood PE. A Guide to Chi-squared Testing (Vol. 280). John Wiley & Sons: New York, 1996. 2 9. Yates F. Contingency tables involving small numbers and the χ test. J Roy Stat Soc Suppl 1934; 1(2): 217–235. 10. Huang L, Zalkikar J, Tiwari RC. A likelihood ratio test based method for signal detection with application to FDA’s drug safety data. J Am Stat Assoc 2011; 106(496): 1230–1241. 11. Bate A, Lindquist M, Edwards IR, et al. A Bayesian neural network method for adverse drug reaction signal generation. Eur J Clin Pharmacol 1998; 54(4): 315–321. 12. Norén GN, Bate A, Orre R, Edwards IR. Extending the methods used to screen the WHO drug safety database towards analysis of complex associations and improved accuracy for rare events. Stat Med 2006; 25(21): 3740–3757. 13. DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Am Stat 1999; 53(3): 177–190. 14. Huang L, Zalkikar J, Tiwari RC. Likelihood ratio test based method for signal detection in drug classes using FDA’s AERS database. J Biopharm Stat 2013; 21: 178–200. 15. Harpaz R, DuMouchel W, Shah NH, Madigan D, Ryan P, Friedman C. Novel data-mining methodologies for adverse drug event discovery and analysis. Clin Pharmacol Therapeut 2012; 91(6): 1010–1021. 16. Tse A, Tseng H, Greene S, Vellozzi C, Lee G, VSD Rapid Cycle Analysis Influenza Working Group. Signal identification and evaluation for risk of febrile seizures in children following trivalent inactivated influenza vaccine in the Vaccine Safety Datalink Project, 2010–2011. Vaccine 2012; 30(11): 2024–2031. 17. Ball R, Botsis T. Can network analysis improve pattern recognition among adverse events following immunization reported to VAERS? Clin Pharmacol Therapeut 2011; 90(2): 271–278. 18. Genkin A, Lewis DD, Madigan D. Large-scale Bayesian logistic regression for text categorization. Technometrics 2007; 49(3): 291–304. 19. Caster O, Norén GN, Madigan D, Bate A. Large-scale regression-based pattern discovery: the example of screening the WHO global drug safety database. Stat Anal Data Min 2010; 3(4): 197–208.

Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

association rule mining in vaers 20. Harpaz R, Chase HS, Friedman C. Mining multi-item drug adverse effect associations in spontaneous reporting systems. BMC Bioinformatics 2010; 11(Suppl 9): S7. 21. Harpaz R, Perez H, Chase HS, Rabadan R, Hripcsak G, Friedman C. Biclustering of adverse drug events in the FDA’s spontaneous reporting system. Clin Pharmacol Therapeut 2011; 89(2): 243–250. 22. Zhang C, Zhang S. Association Rule Mining: Models and Algorithms. SpringerVerlag, 2002. 23. Geng L, Hamilton HJ. Interestingness measures for data mining: a survey. ACM Comput Surv 2006; 38(3): 9. 24. Tan PN, Kumar V, Srivastava J. Selecting the right objective measure for association analysis. Inform Syst 2004; 29(4): 293–313. 25. Brown EG, Wood L, Wood S. The medical dictionary for regulatory activities (MedDRA). Drug Saf 1999; 20(2): 109–117. 26. Rouane-Hacene M, Toussaint Y, Valtchev P. Mining Safety Signals in Spontaneous Reports Database Using Concept Analysis. Springer: Berlin, Germany, 2009. 27. K Fan, X Sun, Y Tao, et al. High-performance signal detection for adverse drug events using MapReduce paradigm. AMIA Annual Symposium Proceedings, p. 902, 2010. 28. Pool V, Chen R. Association rule discovery as a signal generation tool for the Vaccine Adverse Event Reporting System. Pharmacoepidemiol Drug Saf 2003; 11: S57. 29. Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. ACM SIGMOD 1993; 22(2): 207–216. 30. R Agrawal and R Srikant. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, 1994. 31. FDA, May 9, 2011 approval letter—Fluzone Intradermal, 2011. [Online]. Available at: http://www.fda.gov/BiologicsBloodVaccines/Vaccines/ApprovedProducts/ucm255160.htm [Accessed 21 May 2015]. 32. FDA, Sanofi Pasteur 390 Fluzone Intradermal, 2014. [Online]. Available at: http://www.fda.gov/downloads/BiologicsBloodVaccines/Vaccines/ ApprovedProducts/UCM305080.pdf [Accessed 21 May 2015].

Copyright © 2015 John Wiley & Sons, Ltd.

933

33. Moro PL, Harrington T, Shimabukuro T, et al. Adverse events after Fluzone® Intradermal vaccine reported to the Vaccine Adverse Event Reporting System (VAERS), 2011–2013. Vaccine 2013; 31: 4984–4987. 34. Szarfman A, Machado SG, O’Neill RT. Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations of drugs and events in the US FDA’s spontaneous reports database. Drug Saf 2002; 25(6): 381–392. 35. FDA, September 6, 2005 approval letter—ProQuad, 2005. [Online]. Available at: http://www.fda.gov/BiologicsBloodVaccines/Vaccines/ApprovedProducts/ ucm188806.htm. [Accessed 11 March 2015]. 36. Centers for Disease Control and Prevention (CDC). Use of combination measles, mumps, rubella, and varicella vaccine: recommendations of the Advisory Committee on Immunization Practices. Department of Health and Human Services, Centers for Disease Control and Prevention. Morbidity and Mortality Weekly Report (MMWR) 2010; 59(RR03): 1–12. 37. CDC, MMRV vaccine information statement, 2010. [Online]. Available at: www. cdc.gov/vaccines/hcp/vis/vis-statements/mmrv.html. [Accessed 11 March 2015]. 38. Klein NP, Fireman B, Yih WK, et al. Measles-mumps-rubella-varicella combination vaccine and the risk of febrile seizures. Pediatrics 2010; 126(1): e1–e8. 39. Jacobsen SJ, Ackerson BK, Sy LS, et al. Observational safety study of febrile convulsion following first dose MMRV vaccination in a managed care setting. Vaccine 2009; 27(34): 4656–4661. 40. Scott J, Botsis T, Ball R. Simulating adverse event spontaneous reporting systems as preferential attachment networks: application to the Vaccine Adverse Event Reporting System. ACI 2014; 5(1): 206–218. 41. McCormick TH, Rudin C, Madigan D. Bayesian hierarchical rule modeling for predicting medical conditions. Ann Appl Stat 2012; 6(2): 652–668. 42. Almenoff JS, DuMouchel W, Kindman LA, Yang X, Fram D. Disproportionality analysis using empirical Bayes data mining: a tool for the evaluation of drug interactions in the post-marketing setting. Pharmacoepidemiol Drug Saf 2003; 12(6): 517–521.

Pharmacoepidemiology and Drug Safety, 2015; 24: 922–933 DOI: 10.1002/pds

Association rule mining in the US Vaccine Adverse Event Reporting System (VAERS).

Spontaneous adverse event reporting systems are critical tools for monitoring the safety of licensed medical products. Commonly used signal detection ...
1MB Sizes 2 Downloads 8 Views

Recommend Documents