Journal of Biomedical Informatics xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin

Using aggregated, de-identified electronic health record data for multivariate pharmacosurveillance: A case study of azathioprine Vishal N. Patel a,b,⇑, David C. Kaelber a,c,d,e,f a

Center for Clinical Informatics Research and Education, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States Center for Proteomics and Bioinformatics, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States c Departments of Information Services, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States d Department of Internal Medicine, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States e Department of Pediatrics, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States f Departments of Epidemiology and Biostatistics, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States b

a r t i c l e

i n f o

Article history: Available online xxxx Keywords: Clinical research informatics De-identified data Systems biology Postmarketing drug surveillance Pharmacosurveillance Azathioprine

a b s t r a c t Objective: To demonstrate the use of aggregated and de-identified electronic health record (EHR) data for multivariate post-marketing pharmacosurveillance in a case study of azathioprine (AZA). Methods: Using aggregated, standardized, normalized, and de-identified, population-level data from the Explore platform (Explorys, Inc.) we searched over 10 million individuals, of which 14,580 were prescribed AZA based on RxNorm drug orders. Based on logical observation identifiers names and codes (LOINC) and vital sign data, we examined the following side effects: anemia, cell lysis, fever, hepatotoxicity, hypertension, nephrotoxicity, neutropenia, and neutrophilia. Patients prescribed AZA were compared to patients prescribed one of 11 other anti-rheumatologic drugs to determine the relative risk of side effect pairs. Results: Compared to AZA case report trends, hepatotoxicity (marked by elevated transaminases or elevated bilirubin) did not occur as an isolated event more frequently in patients prescribed AZA than other anti-rheumatic agents. While neutropenia occurred in 24% of patients (RR 1.15, 95% CI 1.07–1.23), neutrophilia was also frequent (45%) and increased in patients prescribed AZA (RR 1.28, 95% CI 1.22–1.34). After constructing a pairwise side effect network, neutropenia had no dependencies. A reduced risk of neutropenia was found in patients with co-existing elevations in total bilirubin or liver transaminases, supporting classic clinical knowledge that agranulocytosis is a largely unpredictable phenomenon. Rounding errors propagated in the statistically de-identified datasets for cohorts as small as 40 patients only contributed marginally to the calculated risk. Conclusion: Our work demonstrates that aggregated, standardized, normalized and de-identified population level EHR data can provide both sufficient insight and statistical power to detect potential patterns of medication side effect associations, serving as a multivariate and generalizable approach to post-marketing drug surveillance. Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction Azathioprine (AZA), a purine analog widely used in solid organ transplants and autoimmune disorders, is known to induce a spectrum of toxicities. While bone marrow suppression [1,2] and hepatotoxicity [3] – thought to be dose-dependent effects due to the accumulation of 6-thioguanine metabolites – are the most widely recognized side effects, dermatologic [4] and renal effects [5] have Abbreviations: AZA, Azathioprine; EHR, electronic health record.

⇑ Corresponding author. Address: Center for Proteomics and Bioinformatics, 10900 Euclid Ave., BRB 9th Floor, Case Western Reserve University, Cleveland, OH 44106, United States. E-mail address: [email protected] (V.N. Patel).

also been reported. Predictions of adverse reactions can be guided by thiomercaptopurine methyltransferase (TPMT) activity [6,7], though TPMT activity has poor sensitivity as a test for toxicity [8]. As a supplement to measuring TPMT activity, case reports of AZA’s toxicities help document the clinical course of side effects, yet the likelihood of a particular organ system’s involvement is often established on an ad hoc basis from a survey of the literature. Consequently, side effects with a poor showing in the literature are presumed to be rare, though they may, in fact, be common. More generally, tracking side effects via case reports or reporting databases provides inaccurate and incomplete estimates of incidence. While case reports and TPMT activity help to substratify patients into cohorts likely to develop adverse reactions, clinical

1532-0464/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jbi.2013.10.009

Please cite this article in press as: Patel VN, Kaelber DC. Using aggregated, de-identified electronic health record data for multivariate pharmacosurveillance: A case study of azathioprine. J Biomed Inform (2013), http://dx.doi.org/10.1016/j.jbi.2013.10.009

2

V.N. Patel, D.C. Kaelber / Journal of Biomedical Informatics xxx (2013) xxx–xxx

awareness can be further heightened by assembling the frequent sets of clinical patterns that arise in patients prescribed AZA. Studying multivariate side effects allows for a syndromic approach to identifying adverse drug reactions, which can point to clinical patterns for gauging a subgroup’s risk of developing complications. While knowledge of disease presentations and syndromes was classically acquired through prolonged clinical observations, the emergence of electronic health data sources now allow for the possibility of mining such information through retrospective data analysis, based on data captured as part of routine clinical care. In particular, network models have proven useful in mining and organizing patterns of clinical phenomena, with recent applications in the assembly of disease comorbidity networks based on frequencies of ICD-9 (International Classification of Diseases, ninth revision) code co-occurrence [9,10]. These networks of side effects provide a global overview of disease-disease associations and may be useful for lifetime risk calculations, though they have limited utility in informing clinical practice as they lack temporal associations and specific clinical variables. Additionally, there may be significant issues with the accuracy of ICD-9 codes. In the area of pharmacosurveillance, research has classically focused on signal detection from voluntary reporting databases, which are limited by the biases inherent in voluntary submissions of adverse drug event [11]. Methodologically, pharmacosurveillance has largely focused on the extraction of bivariate – i.e. drug-event – signals [11,12], though studies have recently emerged that focus on detecting multivariate patterns through association rule mining [13,14]. In this work, we aim to (1) quantify the incidence of size effects associated with AZA, (2) compare the incidence of side effects associated with AZA to other similar drugs, and (3) combine networkbased approaches with traditional pharmacosurveillance signal detection methods to discover patterns of multiple events associated with AZA that can be clinically informative. As opposed to discovering events in reporting databases, we used a database of aggregated, standardized, normalized, and de-identified population level EHR data from over 10 million patients. Rather than using ICD-9 codes, we use laboratory and vital sign measurements – reliable fields in most EHR systems [15] – to assess the presence of side effects. The use of laboratory and vital sign values from aggregated EHR data avoids the biases of spontaneous reporting systems and the problems with provider-entered ICD-9 codes, while simultaneously increasing our power to detect rare event associations.

2. Materials and methods 2.1. Database description De-identified data was obtained using the Explore application of the Explorys platform (Explorys, Inc.), which places a health data gateway (HDG) server behind the firewall of each participating healthcare organization. After collecting data from a variety of health information systems – electronic health records (EHRs), billing systems, laboratory systems, etc. – the HDG maps the data to informatics ontologies, standardizing and normalizing measurements. Next, the data from each participating healthcare organization is passed into a data grid. A web application allows each healthcare organization to search and analyze the aggregated, standardized, normalized, and de-identified population level data. At the time of this study, the aggregated data grid contained information on more than 10 million patients from multiple, distinct healthcare systems with different EHRs; the EHR serves as the primary medical record within participating institutions. All data used were de-identified to meet Health Insurance Portability

and Accountability Act (HIPAA) and Health Information Technology for Economic and Clinical Health (HITECH) Act standards. Therefore, this work was deemed not to be human studies research by the Institutional Review Board of the MetroHealth System. Business affiliation agreements were in place between all participating healthcare systems and Explorys Inc. regarding contribution of EHR data and the use of these de-identified data. Unified Medical Language System (UMLS) ontologies were used to map EHR data to facilitate searching and indexing. Diagnoses, findings, and procedures were mapped into the systematized nomenclature of medicine-clinical terms (SNOMED-CT) hierarchy [16]. Prescription medication orders were mapped to RxNorm [17]. Laboratory test observations were mapped to logical observation identifiers names and codes (LOINC), established by the Regenstrief Institute [18]. At the time of analysis, the application contained 14,580 records of patients who had ever been prescribed AZA. 2.2. Side effect network analysis Our analysis focused on end-organ dysfunction known to be implicated with AZA, as well as non-specific side effects, such as hypertension, fever, and cell lysis. To study side effect patterns of AZA associated with particular organ systems, we used key reference ranges for lab values as proxies of organ function (Table 1) [19]. Patients were selected who had normal lab values within 90 days prior to being prescribed AZA, and an abnormal measurement within 90 days after being prescribed AZA. As the actual administration or consumption of a medication is often not recorded in EHRs, especially for outpatient medications, we used medication orders as a proxy for drug administration (i.e. the term ‘‘prescribed’’ refers to the date at which the order for ‘‘azathioprine’’ appeared in the patient’s EHR). For neutropenia and neutrophilia, we extended the selection window to patients with a normal neutrophil count within 365 days prior to AZA prescription. To avoid inflating the significance of creatinine measurements due to subgroup effects from pre-existing renal dysfunction, we excluded all patients with an ICD-9 or American Medical Association Common Procedural Terminology (CPT) code mapped to any of the following SNOMED terms: renal impairment, renal failure syndrome, history of kidney transplant, or renal transplant (procedure). 2.3. Control cohort Identification of patients suffering from various side effects is subject to the myriad biases of dealing with EHR data. For instance, an elevation in blood pressure observed after prescribing AZA may be due to external factors (e.g. environment, other drugs) or errors in measurement. Similarly, an elevation in liver enzymes may simply be a phenomenon associated with the underlying autoimmune disease rather than AZA. To address these issues and to calculate the significance of AZA-induced side effects, we assembled a control cohort of patients who experienced abnormal values when administered one of 11 other anti-rheumatic drugs (Table 2). The control drugs were identified by first selecting those drugs tagged with the SNOMED (Systemized Nomenclature of Medicine) code ‘‘anti-rheumatic agents.’’ This unfiltered list of 42 drugs contains agents with both frequent and infrequent side effect profiles, and the inclusion of drugs with infrequent side effects would artificially inflate the significance of side effects associated with AZA. Apart from identifying statistically significant side effects, we also sought to identify clinically relevant side effects. In this regard, it was important to account for the prevalence of ‘‘common’’ side effects, e.g. headache, to judge the relevance of side effects associated with AZA. Thus, to avoid statistical bias and produce clinically relevant

Please cite this article in press as: Patel VN, Kaelber DC. Using aggregated, de-identified electronic health record data for multivariate pharmacosurveillance: A case study of azathioprine. J Biomed Inform (2013), http://dx.doi.org/10.1016/j.jbi.2013.10.009

V.N. Patel, D.C. Kaelber / Journal of Biomedical Informatics xxx (2013) xxx–xxx Table 1 Side effects assessed after AZA prescription. Side effect

Lab value

Abnormal range

Anemia Cell lysis Fever Hepatotoxicity

190 IU/L >37.8 °F AST > 40 IU/L and ALT > 40 IU/L

Hepatotoxicity Hypertension

Hemoglobin (Hgb) Lactate dehydrogenase (LDH) Temperature Aspartate aminotransferase (AST), Alanine aminotransferase (ALT) Total bilirubin (Bili) Blood pressure (BP)

Nephrotoxicity Neutropenia

Creatinine (Cr) Neutrophil count

Neutrophilia

Neutrophil count

>1 mg/dL Systolic > 140 mm Hg or Diastolic > 90 mm Hg >1.5 mg/dL Count < 57% or 70%

Table 2 Distribution of anti-rheumatic agents among the control and AZA cohorts. Drug name (RxCUI)

Control cohort (% Total)

AZA cohort (% Total)

Abatacept (614391) Adalimumab (327361) Azathioprine (1256) Clioquinol (5942) Etanercept (214555) Hydroxychloroquine (5521) Infliximab (191831) Iodoquinol (3435) Leflunomide (27169) Methotrexate (6851) Oxyquinoline (110) Sulfasalazine (9524)

180 3100 3330 100 2920 27,670 3100 5510 1620 21,000 280 6230

80 750 14,270 0 280 2120 1320 50 520 1880 0 630

Total

72,290

(0.2%) (4.3%) (4.6%) (0.1%) (4.0%) (38.3%) (4.3%) (7.6%) (2.2%) (29.0%) (0.4%) (8.6%)

(0.5%) (5.1%) (97.9%)a (0.0%) (1.9%) (14.5%) (9.1%) (0.3%) (3.6%) (12.9%) (0.0%) (4.3%)

14,580

a

The fraction of patients in the AZA cohort prescribed AZA is not 100% since this analysis was performed at a different time than the side effect network analysis. The analysis tool maps patients to the most current instantiation of the patient database, and a discrepancy indicates a change in the underlying database between analyses.

results, we focused on those drugs most likely to produce side effects. Namely, we selected the subset of anti-rheumatic drugs for which at least 5% of all patients prescribed the drug were tagged with the SNOMED code ‘‘Adverse Reaction to Drug’’ and at least 50% were tagged with a SNOMED ‘‘Allergy to Drug’’ code. This approach does not guarantee that a particular drug was responsible for an adverse reaction or a drug allergy; rather, it serves to increase the likelihood of a drug-event association. We also included sulfasalazine, for whom 49.4% of patients prescribed the drug had an ‘‘Allergy to Drug’’ code. Through this process, we arrived at a concise list of 12 anti-rheumatic agents that are prone to inducing toxicity. After manual inspection of this list, homatropine was excluded because it is applied topically, rather than taken orally, to treat rheumatologic sequelae (i.e. uveitis). Overlap is evident between the ‘‘AZA’’ and ‘‘non-AZA’’ cohorts since controlling the AZA cohort for the absence of the other 11 drugs was computationally intractable and yielded cohorts of near-null size, as most patients prescribed AZA have also been prescribed other antirheumatic agents at some point. We performed database searches for subsets of patients experiencing each of the 9 individual side effects (Table 1) and the 36 non-redundant side effect pairs. As over-constraining the search resulted in too few patients, searches were not designed to be mutually exclusive, i.e. a search for patients with abnormal creatinine and abnormal bilirubin was not controlled to rule out those patients with abnormal hemoglobin, abnormal blood pressure,

3

etc. The total number of patients experiencing each individual side effect and each side effect pair is shown in Supplemental Table 1. For the AZA group experiencing two side effects, cohort sizes ranged from 10 to 310 patients. Control cohort sizes ranged from 70 to 1400 patients. We also calculated the number of patients with normal laboratory and vital sign values before and after drug prescription. This value was used in calculating the total number of patients in whom the value was measured, which was necessary for estimating the proportion of patients experiencing a single side effect. 2.4. Side effect network As with previous network models of disease co-occurrence [9,10], we began construction of our network by compiling the frequencies of side effect pairs. To help identify causal relationships between side effects, we then calculated the conditional probability of developing a side effect, as per Bayes’ theorem:

PðAjBÞ ¼

PðA \ BÞ PðBÞ

ð1Þ

where A and B represent two abnormal values for two side effects. To reduce the number of pairwise searches required, we simplified the probability as follows:

PðAjBÞ ¼

#ðA \ BÞ=#ððA \ BÞ [ ðA \ BÞÞ #ðA \ BÞ   #B #B=#ðB [ BÞ

ð2Þ

where #ðA \ BÞ represents the number of patients with abnormal  represents the total number values for both A and B, and #ðB [ BÞ of patients with either an abnormal or normal value for B. The right-hand side approximation is possible if the total number of patients in the denominator of P(B) is similar to the total number for the numerator, PðA \ BÞ. We found this to be true for our data, where both denominators were of similar order. To normalize the probability of AZA event associations by an appropriate background, we calculated the relative risk of a side effect association in the AZA cohort relative to the control cohort, assessing significance using the 95% confidence interval. This approach is akin to the proportional reporting ratio commonly used in pharmacovigilance [11]. Cells in Table 3 are color-coded according to the relative risk induced by AZA relative to the control drugs. 2.5. Error propagation analysis Since cohort numbers reported were rounded to the nearest ten (for full statistical de-identification), we calculated the uncertainty propagated by this rounding error. Since the relative risk is based on divisions of proportions, we estimated the error carried forth in division (or multiplication) of two proportions, z = x/y, as:

 2  2  2 Dz Dx Dy ¼ þ z x y

ð3Þ

where Dx and Dy are the uncertainty in the proportions x and y, respectively, and Dz is the uncertainty in the relative risk [20]. The uncertainties of x and y are both equal to 5 in this case (as cohort sizes were rounded to the nearest 10). The contribution of rounding error to the relative risk is reported in Table 4. 3. Results 3.1. Incidence of individual side effects The proportion of patients experiencing side effects 90 days after prescription of AZA appears in Table 3. Side effect pairs with an increased risk of co-occurrence under AZA are highlighted, with

Please cite this article in press as: Patel VN, Kaelber DC. Using aggregated, de-identified electronic health record data for multivariate pharmacosurveillance: A case study of azathioprine. J Biomed Inform (2013), http://dx.doi.org/10.1016/j.jbi.2013.10.009

4

V.N. Patel, D.C. Kaelber / Journal of Biomedical Informatics xxx (2013) xxx–xxx

Table 3 Frequencies and conditional probabilities of side effects induced by AZA.

a

The diagonal represents the proportion of patients experiencing a single side effect. The off-diagonal elements represent the conditional probability, P(column|row). The relative risk of developing a side effect pair (relative to any one of 12 anti-rheumatic drugs) is indicated by the cell color; only those side effect pairs with a statistically significant (95% confidence interval) increased risk are highlighted.

b

the risk calculated relative to the proportion of patients prescribed one of the 11 other anti-rheumatic drugs. Proportions were calculated relative to the side effect on the row, P(column|row). For each row, the proportion of AZA patients suffering from that individual side effect (along the diagonal) is calculated relative to the total number of patients given AZA that had this particular laboratory or vital sign value measured, i.e. normal and abnormal combined. While a single laboratory or vital sign measurement may not have been identified as occurring more frequently in AZA patients, the frequency may be increased when viewed in conjunction with other laboratory or vital sign values, and these pairings are found in the off-diagonal elements of Table 3. Among isolated lab or vital sign values along the diagonal of Table 3, we first note that renal dysfunction (measured by elevated creatinine) is infrequent, with only 7.9% of patients developing a measured creatinine having an abnormal value (i.e. >1.5 mg/dL). The relative risk of nephroxicity in patients prescribed AZA was not statistically significantly greater than patients prescribed other anti-rheumatic drugs (RR 1.19, 95% CI 0.99–1.44). The proportion of patients suffering from hepatotoxicity as measured by either elevated transaminases or bilirubin

was 14.1%. However, neither elevated transaminases (RR 0.99, CI 0.86–1.14) nor an elevated bilirubin (RR 1.01, CI 0.88–1.16) occur as isolated events more frequently in patients prescribed AZA than other anti-rheumatic agents in our data. AZA-associated fever, defined in our study as a temperature >37.8 °C, occurred at a rate of 13.1% of AZA patients (RR 1.31, CI 1.18– 1.44), more than three times the previously reported incidence of fever of 4.2% [21]. Given AZA’s well-known side effect of bone marrow suppression, we successfully identified the significantly increased risk of neutropenia in 24% of AZA users (RR 1.15, CI 1.07–1.23). Anemia is highly prevalent (28%) in our cohort, but AZA does not demonstrate an increased risk relative to other anti-rheumatic drugs (RR 0.97, CI 0.90–1.05). Neutrophilia, thought to arise either in direct response to the drug or from bone marrow stimulation due to excessive hemolysis [4], was also frequent (45.2%) and significant (RR 1.28, CI 1.22– 1.34). Among the non-specific laboratory and vital sign values (temperature, blood pressure, and lactate dehydrogenase), over a quarter (30%) of patients experienced a spike in blood pressure after being prescribed AZA, equivalent to controls (RR 1.01, CI

Table 4 Contribution of rounding error to uncertainty in relative risk.

a

Uncertainties greater than 0.10 are highlighted in gray; this cutoff was chosen because the smallest cohort size was two significant digits. b Two cells marked as ‘‘not a number’’ (NaN) had cohorts too small to report and were estimated as zero. c Neutrophilia and neutropenia are mutually exclusive.

Please cite this article in press as: Patel VN, Kaelber DC. Using aggregated, de-identified electronic health record data for multivariate pharmacosurveillance: A case study of azathioprine. J Biomed Inform (2013), http://dx.doi.org/10.1016/j.jbi.2013.10.009

V.N. Patel, D.C. Kaelber / Journal of Biomedical Informatics xxx (2013) xxx–xxx

5

4. Discussion

Fig. 1. AZA-induced side effect network showing links with significantly increased risk relative to other anti-rheumatic drugs. Lab measurements highlighted in green have an increased risk for occurrence in patients prescribed AZA; gray nodes have a decreased or non-significant risk. The size of a node corresponds to the proportion of patients experiencing that side effect. Edge widths are scaled to the proportion of AZA patients experiencing the side effect pair; edge color corresponds to relative risk. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

0.96–1.05). As expected from in vitro experiments [22], a majority (59%) of patients prescribed AZA experience a rise in LDH, for which the risk may be marginally more significant than other anti-rheumatic agents (RR 1.15, CI 1.00–1.34). 3.2. Side effect network The off-diagonal proportions in Table 3 approximate conditional probabilities, and, thus, the side effect matrix can be viewed as a directed graph, or network (Fig. 1). In the side effect network, the preponderance of inputs into LDH characterizes it as a downstream ‘‘sink’’: a clinical phenomenon whose existence is conditional on several priors. As expected based on the large proportion of patients experiencing neutrophilia, this non-specific side effect also acts as a sink in the network. Neutropenia and elevated transaminases are both disconnected from the other side effects in the network, indicating that, by this approach, these two side effects have no reliable priors upon which clinical predictions can be made. In particular, neutropenia had a significantly reduced risk of occurrence in association with several network variables: elevated transaminases (RR 0.34, CI 0.22–0.52); elevated bilirubin (RR 0.33, CI 0.21–0.51); fever (RR 0.54, CI 0.39–0.74); elevated blood pressure (RR 0.74, CI 0.58–0.94); and anemia (RR 0.49, CI 0.38–0.62). 3.3. Error propagation We calculated how rounding of cohort sizes – part of the statistical de-identification process – may propagate error throughout the analysis, and these results are shown in Table 4, with uncertainties greater than 0.1 highlighted. With the large cohort sizes available for single abnormal laboratory and vital sign searches (e.g. a creatinine elevation alone), the uncertainties introduced by rounding error are small and noncontributory along the diagonal. As expected, the small size of cohorts in whom an abnormal LDH was measured resulted in a high degree of uncertainty in the relative risk calculated. For each side effect pair, the confidence interval based on sampling error was larger than the interval spanned by rounding error.

In this work, we illustrate how exploration of a side effect network using aggregated, standardized, normalized, and deidentified population level EHR data can yield clinically useful patterns for detecting post-marketing drug-side effect associations. In our case study of AZA, we assembled a subset of laboratory and vital sign measurements from aggregated EHR data, and, after calculating conditional probabilities of co-occurrence, compiled a subnetwork of side effects more frequently associated with AZA than with other anti-rheumatologic drugs. In quantifying overall side effects associated with AZA, we found that nephrotoxicity is a rare event (7.9%) that does not occur more frequently in AZA patients than in patients prescribed other anti-rheumatologic agents studied. We also found that our estimated rate of neutropenia is comparable to the rate reported by Salix Pharmaceuticals, Inc. for leukopenia (28%) in rheumatoid arthritis [23]. Neutropenia and/or leukopenia have been previously reported at much lower rates, from 2.9% [24] to 5.5% [25] to 10.5% [21], and the higher incidence of this side effect in our study may be due, in part, to the conservative cutoff used to define the clinical event. In addition, the total number of patients (i.e. the denominator) used for our calculations is based upon patients in whom neutrophils were measured but were found to be normal. Since a neutrophil count was ordered in these patients as part of routine clinical care, the clinician may have been suspecting an abnormality of this laboratory value, and, thus, this subgroup of patients with a ‘‘normal’’ neutrophil count may artificially underestimate the true proportion of patients whose neutrophils were unaffected by AZA. An elevated neutrophil fraction in AZA has received mixed attention over the years, playing a prominent role in a recent dermatologic review [4], while receiving no mention in other case reports [5,26,27]. As a downstream clinical phenomenon, or sink, this finding may have received scant attention [28] for its lack of specificity and ambiguous biological role. However, our findings clearly show that both neutrophilia and neutropenia are relatively frequent entities with distinct patterns uniquely associated with AZA prescriptions. We also found that elevated liver transaminases are no more frequent in patients prescribed AZA (14.1%) than in patients prescribed other anti-rheumatic agents, contrary to reporting trends [26]. This is similar to the rate (13.7%) reported in a pediatric population based on an AST greater than twice normal limits [21], though our incidence is much greater than the rate (5.2%) found by Hindorf et al. [25]. The use of a side effect network allowed us to examine the interplay between organ systems that show differential involvement in the process of AZA toxicity. In particular, we recapitulated the classic clinical knowledge that neutropenia is, indeed, an unpredictable event, as it has no dependencies in our side effect network. Though a lack of statistically significant associations alone does not prove the absence of such associations, neutropenia also had a decreased and statistically significant relative risk of association with several side effects, supporting our claim that patterns of neutropenia are isolated and distinct from other variables examined in our study. Thus, our analysis has the power to segregate side effects by our ability to predict and prevent them. In the future analysis of drug side effects, our work would allow biomedical researchers to limit their scope to clinically unpredictable side effects in order to identify molecular bases for prevention and therapy (e.g. the role of TPMT in predicting bone marrow toxicity in AZA). As mentioned, there may be confounders in the underlying data set that may systematically bias our AZA results. Importantly, the

Please cite this article in press as: Patel VN, Kaelber DC. Using aggregated, de-identified electronic health record data for multivariate pharmacosurveillance: A case study of azathioprine. J Biomed Inform (2013), http://dx.doi.org/10.1016/j.jbi.2013.10.009

6

V.N. Patel, D.C. Kaelber / Journal of Biomedical Informatics xxx (2013) xxx–xxx

analysis of individual edges – i.e. pairwise side effect associations – must proceed with caution, as such correlations are significantly influenced by selection bias. For instance, we found that inclusion of renal transplant patients in our analysis – approximately 10% of our AZA cohort – produced a significant interaction between creatinine elevation and anemia in our side effect network, which may be attributable to hemolytic uremic syndrome following renal transplant [29] rather than AZA prescription per se. These kinds of issues reflect underlying clinical heterogeneities that may be difficult to deduce at the aggregate level [30]. In another example, LDH is not a routine laboratory order; it is more frequently ordered in intensive care units and by physicians who suspect a hemolytic process. Consequently, patient cohorts exhibiting correlations between LDH and other side effects may significantly differ in their clinical characteristics from patient cohorts exhibiting correlations between routine measurements (e.g. blood pressure and temperature). While individual correlations, or edges, are prone to such selection bias, the global structure of the side effect network reflects generalizable processes, reflected in the network’s ability to recapitulate classic clinical knowledge. Our careful choice of controls based on patients’ prescribed medications of the same pharmacological class also serves to minimize concerns of selection bias. The appropriate selection of controls and the reduction of cohort heterogeneity remain open questions in the analysis of aggregated, de-identified, population-level data. Additionally, we recognize that there is cross-contamination between the AZA and control cohorts. While this cross-contamination biases the results toward the null hypothesis, we continued to find statistically significant results between the two groups. Finally, while the rounding error introduced for statistical de-identification appears formidable, we show that the error introduced for cohorts as small as 40 patients is often only a marginal contribution to the calculated relative risk (Table 4). Our work lays the foundation for future explorations of side effect networks using pooled, standardized, normalized, and deidentified population level EHR data. As opposed to the ‘‘global’’ networks assembled in recent studies [9,10] from ICD-9 codes, we illustrate the power of ‘‘local’’ networks focusing on a few, well-defined clinical parameters, based on LOINC or vital sign data. Using the methodology outlined herein, we show that side effect networks can be leveraged to increase post-marketing surveillance of drugs and associated side effects. While this work examined the effects of AZA retrospectively over more than a decade, periodic assessments of EHR data would allow the AZA side effect network (or side effect networks for other drugs) to proactively track clinical phenomena, with increasing statistical power as more EHR data becomes available. The approach presented here identifies associations between prescribed medications and side effects and points to areas to investigate further to understand if true cause and effect exist. The tools and methodologies used here are also very scalable to many other medications and side effects that could be explored for post-market drug surveillance. This case study of AZA demonstrates the growing potential of clinical research informatics tools and methodologies to address important clinical questions. As digital clinical data becomes increasingly available through the proliferation of EHRs, the possibilities for the secondary use of such data are numerous, though contingent upon the development of new tools and methods. Our example for post-marketing drug surveillance demonstrates that clinical research informatics combining the right clinical data (large aggregated, standardized, normalized, and de-identified EHR data), the right tools (a fast, easy-to-use, and HIPAA/HITECH compliant interface), and the right methods (side effect analyses driven by case reports and biological hypotheses) has the ability to transform certain types of clinical research, in this case postmarketing drug surveillance.

Conflict of Interest Neither DCK or VNP or the MetroHealth System have any direct financial ties to Explorys Inc. In exchange for contributing de-identified data to the Explorys network, the MetroHealth System received access to the Explorys Population Explorer tool, which was used to conduct this study. Acknowledgments The authors would like to thank Michael Matheny, John Sedor, Thomas E. Love, Anil Jain, and Jason Gilder for their helpful comments and suggestions on the manuscript. This publication was made possible in part by the Clinical and Translational Science Collaborative of Cleveland, UL1TR000439 from the National Center for Advancing Translational Sciences (NCATS) component of the National Institutes of Health and NIH roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.jbi.2013.10.009. References [1] De Boer NKH, Mulder CJJ, van Bodegraven AA. Myelotoxicity and hepatotoxicity during azathioprine therapy. Neth J Med 2005;63:444–6. [2] Anstey A, Lennard L, Mayou SC, Kirby JD. Pancytopenia related to azathioprine–an enzyme deficiency caused by a common genetic polymorphism: a review. J R Soc Med 1992;85:752–6. [3] Sterneck M, Wiesner R, Ascher N, Roberts J, Ferrell L, Ludwig J, et al. Azathioprine hepatotoxicity after liver transplantation. Hepatology 1991;14: 806–10. [4] Bidinger JJ, Sky K, Battafarano DF, Henning JS. The cutaneous and systemic manifestations of azathioprine hypersensitivity syndrome. J Am Acad Dermatol 2011;65:184–91. [5] Sloth K, Thomsen AC. Acute renal insufficiency during treatment with azathioprine. Acta Med Scand 1971;189:145–8. [6] Colombel J, Ferrari N, Debuysere H, Marteau P, Gendre J, Bonaz B, et al. Genotypic analysis of thiopurine S-methyltransferase in patients with Crohn’s disease and severe myelosuppression during azathioprine therapy. Gastroenterology 2000;118:1025–30. [7] Lennard L. TPMT in the treatment of Crohn’s disease with azathioprine. Gut 2002;51:143–6. [8] Kader HA, Wenner WJ, Telega GW, Maller ES, Baldassano RN. Normal thiopurine methyltransferase levels do not eliminate 6-mercaptopurine or azathioprine toxicity in children with inflammatory bowel disease. J Clin Gastroenterol 2000;30:409–13. [9] Hidalgo CA, Blumm N, Barabási A-L, Christakis NA. A dynamic network approach for the study of human phenotypes. PLoS Comput Biol 2009;5: e1000353. [10] Folino F, Pizzuti C, Ventura M. A comorbidity network approach to predict disease risk. In: Khuri S, Lhotská L, Pisanti N, editors. Inf. technol. bio- med. informatics ITBAM 2010. Berlin Heidelberg: Springer; 2010. p. 102–9. [11] Almenoff JS, Pattishall EN, Gibbs TG, DuMouchel W, Evans SJW, Yuen N. Novel statistical tools for monitoring the safety of marketed drugs. Clin Pharmacol Ther 2007;82:157–66. [12] Harpaz R, DuMouchel W, LePendu P, Bauer-Mehren A, Ryan P, Shah NH. Performance of pharmacovigilance signal-detection algorithms for the FDA adverse event reporting system. Clin Pharmacol Ther 2013;93:539–46. [13] Harpaz R, Chase HS, Friedman C. Mining multi-item drug adverse effect associations in spontaneous reporting systems. BMC Bioinformatics 2010;11: S7. [14] Wang C, Guo X-J, Xu J-F, Wu C, Sun Y-L, Ye X-F, et al. Exploration of the association rules mining technique for the signal detection of adverse drug events in spontaneous reporting systems. PLoS ONE 2012;7:e40561. [15] Navaneethan SD, Jolly SE, Schold JD, Arrigain S, Saupe W, Sharp J, et al. Development and validation of an electronic health record-based chronic kidney disease registry. Clin J Am Soc Nephrol 2010;6:40–9. [16] International health terminology standards development organisation. SNOMED-CT. n.d. [17] Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc 2011;18:441–8. [18] McDonald CJ. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin Chem 2003;49:624–33.

Please cite this article in press as: Patel VN, Kaelber DC. Using aggregated, de-identified electronic health record data for multivariate pharmacosurveillance: A case study of azathioprine. J Biomed Inform (2013), http://dx.doi.org/10.1016/j.jbi.2013.10.009

V.N. Patel, D.C. Kaelber / Journal of Biomedical Informatics xxx (2013) xxx–xxx [19] Kratz A, Ferraro M, Sluss PM, Lewandrowski KB. Case records of the Massachusetts general hospital. Weekly clinicopathological exercises. Laboratory reference values. N Engl J Med 2004;351:1548–63. [20] Lindberg V. Uncertainties. Error propagation. Graphing, and the Vernier Caliper; 2000. [21] Kirschner BS. Safety of azathioprine and 6-mercaptopurine in pediatric patients with inflammatory bowel disease. Gastroenterology 1998;115: 813–21. [22] Tapner MJ, Jones BE, Wu WM, Farrell GC. Toxicity of low dose azathioprine and 6-mercaptopurine in rat hepatocytes. Roles of xanthine oxidase and mitochondrial injury. J Hepatol 2004;40:454–63. [23] Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol 2010;6. [24] Connell WR, Kamm MA, Ritchie JK, Lennard-Jones JE. Bone marrow toxicity caused by azathioprine in inflammatory bowel disease: 27 years of experience. Gut 1993;34:1081–5.

7

[25] Hindorf U, Lindqvist M, Hildebrand H, Fagerberg U, Almer S. Adverse events leading to modification of therapy in a large cohort of patients with inflammatory bowel disease. Aliment Pharmacol Ther 2006;24:331–42. [26] Jeurissen ME, Boerbooms AM, van de Putte LB, Kruijsen MW. Azathioprine induced fever, chills, rash, and hepatotoxicity in rheumatoid arthritis. Ann Rheum Dis 1990;49:25–7. [27] Saway PA, Heck LW, Bonner JR, Kirklin JK. Azathioprine hypersensitivity. Case report and review of the literature. Am J Med 1988;84:960–4. [28] Collison DW, Dahlberg PJ, Hogan JD. Azathioprine hypersensitivity in bullous pemphigoid. J Am Acad Dermatol 1990;23:125–7. [29] Van Buren D, Van Buren CT, Flechner SM, Maddox AM, Verani R, Kahan BD. De novo hemolytic uremic syndrome in renal transplant recipients immunosuppressed with cyclosporine. Surgery 1985;98:54–62. [30] Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB. Using discordance to improve classification in narrative clinical databases: an application to community-acquired pneumonia. Comput Biol Med 2007;37:296–304.

Please cite this article in press as: Patel VN, Kaelber DC. Using aggregated, de-identified electronic health record data for multivariate pharmacosurveillance: A case study of azathioprine. J Biomed Inform (2013), http://dx.doi.org/10.1016/j.jbi.2013.10.009

Using aggregated, de-identified electronic health record data for multivariate pharmacosurveillance: a case study of azathioprine.

To demonstrate the use of aggregated and de-identified electronic health record (EHR) data for multivariate post-marketing pharmacosurveillance in a c...
655KB Sizes 0 Downloads 0 Views