Review Click here for more articles from the symposium

doi: 10.1111/joim.12159

Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how? G. Trifiro1,2, P. M. Coloma1, P. R. Rijnbeek1, S. Romio1,3, B. Mosseveld1, D. Weibel1, J. Bonhoeffer4,5, M. Schuemie1,6,7, J. van der Lei1 & M. Sturkenboom1 From the 1Department of Medical Informatics, Erasmus Medical Center, Rotterdam, the Netherlands; 2Department of Clinical and Experimental Medicine, University of Messina, Messina; 3Department of Clinical and Preventive Medicine, Universit a Milano-Bicocca, Milan, Italy; 4Brighton Collaboration Foundation; 5University Children’s Hospital Basel, University of Basel,Basel, Switzerland; 6Janssen Research 7 and Development LLC, Titusville, NJ, USA; and Observational Medical Outcomes Partnership, Foundation for the National Institutes of Health, Bethesda, MD, USA

Abstract. Trifir o G, Coloma PM, Rijnbeek PR, Romio S, Mosseveld B, Weibel D, Bonhoeffer J, Schuemie M, J van der Lei, M Sturkenboom (Erasmus Medical Center, Rotterdam, the Netherlands; University of Messina, Messina, Italy; Universit a Milano-Bicocca, Milan, Italy; Brighton Collaboration Foundation, Basel, Switzerland; University Children’s Hospital Basel, University of Basel, Basel, Switzerland; Janssen Research and Development LLC, Titusville, NJ, USA; Foundation for the National Institutes of Health, Bethesda, MD, USA). Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how?. (Review) J Intern Med 2014; 275: 551–561. A growing number of international initiatives (e.g. EU-ADR, Sentinel, OMOP, PROTECT and VAESCO) are based on the combined use of multiple healthcare databases for the conduct of active surveillance studies in the area of drug and vaccine safety. The motivation behind combining multiple healthcare databases is the earlier detection and validation, and hence earlier management, of potential safety issues. Overall, the combination of multiple healthcare databases increases statistical sample size and heterogeneity of exposure for postmarketing drug and vaccine safety surveillance, despite posing several technical challenges. Healthcare

Overview of current multidatabase safety studies Knowledge of the safety profile of drugs/vaccines prior to marketing is limited because of the small and selective groups of individuals included in clinical trials. Surveillance (passive and/or active) therefore needs to be continued postmarketing to learn from the effects of drugs in ‘real-world’ practice. The prevailing system of postmarketing surveillance is passive, relying on spontaneous reporting systems (SRSs). These systems are

databases generally differ by underlying healthcare systems, type of information collected, drug/vaccine and medical event coding systems and language. Therefore, harmonization of medical data extraction through homogeneous coding algorithms across highly different databases is necessary. Although no standard procedure is currently available to achieve this, several approaches have been developed in recent projects. Another main challenge involves choosing the work models for data management and analyses whilst respecting country-specific regulations in terms of data privacy and anonymization. Dedicated software (e.g. Jerboa) has been produced to deal with privacy issues by sharing only anonymized and aggregated data using a common data model. Finally, storage and safe access to the data from different databases requires the development of a proper remote research environment. The aim of this review is to provide a summary of the potential, disadvantages, methodological issues and possible solutions concerning the conduct of postmarketing multidatabase drug and vaccine safety studies, as demonstrated by several international initiatives. Keywords: claims database, drug monitoring, electronic health records, product surveillance, postmarketing, vaccine.

based on suspected adverse drug reactions that are reported to national authorities by a variety of individuals, including physicians and other healthcare practitioners as well as patients and even lawyers. However, the reliance of SRSs on voluntary information makes the system susceptible to various limitations, including underreporting, lack of information on the user population and patterns of drug use and reporting bias from excessive media attention or class lawsuits [1–3].

ª 2014 The Association for the Publication of the Journal of Internal Medicine

551

G. Trifir o et al.

The rise in safety-related warnings and market withdrawals of widely used products in the first decade of the new millennium has fuelled efforts to explore other data sources and develop new methodologies. An important resource that has been proposed to have enormous potential for active surveillance is the electronic healthcare record (EHR). Therefore, public and/or private initiatives have been launched worldwide to investigate the secondary use of EHRs for this purpose. These potential resources include detailed clinical information such as patients’ symptoms, physical examination findings, specialist care referrals, diagnostic tests, prescribed medications and other interventions. In addition, data on pharmacy dispensations, diagnostic procedures, hospitalizations and other healthcare services are now routinely recorded electronically by healthcare delivery systems for audit and reimbursement purposes. Data from EHRs reflect actual clinical practice and as such have been employed to characterize healthcare utilization patterns, monitor patient outcomes and carry out formal drug safety studies [4–6]. As routine by-products of the healthcare delivery system, the use of these databases offers the advantage of efficiency in terms of time necessary to conduct a study, manpower and financial costs. Since the 1980s, these types of databases have been used for the evaluation of drug safety issues, but mostly in isolation (each database on its own). In recent years, and building on the ground-breaking example of the Vaccine Safety Datalink (VSD) in the USA [7], there has been growing awareness that collaboration and pooling of data may yield information more rapidly and provide additional advantages. Therefore, several networks of multiple EHR databases have been established to support traditional drug surveillance systems. The sentinel initiative The Sentinel Initiative was established in 2008 after the US Food and Drug Administration (FDA) Amendments Act mandated the creation of a new postmarketing surveillance system utilizing EHRs to prospectively monitor the safety of marketed medical products [8]. Two pilot projects were initiated. First, the Mini-Sentinel (http://minisentinel.org/), which is coordinated by Harvard University and the Federal Partners’ Collaboration, will enable the FDA to examine privately held electronic healthcare data representing over 100 million individuals [9, 10]. Data sources currently 552

ª 2014 The Association for the Publication of the Journal of Internal Medicine Journal of Internal Medicine, 2014, 275; 551–561

Review: Combining multiple healthcare databases

available include administrative claims with pharmacy dispensing data, but information from outpatient and inpatient medical records and disease-/treatment-specific patient registries will be added subsequently. The second pilot project is the vaccine safety activities which together constitute the PostLicensure Rapid Immunization Safety Measurement (PRISM) programme [11]. OMOP The Observational Medical Outcomes Partnership (OMOP, http://omop.fnih.org/) is a US public– private partnership between the FDA, academia, data owners and the pharmaceutical industry and is administered by the Foundation for the National Institutes of Health. It was initiated to identify the needs of an active drug safety surveillance system and to develop the necessary methodologies to enhance secondary use of observational data for maximizing the benefit and minimizing the risk of pharmaceutical agents. Observational Medical Outcomes Partnership’s network of databases consists of commercially licensed databases, as well as university- or practice-based and federal healthcare databases, and contains both administrative claims and medical records [12]. EU-ADR The recently concluded EU-ADR project (Exploring and Understanding Adverse Drug Reactions by Integrative Mining of Clinical Records and Biomedical Knowledge, http://www.euadr-project.org/) was launched in 2008 and was funded by the Directorate General Information Society from the European Commission (EC) under its Seventh Framework Programme [13]. A computerized integrated framework using EHR and biomedical data was developed in EU-ADR for the detection and substantiation of drug safety signals. The network originally comprised eight population-based administrative databases and general practitioner databases in four European countries (Denmark, Italy, the Netherlands and the UK) that were working together in a distributed approach. Other collaborative networks EC-funded projects Recent publicly launched initiatives partly focusing on improving methods for pharmacovigilance and pharmacoepidemiology include the Pharmacoepidemiological Research on Outcomes of Therapeu-

G. Trifir o et al.

tics by a European Consortium (PROTECT) project, which is funded by the Innovative Medicines Initiative and coordinated by the European Medicines Agency (EMA) (http://www.imi-protect.eu/). The Global Research Initiative in Paediatrics network of excellence (GRIP, http://www.grip-network. org/), which is funded by DG Research, Directorate General Research and Innovation, focuses on paediatric clinical pharmacology and the effects of drugs and vaccines in children. Within the GRIP project, a global paediatric pharmacoepidemiology platform is being created and will be tested. This will involve EU databases, Mini-Sentinel and databases from many other countries. There are several other recently concluded or ongoing projects funded by the EC at the request of the EMA in which multiple healthcare databases are combined to evaluate specific safety issues such as: (i) nonsteroidal anti-inflammatory drugrelated gastrointestinal and cardiovascular risks (SOS, http://www.sos-nsaids-project.org/); (ii) the arrhythmogenic risk of drugs (ARITMO, http:// www.aritmo-project.org); (iii) the cardio/cerebrovascular and pancreatic safety of blood glucoselowering agents (SAFEGUARD, http://www. safeguard-diabetes.org/); (iv) the risk of congenital anomalies related to new anti-epileptic agents, insulin analogues, anti-asthmatic drugs and antidepressants (EUROmediCAT, http://www. euromedicat.eu); (v) the long-term adverse effects of methylphenidate in attention deficit and hyperactivity disorder (ADDUCE, http://www.adhdadduce.org); (vi) the safety of biological agents in patients with juvenile idiopathic arthritis (PHARMACHILD); (vii) the safety of epoetins (EPOCAN, http://www.epocan.com); and (viii) risk of cancer associated with insulin analogues (CARING). Each of these projects has organized the collaborative database efforts differently, ranging from fragmented to harmonized approaches. Projects funded by the European Centre for Disease Prevention and Control The European Centre for Disease Prevention and Control (ECDC) has funded two vaccine-related projects that involve collaborative database approaches: the Influenza - Monitoring Vaccine Effectiveness (I-MOVE) consortium coordinated by EpiConcept [14] and the Vaccine Adverse Event Surveillance and Communication project (VAESCO, http://www.vaesco.net) coordinated by the Brighton Collaboration Foundation on the safety of vaccines. Data were derived from eight countries

Review: Combining multiple healthcare databases

to study the association between pandemic influenza vaccine and GBS [15] as well as narcolepsy [16]. Sustainable platforms Recognizing the need to pursue collaboration beyond the lifetime of the EC project, two initiatives were started: the EU-ADR Alliance and VACCINE.GRID. The EU-ADR Alliance The EU-ADR Alliance is a network of databases from the institutions that have been participating in the EU-ADR, SOS, ARITMO and SAFEGUARD projects and have gained experience of collaboration of these distributed networks. Databases other than those involved in these projects may contribute data to the network. The EU-ADR Alliance has conducted several EMA-requested projects (see www.encepp.eu) and creates a one-point access to the conduct of postauthorization studies requested for regulatory purposes and that need the inclusion of multiple countries. VACCINE.Grid A Swiss foundation (www.vaccinegrid.org) was established by the Brighton Collaboration Foundation in recognition of the need to quantify vaccine coverage, effectiveness and safety throughout the vaccine’s life cycle. VACCINE.GRID is a federated network of leading academic and public health organizations experienced in and dedicated to implementing innovative and distributed vaccine effect research for commercial and public funders. Other initiatives In Canada, the Drug Safety and Effectiveness Network (DSEN, http://www.cihr-irsc.gc.ca/e/ 40269.html) was established by the government to augment available evidence on drug safety and effectiveness by leveraging existing resources from ‘real-world’ settings such as the National Prescription Drug Utilization System. The DSEN established a collaborating centre, the Canadian Network for Observational Drug Effect Studies (CNODES), which is a distributed network of investigators and linked databases in British Columbia, Alberta, Saskatchewan, Manitoba, Ontario, Quebec and Nova Scotia. The aim of CNODES is to collect data on around 40 million subjects [17]. ª 2014 The Association for the Publication of the Journal of Internal Medicine Journal of Internal Medicine, 2014, 275; 551–561

553

G. Trifir o et al.

Value of collaboration Strength in numbers The impact of data abundance has become important in fields as varied as science and sport, as well as in public health. In many fields, there has been a drift towards data-driven discovery and decisionmaking [18]. In the context of healthcare and medicine, the availability of more data holds the promise of more information and more statistical power. For active drug surveillance, for example, timely assessment of safety issues requires monitoring of large populations that are representative of the entire spectrum of medication users as well as an extensive observation period, particularly for events that are rare or have a long latency [19, 20]. Examples of the advantages of combining data across countries were provided in the VAESCO project whilst monitoring the risk of GBS and narcolepsy associated with the pandemic influenza vaccine [15, 16, 21]. Moreover, new drugs (or infrequently used drugs for rare diseases) that slowly penetrate the market will require a greater amount of patient data to obtain a significant user population within a reasonable time frame. Although it took 5 years for rofecoxib to be voluntarily withdrawn from the market, it has been suggested that if the medical records of 100 million patients had been available for safety monitoring, the adverse cardiovascular effect would have been discovered in just 3 months, given the drug utilization patterns in the USA [22]. This explains the reason for the Sentinel-mandated population size for surveillance. Database size (variously measured as total population, total follow-up time or total exposure time to drugs) is thus important in understanding the capability of large-scale surveillance [23]. In a previous study, we demonstrated how much leverage the EU-ADR network of databases can provide for monitoring the safety of medicines [24]. We provided estimates of the number and types of drugs that can be monitored in such a system as a function of actual drug use, minimal detectable relative risk and incidence rates (IRs) of outcomes of interest. Given the pooled population-based IRs of six events of interest, estimated directly within EU-ADR, we calculated the total amount of personyears of exposure that would be required to detect an association between a particular drug and a particular event over varying magnitudes of relative risks of 2, 4 and 6 using a one-sided significance level of a = 0.05 and a power of 80% (b = 0.2). The 554

ª 2014 The Association for the Publication of the Journal of Internal Medicine Journal of Internal Medicine, 2014, 275; 551–561

Review: Combining multiple healthcare databases

results of the project showed that the database network may be most useful for the surveillance of more frequently used classes of drugs (>5%) and for outcomes with a high background incidence in the general population (>1/1000). Extrapolating from the drug utilization patterns observed within the databases, we hypothesized that the EU-ADR network would have had enough exposure data to detect an association between rofecoxib and acute myocardial infarction within 2 years after being marketed. On the premise that an increase in the size of the database network would translate to an increase in the power to detect safety signals, a simulation was also performed with the aim of determining how the percentage of drugs that can be monitored would change if more data become available. Simulated analyses showed that even with expansion of the EU-ADR network to 10 times its current size of over 21 million individuals and ~154 million person-years of follow-up, there would still be unmet needs in signal detection and that further collaboration is needed, particularly for special populations such as children [24]. In the SOS project which assembled data on 1.3 million children using nonsteroidal anti-inflammatory drugs (NSAIDs) from four different countries, enough exposure information was available only for ibuprofen (the most frequently used drug) to investigate a weak association (i.e. relative risk of 2) between exposure and asthma exacerbation (the most common serious event evaluated in children). Acknowledging that especially in paediatric therapeutics, much can and needs to be learned from real-life data on drug effects, the GRIP network is currently setting up a global pharmacoepidemiology infrastructure to pool data from various sources. Opportunity in diversity The predisposition towards, and manifestations of, disease also often vary in certain populations, because of ethnicity or exposures that are peculiar to a group. Moreover, different patients react differently to healthcare interventions, including drug therapy and vaccination. One of the presumed benefits of combining disparate healthcare databases for active safety surveillance is the ability to assess exposures to a larger variety of drugs and to characterize use of drugs within a wider range of the population. There is much knowledge and understanding to be gained regarding disease progression and health management

G. Trifir o et al.

from networks of databases that are diverse not only in physical location (i.e. different healthcare systems), but also in structure and content (e.g. outpatient vs. inpatient care, medical records vs. insurance claims). Previously, we showed that patterns of use of NSAIDs varied amongst different countries but were similar amongst different databases in the same country [13]. Moreover, diversity in the implementation of studies and healthcare structures or databases allows investigation of consistency and the effect of different potential biases on the study results. This was demonstrated in the VAESCO project which studied the association between pandemic influenza vaccine and Guillain–Barr e syndrome (GBS). In this project, we were able to look at the effect of different pandemic influenza vaccines due to the different vaccination strategies used in different countries. Whereas the unadjusted estimates in all countries were consistently elevated, adjustment for infections and seasonal influenza vaccine consistently reduced the risk to an overall null result and allowed the results of countries in which these data were not available to be put into perspective [15]. We believe that if one aims to create a comprehensive platform that would allow for monitoring the largest number of drugs possible, the greatest benefit would be achieved by combining databases from many different countries with diverse utilization patterns. Avoiding redundancies in research The increasing complexity of research in healthcare and medicine demands new forms of collaboration to enable cooperation and exchange of expertise across disciplines. Beyond the expected increase in data heterogeneity and statistical power, database networking is also about capacity building and performance efficiency. This was demonstrated in VAESCO by using different models and approaches for data sharing. Andrews et al. utilized different approaches to running the analyses and concluded that collaboration is possible in Europe but that performing separate workup is laborious [25]. Concerted efforts in data retrieval and management as well as method development avoid many redundancies in the conduct of research. Collaboration amongst research institutions and healthcare databases is desirable and is becoming more feasible with the establishment of both the funded

Review: Combining multiple healthcare databases

networks described previously and the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance [26]. These developments have paved the way towards a different paradigm of collaborative studies and have changed the procedures of conducting active surveillance studies. How to work together? Traditionally, pharmaco-epidemiological studies were conducted independently according to different protocols and definitions and later brought together through a meta-analysis of the published effect estimates. Although meta-analyses of different and independent studies in principle allow for pooling of results, much information is lost. The heterogeneity in design and conduct of the studies may limit our ability to combine the studies and interpret results. A more harmonized approach would be to use a common protocol for the different databases that standardizes the definitions and designs and then proceed with complete local elaboration of the data and subsequent analysis. Statistical models or scripts would be implemented locally in each database and only the outcome parameter estimates (e.g. rates and coefficients) would be shared. This approach reduces the heterogeneity between the individual studies and facilitates the interpretation of the combined results. Moreover, the local governance rules are easy to adhere to as data are processed locally. On the other hand, it has the disadvantage that all sites need to be capable of handling and analysing the data and, most importantly, there may be different interpretations and programming of the common protocol. It is often unavoidable that different programming decisions are taken in working up the data, despite a common protocol. This working model was used by Andrews et al. [25], as well as in the TEDDY [27] and PROTECT projects. A further level of harmonization is achieved when not only the protocol, but also the local data elaboration, is harmonized. Because database structures are different, this approach requires a common data model from which the standardized scripts can be launched in a distributed manner. The Mini-Sentinel, VSD, EU-ADR, SOS, ARITMO, SAFEGUARD, GRIP, OMOP and VAESCO projects have taken this approach. Data sharing can be performed at different levels of granularity (ranging from sharing detailed data on individual patients to ª 2014 The Association for the Publication of the Journal of Internal Medicine Journal of Internal Medicine, 2014, 275; 551–561

555

G. Trifir o et al.

sharing only aggregated counts) and is often limited by governance and ethical guidelines and principles. The most conservative approach is to share highly aggregated tables and model coefficients, which is performed for example in the MiniSentinel and CNODES projects. VSD, OMOP and several projects in the EU (SOS, ARITMO, SAFEGUARD and GRIP) have applied a more liberal sharing of de-identified aggregated person-level data in a secure remote research environment. The advantage of this approach is that final analyses and modelling can be shared and rotated amongst participants, which promotes the collaborative nature of the work and limits political issues that may arise in this new socio-technological framework. In addition, there is maximum freedom to explore and analyse. In summary, the concept of bringing data together within and across countries with the purpose of addressing a drug safety issue can be approached in several ways with respect to: (i) harmonization of protocols as well as outcome, exposure and covariate definitions; (ii) harmonization of local elaboration of data; and (iii) sharing of data via a common platform for shared pooled analyses.

Review: Combining multiple healthcare databases

Mini-Sentinel has a more elaborated model [28], with many tables including information on patient demographics, drug dispensing, diagnoses, proceedings and laboratory values. Drugs are coded using the National Drug Code (NDC), and diagnoses and procedures are coded using the International Classification of Diseases, 9th, 10th or 11th revision (ICD-9, 10 or 11). This CDM also includes precomputed tables of aggregates, such as counts of patients enrolled in a certain year, and other data redundancies. Observational Medical Outcomes Partnership has an even more elaborated CDM [29], with the latest version also including tables for storing information on healthcare providers and costs [30]. It describes a fully normalized database, and all data elements are coded using the OMOP standardized terminology [31], which allows automated mapping between different coding systems including ATC, NDC and ICD-9 and ICD-10. However, customized entries have to be added to the standardized terminology to facilitate identification of the health

Distributed and common data models Because local governance rules as well as national law limit sharing of patient-level data, the most practical option is to perform (part of) the analysis locally at each database site then merge the results across databases. As previously mentioned, ideally, this would be performed not only by sharing the same protocol, but also by sharing the analysis code itself, thereby guaranteeing that the analysis is performed uniformly everywhere. This requires that each database represents data in a similar format: a common data model (CDM). Every database network has developed its own CDM. Exploring and Understanding Adverse Drug Reactions (EU-ADR) has the simplest model, requiring the data to be presented in several tables [13], with information on patients and their observation time, exposures, outcomes of interest and potential covariates. Drugs are coded using Anatomical Therapeutic Chemical (ATC) classification, and outcomes and covariates are identified using customized codes that are mapped to different diagnostic coding systems in the databases (see below). 556

ª 2014 The Association for the Publication of the Journal of Internal Medicine Journal of Internal Medicine, 2014, 275; 551–561

Mini-sentinel & OMOP

EU-ADR

Database

Database

CDM

Local

Analysis script Effect estimates

Local

Data files

Analysis script Aggregated data

Combine estimates

Pooled analysis

Effect estimates

Effect estimates

Centralized

Fig. 1 Distributed analysis architecture in Mini-Sentinel and Observational Medical Outcomes Partnership as compared with Exploring and Understanding Adverse Drug Reactions (EU-ADR).

G. Trifir o et al.

Review: Combining multiple healthcare databases

outcomes of interest with definitions different from those in existing coding systems. Once the data are in a CDM, the analysis can be performed. In both Mini-Sentinel and OMOP, the complete analysis is performed at the local site and the resulting effect estimates are combined across databases. As shown in Fig. 1, each analysis is divided into two parts in EU-ADR: the first part is executed locally, and data are aggregated to a level at which individual patients cannot be identified. For simple incidence rate ratios, this could be counts of exposed and unexposed events in various strata of the population, and for other analyses, such as case–control studies, it could be sets of cases and controls with exposure status on the index dates. Aggregated data are then transmitted to a central location, where the second part of the analysis can take place. This division into two parts has the advantage that it is easier to pool the analysis, and that it allows more freedom afterwards to explore the data. Central to EU-ADR is the Java application Jerboa (Fig. 2) [13] that can be configured using a scripting language. The reason for coding Jerboa in Java was to limit demands on local infrastructure and optimize the range of hardware that can be used to run the program. As most epidemiologists will not be able to understand Java, Jerboa produces files in human-readable format that local staff can use for verification purposes. Jerboa is run locally and produces outputs (e.g. incidence rates of events and prevalence rate of prescriptions) as needed,

Fig. 2 Distributed data cessing in Exploring Understanding Adverse Reactions (EU-ADR) and EU-funded projects.

using tables containing information on patients, drug exposures, outcomes and potential covariates as input files. Both Mini-Sentinel and OMOP have developed SAS scripts that can be configured using settings files. These scripts typically require more powerful computers and expensive SAS licenses but have the advantage that epidemiologists are often experienced users of SAS. Both OMOP and EU-ADR have implemented a wide array of analysis designs, including cohort methods using propensity scores, case–control designs and self-controlled case series. How to harmonize data extraction for a CDM? Different coding schemes for medical events [e.g. ICD-9-Clinical Modification (CM) and ICD-10, the International Classification of Primary Care (ICPC) and the READ Code (RCD) classification) and different sources of information (e.g. general practitioners’ records, hospital discharge diagnoses, death registries and laboratory values) are available in various healthcare databases. For this reason, it is not possible to construct a single, completely reusable data extraction algorithm for the medical event search in all the databases. We therefore need to develop a strategy for the translation of codes (or unstructured text) to enable extraction of the same medical events from heterogeneous databases. This translation refers to the adaptation of a common data extraction algorithm based on the coding scheme and type of

proand Drug other

ª 2014 The Association for the Publication of the Journal of Internal Medicine Journal of Internal Medicine, 2014, 275; 551–561

557

G. Trifir o et al.

information available in each database. This challenge was faced in EU-ADR and other EU-funded multidatabase projects (i.e. ARITMO, SAFEGUARD and SOS) as databases from different European countries differ substantially from each other due to differences in healthcare systems, disease coding schemes and natural languages. On the other hand, US multidatabase projects include data sources that are much more comparable with respect to the type of collected information and coding schemes. Specifically, to reconcile differences across terminologies (READ, ICD-9-CM, ICD-10 and ICPC plus free text), the EU-ADR team of researchers built a shared semantic foundation for the definition of events under study [32] by selecting disease concepts from the Unified Medical Language System (UMLS, V.2008AA) [33], and setting up a multistep and iterative process for the harmonization of event data extraction [34]. The sequential steps of this process are described below. Identification of UMLS concepts and projection into different terminologies First, a UMLS concept is identified by a concept unique identifier and describes a single medical concept that can be expressed using different synonyms (terms). In EU-ADR, for each event, a medical definition was first created and, based on that, relevant UMLS concepts describing the event were subsequently identified and projected into different database-specific terminologies. In addition, the labels of the codes were identified and considered by databases for free text search of the events. Definition of data extraction algorithm Secondly, once the relevant diagnostic codes and keywords were identified, for each event, a data extraction algorithm was constructed based on previous studies and on the data providers’ expertise. In addition to the database-specific codes and keywords, additional criteria were considered as needed (e.g. laboratory values for acute liver injury and acute renal failure). The data extraction algorithm was subsequently adapted and used by each database according to the available information. Event data extraction Thirdly, using a common data model, each database created locally standardized input files 558

ª 2014 The Association for the Publication of the Journal of Internal Medicine Journal of Internal Medicine, 2014, 275; 551–561

Review: Combining multiple healthcare databases

(patient, drug and event files), linkable via a patient unique identifier. Subsequently, these input files were managed locally by the purpose-built software Jerboa [13], which generated aggregated and de-identified results (e.g. IR of the events) and then sent in encrypted format to a central repository for evaluation and further analyses. Benchmarking of IRs of events Finally, for each event, benchmarking of databasespecific IRs was conducted. In particular, overall age- and sex-standardized IRs were calculated using Jerboa and compared across different databases. Comparisons of the IRs were particularly investigated amongst databases with similar structure (i.e. general practice and claims databases) and amongst databases of the same country. The observed IRs were likewise compared with those estimated in previous database studies. Outliers were identified through discussion, and important differences with respect to other databases or with previous studies were further investigated. In general, benchmarking of IRs was considered an indirect validation of the translation of the data extraction algorithms into the different databases. Lessons learned Experience from EU-ADR and other European multidatabase projects indicates that translation of data extraction algorithms involves harmonization of different terminologies and disease coding systems as well as customizing queries and benchmarking of IRs of events. This multiple-step process requires expertise from local data providers, as they have the best knowledge of the characteristics and process underlying the data collection in their own database. At the moment, a consensus on the best approach for the translation of data extraction algorithms is not available and should be sought in the near future, in the light of the increasing need to combine multiple, international and heterogeneous databases for the conduct of epidemiological investigations, especially in the area of drug safety. How to store and handle data? The next step in building a database network is ensuring secure storage and use of data. We believe that in large collaborative projects the data contributors in the distributed network approach should not only be data providers but need also to

G. Trifir o et al.

be involved in each of the subsequent steps. To ensure the best possible quality of research, we want to keep data providers engaged in the entire process so that at all stages of a study we can benefit from their input. To facilitate active involvement of a collective of data sources, researchers from the Department of Medical Informatics of Erasmus Medical Center (EMC) in Rotterdam, the Netherlands developed the OCTOPUS infrastructure that allows geographically widespread research sites to collaborate. OCTOPUS is a remote research environment (RRE) that provides secure access to the data to ensure the high level of stored data protection. The infrastructure offers in a remote desktop several analytical tools, word processing software and utilities. It can host multiple research projects, each with its own secured area to share data as well as results. OCTOPUS consists of a windows application server (RRE-APP) and a database server (RRE-DB) located at EMC (see Fig. 3). For security reasons, there are no direct connections between the RRE-APP and RRE-DB. The data sets of a specific study are thus archived on the database server and are not directly accessible to OCTOPUS users. The system administrator extracts the data from the centralized MySQL database and prepares a data set on RRE-APP that is accessible only to the partners involved in the specific study and work package. Access to the application server is implemented by remote desktop protocol and is only allowed from the pool of Internet Protocol (IP) addresses of the participating institutes. The users are authenticated with a USB token and a unique PIN code. The remote desktop of the users is restricted by disabling the option to copy/paste files between the remote session and the partners’ local computer. Furthermore, all

Review: Combining multiple healthcare databases

project partners’ computer devices (e.g. printers and storage) are disabled in the remote session. Any misconduct or violation of security principles will be notified immediately to the project coordinator and project manager. All users sign a confidentiality agreement for the use of OCTOPUS. OCTOPUS is currently used in a number of drug safety projects (e.g. SOS, ARITMO and SAFEGUARD). For example, in the covariate harmonization step in the ARITMO project, the Jerboa output was uploaded to OCTOPUS by each of the data providers. Subsequently, each member of the research group was assigned a set of covariates to analyse and had to create a summary report in the shared folder. After numerous iteration steps, the final mapping of the covariates was agreed upon and a case–control study was initiated using the same pipeline. This OCTOPUS infrastructure has proven its value as an RRE. In the future, the server-based version of OCTOPUS might be moved to a cloud environment to allow more flexibility in the amount of available resources and to run more projects in parallel. Future perspectives and challenges Several international initiatives have shown the great potential of combining multiple healthcare databases for the postmarketing assessment of drug and vaccine safety in clinical practice. Nevertheless, integration of single databases with differences in type of collected data and drug and disease terminologies as well as underlying population and healthcare systems is extremely challenging and requires local expertise. Recent experience in the USA and Europe has provided examples of how to extract, handle and store harmonized data from different databases, which

Fig. 3 The OCTOPUS remote research environment. ª 2014 The Association for the Publication of the Journal of Internal Medicine Journal of Internal Medicine, 2014, 275; 551–561

559

G. Trifir o et al.

work models should be adopted for the analyses and how to interpret results, when carrying out a multidatabase safety studies. The combination of multiple databases is particularly promising for the assessment of vaccine safety, which requires rapid results, as well as for the study of rare diseases or rare drug exposure, which necessitates very large study populations to gain enough statistical power. On the other hand, further efforts are needed to optimize and standardize the whole process of data extraction, handling, analysis and interpretation, whilst taking care to adhere to national privacy legislations, when a multidatabase safety study is conducted. Projects, such as EU-ADR, Sentinel, OMOP and PROTECT have also demonstrated the great potential of combining multiple healthcare databases not only for drug safety studies but also in the area of signal detection and refinement, complementing traditional SRSs. In the near future, with the finalization of all the ongoing multidatabase studies, a large body of scientific evidence of the benefits as well as the problems and related solutions with regard to methodological issues concerning multidatabase safety studies will be generated. Conflict of interest statement Miriam Sturkenboom is running a research group that occasionally performs studies for pharmaceutical companies according to unconditional grants These companies include AstraZeneca, Pfizer, Lilly and Boehiinger. She has also been a consultant to Pfizer Novartis, Consumer Health, Servier, Celgene and Lundbeck on issues not related to the study. The other authors have no conflicts of interest to declare. Acknowledgements We thank all the donor agencies and participants of the above mentioned projects for the extraordinary collaborative achievements.

References 1 Goldman SA. Limitations and strengths of spontaneous reports data. Clin Ther 1998; 20(Suppl. C): C40–4. 2 Stephenson WP, Hauben M. Data mining for signals in spontaneous reporting databases: proceed with caution. Pharmacoepidemiol Drug Saf 2007; 16: 359–65.

560

ª 2014 The Association for the Publication of the Journal of Internal Medicine Journal of Internal Medicine, 2014, 275; 551–561

Review: Combining multiple healthcare databases

3 Molokhia M, Tanna S, Bell D. Improving reporting of adverse drug reactions: systematic review. Clin Epidemiol 2009; 1: 75–92. 4 Hennessy S. Use of health care databases in pharmacoepidemiology. Basic Clin Pharmacol Toxicol 2006; 98: 311–13. 5 Garcia Rodriguez LA, Perez Gutthann S. Use of the UK General Practice Research Database for pharmacoepidemiology. Br J Clin Pharmacol 1998; 45: 419–25. 6 Suissa S, Garbe E. Primer: administrative health databases in observational studies of drug effects–advantages and disadvantages. Nat Clin Pract Rheumatol 2007; 3: 725–32. 7 Chen RT, Glasser JW, Rhodes PH et al. Vaccine Safety Datalink project: a new tool for improving vaccine safety monitoring in the United States. The Vaccine Safety Datalink Team. Pediatrics 1997; 99: 765–73. 8 Platt R, Wilson M, Chan KA, Benner JS, Marchibroda J, McClellan M. The new Sentinel Network–improving the evidence of medical-product safety. N Engl J Med 2009; 361: 645–7. 9 Robb MA, Racoosin JA, Worrall C, Chapman S, Coster T, Cunningham FE. Active surveillance of postmarket medical product safety in the Federal Partners’ Collaboration. Med Care 2012; 50: 948–53. 10 Platt R, Carnahan RM, Brown JS et al. The U.S. Food and Drug Administration’s Mini-Sentinel program: status and direction. Pharmacoepidemiol Drug Saf 2012; 21(Suppl. 1): 1–8. 11 Nguyen M, Ball R, Midthun K, Lieu TA. The Food and Drug Administration’s Post-Licensure Rapid Immunization Safety Monitoring program: strengthening the federal vaccine safety enterprise. Pharmacoepidemiol Drug Saf 2012; 21(Suppl. 1): 291–7. 12 Stang PE, Ryan PB, Racoosin JA et al. Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann Intern Med 2010; 153: 600–6. 13 Coloma PM, Schuemie MJ, Trifiro G et al. Combining electronic healthcare databases in Europe to allow for large-scale drug safety monitoring: the EU-ADR Project. Pharmacoepidemiol Drug Saf 2011; 20: 1–11. 14 Valenciano M, Ciancio B. I-MOVE: a European network to measure the effectiveness of influenza vaccines. Euro Surveill 2012; 17: 1–2. 15 Dieleman J, Romio S, Johansen K, Weibel D, Bonhoeffer J, Sturkenboom M. Guillain-Barre syndrome and adjuvanted pandemic influenza A (H1N1) 2009 vaccine: multinational case-control study in Europe. BMJ 2011; 343: d3908. 16 Wijnans L, Lecomte C, de Vries C et al. The incidence of narcolepsy in Europe: before, during, and after the influenza A(H1N1)pdm09 pandemic and vaccination campaigns. Vaccine 2013; 31: 1246–54. 17 Suissa S, Henry D, Caetano P et al. CNODES: the Canadian Network for Observational Drug Effect Studies. Open Med 2012; 6: e134–40. 18 Schadt EE. The changing privacy landscape in the era of big data. Mol Syst Biol 2012; 8: 612. 19 Meyboom RH, Egberts AC, Gribnau FW, Hekster YA. Pharmacovigilance in perspective. Drug Saf 1999; 21: 429–47. 20 Re-use of Mini-Sentinel data following rapid assessments of potential safety signals using customizable modular programs. Available from: http://www.mini-sentinel.org/work_ products/Statistical_Methods/Mini-Sentinel_Methods_Reuse-of-Mini-Sentinel-Data.pdf.

G. Trifir o et al.

21 VAESCO Consortium. Narcolepsy in association with pandemic influenza vaccination – a multi-country European epidemiological investigationSeptember 2012: Available from: http://www.ecdc.europa.eu/en/publications/publications/ vaesco%20report%20final%20with%20cover.pdf. 22 McClellan M. Drug safety reform at the FDA–pendulum swing or systematic improvement? N Engl J Med 2007; 356: 1700–2. 23 Hammond IW, Gibbs TG, Seifert HA, Rich DS. Database size and power to detect safety signals in pharmacovigilance. Expert Opin Drug Saf 2007; 6: 713–21. 24 Coloma PM, Trifiro G, Schuemie MJ et al. Electronic healthcare databases for active drug safety surveillance: is there enough leverage? Pharmacoepidemiol Drug Saf 2012; 21: 611–21. 25 Andrews N, Stowe J, Miller E et al. A collaborative approach to investigating the risk of thrombocytopenic purpura after measles-mumps-rubella vaccination in England and Denmark. Vaccine 2012; 30: 3042–6. 26 Blake KV, Devries CS, Arlett P, Kurz X, Fitt H. Increasing scientific standards, independence and transparency in postauthorisation studies: the role of the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance. Pharmacoepidemiol Drug Saf 2012; 21: 690–6. 27 Sturkenboom MC, Verhamme KM, Nicolosi A et al. Drug use in children: cohort study in three European countries. BMJ 2008; 337: a2245. 28 Mini-Sentinel. Overview and Description of the Common Data Model v2.1. 2012; Available from: http://www.mini-sentinel.

Review: Combining multiple healthcare databases

29

30

31

32

33 34

org/work_products/Data_Activities/Mini-Sentinel_CommonData-Model_v2.1.pdf. Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc 2012; 19: 54–60. OMOP. Observational Medical Outcomes Partnership Common Data Model Specifications Version 4.0. Available from: http://omop.fnih.org/CDMvocabV4. Reich C, Ryan PB, Stang PE, Rocca M. Evaluation of alternative standardized terminologies for medical conditions within a network of observational healthcare databases. J Biomed Inform 2012; 45: 689–96. Trifiro G, Pariente A, Coloma PM et al. Data mining on electronic health record databases for signal detection in pharmacovigilance: which events to monitor? Pharmacoepidemiol Drug Saf 2009; 18: 1176–84. Lindberg DA, Humphreys BL, McCray AT. The unified medical language system. Methods Inf Med 1993; 32: 281–91. Avillach P, Coloma PM, Gini R et al. Harmonization process for the identification of medical events in eight European healthcare databases: the experience from the EU-ADR project. J Am Med Inform Assoc 2013; 20: 184–92.

Correspondence: Gianluca Trifir o, MD, PhD, Department of Medical Informatics, Erasmus University Medical Center, Dr Molewaterplein 50, 3015 GE Rotterdam, the Netherlands. (fax: +31-10-7044722; e-mail: [email protected]).

ª 2014 The Association for the Publication of the Journal of Internal Medicine Journal of Internal Medicine, 2014, 275; 551–561

561

Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how?

A growing number of international initiatives (e.g. EU-ADR, Sentinel, OMOP, PROTECT and VAESCO) are based on the combined use of multiple healthcare d...
616KB Sizes 0 Downloads 3 Views