Theory and methods

Use of administrative medical databases in population-based research Natalie Gavrielov-Yusim, Michael Friger Department of Epidemiology and Biostatistics, Ben-Gurion University of the Negev, Beer-Sheva, Israel Correspondence to Natalie Gavrielov-Yusim, Department of Epidemiology and Biostatistics, Ben-Gurion University of the Negev, P.O.B 653, Beer-Sheva 8410501, Israel; [email protected] Received 15 April 2013 Revised 17 October 2013 Accepted 23 October 2013 Published Online First 18 November 2013

ABSTRACT Administrative medical databases are massive repositories of data collected in healthcare for various purposes. Such databases are maintained in hospitals, health maintenance organisations and health insurance organisations. Administrative databases may contain medical claims for reimbursement, records of health services, medical procedures, prescriptions, and diagnoses information. It is clear that such systems may provide a valuable variety of clinical and demographic information as well as an on-going process of data collection. In general, information gathering in these databases does not initially presume and is not planned for research purposes. Nonetheless, administrative databases may be used as a robust research tool. In this article, we address the subject of public health research that employs administrative data. We discuss the biases and the limitations of such research, as well as other important epidemiological and biostatistical key points specific to administrative database studies.

INTRODUCTION

To cite: Gavrielov-Yusim N, Friger M. J Epidemiol Community Health 2014;68:283–287.

Observational epidemiologic studies seek to acquire data on sufficiently large and representative sample of subjects, which can be analysed to provide meaningful, valid and generalisable findings. In most cases, a field study that fulfils all these criteria requires considerable infrastructure including participants’ recruitment, examination, follow-up and the storage of specimens or other study material. Since scientific research is often limited in resources, cost-effective alternatives of traditional observational studies are needed. Administrative databases are massive repositories of data collected in healthcare for various purposes. Such databases are mainly maintained in hospitals, health maintenance organisations and health insurance organisations. Administrative data may include claims for reimbursement, records of health services, medical procedures, prescriptions and diagnoses information. It is therefore obvious that administrative databases provide a variety of already stored data with an on-going collection process. In addition to that, they may provide an infrastructural basis for new data collection, which was not originally planned in the system, with minimal investment of logistics and time. In general, information gathering in such databases is not intently planned for research purposes. Most of the administrative databases were originally set up as monitoring tools for health policymakers’ use. Their primary use was to track healthcare systems’ activity from an administrative and financial point of view. Consequently, administrative databases differ from other medical data

repositories, such as electronic health records. Whereas the former chiefly intends to store financial and administrative information for medical insurers’ and providers’ use, the latter is mainly used by clinicians to document patients’ clinical condition. In the upcoming sections we will explain which research fields can benefit from administrative databases and what are the key points and pitfalls of such studies.

RESEARCH FIELDS MAKING USE OF ADMINISTRATIVE DATABASES In the last two decades many epidemiology-related fields have adopted administrative databases as their main choice of data source (see examples in table 1). Pharmacoepidemiology is one of the fields that seem to have greatly benefitted from this trend. Pharmacoepidemiology deals with low-frequency or long-term adverse events of drugs and vaccines and requires considerable, sometimes enormous, sample sizes, as well as extended follow-up periods. In general, case-control or cohort designs are applicable in this field. However, case-control studies in pharmacoepidemiology have been strongly criticised,1 due to their high susceptibility to selection and recall biases. Moreover, case-control studies do not estimate the absolute risk or the incidence of adverse outcomes in the population.2 Using administrative databases allows longitudinal design, incidence calculation, large sample size and robust power even for very rare events, along with a relatively short and inexpensive study design. Two other research types that can benefit from using administrative databases are studies requiring massive data collection, such as large epidemiologic surveys and epidemiologic surveillance. Surveys, also called cross-sectional studies, reveal a pointprevalence of risk factors and outcomes in the population. When a research question requires an investigation of the disease’s trend, the surveys must be consecutively repeated in several points in time. This demands an elaborate setup, including facilities, as well as recruitment and training of a special team. In view of their time-consuming and finance-consuming nature, surveys are often applied to investigate most prevalent health problems, with greater public interest (eg, obesity, smoking, education, violence).3–5 Accordingly, it might be difficult to justify and dedicate similar resources for surveying less prioritised outcomes and diseases. Furthermore, survey results may be strongly affected by self-report and non-participation, which can lead to information and selection bias, respectively. Another major shortcoming of pointprevalence studies is that they detect associations between epidemiologic factors and do not reveal

Gavrielov-Yusim N, et al. J Epidemiol Community Health 2014;68:283–287. doi:10.1136/jech-2013-202744

283

Theory and methods Table 1 Examples of medical databases used in research Example publications

Database, country

Database description

Population covered

Medicaid, USA

Administrative database of medical insurer Administrative database of medical insurer and provider Administrative database of health maintenance organisation Clinical database of general practitioners Clinical database of hospitals

Recipients of social welfare

11

reviewed in

2 12

US veterans

13

reviewed in

12

Members of the KP health plan (largely representative of their communities) 8% of the UK population (representative of the source population) All hospital admissions funded by the National Health Service in England 99% of the Saskatchewan population

14

reviewed in

12 15

16

reviewed in

2 12 15

reviewed in

2 12

53% of Israel’s population

19

Nationwide coverage

Reviewed and exemplified in2 20

Veteran Affairs Clinical Database, USA Kaiser Permanente (KP), USA General Practice Research Database (GPRD); now -Clinical Practice Research Datalink (CPRD), UK Hospital Episode Statistics (HES), UK Saskatchewan Health Services Database, Canada Clalit Health Services Database, Israel National Prescription Databases, Nordic countries (Finland, Denmark, Sweden, Iceland, Norway)

Administrative database of provincial health plan Administrative database of health maintenance organisation Pharmaceutical national databases

causative relationships between them. Consecutive pointprevalence studies may provide the desired causality insight. However, large surveys, as a rule, are not able to collect data from the same participants, and cover each time a different representative sample of the population. Therefore, this longitudinal perspective is ecologic and not individual-based. Using administrative databases as data source can resolve most of these issues since it provides a variety of medical and personal data, with individual follow-up, and does not require the personal involvement of study participants. Active surveillances are executed while studying dynamically changing epidemiologic trends based on longitudinal data. This design requires an active ongoing data collection in prespecified time intervals. It also demands considerable resource dedication and is usually undertaken for a selected group of specially prioritised epidemiologic issues. Most often this tool is used to investigate infectious diseases’ trends.6–10 Administrative databases can provide a suitable cost-effective alternative of these resource-demanding studies. Although administrative databases have many advantages serving epidemiologic research well, they also have limitations which must be considered in the process of planning, executing and interpreting research findings. The next sections describe these limitations and the ways they can be avoided or treated.

INFORMATION BIAS IN ADMINISTRATIVE DATABASES RESEARCH In general, administrative data research suffers from the same biases as field studies. These biases may be broadly classified as belonging to either information or selection bias family. This section deals with an information bias which occurs due to imperfect data collection within administrative database, and is mainly expressed in misclassification of the research exposure, outcome or both.21 To simplify, we will first address the issue of outcome misclassification. The general formulation of this problem is—when using administrative data to study a certain disease as an outcome, how do we make sure that the condition documented in patients’ records is indeed a suitable representative of the disease under investigation? Misclassification of the outcome may be a result of erroneous or unclear clinical documentation as well as a result of misdiagnosis. The most trivial example for 284

17

18

this subject is identifying influenza cases using administrative data. The clinical definition of influenza-like illness is very wide and may include a range of ailments caused by various respiratory or other viruses.22 To maximise validity in such settings, a meticulous case ascertainment must precede data analysis. Specifically in the influenza example, in addition to retrieving explicit ‘influenza’ diagnoses, other conditions, listed in the standardised case definition of influenza, as defined by WHO,22 must be retrieved and used in outcome definition. Exposure misclassifications, for example, misclassified use of medication, have been previously addressed in the literature.23 Medication use is improperly documented in most administrative databases if drug purchase is not reimbursed. Such is the case with privately purchased medications, drugs under restrictive coverage policies23 and over-the-counter drugs.20 Another challenge in pharmaceutical exposure is the issue of treatment adherence and compliance. Pharmaceutical databases provide information regarding whether a patient received a drug prescription or purchased it. However, this gives little indication regarding whether the patient indeed ingested the drug and in what dosage.20 These issues may be resolved using either an additional data source, or by running a pilot study on a group of participants, from whom relevant information may be received through personal interviews. Given that the pilot will include a large enough and representative group, it should provide sufficient basis for assumption on missing data in the entire dataset. Information bias may often arise while using administrative databases in countries with coexisting universal public and private healthcare systems. In Israel, where all citizens are covered by national insurance, there is a growing trend with the insured preferring private clinics over public medicine.24 However, private visits are not always registered in the main healthcare system, depending on the type of clinic and consultation. This way, retrieving data on medical visits will lead to erroneous measurement of visits number. Because even though the participants are sampled correctly, there is a certain subset of them that supplies partial or flawed information. The challenges listed here may not always be easy to resolve, since administrative databases are not initially designed for research. The quality of information in these systems depends greatly on specific incentives of data reporting, the most prevalent of which is financial. In other words, information in

Gavrielov-Yusim N, et al. J Epidemiol Community Health 2014;68:283–287. doi:10.1136/jech-2013-202744

Theory and methods administrative databases is most accurately represented when it has important administrative or financial implications. As a result, expensive medical procedures are documented better than less costly, but nonetheless clinically important, health interventions. Bearing this in mind provides a perspective of which studies may be performed based on administrative data alone, and which studies require complementary data sources, such as pilot-scale field investigation performed on a subset of study participants. Such pilot will provide an insight on the validity of variables retrieved from the database and on their usefulness in the investigation. Alternatively, researches may try to construct surrogate variables, which validly represent the study outcome/exposure, but unlike it have a stronger administrative or financial implication, and therefore, have a better chance to be well represented in the database. For example, sometimes the study population may be easier and more accurately captured using the number of patients ‘treated for the disease’ (data retrieved from reimbursed prescription database) rather than the number of ‘diagnosed patients’ (data retrieved from administratively reported diagnoses). Depending on the purpose and process of the diagnoses report, these data may vary by reporting clinic, year and type of healthcare insurer and provider. However, the documentation of drug purchase and reimbursement is fairly stable and uniform. It is important to stress that such method of variable definition should always be preceded by a preliminary validation study, which must demonstrate the degree of correlation between the variable and its surrogate. This issue is further complicated by the fact that clinical guidelines and definitions may change, as well as the clinical coding system, which may be transformed with time (eg, ICD9 substituted with ICD10 codes). There is no way to avoid such complications, but to accurately state in the methods section the system and the definitions used for case identification in the study. In view of this problem, lately much of the scientific work on administrative databases is being dedicated to developing uniform clinical algorithms which are intended to identify patients with certain diagnosis with maximal accuracy.25–27 Another type of information bias detected in many database studies, and thoroughly discussed in this context by Prof Suissa,12 28 is called ‘immortal time bias’. As explained in the referenced publications, this bias occurs because a certain initial interval of the follow-up period is erroneously classified as exposed while in fact being unexposed. Therefore, this interval called immortal time, adds guaranteed protected survival time to the exposed group and systematically distorts causal associations. This bias inflates the survival of the treated group and overestimates the protective effect of treatment.28 Although this bias is not intrinsic to administrative data studies, it often appears in them. Several solutions have been proposed for dealing with this problem. First, immortal bias can be avoided by carefully defining the follow-up period during data retrieval from the database. Specifically, the definition of index time for the exposed and unexposed cohorts must be equivalent.12 Correcting for immortal time bias is also possible during the statistical analysis, using person-time modelling where the immortal period is classified as unexposed period.28 Alternatively, Cox proportional hazard may be used, where the exposure is modelled in a time-dependent manner.

aggregated form is patients’ privacy. In particular, area-level data maintenance is applied for data of personal character, such as income, race and ethnicity. Aside from ethical and privacy issues, this type of personal data is not always practically attainable.29 To explain the nature of area-level data, we will use a patient’s personal income as an example. This parameter is not routinely collected in administrative databases. However, patients’ socioeconomic status (SES) may be inferred from the general sociodemographic composition of their area of residence. The method of data imputation, which is used to produce an SES proxy based on patients’ area of residence, is called ‘geocoding’.29 This technique has been proposed and practiced for imputation of ethnicity and SES in the administrative databases.29 30 Essentially, geocoding is used to link between two sets of data. The first one is the list of patients and their addresses, derived from the administrative database. The second is census-derived information, such as rates of poverty, levels of education and employment, ethnic and racial composition, on geographically defined areas of residence. Based on the combination of these data, the patients’ SES is inferred. The smaller the chosen area of residence is (city, zip code area, census tract, neighbourhood), the more homogenous it is in terms of its sociodemographic composition. Area-level data is considered to be a major limitation of administrative database studies, mainly due to the possibility of ecological fallacy and risk factor misclassification introduced into analysis by aggregate statistics.31 Nonetheless, in the absence of individual-level data, aggregate data is considered as an acceptable and valid proxy.32 Moreover, in certain study types, area-level information is more useful and valuable than individual data. For example, in studies focusing on health disparities, area-level characteristics provide more than an estimate of participants’ SES, but also add the sociodemographic context they reside in. This type of information entails the concentration of poverty in the given neighbourhood, the accessibility of medical services and other environmental attributes that cannot be reduced to an individual level.32–34 Additionally, statistical methodology that offers a systematic improvement of the ecologic fallacy has been proposed by several authors.35 36 These methods combine ecological data with small samples of individual-level exposures and outcomes. The biggest shortcoming of this methodology is that linking ecological and individual data may be very challenging in practice.35 36 Currently, attempts are being made to improve this methodology and facilitate its application in practice. In the context of area-level data it is also important to mention the significance of a multilevel analytical approach. Multilevel or hierarchical modelling allows the simultaneous analysis of the higher-level and lower-level data units, such as area-level and individual variables, respectively.37 Administrative databases provide an abundance of information that differs by its type and nature. In order to model such data correctly, it is important to distinguish between the group and individual sources of variability. Models that combine individual variables (eg, patients, students) nested within group variables (eg, neighbourhoods, schools), should be constructed using a multilevel approach. For example, while modelling the variation of treatment compliance in the population, patient-level and physicianlevel (or clinic-level) variation should be accounted for.

AREA-LEVEL DATA

GENERALISABILITY

Area-level data, otherwise named group-level or aggregate data, are ecologic type of information collected for individuals in administrative databases. The reason that some data is only provided in

One of the common claims made against administrative data concerns the generalisability, or the external validity of study results. Often times, researchers believe that in order to be

Gavrielov-Yusim N, et al. J Epidemiol Community Health 2014;68:283–287. doi:10.1136/jech-2013-202744

285

Theory and methods considered credible and meaningful, cohort study results must be applicable to large geographically defined populations (ie, residents of a country from which the study group had been drawn). However, this demand is somewhat misleading. Geographical residence is only one of the many characteristics, by which study populations may be defined. Conceptually, ethnicity or SES, may serve as population-defining factors as well. As long as the internal validity of the study has been optimised, the results of the analysis will be valid and applicable to the population under investigation. Accordingly, if the researchers are aware of the characteristics defining the population captured by their databases, there should be no difficulties or mistakes in the interpretation of findings. Therefore, prior to conducting research on administrative data, investigators should find out the sociodemographic composition of the population described by their database, and carefully define the group to which study conclusions will be applicable.

STATISTICAL ISSUES IN ADMINISTRATIVE DATABASES RESEARCH As was previously mentioned, administrative databases provide massive sample sizes. This section focuses on the specific features of statistical inference in large datasets. The first methodological issue characteristic of large samples is related to Hosmer–Lemeshow (HL) test, which is used to assess goodness-of-fit in logistic regression.38 This test divides the sample into a number of groups, with most statistical packages using 10 groups as a default. The null hypothesis of the test states that the model fits the data. Significant departures from the tested model indicate that the regression is poorly fitted. As with any statistical test, the power of the HL procedure increases with sample size. Therefore, in large samples even small departures from the proposed model will be considered significant, demanding rejection of the regression model. In this case, the effect of high power is undesirable, since the likelihood of model rejection must be independent of sample size. In view of this problem, the authors of the original HL test have recently provided a set of recommendations, which explain how to account for sample size in the HL test, and how to eliminate its influence on the model fit estimation.39 Another subject, which has to be considered when working with large datasets, is that of the statistical significance and its interpretation. The sole use of statistical significance in hypothesis testing and results interpretation had been criticised in the past.40 41 At present, the idea that clinical relevance must be demonstrated along with statistical significance of the association had already become a consensus. However, in large-scale database studies, this issue is even more noteworthy than in clinical field research. In general, given a large enough dataset, all comparisons may yield statistically significant differences, even those of lowest magnitude. Therefore, using statistical significance as a discriminatory factor in model construction becomes in many cases impractical. As a result, clinical relevance of the effect size is extremely important in outcome-modelling within large datasets, and sometimes may replace altogether the bivariate pretesting of risk factors with the outcome. In view of the fact that clinical importance cannot always be prespecified, there are several methodological options which are designed to help investigators assess and define it.42–44

CONFOUNDING IN ADMINISTRATIVE DATABASES RESEARCH Confounding in administrative database research is largely similar to that found in other epidemiologic investigations. 286

Nonetheless, there is an issue related to residual confounding, which deserves special attention in the context of database studies. Due to robust design, the statistical estimates of association derived from database studies tend to have very narrow CIs. Theoretically, this indicates high precision of the revealed associations. However, a dataset retrieved from administrative database often lacks covariates, which may be critical in the model adjustment in a given investigation. Although administrative data may have valuable information on main risk factors and outcomes of the investigated subject, it will not necessarily provide all the required confounders. In administrative databases, where data collection by definition is not fine tuned for any specific research, we use whatever data is available, rather than required data. In such cases, statistical models may be insufficiently adjusted and suffer from considerable residual confounding. This combination of highly precise but confounded results poses a special hazard of a biased findings’ interpretation. Sometimes surrogate variables found in administrative databases may effectively substitute the missing covariates. For example, the parameter of health-mindedness may be represented by a combination of participants’ use of preventive medical services, such as immunisation, participation in community health programmes, visiting dietician and so on.

DATA LINKAGE IN ADMINISTRATIVE DATABASE RESEARCH As was previously mentioned, many of the methodological issues in database studies may be resolved by joining administrative data to another data source by means of record linkage. Such data linkage creates a more comprehensive and integrated dataset. In some studies, this step is not merely beneficial but crucial. For instance, in the USA, the low-income elderly and disabled may be covered by Medicare and Medicaid. Thus, data on some outcomes and exposures have to be received from both sources. Record linkage must be anonymous and individual. It requires the presence of personal identifiers in original databases, which must be removed after linkage. In some databases, for example, in Nordic countries or in Israel, this task is fairly straightforward, because patients are registered using a unique personal identification number, which every citizen receives at birth or upon immigration.20 In databases not employing a uniform individual identifier, this task is much more challenging and may require an elaborate linkage algorithm. To summarise, administrative databases may serve as a potent tool in public health research. Being aware of the pitfalls specific to this data source will assist researches to attain a valid and effective study design. Additional research is required to adjust epidemiologic and statistical methodology to administrative database investigations.

What is already known on this subject ▸ Nowadays, in the era of computerisation, most systems, medical and others, are being set up for digital information collection and storage. ▸ Due to this data availability, using massive databases in epidemiological research is becoming increasingly popular. ▸ The usefulness of administrative databases in epidemiological study has been recognised and demonstrated during the past years.

Gavrielov-Yusim N, et al. J Epidemiol Community Health 2014;68:283–287. doi:10.1136/jech-2013-202744

Theory and methods 16

What this study adds 17

We attempted to construct a framework of epidemiological and statistical methodological highlights specific to database research.

18

19 20

Policy implications 21

Using this framework may help investigators to avoid the common pitfalls of administrative database research and benefit from the many advantages that this tool can offer.

22

23

Funding This work was supported by a stipend from the Israel National Institute for Health Policy Research.

24

Contributors NG-Y conceptualised, designed, drafted the initial manuscript and approved the final manuscript as submitted. MF critically reviewed and revised the manuscript, and approved the final manuscript as submitted. Both authors are responsible for the overall content of the manuscript.

25

Competing interests None.

26

Provenance and peer review Not commissioned; externally peer reviewed. 27

REFERENCES 1 2 3

4

5

6 7

8

9

10

11 12

13 14

15

Farrington CP, Nash J, Miller E. Case series analysis of adverse reactions to vaccines: a comparative evaluation. Am J Epidemiol 1996;143:1165–73. Ehrenstein V, Sorensen HT, Bakketeig LS, et al. Medical databases in studies of drug teratogenicity: methodological issues. Clin Epidemiol 2010;2:37–43. Margerison-Zilko C, Cubbin C. Socioeconomic disparities in tobacco-related health outcomes across race/ethnic groups in the United States: National Health Interview Survey 2010. Nicotine Tob Res 2013;15:1161–5. Vozoris NT, O’Donnell DE. Prevalence, risk factors, activity limitation and health care utilization of an obese, population-based sample with chronic obstructive pulmonary disease. Can Respir J 2012;19:e18–24. Comino EJ, Hermiz O, Flack J, et al. Using population health surveys to provide information on access to and use of quality primary health care. Aust Health Rev 2006;30:485–95. Cowling CS, Povovic G, Liu BC, et al. Australian trachoma surveillance annual report, 2010. Commun Dis Intell 2012;36:E242–50. Zhou W, Pool V, Iskander JK, et al. Surveillance for safety after immunization: Vaccine Adverse Event Reporting System (VAERS)—United States, 1991–2001. MMWR Surveill Summ 2003;52:1–24. Falkenhorst G, Simonsen J, Ceper TH, et al. Serological cross-sectional studies on salmonella incidence in eight European countries: no correlation with incidence of reported cases. BMC Public Health 2012;12:523. Simonsen J, Teunis P, van Pelt W, et al. Usefulness of seroconversion rates for comparing infection pressures between countries. Epidemiol Infect 2011;139:636–43. Iwane MK, Edwards KM, Szilagyi PG, et al. Population-based surveillance for hospitalizations associated with respiratory syncytial virus, influenza virus, and parainfluenza viruses among young children. Pediatrics 2004;113:1758–64. Bobo WV, Cooper WO, Stein CM, et al. Antipsychotics and the risk of type 2 diabetes mellitus in children and youth. JAMA Psychiatry 2013;70:1067–75. Suissa S, Garbe E. Primer: administrative health databases in observational studies of drug effects—advantages and disadvantages. Nat Clin Pract Rheumatol 2007;3:725–32. Virani SS, Woodard LD, Wang D, et al. Correlates of repeat lipid testing in patients with coronary heart disease. JAMA Intern Med 2013;173:1439–44. Richards JL, Hansen C, Bredfeldt C, et al. Neonatal outcomes after antenatal influenza immunization during the 2009 H1N1 influenza pandemic: impact on preterm birth, birth weight, and small for gestational age birth. Clin Infect Dis 2013;56:1216–22. Strom Brian L. Pharmacoepidemiology. 4th edn. Chichester: John Wiley & Sons Ltd, 2005.

28 29 30

31 32 33

34 35 36 37 38 39

40 41 42

43 44

Tsang C, Bottle A, Majeed A, et al. Cancer diagnosed by emergency admission in England: an observational study using the general practice research database. BMC Health Serv Res 2013;13:308. Neuburger J, Macneill F, Jeevan R, et al. Trends in the use of bilateral mastectomy in England from 2002 to 2011: retrospective analysis of hospital episode statistics. BMJ Open 2013;3:e003179. Meng X, D’Arcy C, Tempier R, et al. Survival of patients with incident dementia who had a pre-existing psychiatric disorder: a population-based 7-year follow-up study. Int J Geriatr Psychiatry 2012;27:683–91. Gavrielov-Yusim N, Battat E, Neumann L, et al. Birth order and private voluntary immunization—a study of 110,902 children. Vaccine 2012;30:442–7. Furu K, Wettermark B, Andersen M, et al. The Nordic countries as a cohort for pharmacoepidemiological research. Basic Clin Pharmacol Toxicol 2010;106:86–94. Szklo M, Nieto J. Epidemiology: beyond the basics. Sudbury, MA: Jones and Bartlett Publishers, 2004. WHO global technical consultation: global standards and tools for influenza surveillance. 2011. http://whqlibdoc.who.int/hq/2011/WHO_HSE_GIP_2011.1_eng. pdf Gamble JM, McAlister FA, Johnson JA, et al. Restrictive drug coverage policies can induce substantial drug exposure misclassification in pharmacoepidemiologic studies. Clin Ther 2012;34:1379–86 e3. Greenfield G, Pliskin JS, Wientroub S, et al. Orthopedic surgeons’ and neurologists’ attitudes towards second opinions in the Israeli healthcare system: a qualitative study. Isr J Health Policy Res 2012;1:30. Goldberg DS, Lewis JD, Halpern SD, et al. Validation of a coding algorithm to identify patients with hepatocellular carcinoma in an administrative database. Pharmacoepidemiol Drug Saf 2013;22:103–7. Goldberg D, Lewis J, Halpern S, et al. Validation of three coding algorithms to identify patients with end-stage liver disease in an administrative database. Pharmacoepidemiol Drug Saf 2012;21:765–9. Curtis JR, Chen SY, Werther W, et al. Validation of ICD-9-CM codes to identify gastrointestinal perforation events in administrative claims data among hospitalized rheumatoid arthritis patients. Pharmacoepidemiol Drug Saf 2011;20:1150–8. Suissa S. Immortal time bias in pharmaco-epidemiology. Am J Epidemiol 2008;167:492–9. Fiscella K, Fremont AM. Use of geocoding and surname analysis to estimate race and ethnicity. Health Serv Res 2006;41(4 Pt 1):1482–500. Creation of new race-ethnicity codes and socioeconomic status indicators for medicare beneficiaries. 2008. http://www.ahrq.gov/research/findings/final-reports/ medicareindicators/medicareindicators.pdf Greenland S, Robins J. Invited commentary: ecologic studies–biases, misconceptions, and counterexamples. Am J Epidemiol 1994;139:747–60. Diez-Roux AV. Bringing context back into epidemiology: variables and fallacies in multilevel analysis. Am J Public Health 1998;88:216–22. Krieger N, Chen JT, Waterman PD, et al. Choosing area based socioeconomic measures to monitor social inequalities in low birth weight and childhood lead poisoning: the Public Health Disparities Geocoding Project (US). J Epidemiol Community Health 2003;57:186–99. Schwartz S. The fallacy of the ecological fallacy: the potential misuse of a concept and the consequences. Am J Public Health 1994;84:819–24. Jackson C, Best N, Richardson S. Improving ecological inference using individual-level data. Stat Med 2006;25:2136–59. Salway R, Wakefield J. A hybrid model for reducing ecological bias. Biostatistics 2008;9:1–17. Diez Roux AV. A glossary for multilevel analysis. J Epidemiol Community Health 2002;56:588–94. Hosmer D, Lemeshow S. Applied logistic regression. Hoboken, New Jersey: John Wiley & Sons, 2000. Paul P, Pennell ML, Lemeshow S. Standardizing the power of the Hosmer-Lemeshow goodness of fit test in large data sets. Stat Med 2013;32:67–80. Borenstein M. The case for confidence intervals in controlled clinical trials. Control Clin Trials 1994;15:411–28. Borenstein M. A note on the use of confidence intervals in psychiatric research. Psychopharmacol Bull 1994;30:235–8. Kieser M, Rohmel J, Friede T. Power and sample size determination when assessing the clinical relevance of trial results by ’responder analyses’. Stat Med 2004;23:3287–305. Snapinn SM, Jiang Q. Responder analyses and the assessment of a clinically relevant treatment effect. Trials 2007;8:31. Kieser M, Friede T, Gondan M. Assessment of statistical significance and clinical relevance. Stat Med 2013;32:1707–19.

Gavrielov-Yusim N, et al. J Epidemiol Community Health 2014;68:283–287. doi:10.1136/jech-2013-202744

287

Use of administrative medical databases in population-based research.

Administrative medical databases are massive repositories of data collected in healthcare for various purposes. Such databases are maintained in hospi...
93KB Sizes 0 Downloads 0 Views