Journal of Clinical Epidemiology

-

(2015)

-

ORIGINAL ARTICLE

Propensity score matching and randomization George Mnatzaganiana,*, David C. Davidsonb, Janet E. Hillerc,d, Philip Ryand a

School of Allied Health, Faculty of Health Sciences, Australian Catholic University, Level 2, Daniel Mannix Building, 17 Young Street, Fitzroy, Victoria 3065, Australia b Royal Adelaide Hospital, North Terrace, Adelaide, South Australia 5000, Australia c School of Health Sciences, Faculty of Health, Arts and Design, Swinburne University of Technology, John Street, Hawthorn, Victoria 3122, Australia d Discipline of Public Health, School of Population Health, The University of Adelaide, North Terrace, Adelaide, South Australia 5000, Australia Accepted 5 January 2015; Published online xxxx

Abstract Objectives: We used elective total joint replacement (TJR) as a case study to demonstrate selection bias toward offering this procedure to younger and healthier patients. Study Design and Setting: Longitudinal data from 2,202 men were integrated with hospital data and mortality records. Study participants were followed from recruitment (1996e1999) until TJR, death, or 2007 (end of follow-up). A propensity score (PS) was constructed to quantify each subject’s likelihood of undergoing TJR. TJR recipients were later matched to their non-TJR counterparts by PS and year of hospitalization. Ten-year mortality from index admission was compared between cases and controls. Results: Overall, 819 (37.2%) had TJR. Those were younger, healthier, and belonged to higher socioeconomic classes compared with those who were not proposed for surgery. Of the TJR recipients, 718 were matched to 1,109 controls. Cases and controls had similar characteristics and similar years of follow-up from recruitment till index admission. Nonetheless, controls were more likely to die (39.5%) compared with 14.5% in TJR cases (P ! 0.001). Conclusion: Selection for elective procedures may introduce bias in prognostic features not accounted for by PS matching. Caution must be exercised when long-term outcomes are compared between surgical and nonsurgical groups in a population at risk for that surgical procedure. Ó 2015 Elsevier Inc. All rights reserved. Keywords: Elective total joint arthroplasty; Selection bias; Channeling bias; Propensity score matching; Confounding; Internal and external validity; Randomization

1. Introduction In nonrandomized studies, except by chance, the characteristics of patients constituting comparison groups of interest are different. The differences in characteristics between the compared nonrandomized groups (eg, surgical vs. nonsurgical) may be large, systematic, and statistically significant. Often such differences arise from clinically motivated patient selection that is not documented [1,2]. In 1983, Rosenbaum and Rubin [3] proposed a methodd propensity score (PS) analysisdas an alternative tool to adjust for confounding and reduce selection bias in such nonrandomized studies. This calculated numerical score

Funding: This study was supported by the University of Adelaide. * Corresponding author. Tel.: þ61-3-9953-3637; fax: þ61-3-99533385. E-mail address: [email protected] (G. Mnatzaganian). http://dx.doi.org/10.1016/j.jclinepi.2015.01.002 0895-4356/Ó 2015 Elsevier Inc. All rights reserved.

describes the expected likelihood for a study participant to receive a treatment (or an exposure of interest) conditional on the participant’s observed pretreatment covariates. After its construction, the score can be used to control for confounding in the comparison of outcomes between treatment groups. This can be done through regression adjustment in form of adding the PS as a covariate or a weight into an outcome model, or through stratification or matching [3e5]. In stratification, those with and without the treatment are stratified by the score’s categories (eg, quintiles), and then comparisons of outcome are run for each stratum [4]. Rosenbaum and Rubin [4] reported that stratifying on the quintiles of a PS will remove 90% of the bias because of measured confounders and risk factors when estimating a linear treatment effect. Within each stratum, if the PS has been correctly specified, those with and without a treatment of interest will have similar distribution of measured baseline covariates [6]. In the third

2

G. Mnatzaganian et al. / Journal of Clinical Epidemiology

What is new? Key findings  Propensity score (PS) matching can balance baseline characteristics of the PS-matched groups that are being compared for a study outcome.  PSs can only account for known covariates; unknown covariates cannot be accounted for.  PS matching is not equivalent to randomization.  With PS matching, we used elective total joint arthroplasty as a case study to demonstrate selection bias toward offering this procedure to younger and healthier patients.  This is the first study to demonstrate selection bias toward offering TJR to younger, healthier patients, and those belonging to higher socioeconomic classes.  Selection for elective procedures may introduce bias in prognostic features not accounted for by PS matching.  Caution must be exercised when long-term outcomes are compared between surgical and nonsurgical groups in a population at risk for that surgical procedure.

method that adjusts for measured confounding, subjects with and without the treatment of interest are matched by their respective PS, thus avoiding the stratified analyses with too many strata. However, often not everyone can be matched, which in turn reduces the sample size and power. Matching by a PS can provide balance in baseline characteristics between the PS-matched groups that are being compared [3e6]. Nonetheless, such scores only account for pretreatment observed covariates. Unobserved factors that influence selection of patients for the treatment or intervention of interest cannot be accounted for [5,7]. Thus, it has been argued that PS matching on observed variables can potentially increase the bias caused by unobserved confounders [7]. A recent PS-matched observational study compared cardiovascular outcomes in a small sample of patients with osteoarthritis (OA) who had and did not have an elective total hip or knee arthroplasty, showing that such elective surgery was associated with a significant reduction in subsequent cardiovascular events [8]. The authors argued that their findings could be explained by the improved physical activity in their 153 patients undergoing arthroplasty and also by the anticipated reduced use of potentially cardiotoxic nonsteroidal anti-inflammatory drugs [8]. However, these findings could have resulted from the initial selection

-

(2015)

-

of healthier patients to this elective procedure. Using elective primary total joint replacement (TJR) as a case study, we conducted the current analysis to explore this issue. The objectives of this present study were not only to show the positive salient features of a constructed PS by demonstrating its prominent balancing properties but also to demonstrate the main limitation in such scores, namely their inability to account for unmeasured confounders, and show that PS matching is not equivalent to randomization. Using TJR as a case study, we show how younger, healthier, and those belonging to higher socioeconomic classes are more likely to be proposed for surgery compared with older, sicker, and more socioeconomically disadvantaged patients.

2. Methods 2.1. Ethics statement Ethical approval for the study was obtained from the Human Research Ethics Committees of the University of Adelaide (H-106-2009) and WA Department of Health (AHEC EC004220). All analyses used deidentified data. The need for informed consent was waived by the ethical committees due to deidentified data being used. 2.2. Data sources and study population The study population was described previously [9e11]. Briefly, it was drawn from the Health In Men Study (HIMS) [11], which arose from a randomized population-based trial of ultrasound screening for abdominal aortic aneurysm in men aged 65e83 years living in Perth, Western Australia (WA). In 1996e1999, a total of 12,203 men attended the baseline screening and provided detailed health and other information including information on diet, alcohol consumption, a comprehensive smoking history, medications used, presence of chronic diseases, and information on physical activity and exercise. In addition, study nurses recorded blood pressure, weight, height, and waist and hip circumferences. These clinical data were then integrated with WA hospital morbidity data (HMD) [12], Cancer Registry, Mental Health Services System, and mortality records. Linkage with HMD was used to identify total hip or knee replacement procedures, presence of morbidity, and readmission [10]. All-cause mortality was ascertained through linkage with WA mortality records. The hospital morbidity database includes demographic, diagnostic, and procedural information on all patients discharged from all public and private hospitals in WA. This database allows the inclusion of up to 21 diagnoses and 11 procedure codes for each hospitalization. For each participant, any morbidity or health-related outcome was retrieved from the linked data in the period 1970 through 2007 and this enabled

G. Mnatzaganian et al. / Journal of Clinical Epidemiology

us to better account for patient comorbidities. Because our cohort was relatively elderly, all study participants had at least one admission to hospital, and because a TJR procedure always required admission to hospital, our retrospective follow-up was complete with no loss to follow-up. The HMD have 21 quality-of-data checks that are built into the provision of data from all hospitals, and there are periodic audits of random selections of hospitalassigned codes to ensure quality and validity of the data [12]. This database has also been validated [13]. In an earlier analysis, we have shown that the sensitivity and positive predictive value of TJR procedure as recorded in the electronic medical hospital records were both 0.92 and the specificity was 0.98 [13]. 2.3. Inclusion and exclusion criteria Of the 12,203 HIMS subjects, only those with a recorded diagnosis of OA were eligible to participate in this study. Attending men who had had a lower limb TJR before baseline screening were excluded. Because the focus of this analysis was elective total hip or knee replacement, we also excluded all those with a recorded diagnosis of fracture in the hip, femur, tibia, fibula, and knee. Similarly, we excluded those with traumatic multiple fractures of lower limb and nontraumatic osteoporotic or other (ie, cancer associated) pathologic fractures. For the remaining eligible subjects with OA, all admissions to both public and private hospitals in WA were captured from baseline screening (1996e1999) till 2007. 2.4. Construction of the PS Of the initially recruited 12,203 participants, 2,202 were eligible to participate in this study and were followed from baseline screening (1996e1999) until they experienced their first elective primary total hip or total knee replacement (TKR) or died or were right censored at the end of follow-up (March 2007) [9,10]. Using a Cox proportional hazards model, we calculated a PS for undergoing an elective TJR. The PS calculated for each participant was based on the following baseline covariates: age, nurse-measured body weight and height, years of smoking, vigorous and nonvigorous exercise, socioeconomic status defined by the Socio-Economic Index For Areas (SEIFA) [14], and Elixhauser’s comorbid conditions [15] that included congestive heart failure, cardiac arrhythmias, valvular disease, pulmonary circulation disorders, hypertension (with and without complications), paralysis, other-than-paralysis neurologic disorders, chronic pulmonary disease, diabetes (with and without complications), hypothyroidism, renal failure, liver disease, peptic ulcer disease excluding bleeding, AIDS/human immunodeficiency virus, lymphoma, metastatic cancer, solid tumor, rheumatoid arthritis and collagen vascular disease, coagulopathy, fluid and electrolyte disorder, blood loss anemia,

-

(2015)

-

3

deficiency anemia, substance abuse, psychoses, and depression. The PS for each study participant was calculated by multiplying the coefficient (derived from the multivariate model) by the subject-specific value for each variable in the model and then summing the resulting products [5]. The proportional hazard assumptions were tested in the multivariate model using Schoenfeld residuals [16]. 2.5. PS balance diagnostics The comparability of TJR cases and non-TJR controls in the initial unmatched and PS-matched samples was assessed by calculating standardized differences. The standardized differences, which are not influenced by sample sizes, compared continuous and binary covariates [6]. For continuous variables, the standardized difference is defined as   X t X c d 5 qffiffiffiffiffiffiffiffiffi S2t þS2c 2

where X t and X c denote the sample mean of the covariate in treated subjects (ie, with TJR) and controls (ie, without TJR), respectively, whereas S2t and S2c denote the sample variance of the covariate in the treated subjects (ie, with TJR) and controls (ie, without TJR), respectively. For dichotomous variables, the standardized difference is defined as   Pbt  Pbc d 5 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi b P t ð1 b P t Þþ b P c ð1 b PcÞ 2

where Pbt and Pbc denote the prevalence of the dichotomous variable in those with and without TJR, respectively. A standardized difference of less than the absolute value of 0.1 has been taken to indicate negligible difference in the mean or prevalence of a covariate between the compared groups [17]. 2.6. Other definitions 2.6.1. Socioeconomic disadvantage The SEIFA, which was based on data from the 1996 census for residential postcodes [14], is a composite index that ranks geographic areas across Australia in terms of their relative socioeconomic advantage and disadvantage based on census data, where lower scores indicate more disadvantaged areas and higher scores indicate more advantaged areas. The score, which has been validated by the Australian Bureau of Statistics, is constructed using a number of different variables that indicate both advantage (ie, high income, having a degree qualification) and disadvantage (ie, unemployment status,

4

G. Mnatzaganian et al. / Journal of Clinical Epidemiology

low income, not enough bedrooms). This index is frequently used in Australian epidemiologic studies where individual measures of socioeconomic status are not available. 2.6.2. Comorbidity Presence of comorbidity relied on all conditions recorded in the various linked data sources. Both ICD-9CM (Clinical Modification) and ICD-10-AM (Australian Modification) coding algorithms were used to capture all conditions recorded in the linked data sets in the period 1970 through 2007. The coding algorithms [18] used to form Elixhauser’s comorbidities are presented in Appendix A at www.jclinepi.com. For descriptive purposes, we also calculated Charlson comorbidity index as shown in Appendix B at www.jclinepi.com. Charlson index has had many adaptations. For this analysis, we used the latest adaptation introduced by Quan et al. [18], while applying the original Charlson weights to build the final score [19]. The definition of OA (also known as degenerative arthritis or degenerative joint disease or osteoarthrosis) was based on a recorded diagnosis in the linked data sets in the period 1970 through 2007. This condition was detected in the linked databases using ICD-10-AM codes (M15.-, M16.-, M17.-, M18.-, and M19.-) and ICD-9-CM codes (715.-). 2.7. Nested caseecontrol design After constructing the PS, TJR cases were randomly matched to their non-TJR counterparts belonging to the same 2,202 sample, up to three controls to one case. The matching was conducted by PS quintiles and 6 3-year admission to hospital. Index admission was defined as admission to undergo TJR for the cases and other-thanTJR admission for the controls. These matching criteria ensured that cases and controls not only had similar baseline characteristics as defined by the PS but also similar time of follow-up from baseline (1996e1999) till index admission. Rates of all-cause 5-year admission to hospital and all-cause 10-year mortality from index admission were then compared between the cases and their matched controls using McNemar’s test. A conditional logistic regression model was run to investigate study outcomes while also accounting for level of education, occupation, and hospital type (private vs. public), which were not accounted for in the PS. Statistical significance was set at a P-value of !0.05. We hypothesized that TJR cases will have better future outcomes compared with their PS-matched controls despite similar baseline characteristics among cases and controls. We also hypothesized that this disparity in future outcomes would not be explained by the measured risk factors and confounders that had been accounted for.

-

(2015)

-

3. Results Of the total 2,202 participants, 819 (37.2%) had an elective TJR. The baseline characteristics of these 819 men differed significantly from those who never had a TJR after baseline screening. The former were younger, healthier with less prevalence of comorbidities, smoked less, and were more likely to belong to a higher socioeconomic status (Table 1). The imbalance in baseline characteristics is also demonstrated by the standardized difference for 22 of the 31 covariates (71%). As reported previously, major independent risk factors for TJR included heavier weight, fewer years of smoking, and engaging in vigorous exercise [9]. Besides these, patients with diabetes, peripheral vascular disorders, valvular disease, pulmonary circulation disorders, liver disease, and malignant disease were less likely to be selected for surgery (results not shown). Based on the multivariate model, a PS was constructed for each of the 2,202 subjects to quantify each man’s chances of undergoing TJR. The calculated score ranged from 10.5 to 4.8 and had a median of 6.0. The selection for surgery of younger, healthier participants, and those belonging to higher socioeconomic classes can be seen in Figures. 1e3. Of the 819 TJR cases, 718 (88%) were matched to 1,109 controls who did not undergo TJR based on up to 1:3 matching by PS quintiles and year of admission to hospital. No significant differences were observed in the baseline characteristics of cases who were and were not matched. Matched cases and controls had similar age, Charlson comorbidity index, weight, height, body mass index, years of past exposure to smoking, socioeconomic disadvantage score, and self-reported physical activity. The balancing properties of our constructed PS can be seen in Table 2. Cases and their PS-matched controls had similar baseline characteristics among 29 of 31 covariates (94%) included in the construction of this score. The balance was also shown by the standardized differences. Two covariates remained different between the cases and their matched controls: cardiac arrhythmia and depression whose prevalence was higher among the controls. However, no systematic difference in baseline covariates between the cases and their matched controls could be detected. Years of followup from baseline (1996e1999) till index admission were also similar in both groups. Nonetheless, 5-year hospitalization following index admission was higher in the controls who were also more likely to die within 10 years of the index admission (39.5%) compared with 14.5% in the TJR cases (P ! 0.001). A multivariate conditional logistic regression that also accounted for level of education, occupation, and hospital type showed that TJR cases were 75% less likely to die within 10 years of the index admission compared with their PS-matched controls (odds ratio 5 0.25, 95% confidence interval: 0.17, 0.36, P ! 0.001).

G. Mnatzaganian et al. / Journal of Clinical Epidemiology

-

(2015)

-

5

Table 1. Baseline characteristics included in the calculation of the propensity score by TJR among unmatched initial participants with osteoarthritis who had and did not have a TJR Baseline screening (1996e1999) characteristics of participants

With TJR N [ 819

Without TJR N [ 1,383

P-value

d

71.6 (4.2) 83.0 (11.5) 171.6 (6.4) 3.2 8.9 1.3 0.7 2.4 23.8 0.9 0.9 1.5 7.5 3.4 0.9 1.0 1.0 0.5 6.8 10.1 1.6 0.9 1.6 0.4 1.3 2.3 0.4 1.3 21.9 (19.7) 1033.6 (84.8) 76.8

72.6 (4.6) 80.2 (12.6) 170.7 (6.8) 8.5 17.3 5.4 2.6 7.5 32.3 1.7 3.1 3.1 17.4 8.5 1.9 1.6 2.2 1.7 12.4 11.6 7.1 3.2 3.0 0.8 2.3 4.6 0.5 4.8 25.0 (19.9) 1023.4 (92.2) 72.3

!0.001 !0.001 0.002 !0.001 !0.001 !0.001 0.002 !0.001 !0.001 0.1 0.001 0.02 !0.001 !0.001 0.04 0.2 0.03 0.01 !0.001 0.3 !0.001 !0.001 0.04 0.2 0.1 0.01 0.6 !0.001 !0.001 0.01 0.02

0.227 0.232 0.136 0.227 0.251 0.229 0.149 0.237 0.190 0.071 0.158 0.107 0.303 0.217 0.085 0.053 0.096 0.115 0.191 0.048 0.272 0.163 0.093 0.052 0.075 0.126 0.015 0.205 0.157 0.115 0.103

Age, mean (SD) Body weight in kg, mean (SD) Body height in cm, mean (SD) Congestive heart failure, % Cardiac arrhythmia, % Valvular disease, % Pulmonary circulation disorders, % Peripheral vascular disease, % Hypertension uncomplicated, % Hypertension complicated, % Paralysis, % Other-than-paralysis neurologic disorders, % Chronic pulmonary disease, % Diabetes without chronic complications, % Diabetes with chronic complications, % Hypothyroidism, % Renal failure, % Liver disease, % Peptic ulcer disease excluding bleeding, % Solid tumor/metastatic cancer/lymphoma/AIDS/HIV, % Rheumatic arthritis/collagen vascular disease, % Coagulopathy, % Fluid and electrolyte disorder, % Chronic blood loss anemia, % Deficiency anemia, % Alcohol or drug substance abuse, % Psychosis, % Depression, % Years of smoking, mean (SD) SES disadvantage SEIFA score, mean (SD) Self-reported weekly exercise, %

Abbreviations: TJR, total joint replacement; d, standardized difference; SD, standard deviation; HIV, human immunodeficiency virus; SES, socioeconomic status as defined by the distribution of the Socio-Economic Index For Areas; SEIFA, Socio-Economic Index For Areas disadvantage score.

4. Discussion Using a population-based sample, in a retrospective cohort analysis, we compared the characteristics of those who had and did not have an elective TJR due to OA and showed that relatively healthier, younger, and those belonging to higher socioeconomic groups are more likely to be proposed for this elective procedure. In a nested case control design that matched TJR cases with non-TJR controls by a PS, we have shown that although those who did and did not have a TJR had similar age, comorbidity, and other baseline characteristics, those who had undergone the elective procedure were less likely to be admitted to hospital and to die within 10 years of index admission. Our findings suggest that selection for this elective procedure may have had prognostic features potentially making one group less or more susceptible to adverse outcomes. Channeling bias, which is a form of selection bias [20], occurs when patients are assigned to an intervention on the basis of prognostic factors [20e22]. This can arise when, for example, patients with clinical indications for an elective TJR are never proposed for elective surgery because of medical concerns regarding worse outcomes [23e25].

This potential selection for elective surgery may be present if, for instance, factors such as advanced age, psychiatric and mental problems, substance abuse, morbid obesity, or comorbidities exclude patients from undergoing the surgical procedure. Other than age or health-related factors, eligible candidates may never undergo the procedure because of long waiting lists, lack of postoperative assistance, and patient unwillingness to undergo surgery [25,26]. This willingness, which has been shown to predict TJR [26], is often related to patient perceptions regarding indications for, and risks associated with, total hip or knee arthroplasty. If these and other selection criteria have prognostic features that unequally increase/decrease risk of one group, making one group more or less susceptible to adverse outcomes, these criteria introduce a selection bias that undermines both the internal and external validities of the findings [20,21,27]. For this reason, patients with OA who have an elective TJR may considerably differ from, and may not represent, those with OA in the target population [25,26,28e31]. Such selection can also be observed in other elective procedures as reported by Ryynanen O-P et al. [2] whose study investigated doctors’

6

G. Mnatzaganian et al. / Journal of Clinical Epidemiology

-

(2015)

-

Fig. 1. Scatter plot of the distributions of the propensity score vs. age.

Fig. 3. Scatter plot of the distributions of the propensity score vs. Socio-Economic Index For Areas disadvantage score.

willingness to refer elderly persons (ages 65e85 years) to different elective surgical procedures. These authors showed that physicians were less likely to refer older and those with more comorbidities to elective procedures such as hip prosthesis, cataract, and coronary bypass elective surgical procedures. In a more recent prospective study, McNulty EJ et al. [1] compared the characteristics of patients who underwent coronary artery bypass (CABG) and patients deemed ineligible for GABG who were instead referred to percutaneous coronary intervention. The authors noted that most of the clinical conditions cited as reasons for CABG ineligibility such as frailty, dementia, or severe aortic calcifications were not captured by a database that listed all major traditional risk factors for CABG. The authors concluded that the unmeasured clinical conditions identified in the chart review of their 101 patients, as well as other unknown factors that have not been identified, could have contributed to worse outcomes among patients deemed ineligible for CABG surgery. Unlike McNulty

et al. study, limited data did not allow us to review the clinical charts of our study population. However, our findings indicate a probable selection bias. The worse clinical outcomes observed in the controls, that is, more admission and more deaths irrespective of their similar baseline characteristics with their PS-matched cases, suggest the presence of unmeasured confounders that have not been accounted for. Currently, except in traumatic fractures of the joint, there is little agreement on indications for a total lower limb arthroplasty [23e25,28e30]. The current criteria for elective TJR are based on consensus-style panel approaches among the clinical experts in the field that may partially explain the geographic variation in TJR rates reported in various countries [32e36]. Analyses of the American Medicare data have shown variations in the rates of TKR and total hip replacement (THR) across the United States, with nearly a threefold difference between the highest and lowest rate states [32]. Inpatient TKR procedure rates per 1,000 Medicare enrollees from 2008 to 2010 varied from 4.3 in San Francisco, California, to 11.9 in Salt Lake City, Utah, with a national average of 8.8. Similar variations were reported for THR ranging from 2.3 to 5.8. Such geographic variation in TJR rates has been also reported in other countries such as Australia [33], the United Kingdom [34], Finland [35], and Germany [36]. These differences, which often remain even after adjusting for age [36], have been attributed to various factors, including a lack of consensus regarding the indications for these procedures [24,32]. Individuals with end-stage joint disease may suffer from pain, stiffness, and limitation of movement of joints, but they may also present with radiographic evidence of joint disease [37]. However, not all subjects with radiographic confirmation show signs and symptoms of a clinical illness and not all those with joint pain demonstrate radiographic illness [38]. This in turn makes it harder to categorize patients to various levels of urgency in terms of their need for TJR. This uncertainty together with lack of clarity regarding indications for

Fig. 2. Scatter plot of the distributions of the propensity score vs. Charlson comorbidity index.

G. Mnatzaganian et al. / Journal of Clinical Epidemiology

-

(2015)

-

7

Table 2. Baseline characteristics included in the calculation of the propensity score by TJR among PS-matched participants with osteoarthritis who had and did not have a TJR a

Baseline screening (1996e1999) characteristics of participants

With TJR N [ 718

Without TJR N [ 1,109

P-value

d

71.6 (4.1) 81.5 (9.9) 171.9 (6.3) 2.9 8.5 1.2 0.8 2.4 22.8 1.0 0.7 1.5 7.4 3.2 0.9 0.8 1.1 0.5 7.0 9.1 1.7 0.9 1.1 0.4 1.1 2.3 0.4 1.2 21.2 (19.8) 1035.9 (84.8) 76.7

71.5 (4.3) 81.3 (11.3) 172.4 (6.7) 4.6 11.7 2.5 1.0 3.7 24.0 0.7 0.9 1.5 7.8 3.4 0.7 1.4 0.8 0.7 7.4 7.2 2.4 1.2 2.3 0.6 1.1 3.7 0.2 3.2 21.1 (19.7) 1036.1 (84.3) 79.8

0.9 0.7 0.1 0.07 0.03 0.06 0.9 0.1 0.6 0.6 0.6 0.9 0.7 0.8 0.6 0.3 0.5 0.7 0.7 0.2 0.3 0.7 0.07 0.2 0.9 0.1 0.3 0.01 0.9 0.9 0.1

0.005 0.017 0.077 0.089 0.106 0.096 0.021 0.075 0.028 0.032 0.022 0.000 0.015 0.011 0.022 0.057 0.031 0.025 0.015 0.069 0.049 0.029 0.093 0.028 0.003 0.082 0.036 0.137 0.005 0.002 0.075

Age, mean (SD) Body weight in kg, mean (SD) Body height in cm, mean (SD) Congestive heart failure, % Cardiac arrhythmia, % Valvular disease, % Pulmonary circulation disorders, % Peripheral vascular disease, % Hypertension uncomplicated, % Hypertension complicated, % Paralysis, % Other-than-paralysis neurologic disorders, % Chronic pulmonary disease, % Diabetes without chronic complications, % Diabetes with chronic complications, % Hypothyroidism, % Renal failure, % Liver disease, % Peptic ulcer disease excluding bleeding, % Solid tumor/metastatic cancer/lymphoma/AIDS/HIV, % Rheumatic arthritis/collagen vascular disease, % Coagulopathy, % Fluid and electrolyte disorder, % Chronic blood loss anemia, % Deficiency anemia, % Alcohol or drug substance abuse, % Psychosis, % Depression, % Years of smoking, mean (SD) SES disadvantage SEIFA score, mean (SD) Self-reported weekly exercise, %

Abbreviations: TJR, total joint replacement; d, standardized difference; SD, standard deviation; HIV, human immunodeficiency virus; SES, socioeconomic status as defined by the distribution of the Socio-Economic Index For Areas; SEIFA, Socio-Economic Index For Areas disadvantage score. a Matching was based on propensity score quintiles and 6 3-year period of hospital admission for TJR in cases and other-than-TJR admission in controls. Index admission was defined as admission to undergo TJR procedure in cases and other-than-TJR admission reason in controls.

this elective procedure may contribute to the selection process potentially letting other than clinical factors influence how a clinical decision is made or how a treatment strategy is planned. In such clinical settings, it is not possible to account for all factors that may have swayed the decision for or against surgery. PSs do provide some degree of balance in baseline characteristics between those who have and do not have an elective arthroplasty as also demonstrated in our study. Nonetheless, such PSs only account for observed covariates, and these cannot fully adjust for all unaccounted selection factors. Because PS matching is not equivalent to randomization, comparisons between surgical and nonsurgical patient groups, as conducted by Ravi et al. study [8], appear less valid. Strengths of our study include its population-based provenance, the longitudinal design, and the data linkage with various data sources. For each participant, any morbidity or health-related outcome was retrieved from the linked data in the period 1970 through 2007 and this enabled us to better account for patient comorbidities. Using a PS enabled us to match on multiple variables. The balancing

properties of our constructed PS that accounted for more than thirty covariates are well demonstrated. Nonetheless, our study has limitations. The prevalence of OA in our study was based on the presence of a recorded diagnosis in the hospital electronic-linked records in the period 1970 through 2007. Available data did not permit us to validate the presence of OA in our cohort; however, the detected prevalence of OA in our cohort was 20.2%, similar to that reported for adult Australians [39]. We also did not have any information on the severity of the OA and clinical indications for TJR, or access to surgery or on patient willingness to undergo the procedure.

5. Conclusion To our knowledge, our analysis is the first to demonstrate selection bias toward offering elective total hip or knee replacement to younger and healthier patients. These findings support the hypothesis that patients selected for

8

G. Mnatzaganian et al. / Journal of Clinical Epidemiology

an elective procedure such as TJR are likely to be initially healthier than the remaining eligible candidates who are not proposed for surgery. The comparison data between the cases and their PS-matched controls presented in Table 2 have most of the appearances of a randomized trial. However, unlike the randomized trial design, this study demonstrates that PS matching cannot account for the hidden selection biases and confounding by unknown factors. Therefore, caution must be exercised when long-term outcomes are compared between surgical and nonsurgical groups in a population at risk for that surgical procedure. More research is required to better understand the pathways for selection of patients for elective procedures such as total hip or knee arthroplasty.

Acknowledgments The authors acknowledge the use of information provided by the Department of Health of Western Australia (DOHWA) and the use of linkages provided by WA Data Linkage Branch. The authors thank the staff and investigators of the original Western Australian Abdominal Aortic Aneurysm screening trial and the Health In Men Study. Special thanks to Professor Paul Norman for providing the data and for supporting the concept of this analysis. Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.jclinepi.2015.01.002. References [1] McNulty EJ, Ng W, Spertus JA, Zaroff JG, Yeh RW, Ren XM, et al. Surgical candidacy and selection biases in nonemergent left main stenting: implications for observational studies. JACC Cardiovasc Interv 2011;4:1020e7. [2] Ryynanen O-P, Myllykangas M, Kinnunen J, Takala J. Doctors’ willingness to refer elderly patients for elective surgery. Fam Pract 1997; 14:216e9. [3] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41e55. [4] Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984;79:516e24. [5] Guo S, Fraser MW. Propensity score analysis. Statistical methods and applications. California: SAGE Publications; 2010. [6] Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statist Med 2009;28:3083e697. [7] Pearl J. Understanding propensity scores. In: Causality: models, reasoning, and inference. 2nd ed. New York: Cambridge University Press; 2009. [8] Ravi B, Croxford R, Austin PC, Lipscombe L, Bierman AS, Harvey PJ, et al. The relation between total joint arthroplasty and risk for serious cardiovascular events in patients with moderate-severe osteoarthritis: propensity score matched landmark analysis. BMJ 2013;347:f6187.

-

(2015)

-

[9] Mnatzaganian G, Ryan P, Norman PE, Davidson DC, Hiller JE. Smoking, body weight, physical exercise, and risk of lower limb total joint replacement in a population-based cohort of men. Arthritis Rheum 2011;63:2523e30. [10] Mnatzaganian G, Ryan P, Norman PE, Davidson DC, Hiller JE. Total joint replacement in men: old age, obesity and in-hospital complications. ANZ J Surg 2013;83:376e81. [11] Norman PE, Flicker L, Almeida OP, Hankey GJ, Hyde Z, Jamrozik K. Cohort profile: The Health In Men Study (HIMS). Int J Epidemiol 2009;38:48e52. [12] Holman CD, Bass AJ, Rouse IL, Hobbs MS. Population-based linkage of health records in Western Australia: development of a health services research linked database. Aust N Z J Public Health 1999; 23:453e9. [13] Mnatzaganian G, Ryan P, Norman PE, Hiller JE. Accuracy of the hospital morbidity data and performance of co-morbidity scores as predictors of mortality. J Clin Epidemiol 2012;65:107e15. [14] Australian Bureau of Statistics. SEIFA: Socio-Economic Indexes for Areas. Canberra, Australia: 2011: Available at http://www.abs.gov. au/ausstats/[email protected]/mf/2033.0.55.001/. Accessed December 20, 2013. [15] Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med Care 1998;36:8e27. [16] Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994;81: 515e26. [17] Normand SLT, Landrum MB, Guadagnoli E, Ayanian JZ, Ryan TJ, Cleary PD, et al. Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. J Clin Epidemiol 2001; 54:387e98. [18] Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care 2005;43:1130e9. [19] Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 1987;40:373e83. [20] Pannucci CJ, Wilkins EG. Identifying and avoiding bias in research. Plast Reconstr Surg 2010;126:619e25. [21] Grimes DA, Schulz KF. Bias and casual associations in observational research. Lancet 2002;359:248e52. [22] Delgado-Rodriguez M, Llorca J. Bias. J Epidemiol Community Health 2004;58:635e41. [23] Wright JG, Coyte P, Hawker G, Bombardier C, Cooke D, Heck D, et al. Variation in orthopedic surgeons’ perceptions of the indications for and outcomes of knee replacement. CMAJ 1995;152: 687e97. [24] Mancuso CA, Ranawat CS, Esdaile JM, Johanson NA, Charlson ME. Indications for total hip and total knee arthroplasties. Results of orthopaedic surveys. J Arthroplasty 1996;11:34e46. [25] Hudak PL, Grassau P, Glazier RH, Hawker G, Kreder H, Coyte P, et al. ‘‘Not everyone who needs one is going to get one’’: the influence of medical brokering on patient candidacy for total joint arthroplasty. Med Decis Making 2008;28:773e80. [26] Hawker GA, Guan J, Croxford R, Coyte PC, Glazier RH, Harvey BJ, et al. A prospective population-based study of the predictors of undergoing total joint arthroplasty. Arthritis Rheum 2006;54:3212e20. [27] Berger VW. Selection bias and covariate imbalances in randomized clinical trials. Chichester: John Willey & Sons; 2005. [28] Naylor CD, Williams JI. Primary hip and knee replacement surgery: Ontario criteria for case selection and surgical priority. Qual Health Care 1996;5:20e30. [29] Dreinhofer KE, Dieppe P, Sturmer T, Grober-Gratz D, Floren M, Gunther KP, et al. Indications for total hip replacement: comparison of assessments of orthopaedic surgeons and referring physicians. Ann Rheum Dis 2006;65:1346e50.

G. Mnatzaganian et al. / Journal of Clinical Epidemiology [30] Cross WW, Saleh KJ, Wilt TJ, Kane RL. Agreement about indications for total knee arthroplasty. Clin Orthop Relat Res 2006;446:34e9. [31] Mnatzaganian G, Ryan P, Reid CM, Davidson DC, Hiller JE. Smoking and primary total hip or knee replacement due to osteoarthritis in 54,288 elderly men and women. BMC Musculoskelet Disord 2013; 14:262. http://dx.doi.org/10.1186/1471-2474-14-262. [32] Arora A, True A. What kind of physician will you be? Variation in health care and its importance for residency training. USA: The Dartmouth Atlas of Health Care; 2012: Available at http://www. dartmouthatlas.org/pages/residency. Accessed July 7, 2013. [33] Dixon T, Urquhart DM, Berry P, Bhatia K, Wang Y, Graves S, et al. Variation in rates of hip and knee joint replacement in Australia based on socio-economic status, geographical locality, birthplace and indigenous status. ANZ J Surg 2011;81:26e31. [34] Judge A, Welton NJ, Sandhu J, Ben-Shlomo Y. Geographical variation in the provision of elective primary hip and knee replacement: the role of socio-demographic, hospital and distance variables. J Public Health (oxf) 2009;31:413e22.

-

(2015)

-

9

[35] M€akel€a KT, Peltola M, H€akkinen U, Remes V. Geographical variation in incidence of primary total hip arthroplasty: a populationbased analysis of 34,642 replacements. Arch Orthop Trauma Surg 2010;130:633e9. [36] Sch€afer T, Pritzkuleit R, Jeszenszky C, Malzahn J, Maier W, Gunther KP, et al. Trends and geographical variation of primary hip and knee joint replacement in Germany. Osteoarthritis Cartilage 2013;21:279e88. [37] Felson DT, Lawrence RC, Dieppe PA, Hirsch R, Helmick CG, Jordan JM, et al. Osteoarthritis: new insights. Part 1: the disease and its risk factors. Ann Intern Med 2000;133:635e46. [38] Hannan MT, Felson DT, Pincus T. Analysis of the discordance between radiographic changes and knee pain in osteoarthritis of the knee. J Rheumatol 2000;27:1513e7. [39] ABS (Australian Bureau of Statistics). National Health Survey: summary of results, 2001. Canberra, Australia: 2001: Available at http:// www.abs.gov.au/AUSSTATS/[email protected]/Lookup/4364.0MainþFeatures 12001. Accessed on 7 July 2013.

Propensity score matching and randomization.

We used elective total joint replacement (TJR) as a case study to demonstrate selection bias toward offering this procedure to younger and healthier p...
516KB Sizes 0 Downloads 11 Views