Correction of Selection Bias in Survey Data: Is the Statistical Cure Worse Than the Bias?

AJPH PERSPECTIVES

Correction of Selection Bias in Survey Data: Is the Statistical Cure Worse Than the Bias? James A. Hanley, PhD

This article was jointly published in the American Journal of Epidemiology (Am J Epidemiol. 2017;185(6):409–411) and the American Journal of Public Health (Am J Pub Health. 2017;107(4):503–505). In previous articles in the American Journal of Epidemiology (Am J Epidemiol. 2013;177 (5):431–442) and American Journal of Public Health (Am J Public Health. 2013;103(10): 1895–1901), Masters et al. reported age-specific hazard ratios for the contrasts in mortality rates between obesity categories. They corrected the observed hazard ratios for selection bias caused by what they postulated was the nonrepresentativeness of the participants in the National Health Interview Study that increased with age, obesity, and ill health. However, it is possible that their regression approach to remove the alleged bias has not produced, and in general cannot produce, sensible hazard ratio estimates. First, we must consider how many nonparticipants there might have been in each category of obesity and of age at entry and how much higher the mortality rates would have to be in nonparticipants than in participants in these same categories. What plausible set of numerical values would convert the (“biased”) decreasing-with-age hazard ratios seen in the data into the (“unbiased”) increasing-with-age ratios that they computed? Can these values be encapsulated in (and can sensible values be recovered from) one additional internal variable in a regression model? Second, one must examine the age pattern of the hazard ratios that have been adjusted for selection. Without the correction, the hazard ratios are attenuated with increasing age. With it, the hazard ratios at older ages are considerably higher, but those at younger ages are well below one. Third, one must test whether the regression approach suggested by Masters et al. would correct the nonrepresentativeness that increased with age and ill health that I introduced into real and hypothetical data sets. I found that the approach did not recover the hazard ratio patterns present in the unselected data sets: the corrections overshot the target at older ages and undershot it at lower ages. (Am J Public Health. 2017;107:503–505. doi:10.2105/ AJPH.2016.303644) See also Morabia, Szklo, and Vaughan, p. 502.

T

he Editors of the American Journal of Public Health (AJPH) and the American Journal of Epidemiology (AJE) asked me to comment on hazard ratio calculations in two articles by Masters et al.1,2 At issue are age-specific hazard ratios for the differences in mortality rates between obesity categories. The hazard ratio estimates in the AJE article2 were derived using Cox regression models that allowed for different hazard ratios for different age intervals; those in the subsequent AJPH article1 were obtained using smooth-in-age hazard ratio models. These estimates deserve scrutiny because they were

April 2017, Vol 107, No. 4

AJPH

derived from a statistical approach that the authors claimed corrected the observed hazard ratios for selection bias,3 that is, a distortion caused by what they postulated was the nonrepresentativeness of the participants in the National Health Interview Study (NHIS) that increased with age, obesity, and ill health. In the AJPH article, the authors used these

corrected hazard ratios to arrive at population attributable fractions (a secondary topic that is addressed in Appendices 1 and 2, available as supplements to the online version of this article at http://www.ajph.org). The way Masters et al. arrived at the age-specific hazard ratios was addressed in their response to a letter to the editor in the AJE4 and was also addressed in an unanswered letter to the editor in the AJPH.5 Appendix 3 and Figure A (available as supplements to the online version of this article at http://www. ajph.org) show my simple reanalyses of some of the NHIS data and the age-specific hazard ratios (which were very similar to those of Masters et al.) that I obtained using their correction for selection bias. However, it is possible that the way they removed the alleged bias has not produced, and in general cannot produce, sensible hazard ratio estimates. Masters et al. postulated that the nonrepresentativeness of the NHIS participants, which increased with age and ill health, distorted the age-specific mortality gradients across the body mass index (BMI) categories: Participants who were older at entry and in the higher-risk BMI categories would have been healthier than their same-age counterparts who did not participate. Ideally, to directly and accurately adjust the observed age-specific hazard ratios for this discrepancy, one would (1) add appropriate (specific to age at entry) numbers of nonparticipants to each of the BMI categories, (2) assign them higher mortality rates than those seen in persons in the

ABOUT THE AUTHOR James A. Hanley is with the Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreál, Que´bec, Canada. Correspondence should be sent to James A. Hanley, McGill University, 1020, Pine Avenue West, Montréal, Québec, H3A 1A2, Canada (e-mail: [email protected]). Reprints can be ordered at http://www.ajph.org by clicking the “Reprints” link. This commentary was accepted December 20, 2016. doi: 10.2105/AJPH.2016.303644

Hanley

Peer Reviewed

Commentary

503

AJPH PERSPECTIVES

Note. Panel a: In a purely mathematical population, the true hazard ratios for exposed versus not exposed participants are approximately 1.1 at the lower end of the age category and 2.5 at the upper age (black line). From this population, 19 sample waves were simulated in which, increasingly with age at potential selection, exposed persons who did participate were less likely to die in the next six years than were their sampled unexposed peers who did not participate. This attenuated the observed hazard ratio curve (red line). Panel b:The selectivity that increases with age excludes some of the exposed persons who are so ill that they will die within six years. For example, the six red dots beginning at age 65 years (a potential age at entry) indicate what fractions of these frail 65-year-old individuals who die in the next six years were excluded from the survey wave. Applied to these selective data, the approach proposed by Masters et al. (panel a, blue line) was not able to recover the hazard pattern present in the unselected data.

FIGURE 1—Attempt to Recover the True Hazard Ratio Pattern After Some Data Have Been Selectively Removed

same category who did participate, and (3) calculate a new (higher) rate for each category. It is difficult to see how estimates of these additional (age-at-entry–specific sets of) quantities implicated in the selection bias can be extracted from the NHIS data simply by using a regression model. Masters et al. claim to have removed the bias merely by adding a centered version of age at entry as an effect modifier in their regression model and then (after fitting) setting it to that central value (see Appendix 3 for details). Epidemiologists often set a known confounder to a typical central value and use a statistical model to standardize the comparison of interest. In the present case, Masters et al. relied on a piece of information that was recorded only for the participants to somehow fill in (impute) some never-estimated number of participants who were selectively missing. This new statistical cure for selection bias appears to be too simple to be true. In this commentary, I focus on three concerns.

504

Commentary

Peer Reviewed

Hanley

The first arises when one considers the plausible values to be used in the ideal adjustment described above; this sensitivity analysis would provide bounds for the correct age-specific hazard ratio estimates. For example, might one plausibly double the hazard ratios for those missing from the higher-risk BMI categories? It is unlikely that the majority of the 10% to 15% who chose not to participate in the 19 NHIS waves6 were in these older and higher-risk categories. Moreover, not all of the missing minority are necessarily at higher risk than those in the same category who did participate. With all of these constraints, can an uncorrected hazard ratio of 1.2 at, for example, 80 years of age realistically become a corrected hazard ratio of 2.4? A second and very concrete concern is the age pattern of the “adjusted-for-selection” hazard ratios shown in Figure 2 of the article by Masters et al. in the AJE2 and Figure 2 of the article by Masters et al. in the AJPH.1

Without their correction, the hazard ratios are attenuated with age. With it, the hazard ratios at older ages are considerably higher but those at younger ages are well below one. The media coverage of these articles missed an important implication for persons younger than 50 or even 55 years of age. All other things being equal, should life insurance premiums be lower for those in an obese BMI category than for those in the normal-weight category? The third, and equally worrying, concern stems from two simulations in which the regression approach of Masters et al. did not fix the nonrepresentativeness that increased with age and ill health that I deliberately introduced. I introduced this selection bias into samples from known populations. In one scenario, the correct hazard ratios were higher at older ages, whereas in the other, they were lower at older ages. The simulations and results are described in detail in Appendix 4, and the R code is provided in Appendix 5 (available as supplements to the online version of this article at http:// www.ajph.org). The simulation for hazard ratios that are higher at older ages, shown in black in Figure 1, is based on a purely mathematical population in which the true hazard ratios for exposed versus not exposed are approximately 1.1 at the lower end of the age category and 2.5 at the upper end. From it, I simulated 19 sample waves in which (increasingly with age at selection) exposed persons who did participate were less likely to die in the next six years than were their sampled exposed peers who did not participate, thereby attenuating the observed hazard ratio curve. Applied to these selective data, the approach proposed by Masters et al. was not able to recover the hazard ratio pattern present in the unselected data. The corrected age-specific hazard ratios overshot the target at older ages and undershot it at lower ages. The second simulation was based on hazard ratio patterns seen in actual populations, in which the hazard ratio for cohort-type mortality rates in men versus women is typically greater than two at age 60 years, 1.5 at age 80 years, and near one at age 100 years (see Figures B and C and Appendix 4, available as supplements to the online version of this article at http://www.ajph.org). Using the published cohort mortality rates for Finland, I simulated 19 sample waves in which, increasingly with age at selection, men who did participate were less

AJPH

April 2017, Vol 107, No. 4

AJPH PERSPECTIVES

likely to die in the next five years than were their sampled male peers who did not participate. Again, applied to these selective data, the approach of Masters et al. was not able to recover the “lower-at-older-ages” hazard ratio pattern present in the full data (Figure C and Appendix 4). The greater the selectivity I introduced, the more the corrected age-specific male-tofemale hazard ratios overshot the target at older ages and undershot it at lower ages. The Appendices 1, 2, and 4 contains several implicit pleas to authors. I end with two explicit ones: that new ways to directly correct heretofore uncorrectable biases first be tested on simulated data generated by

known parameter values and that reported model-based corrections not be so drastic that (as with the life insurance premiums) they seemingly correct the problem at one end of the age scale by creating one at the other end. I also make a plea to editors: instead of asking authors to report what software—and what version—they used to prepare the data and derive the reported results, might they ensure instead that the computer code used is publicly available? REFERENCES 1. Masters RK, Reither EN, Powers DA, Yang YC, Burger AE, Link BG. The impact of obesity on US mortality levels: the importance of age and cohort factors

in population estimates. Am J Public Health. 2013;103(10): 1895–1901. 2. Masters RK, Powers DA, Link BG. Obesity and US mortality risk over the adult life course. Am J Epidemiol. 2013;177(5):431–442. 3. Chapter 11: selection bias. In: Kleinbaum DG, Kupper LL, Morgenstern H. Epidemiologic Research: Principles and Quantitative Methods, Chapter 11. Belmont, CA: Lifetime Learning Publications; 1982:194–219. 4. Masters RK, Powers DA, Link BG. The authors reply. [letter] Am J Epidemiol. 2014;179(4):530–532. 5. Wang Z, Liu M. Obesity-mortality association with age: wrong conclusion based on calculation error. [letter] Am J Public Health. 2014;104(7):e3–e4. 6. Galea S, Tracy M. Participation rates in epidemiologic studies. Ann Epidemiol. 2007;17(9):643–653.

Masters et al. Respond Ryan K. Masters, PhD, Daniel A. Powers, PhD, Eric N. Reither, PhD, Y. Claire Yang, PhD, and Bruce G. Link, PhD

This article was jointly published in the American Journal of Epidemiology (Am J Epidemiol. 2017;185(6):412–413) and the AJPH (Am J Pub Health. 2017;107(4):505–506).

A

valuable role that journal editors can play is to create a level playing field where debates can be aired and the scientific merits of the issues can be judged by the full scientific community. It is rare for editors to bypass this step and render their own judgments, which is what happened in this case. Editors of the AJPH and the American Journal of Epidemiology have decided our approach is seriously flawed.1,2 This decision was based largely on Dr. Hanley’s assessment,3,4 which they solicited and which was neither externally nor anonymously peer reviewed. The Editors have allowed us only 600 words to defend our peer-reviewed articles5,6 in response to their editorial, Dr. Hanley’s solicited commentary, and his extensive Appendices. We do not have space here to fully describe our position in response to these statements, and we therefore invite interested readers to examine our responses in our Appendix (available as a supplement to the online version of this article at http://www.ajph.org). Our articles represented earnest efforts to address selection biases in survey-based estimates of the obesity–mortality association. We proposed an alternative to the standard method because that method completely ignores these biases. Dr. Hanley’s simulations convinced him that our approach creates more problems than it

April 2017, Vol 107, No. 4

AJPH

solves. We respect Dr. Hanley’s effort and acknowledge that it is an important part of the scientific enterprise—a strong and wellreasoned challenge. In response to Hanley’s challenge, we wrote an evidence-based rebuttal that the Editors of the AJPH (then the American Journal of Public Health) declined to publish. The essence of our response was that Hanley’s simulation assumptions inaccurately reflected the full scale of the selection biases that affect the obesity– mortality association in data from the National Health Interview Survey (NHIS). Nonetheless, we refitted our survival models, taking into account Dr. Hanley’s concerns. Results from these new analyses were consistent with those from our original articles—namely, that apparent age-related declines in the obesity–mortality association strongly reflect selection bias.

Furthermore, we showed that the approach used in our articles corrected this bias in NHIS data and provided accurate estimates of true male–female mortality hazard ratios in official US mortality data. We did this to counter Dr. Hanley’s test of our approach, in which he used known male–female hazard ratios but simulated a selection pattern that was not observed in the NHIS or the National Health and Nutrition Examination Surveys. Taken together, our analyses show that (1) Hanley’s simulation bears little resemblance to real survey data and (2) our approach provides accurate estimates of known hazard ratios using data from the NHIS and National Health and Nutrition Examination Surveys. For our complete response, please see the Appendix. For us, the most critical issue remains the strong likelihood that uncorrected survey

ABOUT THE AUTHORS Ryan K. Masters is with the Department of Sociology, University of Colorado Boulder, Boulder. Daniel A. Powers is with the Department of Sociology, College of Liberal Arts, University of Texas at Austin, Austin. Eric N. Reither is with the Department of Sociology, Social Work, and Anthropology, Utah State University, Logan. Y. Claire Yang is with the Department of Sociology and Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill. Bruce G. Link is with the Department of Sociology and Public Policy, University of California-Riverside. Correspondence should be sent to Ryan K. Masters, Department of Sociology, UCB 327, Ketchum Hall 264, University of Colorado Boulder, Boulder, CO 80309 (e-mail: [email protected]). Reprints can be ordered at http://www.ajph.org by clicking the “Reprints” link. This editorial was accepted January 17, 2017. doi: 10.2105/AJPH.2017.303715

Masters et al.

Editorial

505

Maintenance antipsychotic therapy: is the cure worse than the disease?

Selection bias.

Athletic Performance and Birth Month: Is the Relative Age Effect More than just Selection Bias?

Athletic performance and birth month: is the relative age effect more than just selection bias?

Calcaneal fractures: selection bias is key.

Is the cure worse than the disease? Caveats in the move from laboratory to clinic.

Minimizing right ventricular pacing in sinus node disease: Sometimes the cure is worse than the disease.

Beware selection bias.

Evaluation of Bias Correction Method for Satellite-Based Rainfall Data.

Correction: Data Gathering Bias: Trait Vulnerability to Psychotic Symptoms?

Homeopathy: where is the bias?

Correction for regression dilution bias.

The Bias in researching cognitive bias.

Selection bias in rheumatic disease research.

Selection bias: a missing factor in the obesity paradox debate.

High-dimensional genomic data bias correction and data integration using MANCIE.

Selection bias, survival, and brachytherapy for glioma.

The impact bias is alive and well.

Brief Report: Negative Controls to Detect Selection Bias and Measurement Bias in Epidemiologic Studies.

Measuring bias in self-reported data.

Ascertainment bias in diabetic outcome data.

Selection bias correction for species sensitivity distribution modeling and hazardous concentration estimation.

Is there gender bias in HIV cure research? A case study of female representation at the 2015 HIV Persistence Workshop.

Investigating non-response bias in a survey of disablement in the community: implications for survey methodology.