Letters

This work was supported by the Ministry of Science, Education, and Sports of the Republic of Croatia grant no. 022-0222411-2410 (project Allergotoxic effects of general and occupational environment; project leader: Jelena Macan). The authors have no conflicts to report. Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com). This content is not peer-reviewed or copy-edited; it is the sole responsibility of the author. Copyright © 2014 by Lippincott Williams & Wilkins ISSN: 1044-3983/14/2505-0769 DOI: 10.1097/EDE.0000000000000139

Epidemiology  •  Volume 25, Number 5, September 2014

ICCMID

Unequal category widths:

ICCMLE

ICCMID

ICCMLE

0.6 0.5

Simulation study 2: ICC = 0.5

0.4

Mean of estimate

0.7

0.8

Simulation study 1: ICC = 0.8

0.3

I

Equal category widths:

Simulation study 3: ICC = 0.2

0.2

To the Editors: n questionnaires applied in epidemiologic surveys, respondents often answer questions about continuous variables in terms of few predefined categories. Examples of such continuous variables commonly treated as categorized (grouped) data are yearly household income, frequency of food intake during 1 week, and hours of intensive physical activity per week. The reproducibility of grouped data is usually estimated by the intraclass correlation coefficient (ICC),1 calculated on the midpoints of predefined categories, or weighted kappa.2 Both methods, however, depend on the choice of categories (cut-off points, number) and seem to underestimate continuous data ICC.3–5 This behavior complicates their interpretation and hinders comparison of questionnaires with differently defined categories. To our knowledge, the maximum likelihood ICC has not been suggested in reproducibility studies with grouped data. We compare it with the midpoint ICC on simulated datasets and on a real-life example, food frequency questionnaire (FFQ) data. Our simulations mimicked situations when 1000 respondents answer the question with 5 predefined categories on 2 occasions. We performed 1000 simulations for each of the 99 ICC values (0.01– 0.99). To investigate the influence of the

number of categories, we further experimented with 3, 5, 10, 25, and 50 categories. These experiments included 1 low (0.2), 1 medium (0.5), and 1 high (0.8) ICC value (1000 simulations for each experiment). We separately analyzed cases of equal and unequal category widths. Data were simulated according to 1-way random-effects model and grouped into predefined categories afterwards. Detailed methodology of simulations and derivation of the log-likelihood for grouped data are in the eAppendix (http://links.lww.com/EDE/A807). The maximum likelihood estimator showed low bias, with a median value of 0.001 (range 0.000–0.011) when categories widths were equal, and 0.002 (0.000–0.017) when categories widths were unequal (simulations with 1000 respondents and 5 categories; eTable1, eFigure1, http://links.lww.com/ EDE/A807). In contrast, midpoint ICC underestimated ICC by a median value of 0.067 (0.002–0.111) when categories widths were equal and 0.133 (0.002– 0.182) otherwise. The Figure shows the results of simulations with different number of categories (as described in eTables 2–4, http://links.lww.com/EDE/A807). Maximum likelihood ICC was unaffected by the number of categories and the choice of cut-off points: its mean estimates for data with the same underlying ICC value differed by 0.005, at most. In comparison, midpoint estimates for data with the same ICC value differed by up to 0.24. The midpoint method underestimated ICC more when the number of categories was lower. Furthermore, its bias was higher when ICC was higher and widths of categories unequal. Its bias was low, mainly below 0.01, only when the number of categories was large (25 and higher). The FFQ example showed a similar pattern, with maximum-likelihood estimates on average higher by 0.09 than midpoint estimates (eTable5, http:// links.lww.com/EDE/A807). Although the bias and dependence of midpoint ICC on the number of

0.1

Intraclass Correlation Coefficient for Grouped Data

3

5

10

25

50

Number of categories

FIGURE.  Impact of number of categories on estimators. For each of the 3 ICC values and a fixed number of categories, plot shows means of ICCMID and ICCMLE over 1000 simulations with equal category widths and 1000 simulations with unequal category widths. True ICC value is shown as a horizontal line at values of 0.2, 0.5, and 0.8. ICC, intraclass correlation coefficient; ICCMID, ICC calculated on categories’ midpoints; ICCMLE, maximum likelihood estimator.

categories have been noted previously, maximum likelihood estimation has not been used in reproducibility studies with grouped data, probably because of lack of user-friendly software solutions. Thus, we provide R package iRepro (available from http://www.imi.hr/~jkovacic/ irepro.html, including installation and usage guidelines). To conclude, researchers should be aware of bias related to the commonly used midpoint approach when estimating ICC from continuous grouped data. The maximum likelihood estimator is a better choice, as it showed almost no bias under all tested conditions. Furthermore, it was unaffected by the choice of categories. This enables comparison of www.epidem.com  |  769

Epidemiology  •  Volume 25, Number 5, September 2014

Letters

To the Editor: nstrumental variable analysis has been used to control for unmeasured confounding in nonrandomized studies.1–4

An instrumental variable (1) is associated with exposure, (2) affects outcome only through the exposure, and (3) is independent of confounders.1–4 If these key assumptions are satisfied (together with additional assumptions, such as homogeneity),1,3,4 instrumental variable analysis could consistently estimate the average causal effect of exposure.1,4 However, if one of the assumptions is violated, the estimate can be severely biased.1,3,4 Several methods are available for checking the first assumption,2,4 but there is no well-established method for checking the second and third assumptions. Some authors1,3 have argued that these assumptions are untestable, as they involve unmeasured confounding. Glymour et al5 suggested several approaches (eg, leverage prior causal assumptions) for evaluating the validity of instrumental variable, although, in certain situations, they might fail to identify a biased instrumental variable or inappropriately suggest that a valid instrumental variable is biased. In addition, balance of measured confounders between instrumental variable categories has been used as a supportive evidence for the third assumption.2,6 Alternatively, an imbalance of measured confounders can falsify this assumption. We propose the standardized difference (SDif), a robust balance measure used in propensity score methods,7,8 to falsify the third assumption by checking independence between an instrumental variable and measured confounders. If measured confounders are insufficiently balanced between instrumental variable categories, indicated by SDif values deviating from zero (eg, >0.10),7 this may also imply an imbalance of unmeasured confounders, even after conditioning on measured confounders (depending on

the associations among instrumental variables and measured and unmeasured confounders). In that case, the third assumption is violated; hence, (un) adjusted instrumental variable analysis is inappropriate. However, if measured confounders are balanced, investigators should rely on background knowledge to argue that such balance could be carried over to unmeasured confounders.2,6 In a simulation study, we assessed the performance of SDif to quantitatively falsify the third assumption. In addition, we applied this measure in an empirical study on the relation between β2-agonist use and myocardial infarction, using physician preference as an instrumental variable. For details, we refer to the eAppendix (http://links.lww.com/EDE/A815). Key findings are summarized below. Data were generated with binary instrumental variable and exposure, continuous confounders (3 measured and 1 unmeasured), and a continuous outcome based on the causal diagram shown in the Figure (panel A). SDif was calculated for the measured confounders. Panel B shows the results of instrumental variable analysis without adjustment for measured confounders. The magnitude of bias in the instrumental variable estimate increased with decreasing balance of measured confounders between instrumental variable categories (eg, for an instrumental variable that was independent of unmeasured confounders, the bias ranged from 0.0 to 6.3 for corresponding SDif of 0.05–0.60). When the instrumental variable was independent of the measured confounders, but associated with the unmeasured confounders, instrumental variable estimates were biased, although the SDif was close to zero.

M.S.A. and M.J.U. contributed equally to this work and are listed in alphabetical order. The research leading to these results was conducted as a part of the PROTECT consortium (Pharmacoepidemiological Research on Outcomes of Therapeutics by a European ConsorTium), which is a public-private partnership coordinated by the European Medicines Agency. The PROTECT project is supported by Innovative Medicine Initiative Joint Undertaking (www. imi.europa.eu) under Grant Agreement no 115004, resources of which are composed of

financial contribution from the European Union’s Seventh Framework Programme (FP7/20072013) and EFPIA companies’ in kind contribution. In the context of the IMI Joint Undertaking (IMI JU), the Department of Pharmacoepidemiology, Utrecht University, also received a direct financial contribution from Pfizer. The views expressed are those of the authors only and not of their respective institution or company. O.H.K. has received unrestricted funding for pharmacoepidemiological research from the Dutch

private-public funded Top Institute Pharma (TI Pharma Grant T6.101 Mondriaan). Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com). This content is not peer-reviewed or copy-edited; it is the sole responsibility of the author.

questionnaires with different grouping schemes, including questionnaires with data that are not grouped (ie, continuous data). Unless the number of categories is large, such comparisons based on the midpoint method should be avoided.

ACKNOWLEDGMENT We thank Jelena Macan for her valuable comments. Jelena Kovačić Veda Marija Varnai Institute for Medical Research and Occupational Health Zagreb, Croatia [email protected]

REFERENCES 1. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–428. 2. Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70: 213–220. 3. Maclure M, Willett WC. Misinterpretation and misuse of the kappa statistic. Am J Epidemiol. 1987;126:161–169. 4. Brenner H, Kliebsch U. Dependence of weighted kappa coefficients on the number of categories. Epidemiology. 1996;7:199–202. 5. de Mast J, van Wieringen W. Measurement system analysis for bounded ordinal data. Qual Reliab Eng Int. 2004;20:383–395.

Quantitative Falsification of Instrumental Variables Assumption Using Balance Measures

I

770  |  www.epidem.com

Copyright © 2014 by Lippincott Williams & Wilkins ISSN: 1044-3983/14/2505-0770 DOI: 10.1097/EDE.0000000000000152

© 2014 Lippincott Williams & Wilkins

Intraclass correlation coefficient for grouped data.

Intraclass correlation coefficient for grouped data. - PDF Download Free
338KB Sizes 1 Downloads 3 Views