Psychology & Health

ISSN: 0887-0446 (Print) 1476-8321 (Online) Journal homepage: http://www.tandfonline.com/loi/gpsh20

Panning for the gold in health research: Incorporating studies’ methodological quality in meta-analysis Blair T. Johnson, Robert E. Low & Hayley V. MacDonald To cite this article: Blair T. Johnson, Robert E. Low & Hayley V. MacDonald (2015) Panning for the gold in health research: Incorporating studies’ methodological quality in meta-analysis, Psychology & Health, 30:1, 135-152, DOI: 10.1080/08870446.2014.953533 To link to this article: http://dx.doi.org/10.1080/08870446.2014.953533

View supplementary material

Accepted author version posted online: 19 Aug 2014. Published online: 27 Oct 2014. Submit your article to this journal

Article views: 190

View related articles

View Crossmark data

Citing articles: 5 View citing articles

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=gpsh20 Download by: [Nanyang Technological University]

Date: 16 November 2015, At: 04:00

Psychology & Health, 2015 Vol. 30, No. 1, 135–152, http://dx.doi.org/10.1080/08870446.2014.953533

Panning for the gold in health research: Incorporating studies’ methodological quality in meta-analysis Blair T. Johnsona*, Robert E. Lowa,b and Hayley V. MacDonaldb,c

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

a

Department of Psychology, Center for Health, Intervention, & Prevention, University of Connecticut, Storrs, CT, USA; bCenter for Health, Intervention & Prevention, University of Connecticut, Storrs, CT, USA; cDepartment of Kinesiology, University of Connecticut, Storrs, CT, USA (Received 30 August 2013; accepted 6 June 2014) Background: Systematic reviews now routinely assess methodological quality to gauge the validity of the included studies and of the synthesis as a whole. Although trends from higher quality studies should be clearer, it is uncertain how often metaanalyses incorporate methodological quality in models of study results either as predictors, or, more interestingly, in interactions with theoretical moderators. Objective and methods: We survey 200 meta-analyses in three health promotion domains to examine when and how meta-analyses incorporate methodological quality. Results: Although methodological quality assessments commonly appear in contemporary meta-analyses (usually as scales), they are rarely incorporated in analyses, and still more rarely analysed in interaction with theoretical determinants of the success of health promotions. The few meta-analyses (2.5%) that did include such an interaction analysis showed that moderator results remained significant in higher quality studies or were present only among higher quality studies. We describe how to model quality interactively with theoretically derived moderators and discuss strengths and weaknesses of this approach and in relation to current meta-analytic practice. Conclusions: In large literatures exhibiting heterogeneous effects, meta-analyses can incorporate methodological quality and generate conclusions that enable greater confidence not only about the substantive phenomenon but also about the role that methodological quality itself plays. Keywords: Cochrane Collaboration; data quality; health promotion; metaanalysis; meta-regression; methodological studies; scoring methods

Research on important issues relevant to human health has veritably exploded in recent decades. Although a wealth of data might promise ever-clearer answers, studies often fail to replicate each other’s findings and meta-analyses of these findings exhibit statistically significant heterogeneity. In addition, the increasing volume of literature and variety of methods is usually much more than can be accurately and efficiently summarised with narrative review strategies. In response, scholars have increasingly undertaken systematic reviews of all available qualifying evidence, harnessing the law of large numbers to reduce the play of chance. Simultaneously, there is an increasing sophistication in the standards for meta-analyses used to aggregate study results. *Corresponding author. Email: [email protected] © 2014 Taylor & Francis

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

136

B.T. Johnson et al.

One purpose of the current article is to focus on how much and how well health-related meta-analyses have incorporated the methodological quality (MQ) of the studies they review. The remainder of this introduction briefly overviews the history of scholarship on this question. In order to document what strategies reviews use to ensure that higher quality evidence is given greater credence, we survey three domains where meta-analyses have been published frequently: (a) behavioural interventions to reduce risk of human immunodeficiency virus (HIV) infection; (b) exercise interventions to improve mental health; and (c) exercise interventions to lower blood pressure. A second purpose of this article is to elaborate a view that, under optimal circumstances, MQ should be incorporated interactively in meta-analytic modelling as a systematic sensitivity analysis. If MQ indeed corresponds to less error, then we should see clearer patterns emerging from higher quality studies and murkier patterns in lower quality studies. For example, dose-response functions should be most visible for higher quality studies. In short, higher study quality should maximise the signal-to-noise ratio. After illustrating such results, we conclude by enumerating potential strengths and weaknesses of considering MQ in this fashion. The role of methodological quality in systematic reviewing Fundamentally, scientists strive to maximise validity and minimise bias – to use tools that lead to the truth, and to avoid systematic error. Researchers have learned to anticipate certain sources of bias, but threats to validity are multitudinous, often stealthy and specific to particular domains. Conceptually, the measurement of study quality helps to remedy this problem by gauging the confidence a reviewer may have in the findings of a given study, and by extension, the findings of the resultant review (Shadish, 1989). Although experts recommend different processes for measuring ‘quality,’ they consistently invoke the conceptual framework found in classic sources on the subject: Higher study quality connotes lower bias and higher validity (Cook & Campbell, 1976; Shadish, Cook, & Campbell, 2002). MQ scales have proliferated in response, aiming to quantify sources of bias in a study and construct a standard of comparison by which to make relative judgements. Deeks and colleagues’ (2003) survey located nearly 200 instruments and, based on qualitative comparisons, recommended two as particularly comprehensive: the Downs and Black Checklist (1998) and the Newcastle–Ottawa Scale (Wells et al., 2000). Published standards and best-practice recommendations for systematic reviews increasingly emphasise that gauging MQ of the sampled studies is indispensable. For example, the popular AMSTAR scale incorporates two items evaluating whether the quality of studies included was assessed (Shea et al., 2007). Thus, reviewers must determine whether quality affects the review’s inferences. Unfortunately, as Valentine (2009) concluded, ‘in most areas of inquiry, we simply do not know enough to be able to predict the direction of bias associated with any quality indicator’ (p. 138). He also noted that MQ scales ‘generally lack operational specificity and are based on criteria that have no empirical support’ (p. 137). At present, therefore, the validity of MQ scales themselves is still in question, and leaves reviewers in a quandary as to courses of action. Advice on the role of study quality in meta-analysis methods is often conditional, controversial and inconsistent (cf. Conn & Rantz, 2003; Shadish, 1989; Valentine, 2009). Lipsey and Wilson (2001) recommended conducting sensitivity analyses using MQ items. Card (2012) advised correcting for some biases (such as artificial dichotomisation) and considering the implications for

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

Psychology & Health

137

generalisability of selecting samples restricted in certain respects. Kahn, Kunz, Kleijnen, and Antes (2011) concluded that analysts should use MQ checklists to code and rank studies. Borenstein, Hedges, Higgins, and Rothstein (2011) considered whether particular dimensions of quality might increase or decrease observed effect sizes. The Cochrane handbook explicitly recommended against both quality checklists and quality rankings; instead, consistent with earlier Cochrane guidance, ‘a Cochrane review would typically seek all rigorous studies (e.g. randomised controlled trials [RCTs]) of a particular comparison of interventions’ (O’Connor, Green, & Higgins, 2008, p. 84, emphasis added). Thus, Cochrane reviews are a form of best-evidence synthesis (Slavin, 1986), restricting reviews to those that meet stringent a priori criteria. In essence, therefore, non-qualifying studies have zero weight in the formal review conclusions, as they are excluded from the data-extraction process. Still, recognising that quality may still vary even in the ‘best’ studies, the Cochrane handbook recommends a risk of bias assessment that overlaps considerably with the Downs and Black (1998) and the Newcastle– Ottawa (Wells et al., 2000) inventories in its emphases on individual biases due to selection, performance, attrition, detection and reporting. It omits many other characteristics that usually serve as selection criteria instead (e.g. RCTs). The handbook further encourages analysts ‘to perform sensitivity analyses showing how conclusions might be affected if studies at high risk of bias were included’ (p. 209, emphasis in original). Most of these authorities addressed the simple situation in which a meta-analysis attempts to determine whether a treatment in fact works. At least two authorities explicitly addressed the fact that theoretical moderators may well be confounded with MQ dimensions (Lipsey & Wilson, 2001; Valentine, 2009). None of these authorities suggested an interactive strategy in order to examine whether effects apparently due to theoretical moderators may in fact depend on the inclusion of studies with lower quality. In summary, influential treatises on systematic reviewing provide varied advice on the use of MQ in meta-analysis, suggesting that greater attention to the potential roles of MQ in research is merited. Nonetheless, no surveys have documented contemporary trends in the use of MQ in synthesising health research. We therefore conducted our own research for this purpose, surveying meta-analyses in three areas where they have been published frequently. We expected to see that (a) meta-analyses have become increasingly likely to incorporate MQ; (b) that MQ is unlikely to be used in analyses and especially rare as an interactive factor; and (c) that MQ is very rarely used as weights in analyses. We suspected quality weighting is rare because, as Valentine (2009) concluded, meta-analytic authorities have long argued against its use; a recent Monte Carlo simulation concurred (Ahn & Becker, 2011). In examining the surveys, we also examine the role of using bestevidence criteria for selecting studies. After presenting the results, we address how modelling MQ interactively can be of considerable value under optimal circumstances. Methods Literature search Three domains were surveyed: (a) behavioural interventions to reduce risk of HIV infection; (b) exercise interventions to improve mental health; and (c) exercise and blood pressure (see Supplementary material for search terms). The first two domains were searched with PubMed and PsycINFO on 06 and 07 August 2013; the last domain was searched on 14 June 2013 as part of a separate review (Johnson et al., 2014).

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

138

B.T. Johnson et al.

Qualifying reports, which could be unpublished, were required to use systematic literature searches and meta-analytic statistics. In the first domain, our sample included meta-analyses of HIV prevention interventions – studies with a primary focus on using a psychosocial strategy to reduce risk for transmitting or acquiring HIV in human samples. Excluded were meta-analyses focused on outcomes that were indirectly related to risk reduction (e.g. identifying factors associated with HIV risk). In the second domain, the sample included meta-analyses using exercise interventions – defined as physical movement of some sort – to improve mental health outcomes, including depression, fatigue, anxiety and quality of life. Those that combined exercise with other treatments such as lifestyle change (Janssen, Gucht, Dusseldorp, & Maes, 2013) were omitted, as were programmes that indirectly aimed to increase physical activity (e.g. Conn, Hafdahl, Moore, Nielsen, & Brown, 2009). In the third domain, our sample included metaanalyses of controlled exercise interventions (independently or combined with other lifestyle interventions) with blood pressure as a primary outcome. Each qualifying meta-analysis was coded for its (a) year of publication, (b) number of authors, (c) number of studies included in the meta-analysis, and (d) whether the meta-analysis was registered with the Cochrane Collaboration or (e) cited its Handbook as guiding the investigation, and (f) used ‘best-evidence synthesis’ methodology (such that it excluded studies lacking a minimum high quality design). This definition excludes Slavin’s (1986) emphasis on more qualitative aspects of the studies in a review, but captures that methodological criteria were used for inclusion (e.g. RCT). For each meta-analysis, also coded was (g) whether MQ was mentioned in the text of the report; and (h) whether standardised MQ scales were applied to studies in the sample. Any MQ scale or item that was applied to the studies qualified, except GRADE (Grades of Recommendation, Assessment, Development and Evaluation), which, although quality relevant, focuses more broadly on the literature as a whole (Schunemann et al., 2011). Meta-analyses with MQ were also coded for (i) which standardised MQ scale was applied; and whether MQ was incorporated into the results in a (j) qualitative fashion, (k) quantitative fashion, (l) in a sensitivity analysis, (m) quantitatively as effect sizes with weighted by MQ scale score (Detsky, Naylor, O’Rourke, McGeer, & L’Abbé, 1992), or (m) used MQ scale interactively with other coded moderators. We also coded (n) whether non-MQ moderators were examined and (o), if so, whether any models used them as simultaneous predictors; these dimensions were coded in relation to the dependent variables relevant to this sample (some metaanalyses addressed other dimensions as well), and (p) gauged scholarly impact as the number of Google Scholar citations per year as of August 2013. Inter-judge reliability was high (l = 32 randomly sampled meta-analyses; M agreement = 91%; M κ = .72). We analysed each resulting sample of meta-analyses using Stata version 13.1 and calculated descriptive summaries and correlations. For our comparisons between Cochrane reviews and non-Cochrane reviews in terms of mean numbers of studies (k) sampled, we used t-tests assuming unequal variances (alpha = .05). Results and discussion Descriptive summary of meta-analyses Table 1 summarises the coded features of the meta-analyses that qualified for the survey in each of the three study domains. Overall, meta-analyses appeared between 1993 and

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

Psychology & Health

139

2013, and in all three samples, more meta-analyses have been published in recent years. Included meta-analyses were cited between 0 and 123.3 times per year since publication, suggesting that some have had considerable scholarly impact. Most of the metaanalyses were best-evidence syntheses (85%), selecting studies with ostensibly superior methodologies; all exercise and blood pressure meta-analyses were best-evidence syntheses. Written by one to 20 authors, these meta-analyses reviewed between two and 194 studies. A minority of meta-analyses followed Cochrane conventions (21%) and/or registered as Cochrane reviews (15%). All Cochrane-registered reviews were best-evidence syntheses, and among the meta-analyses that were unregistered but reported following Cochrane conventions (l = 13), 12 (92%) were best-evidence syntheses. Meta-analyses that followed Cochrane conventions sampled fewer studies, a statistically significant difference within the domains of HIV prevention and exercise and mental health (ts = 3.03 and 2.90, ps ≤ .01). (There were only two Cochrane reviews in the exercise and blood pressure domain.) Use of methodological quality in the meta-analyses Across all three domains, incorporating MQ items or scales has become more common with passing years, as has discussing MQ. In general, analysts use them in conformity to the very most common and consistent recommendations (see Table 1). Specifically, quantitative interpretation of study quality was common and weighted analyses by MQ scale score did not appear at all. The three most popular MQ instruments were risk of bias (27%) (Higgins & Green, 2008), followed by PEDro (20%) (Verhagen et al., 1998), and Jadad et al. (1996) (14%) inventories. Some recently developed MQ scales did not appear in the survey (e.g. DIAD, Valentine & Cooper, 2008). Many meta-analyses applied one or more investigator-generated MQ items or instruments (14%), with content overlapping greatly with popular inventories. Meta-analyses usually listed individual items in MQ instruments for each included study and also formed indexes using the sum or average of items satisfied; those using risk of bias nearly always left the items unscaled, consistent with Cochrane advice. None used both risk of bias and another instrument. Quantitative use of MQ scales was common in HIV prevention meta-analyses but rare in the two exercise domains (Table 1). Sensitivity analyses using MQ scales appeared in a small minority of meta-analyses in the HIV prevention and exercise and mental health domains; no meta-analysis examining exercise and blood pressure used this technique. When MQ scales were used quantitatively, it was usually to determine whether the success of the health promotion intervention hinged on studies’ MQ. (Of course, it is possible that meta-analysis teams performed but did not report quantitative analyses involving MQ). For example, Heiwe and Jacobson (2011) meta-analysed 45 trials evaluating the effects of exercise for older adults, finding numerous benefits on outcomes including health-related quality of life. Their sensitivity analyses revealed that removing trials with higher risk of bias left their conclusions intact. Using MQ in this fashion is sensible when trials exhibit heterogeneity in findings and when the primary interest is in determining whether a treatment’s effects only appear in studies believed to have lower study quality (O’Connor et al., 2008). When a treatment’s effects hinge on moderators such as the dose of a treatment, more sophisticated analyses dealing with MQ are in order, as we next show.

140

B.T. Johnson et al.

Table 1. Features of meta-analyses in three domains and how methodological quality was used with the studies meta-analysed. Feature

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

Publication year, M ± SD (ra) Followed Cochrane handbook (l) Impact (citations per year), M ± SD Best-evidence synthesis, l (%) (rb) Cochrane registered, l (%) Followed Cochrane handbook, l (%) k meta-analysed studies, M ± SD (rb) Followed Cochrane handbook (rb) Did not follow Cochrane handbook (rb) Included k ≥ 30 studies, l (%) (rb) Mentioned MQ in text, l (%) (rb) Used MQ, l (%) (rb) Best-evidence synthesis, l (%) Not best-evidence synthesis, l (%) MQ used qualitatively, l (%) MQ used in sensitivity analyses, l (%) MQ used in analyses, l (%) MQ used in interaction, l (%) Used other moderators, l (%) Used simultaneous moderators, l (%) With M k meta-analysed studies (range)

HIV prevention (l = 56)

Exercise and mental health (l = 111)

Exercise and blood pressure (l = 33)

2006.65 ± 4.55 (.55) 2009.80 ± 2.70 (10) 11.58 ± 14.31

2009.90 ± 3.08 (.85)

2003.73 ± 6.63 (.39)

2009.55 ± 2.49 (29)

2006.5 ± 6.86 (4)

12.54 ± 17.21

15.17 ± 20.82

38 (68%) (−.35) 8 (14%) 10 (18%) 42.02 ± 45.99 (.41) 19.10 ± 18.19 (−.03) 47.00 ± 48.76 (.57) 27 (48%) (.49)

98 (88%) (.47)

33 (100%) ( – )

20 (18%) 29 (26%) 24.40 ± 23.84 (−.02)

2 (6%) 4 (12%) 26.45 ± 23.76 (.27)

15.86 ± 15.08 (.52)

53.25 ± 34.91 (.10)

27.41 ± 25.64 (.09)

22.76 ± 19.95 (.15)

30 (27%) (−.03)

12 (36%) (.25)

39 (70%) (.58)

99 (89%) (.49)

21 (64%) (.36)

25 (45%) (.65) 22 (88%) 3 (12%) 19 (76%) 6 (11%)

95 (86%) (.64) 91 (93%) 4 (4%) 85 (89%) 21 (22%)

17 (52%) (.66) 17 (100%) 0 (0%) 16 (48%) 0 (0%)

14 (56%) 1 (2%) 45 (80%) 19 (34%) 63.05 (9–184)

14 (15%) 4 (4%) 52 (47%) 11 (10%) 51.55 (10–90)

1 (6%) 0 (0%) 23 (70%) 1 (3%) 21 (NA)

Note: k = Number of studies (within a meta-analysis), l = Number of meta-analyses, MQ = Methodological quality, and NA = Not available. a Number of cases correlated with year of publication. b Proportion of cases ‘yes’ in each year correlated with year of publication.

Meta-analyses taking an interactive approach to methodological quality The surveys showed that examining theoretical moderators was common, especially in the HIV prevention and exercise and blood pressure literatures. Use of simultaneous moderators was much more rare; HIV prevention meta-analyses were most likely to use them, with few examples of this practice in either exercise domains. Only five of the 31 meta-analyses that used simultaneous multi-moderator analyses (16%) evaluated MQ interactively (i.e. Theoretical Moderator × MQ), a mere 2.5% of the entire meta-analytic sample. The five meta-analyses have (a) appeared only since 2008; (b) had relatively large samples of studies (Mk = 77.2; SD = 33.16; range = 44–119); (c) usually were best-evidence syntheses; (d) had effect sizes with significant heterogeneity; (e) used

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

Psychology & Health

141

moderators based on theories about the phenomenon in an effort to explain the heterogeneity; and (f) evaluated whether these moderation patterns hinged on MQ, gauged as summary scores. In the HIV prevention sample, Johnson, Scott-Sheldon, Huedo-Medina, and Carey (2011) meta-analysis showed that, in 98 qualifying HIV prevention interventions for adolescents, condom use increased most when (a) interventions provided more condom skills training or (b) more motivational training. The former moderation pattern did not depend significantly on the studies’ MQ, but the latter pattern did, where MQ was a compilation of 12 items drawn from two scales (Jadad et al., 1996; Miller et al., 1995). Specifically, the impact of motivational training was more pronounced in higher quality studies than in lower quality studies. Johnson et al. also found that skills and motivational training had far better explanation of variation in effect sizes for higher than lower quality studies. No meta-analyses used the interactive strategy in relation to exercise and blood pressure but four used it in relation to exercise and mental health. Two of these used MQ scales as a blocking strategy. Ferrer, Huedo-Medina, Johnson, Ryan, and Pescatello (2011) meta-analysed 81 exercise trials that evaluated changes in quality of life among cancer survivors. In longer aerobic exercise trials (≥26 weeks of training), higher exercise intensity was associated with improvements in quality of life. This pattern remained significant when the meta-analysts restricted the sample only to those trials that scored at least six (out of 10 possible) on the PEDro scale (Verhagen et al., 1998). Blocking for MQ represents an incomplete form of the interactive strategy; the complete form would model the extent to which the relationship between intensity and quality of life depended on MQ. Nonetheless, this strategy showed that conclusions held up in higher quality studies. Kangas, Bovbjerg, and Montgomery (2008) followed a similar strategy examining cancer-related fatigue as a function of exercise and other interventions. The other two meta-analyses used MQ as a fully interactive factor in relation to exercise and mental health. Brown et al. (2012) meta-analysed 40 exercise trials focused on depression among cancer survivors. A dose-response function emerged only in higher quality trials, again as defined by PEDro (Verhagen et al., 1998): The amount of aerobic exercise, gauged in minutes per week, was linked to reductions in depression in higher (but not lower) quality studies. For the higher quality studies, the magnitude of difference estimated for trials in which cancer survivors performed at least three hours of exercise per week was large; a significant difference favouring exercise emerged for samples performing at least two hours per week (i.e. the confidence bands no longer include zero at that point), but at shorter durations, the average difference is not significant. For lower quality studies, the confidence intervals across the entire observed range never differed significantly from zero. In other words, in lower quality studies, there is no evidence that aerobic exercise has any benefit. In contrast, within higher quality studies, it is quite clear that exercise matters, and in particular, that more exercise produces more benefit. Brown and colleagues evaluated these effects using an Exercise Intensity × MQ interaction term in their models of effect size magnitude. This interactive modelling approach picks up variation in the magnitude of effects that the theoretical moderators alone – intensity of exercise or duration of exercise – could not detect. Indeed, in Brown et al.’s meta-analysis, the dose-response function was not present in the data unless MQ was also controlled. A similar dose-response pattern appeared for higher quality studies in the other meta-analysis (again assessed with PEDro), except

142

B.T. Johnson et al.

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

that the outcome was cancer-related fatigue and the theoretical moderator was intensity of resistance exercise (Brown et al., 2011). In sum, when meta-analyses reported Theoretical Moderator × MQ results, either (a) there was less noise and more signal (dose-response) patterns in the higher quality studies, or else (b) moderator findings generalised between lower and higher quality studies. Because all five meta-analyses used summary scores, it is unclear which specific MQ dimensions are most associated with increased or decreased precision in modelling effects. Interactively incorporating methodological quality in meta-analytic models Results from models that evaluate Theoretical Moderator × MQ interactions can take many forms, but we illustrate four possibilities that may, in application, be quite important. Although these examples treat MQ as though it is a scaled (and scalable), continuous factor, it should not be taken as an endorsement of this omnibus strategy. Indeed, MQ may be modelled in terms of a specific MQ item of interest (e.g. RCT) or as clusters of related MQ items, similar to the Downs and Black Checklist’s (1998) five subscales (reporting, external validity, internal validity, internal validity – confounding and power). As Figure 1, model A (1st row) illustrates, a moderator may show the same relation to the effect sizes for both lower and higher quality studies. In this case, there would be no significant Theoretical Moderator × MQ interaction, only a significant theoretical moderator main effect. Yet, note that, if there is less noise for higher quality studies, then it should be possible to detect a significant effect size at a lower level – or dosage – of the theoretical moderator. Because there is no MQ interaction, modelling the doseresponse pattern in relation to the theoretical moderator can take advantage of the additional statistical power accruing from including the lower quality studies. Assuming there is no main effect of the MQ factor (such as in the present example), MQ may be omitted from the analysis and interpretation proceeds focussing only on the theoretical factor. The implication is that the treatment effect is bigger or smaller in relation to the theoretical moderator’s level and that these effects generalise across levels of MQ. Naturally, as in any correlational research, we would be wary for confounders that may explain away any moderator’s effects. Model B in Figure 1 (2nd row) shows that the theoretical moderator relates markedly to the effect sizes under higher but not lower quality conditions, which present as a statistically significant Theoretical Moderator × MQ interaction term. (There is also a main effect for MQ, such that higher quality studies are more likely to see a treatment effect, but it is qualified by the presence of the higher order interaction.) The fact that the relation is present only for higher quality studies increases confidence that the theoretical moderator actually is related to the success of the treatment. Perhaps other factors, such as particular methodological deficits, helps explain variability of effect sizes for lower quality studies. Model C in Figure 1 (3rd row) represents what may be critics’ greatest fear: The theoretical moderator relates to effect sizes only for lower quality studies and there is no relation visible for higher quality studies. (Again, there is a main effect for MQ, this time showing that lower quality studies are more likely to see a treatment effect, but again, it is qualified by the presence of the higher order interaction.) In such a circumstance, we would conclude that the weight of evidence does not support the

Psychology & Health

143

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

A

B

C

D

Figure 1. Potential joint patterns of methodological quality and theoretical moderators across four models, depicting how intervention success may hinge on one, both, or neither factor. (Each moderator is modelled as a continuous variable without artificially dichotomising distributions – see Supplementary material) The solid line is the meta-regression line (estimate of the treatment effect), the dashed lines are confidence bands for the meta-regression line, and the dotted line indicates the value of a null effect (no difference between treatment and control conditions). Implications are spelled out in the comments in the right column and in the text.

veracity of the theoretical moderator, even though there are suggestions it works as predicted in lower quality studies. Note that in this example, the treatment failed on average for higher quality studies, relative to the comparison, whereas on average for lower quality studies the treatment succeeded. This example thus presents a further complication: Concluding that neither the treatment itself nor a theoretical moderator has any

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

144

B.T. Johnson et al.

bearing on results under the conditions instantiated by higher quality trials. Nonetheless, exploring other factors might help explain when the treatment succeeds or fails. Model D in Figure 1 (4th row) illustrates another pattern with no significant Theoretical Moderator × MQ interaction term and no main effect for either term. Yet, notice that the treatment effect attains significance only for higher quality studies. On average, the treatment effect is significant, and the highest quality evidence is more likely to exhibit a statistically significant trend, whereas there is more noise for lower quality studies. In this case, averaging across all research seems likely to show a significant treatment effect, though analysts might investigate other potential sources of heterogeneity – such as alternative definitions of MQ. These examples help to show why, after significance testing is completed, evaluating the magnitude of regression coefficients is far less interesting than in showing how mean effect sizes vary in response to levels of theoretical moderators and their interactions. Under optimal conditions, these strategies can greatly improve the interpretability of meta-analytic models. To better evaluate the role that MQ plays in study results, we advocate a meta-analytic modelling strategy whereby MQ appears as an interactive factor with theoretical moderators in standard meta-regression procedures. Because treatises on meta-analyses offer little on the subject of decomposing interactions between moderators (e.g. Borenstein et al., 2011; Lipsey & Wilson, 2001; Thompson & Higgins, 2002), including simple-slopes analyses, we provide a technical summary to illustrate how one may do so using aggregate meta-regression with either continuous or categorical moderators (see Supplementary material). Strengths and weaknesses of pursuing an interactive strategy The primary strength of using this interactive strategy is that it recognises and allows studies that do not follow so-called ‘gold-standard’ methods to offer valuable information about health or other outcomes. Siding with those who wish to treat it as an empirical question (e.g. Lipsey & Wilson, 2001; Valentine, 2009), we believe it is better to determine whether these studies presumed to have lower quality actually do provide different results than to exclude them with pure prejudice. Although RCTs do routinely offer the advantage of inferring a causal advantage for the treatment, intervention dosages delivered in the control arm can complicate interpretations (e.g. Bishop, Fenge-Davies, Kirby, & Geraghty, 2015), interventions may have differential attrition from control groups (e.g. Crutzen, Viechtbauer, Spigt, & Kotz, 2015) and other sorts of methodological designs may be just as useful (Shadish et al., 2002), especially when they evaluate complete interventions (see Tarquinio, Kivits, Minary, Coste, & Alla, 2015). Repeated-measure comparisons, for example, show how much change is possible following a treatment, something that is particularly important for clinical implementation. The interactive framework can evaluate whether results generalise across levels of MQ, an approach that affords a number of other advantages, as we detail in the remainder of this section. We also detail some of the more important limitations of the strategy and additional considerations when examining MQ, many of which mirror more general concerns in regular meta-analytic modelling (e.g. Borenstein et al., 2011; Harbord & Higgins, 2008; Lipsey & Wilson, 2001; Thompson & Higgins, 2002). Thus, we are not introducing these elements so much as explaining their relevance to pursuing interactions involving MQ and other moderators.

Psychology & Health

145

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

The interactive strategy profits from large literatures An interactive strategy incorporating MQ depends on a sample of studies that is large enough to power the necessary analyses. In conventional, aggregate-level meta-analysis, more sampled studies is better because statistical power increases as more studies enter a meta-analytic database (Borenstein et al., 2011; Hedges & Pigott, 2004). Interactions typically have lower statistical power than main effects (Cohen, 1988), which only exacerbates the problem. Any particular rule is of course arbitrary; but as a preliminary guideline for aggregate-level meta-analysis, we suggest at least 10 studies for each moderator in the model. As noted, the meta-analyses in our survey that used an interactive strategy usually had many more studies than this criterion. In addition, the use of more restrictive Type I error criteria (e.g. Bonferroni correction) is advised for such analyses (e.g. Bland & Altman, 1995). Overall, assuming that at least 30 studies in the sample are necessary, 68% of the meta-analyses in the present survey were too small for the interactive strategy to be applied. Table 1 shows that only the HIV prevention meta-analyses routinely are large enough to use the strategy and that a minority of meta-analyses in the exercise domains are large enough. Curiously, it was only in the HIV prevention domain that samples of included studies have grown larger over time, perhaps because researchers who launch new meta-analyses focused on exercise deliberately narrow their selection criteria to focus on particular types of studies (e.g. a specific type of exercise as well as a focused outcome) and populations (e.g. people with hypertension, cancer survivors). Furthermore, many best-evidence meta-analyses and those following Cochrane conventions have very small samples because they use methodological deficits as exclusion criteria. In effect, such reviews restrict the range of MQ and make it more difficult to evaluate the role of MQ in study results. The interactive strategy implies that using broader selection criteria is generally a better method because it helps to maximise the sample size and provides diversity in MQ (followed by application of the interactive strategy). Unless the larger sample is taken, the review is more limited in making conclusions about the role of MQ in results. Importantly, therefore, embracing an interactive strategy enables a review to make more definitive conclusions about both theoretical and methodological aspects of the research, consonant with a perspective that meta-analysis is effect-size surface estimation (Rubin, 1992). Use of the interactive strategy may well help to optimise future research, improving the quality of the science and accelerating the advancement of knowledge, although it is clear that investigators face many practical challenges in improving MQ (see de Bruin, McCambridge, & Prins, 2015; Evers, Hiligsmann, & Adarkwah, 2015; McCambridge, 2015). The interactive strategy depends on heterogeneous effects Literatures marked by homogeneity offer little chance of finding moderators, as Hedges and Olkin (1985) implied and others elaborated (e.g. Higgins & Thompson, 2004; Johnson & Turco, 1992). If the population variance (known as tau2) is zero, no bivariate moderator could achieve statistical significance, let alone the Theoretical Moderator × MQ interaction that we suggest evaluating for sensitivity analyses. With the growing numbers of trials available for practically all health phenomena, the interactive strategy deserves serious consideration in future meta-analyses.

146

B.T. Johnson et al.

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

The meaning of MQ and theoretical moderators As implied above, it is entirely possible that specific MQ dimensions (or scales) and theoretical moderators are confounded (e.g. Lipsey & Wilson, 2001). For instance, RCTs of exercise might be more likely to use longer training programmes with greater supervision than uncontrolled trials, increasing the likelihood that the former will detect exercise effects on the key outcomes. The generally correlational nature of meta-analysis implies that multi-moderator analyses should become the norm so that moderators explaining unique variance in treatment effects can be isolated – and those that only appear to be related to treatment effects have less prominence in conclusions. In parallel, both MQ dimensions and theoretical moderator information can go unreported in particular trials, decreasing the sensitivity of analyses to show whether they matter (Abraham, Johnson, de Bruin, & Luszczynska, 2014; Ayling, Brierley, Johnson, Heller, & Eiser, 2015; Johnson, Michie, & Snyder, 2014). Scientific reporting standards continue to proliferate and to be adopted, suggesting that more recent studies are better reported, which includes many dimensions that appear in MQ scales. Thus, early research might often appear to be of lower quality but instead, simply be more poorly reported (Higgins & Green, 2008). The upshot is that analyses involving MQ might be more meaningful in more recent studies within a particular literature. Sufficient coverage in the moderators The interactive strategy also depends on having sufficient coverage in moderators, including both the theoretical and the MQ items. In other words, if there is restriction of range in either the observed theoretical or MQ variables, then modelling is similarly restricted. If the qualifying studies are all of low quality (Figure 1, Model C, left panel), then the interactive strategy reduces to a qualitative strategy such that methodological shortcomings represent a serious limitation for interpreting the phenomenon in question. Many Cochrane reviews offer just such conclusions. Adequate coverage across theoretical moderators implies that observations for the combinations of moderators are also well dispersed, which is a subtler but no less important point. A theoretical moderator might have better coverage in lower quality studies than in the higher quality counterparts, or vice versa. Regardless of study quality, at points where coverage is thin, estimates of effects will be much less precise. Evidence of poor coverage can also result from producing estimates at different combinations or levels of interest implied by Theoretical Moderator × MQ interactions; confidence intervals around an estimate will be wider where fewer studies exist. Potentially more work in data extraction Because the interactive strategy incorporates MQ, it necessitates including more evidence, therefore, future meta-analyses that follow this approach will need to perform more work during data extraction. We argue that the additional work is important to undertake because it will ultimately permit more accurate generalisations about the phenomenon. Naturally, in very large literatures, random sampling may reduce effort (Card, 2012), but there are suggestions that larger samples of studies are not necessarily more time-consuming to code – for instance, because coders become more efficient with experience (Allen & Olkin, 1999). Unless all literature is included, or at least a

Psychology & Health

147

representative sample, the review cannot make conclusions about the role of MQ or its interactions with substantive moderators. Therefore, when a large number of studies are omitted for methodological reasons, these best-evidence syntheses summarise only the veritable tip of the iceberg.

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

The role of purely narrative systematic reviewing We do not discount the importance of qualitative aspects of research. Nonetheless, the nuances that we have described in the interactive strategy are unlikely to prove very useful in a systematic review that lacks meta-analysis. Similarly, in order to detect interactions with quality in a narrative review, the interaction effects would have to be larger than what may be typically encountered. In essence, and by Cohen’s (1992) definition, no moderator relation that is less than a medium effect size (e.g. a standardised regression weight of .30) should be clearly visible to an observer without the use of statistics. An interaction effect involving quality (or otherwise) would have to be even larger in order for systematic reviews to discern it unaided by statistics. Thus, analysts who wish to follow this strategy are most likely to succeed when using meta-analytic modelling, and, in turn, they are more likely to succeed when they define the literature domain more broadly. Individual vs. sample-level aggregation Our surveyed examples and other illustrations have used aggregate-level meta-analysis, but there is no reason that individual-level, integrative meta-analyses cannot follow the same strategy. Using this approach, single-case studies (Hedges, Pustejovsky, & Shadish, 2012) could be incorporated, as well as studies with multiple observations. Although the notion of including smaller studies has faced stark criticism (Coyne, Thombs, & Hagedoorn, 2010), the interactive strategy provides a means to determine whether, in fact, smaller studies’ results differ from those that are larger, when controlling other MQ dimensions. The role of unpublished literature Some researchers assume that grey literature is of lower quality; yet, unless MQ is coded and examined across the entire literature, this claim remains a presupposition. Many apparent patterns of publication bias may in fact be better attributed to combinations of MQ interacting with theory-driven moderators. Nearly all publication bias tests assume a single-population effect size (Johnson, Low, & LaCroix, 2012), and it is known that heterogeneous distributions that appear to exhibit publication bias can disguise sub-populations of effect size (Johnson et al., 2012; Lau, Ioannidis, Terrin, Schmid, & Olkin, 2006). Indeed, it appears safe to assume that even literatures with marked publication bias may still yield significant, interesting patterns involving theoretical or methodological factors. Capable statistical software (and the ability to use it) The interactive strategy requires software that can incorporate multiple simultaneous predictors (see Supplementary material) and some popular software lacks this ability, such as the Cochrane Collaboration’s (2008) Review Manager. Another only recently added it

148

B.T. Johnson et al.

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

(Borenstein, 2013). Moreover, the interactive strategy requires deftness in manipulating variables in meta-regression programmes and/or using matrix algebra in order to generate estimates at interesting levels of the moderators (see Supplementary material). Theory to make predictions about moderators Tacit to our approach so far is the presumption that one needs theories about what factors relate to a particular phenomenon, the factors that make health promotion efforts succeed or fail (Johnson, Dove, & Boynton, 2011; Michie, Johnson, & Johnston, in press; Rothman, Hertel, Baldwin, & Bartels, 2008). In the examples from our surveys, health behaviour theory largely informed decisions about what moderators to use. In the HIV prevention example, it was the theory that dosage of skills training and motivational instruction would help reduce risk (Fisher & Fisher, 1992; Johnson et al., 2010). In the exercise and depression example, it was the theory that level of exertion (i.e. intensity of exercise) should matter for mental health (Luepker et al., 1996). Theories, of course, are often much more complex than these examples, involving clusters of factors such as behaviour change techniques (Michie et al., 2013) and/or patient features (Pescatello, Arena, Riebe, Thompson, & Williams, 2014). As the complexity of the theory mounts, the size of literature necessary to model the essential variables increases proportionally, which again makes modelling trends using individual-participant-level data an attractive approach. Conclusion: Divining the truth about the factors underlying health phenomena – including methodological quality In conclusion, this article advances the view that MQ should be examined more commonly in a quantitative approach, but more importantly be tested in interaction with other moderators of interest. Our surveys in three health-based literatures suggest that although meta-analyses using the interactive strategy are presently rare, they nonetheless illustrate how valuable the strategy can be in advancing science both substantively and methodologically. Although the strategy has certain limitations, the sheer accumulation of scientific evidence helps to mitigate them. An interactive approach using MQ can show whether trends replicate across studies that exhibit great diversity, thereby allowing studies’ results a better chance to accrue in support of the greater good, rather than merely discarding them on the threat of presumed bias. We assert that all of such results are valuable contributions to the development of knowledge. The interactive strategy may produce some of the first evidence-based support to such claims, and provide better guidance to researchers about what factors constitute ‘higher quality.’ Until that time, the interactive strategy can show whether trends support a theoretical expectation only in higher quality studies, or whether only lower quality studies display the trend (Figure 1, 3rd row). Importantly, fully embracing the interactive strategy for exploring MQ strongly implies that methodological restrictions in selection criteria should become more lenient. Having more lenient selection criteria in meta-analyses need not imply neglecting qualitative aspects of the literature. In fact, the interactive strategy prescribes more careful attention to MQ and its role in results. As an ethos, we advance that science offers no absolute gauges of the truth, merely imperfect tools that help one divine it. There is no perfect measure, technique, or study.

Psychology & Health

149

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

Scholars focused on these issues routinely reach conclusions in the vein of ‘there is no gold standard for MQ.’ Nevertheless, because analysing MQ requires operationalising it in some fashion, we propose that the best way to determine a gold standard is by increasingly studying MQ in its varying instantiations, much as miners pan for gold (signal) from among other stones (noise). If the interactive strategy is more broadly applied in literatures of studies that afford its use, then greater knowledge should result not only about interesting conceptual phenomena but also about the behaviour of methodological factors. These developments can foster a sophisticated theory of MQ and its role in research results. As future investigations reveal parallel trends about operationalisations of MQ, a broader gold standard may well emerge. Supplemental material All Supplemental Material is available alongside this article on http://dx.doi.org/10. 1080/10.1080/08870446.2014.953533. Acknowledgements This research was facilitated by United States Public Health Service grant R01-MH58563 to Blair T. Johnson. For serving as a coder of meta-analyses in our survey, we thank Lauren Lamberti; for determining the relevance to our survey of a review written in Greek, we thank Cleo Protogerou; for comments on previous drafts of this article, we thank Elizabeth Schifano and two anonymous reviewers; for comments on the technical summary and Stata syntax, we thank Natasza Marrouch; finally, for comments on a presentation of this paper, we thank Hart Blanton, David A. Kenny, and Colin W. Leach, and Adam Sheya.

References Abraham, C., Johnson, B. T., de Bruin, M., & Luszczynska, A. (2014). Enhancing reporting of behavior change intervention evaluations. Journal of Acquired Immune Deficiency Syndromes, 66, S293–S299. Ahn, S., & Becker, B. J. (2011). Incorporating quality scores in meta-analysis. Journal of Educational and Behavioral Statistics, 36, 555–585. Allen, I. E., & Olkin, I. (1999). Estimating time to conduct a meta-analysis from number of citations retrieved. The Journal of the American Medical Association, 282, 634–635. Ayling, K., Brierley, S., Johnson, B., Heller, S., & Eiser, C. (2015). How standard is standard care? Exploring control group outcomes in behaviour change interventions for young people with type 1 diabetes. Psychology & Health, 30(1), 85–103. Bishop, F. L., Fenge-Davies, A. L., Kirby, S., & Geraghty, A. W. A. (2015). Context effects and behaviour change techniques in randomized trials: A systematic review using the example of trials to increase adherence to physical activity in musculoskeletal pain. Psychology & Health, 30(1), 104–121. Bland, J. M., & Altman, D. G. (1995). Multiple significance tests: The Bonferroni method. British Medical Journal, 310, 170. Borenstein, M. (2013). Comprehensive meta-analysis version 3. Englewood Cliffs, NJ: Biostat. Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2011). Introduction to metaanalysis. New York, NY: Wiley. Brown, J. C., Huedo-Medina, T. B., Pescatello, L. S., Pescatello, S. M., Ferrer, R. A., & Johnson, B. T. (2011). Efficacy of exercise interventions in modulating cancer-related fatigue among

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

150

B.T. Johnson et al.

adult cancer survivors: A meta-analysis. Cancer Epidemiology, Biomarkers and Prevention, 20, 123–133. Brown, J. C., Huedo-Medina, T. B., Pescatello, L. S., Ryan, S. M., Pescatello, S. M., Moker, E., … Johnson, B. T. (2012). The efficacy of exercise in reducing depressive symptoms among cancer survivors: A meta-analysis. PLoS ONE, 7, e30955. Card, N. A. (2012). Applied meta-analysis for social science research. New York, NY: Guilford. Cochrane Collaboration. (2008). Review manager (revman), version 5.0. Copenhagen: Author. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. Cook, T. D., & Campbell, D. T. (1976). The design and conduct of quasi-experiments and true experiments in field settings. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 223–326). Chicago, IL: Rand McNally. Conn, V. S., Hafdahl, A. R., Moore, S. M., Nielsen, P. J., & Brown, L. M. (2009). Meta-analysis of interventions to increase physical activity among cardiac subjects. International Journal of Cardiology, 133, 307–320. Conn, V. S., & Rantz, M. J. (2003). Research methods: Managing primary study quality in metaanalyses. Research in Nursing & Health, 26, 322–333. Coyne, J. C., Thombs, B. D., & Hagedoorn, M. (2010). Ain’t necessarily so: Review and critique of recent meta-analyses of behavioral medicine interventions in health psychology. Health Psychology, 29, 107–116. Crutzen, R., Viechtbauer, W., Spigt, M., & Kotz, D. (2015). Differential attrition in health behaviour change trials: A systematic review and meta-analysis. Psychology & Health, 30(1), 122–134. Deeks, J. J., Dinnes, J., D’amico, R., Sowden, A., Sakarovitch, C., Song, F., … Altman, D. (2003). Evaluating non-randomised intervention studies. Health Technology Assessment, 7 (27), 1–179. Detsky, A. S., Naylor, C. D., O’Rourke, K., McGeer, A. J., & L’Abbé, K. A. (1992). Incorporating variations in the quality of individual randomized trials into meta-analysis. Journal of Clinical Epidemiology, 45, 255–265. de Bruin, M., McCambridge, J., & Prins, J. M. (2015). Reducing the risk of bias in health behaviour change trials: Improving trial design, reporting, or bias assessment criteria? A review and case study. Psychology & Health, 30(1), 8–34. Downs, S. H., & Black, N. (1998). The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. Journal of Epidemiology and Community Health, 52, 377–384. Evers, S. M. A. A., Hiligsmann, M., & Adarkwah, C. C. (2015). Risk of bias in trial-based economic evaluations: Identification of sources and bias-reducing strategies. Psychology & Health, 30(1), 52–71. Ferrer, R. A., Huedo-Medina, T. B., Johnson, B. T., Ryan, S. M., & Pescatello, L. S. (2011). Exercise interventions for cancer survivors: A meta-analysis of quality of life outcomes. Annals of Behavioral Medicine, 41, 32–47. Fisher, J. D., & Fisher, W. A. (1992). Changing AIDS-risk behavior. Psychological Bulletin, 111, 455–474. Harbord, R. M., & Higgins, J. P. T. (2008). Meta-regression. Stata Journal, 8, 493–519. Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press. Hedges, L. V., & Pigott, T. D. (2004). The power of statistical tests for moderators in meta-analysis. Psychological Methods, 9, 426. Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2012). A standardized mean difference effect size for single case designs. Research Synthesis Methods, 3, 224–239.

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

Psychology & Health

151

Heiwe, S., & Jacobson, S. H. (2011). Exercise training for adults with chronic kidney disease. Cochrane Database of Systematic Reviews (10), CD003236. Higgins, J. P. T., & Green, S. (Eds.). (2008). Cochrane handbook for systematic reviews of interventions (Vol. 5). West Sussex: Wiley Online Library. Higgins, J. P. T., & Thompson, S. G. (2004). Controlling the risk of spurious findings from metaregression. Statistics in Medicine, 23, 1663–1682. Jadad, A. R., Moore, R. A., Carroll, D., Jenkinson, C., Reynolds, D. J. M., Gavaghan, D. J., & McQuay, H. J. (1996). Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Controlled Clinical Trials, 17, 1–12. Janssen, V., Gucht, V. D., Dusseldorp, E., & Maes, S. (2013). Lifestyle modification programmes for patients with coronary heart disease: A systematic review and meta-analysis of randomized controlled trials. European Journal of Preventive Cardiology, 20, 620–640. Johnson, B. T., Dove, N. L., & Boynton, M. H. (2011). On being basic and applied at the same time: Intersections between social and health psychology. In M. Mark, S. Donaldson, & B. A. Campbell (Eds.), The intersection of social psychology and program evaluation (pp. 321–341). New York, NY: Guilford. Johnson, B. T., Low, R. E., & LaCroix, J. M. (2012). Systematic reviews to support evidencebased medicine (2nd edition) by Khalid Khan, Regina Kunz, Jos Kleijnen and Gerd Antes: A review. Research Synthesis Methods, 4, 102–108. Johnson, B. T., MacDonald, H. V., Bruneau, M. L., Jr, Goldsby, T. U., Brown, J. C., Huedo-Medina, T. B., & Pescatello, L. S. (2014). Methodological quality of meta-analyses on the blood pressure response to exercise: A review. Journal of Hypertension, 32, 706–723. Johnson, B. T., Michie, S., & Snyder, L. B. (2014). Effects of behavioral intervention content on HIV prevention outcomes: A meta-review of meta-analyses. Journal of Acquired Immune Deficiency Syndromes, 66, S259–S270. Johnson, B. T., Redding, C. A., DiClemente, R. J., Mustanski, B. S., Dodge, B., Sheeran, P., … Fishbein, M. (2010). A network-individual-resource model for HIV prevention. AIDS and Behavior, 14, 204–221. Johnson, B. T., Scott-Sheldon, L. A. J., Huedo-Medina, T. B., & Carey, M. P. (2011). Interventions to reduce sexual risk for human immunodeficiency virus in adolescents: A meta-analysis of trials, 1985–2008. Archives of Pediatric and Adolescent Medicine, 165, 77–84. Johnson, B. T., & Turco, R. M. (1992). The value of goodness-of-fit indices in meta-analysis: A comment on Hall and Rosenthal. Communications Monographs, 59, 388–396. Kangas, M., Bovbjerg, D. H., & Montgomery, G. H. (2008). Cancer-related fatigue: A systematic and meta-analytic review of non-pharmacological therapies for cancer patients. Psychological Bulletin, 134, 700–741. Khan, K., Kunz, R., Kleijnen, J., & Antes, G. (2011). Systematic reviews to support evidencebased medicine (2nd ed.). London: CRC Press. Lau, J., Ioannidis, J. P. A., Terrin, N., Schmid, C. H., & Olkin, I. (2006). The case of the misleading funnel plot. British Medial Journal, 333, 597–600. Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis (Vol. 49). Thousand Oaks, CA: Sage. Luepker, R., Johnson, S., Breslow, L., Chobanian, A., Davis, C., Duling, B., … McBride, P. (1996). Physical activity and cardiovascular health. Journal of the American Medical Association, 276, 241–246. McCambridge, J. (2015). From question-behaviour effects in trials to the social psychology of research participation. Psychology & Health, 30(1), 72–84. Michie, S., Johnson, B. T., & Johnston, L. (in press). Advancing cumulative evidence on behaviour change techniques and interventions: A comment on Peters, de Bruin, and Crutzen. Health Psychology Review.

Downloaded by [Nanyang Technological University] at 04:00 16 November 2015

152

B.T. Johnson et al.

Michie, S., Richardson, M., Johnston, M., Abraham, C., Francis, J., Hardeman, W., … Wood, C. E. (2013). The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: Building an international consensus for the reporting of behavior change interventions. Annals of Behavioral Medicine, 46, 81–95. Miller, W. R., Brown, J. M., Simpson, T. L., Handmaker, N. S., Bien, T. H., Luckie, L. F., … Tonigan, J. S. (1995). A methodological analysis of the alcohol treatment outcome literature. In R. K. Hester & W. R. Miller (Eds.), Handbook of alcoholism treatment approaches: Effective alternatives (2nd ed., pp. 12–24). Boston, MA: Allyn & Bacon. O’Connor, D., Green, S., & Higgins, J. P. T. (2008). Defining the review question and developing criteria for including studies. In J. P. T. Higgins & S. Green (Eds.), Cochrane handbook for systematic reviews of interventions: Cochrane book series (pp. 83–94). Chichester: WileyBlackwell. Pescatello, L. S., Arena, R., Riebe, D., Thompson, P. D. & Williams, L. (2014). ACSM’s guidelines for exercise testing and prescription. Baltimore, MD: Wolters Kluwer Health/Lippincott Williams & Wilkins. Rothman, A. J., Hertel, A. W., Baldwin, A. S., & Bartels, R. (2008). Integrating theory and practice: Understanding the determinants of health behavior change. In J. Shah & W. Gardner (Eds.), Handbook of motivation science (pp. 494–507). New York, NY: Guilford. Rubin, D. B. (1992). Meta-analysis: Literature synthesis or effect-size surface estimation? Journal of Educational and Behavioral Statistics, 17, 363–374. Schunemann, H. J., Oxman, A. D., Vist, G. E., Higgins, J. P. T., Deeks, J. J., & Glasziou, P. (2011). Interpreting results and drawing conclusions. In Cochrane Handbook for systematic reviews of interventions (pp. 359–388). West Sussex: Wiley-Blackwell. Shadish, W. R. (1989). The perception and evaluation of quality in science. In B. Gholson, W. R. Shadish, R. A. Neimeyer, & A. C. Houts (Eds.), Psychology of science: Contributions to metascience (pp. 383–426). New York, NY: Cambridge University Press. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton, Mifflin and Company. Shea, B. J., Grimshaw, J. M., Wells, G. A., Boers, M., Andersson, N., Hamel, C., … Bouter, L. M. (2007). Development of AMSTAR: A measurement tool to assess the methodological quality of systematic reviews. BMC Medical Research Methodology, 7, 10. Slavin, R. E. (1986). Best-evidence synthesis: An alternative to meta-analytic and traditional reviews. Educational Researcher, 15, 5–11. Tarquinio, C., Kivits, J., Minary, L., Coste, J., & Alla, F. (2015). Evaluating complex interventions: Perspectives and issues for health behaviour change interventions. Psychology & Health, 30(1), 35–51. Thompson, S. G., & Higgins, J. (2002). How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine, 21, 1559–1573. Valentine, J. C. (2009). Judging the quality of primary research. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), Handbook of research synthesis and meta-analysis (2nd ed., pp. 129–146). New York, NY: Russell Sage Foundation. Valentine, J. C., & Cooper, H. (2008). A systematic and transparent approach for assessing the methodological quality of intervention effectiveness research: The Study Design and Implementation Assessment Device (Study DIAD). Psychological Methods, 13, 130–149. Verhagen, A. P., de Vet, H. C., de Bie, R. A., Kessels, A. G., Boers, M., Bouter, L. M., & Knipschild, P. G. (1998). The Delphi list: A criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. Journal of Clinical Epidemiology, 51, 1235–1241. Wells, G. A., Shea, B., O’Connell, D., Peterson, J., Welch, V., Losos, M., & Tugwell, P. (2000). The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Retrieved from http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp

Panning for the gold in health research: incorporating studies' methodological quality in meta-analysis.

Systematic reviews now routinely assess methodological quality to gauge the validity of the included studies and of the synthesis as a whole. Although...
1MB Sizes 0 Downloads 4 Views