medical education in review How much evidence does it take? A cumulative meta-analysis of outcomes of simulation-based education David A Cook1,2,3

CONTEXT Studies that investigate research questions that have already been resolved represent a waste of resources. However, the failure to collect sufficient evidence to resolve a given question results in ambiguity.

OBJECTIVES The present study was conducted to reanalyse the results of a meta-analysis of simulationbased education (SBE) to determine: (i) whether researchers continue to replicate research studies after the answer to a research question has become known, and (ii) whether researchers perform enough replications to definitively answer important questions.

METHODS A systematic search of multiple databases to May 2011 was conducted to identify original research evaluating SBE for health professionals in comparison with no intervention or any active intervention, using skill outcomes. Data were extracted by reviewers working in duplicate. Data synthesis involved a cumulative meta-analysis to illuminate patterns of evidence by sequentially adding studies according to a variable of interest (e.g. publication year) and re-calculating the pooled effect size with each addition. Cumulative metaanalysis by publication year was applied to 592 comparative studies using several thresholds of ‘sufficiency’, including: statistical significance;

stable effect size classification and magnitude (Hedges’ g  0.1), and precise estimates (confidence intervals of less than  0.2).

RESULTS Among studies that compared the outcomes of SBE with those of no intervention, evidence supporting a favourable effect of SBE on skills existed as early as 1973 (one publication) and further evidence confirmed a quantitatively large effect of SBE by 1997 (28 studies). Since then, a further 404 studies were published. Among studies comparing SBE with non-simulation instruction, the effect initially favoured non-simulation training, but the addition of a third study in 1997 brought the pooled effect to slightly favour simulation, and by 2004 (14 studies) this effect was statistically significant (p < 0.05) and the magnitude had stabilised (small effect). A further 37 studies were published after 2004. By contrast, evidence from studies evaluating repetition continued to show borderline statistical significance and wide confidence intervals in 2011.

CONCLUSIONS Some replication is necessary to obtain stable estimates of effect and to explore different contexts, but the number of studies of SBE often exceeds the minimum number of replications required.

Medical Education 2014; 48: 750–760 doi: 10.1111/medu.12473 Discuss ideas arising from the article at www.mededuc.com ‘discuss’

1

Division of General Internal Medicine, Mayo Clinic College of Medicine, Rochester, Minnesota, USA 2 Center for Online Learning, Mayo Clinic College of Medicine, Rochester, Minnesota, USA 3 Mayo Multidisciplinary Simulation Center, Mayo Clinic College of Medicine, Rochester, Minnesota, USA

750

Correspondence: David A Cook, MD, MHPE, Division of General Internal Medicine, Mayo Clinic College of Medicine, Mayo 17, 200 First Street SW, Rochester, Minnesota 55905, USA. Tel: 00 1 507 266 4156; E-mail: [email protected]

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

Meta-analysis of simulation-based education

INTRODUCTION

Education, like clinical medicine, is an art informed by science. In recent years, growing numbers of scholars have sought to generate and apply increased evidence to the art of health professions education.1–3 Recent discussions have emphasised the complex challenges educators face in establishing what works, for whom and in what context,4–6 the need for greater methodological and conceptual rigor,7–9 the value of evidence derived from varied complementary sources,10,11 and the imperative to apply what we know in practice (i.e. to translate knowledge into practice).12–15 In all of these arguments is a call, express or implied, for more evidence. Of course, the need for evidence is not infinite and it often appears that investigators conduct a study without considering prior work16 or how much the new study will contribute to the field. This raises the question of how much evidence is sufficient. This is an important issue because conducting a study investigating a research question to which the answer is already known has at least four important negative consequences: (i) it represents an inefficient use of researcher resources that could be better spent elsewhere; (ii) it introduces ethical uncertainty (because the field is no longer in equipoise); (iii) it represents an inefficient use of the publication system (in terms of editors’ and reviewers’ time, and journal space), and (iv) it delays the implementation in educational practice of interventions that are known to be effective (or the avoidance of those known to be ineffective). The converse is also true: failure to collect sufficient evidence to resolve a given question is also problematic. A single study is rarely sufficient to definitively answer a question. Failure to replicate the findings of an earlier study is common in both clinical medicine17 and education.18 Thus, several replications are typically desirable, both to determine how results generalise across different contexts (learners, topics and educational settings), and to achieve a sample sufficient to allow for statements about statistical significance and a reasonably precise estimate of the magnitude of effect. Previous work in medical education has explored the quality of reporting7,16 and quality of methods.7,19–21 In an earlier essay, I also suggested that studies that compare the outcomes of educational interventions with those of no intervention do less to advance the science of education than do theory-

guided comparisons of two active interventions (comparative effectiveness research).22 However, I am not aware of work in health professions education that attempts to empirically determine the sufficiency of evidence. Such an exploration would be useful to the research community because it would underscore the need to prioritise questions and to comprehensively seek existing evidence. One useful technique with which to explore issues related to the sufficiency of evidence is cumulative meta-analysis.23,24 This technique uses a series of meta-analyses (rather than a single analysis) to show how patterns emerge in a body of evidence as a given variable (e.g. date of publication or study size) changes. For example, to see how evidence evolves over time, a cumulative meta-analysis might begin with the first known study on a topic and then sequentially add each subsequent study, one at a time, re-calculating the pooled effect size with each addition. Lau et al. pioneered the use of cumulative meta-analysis in a study analysing the outcomes of treatments for cardiovascular disease.23 They showed that evidence supporting the effectiveness of streptokinase was available much earlier (i.e. with fewer studies) than had been recognised and, likewise, that the deleterious effects of several cardiovascular drugs could have been identified sooner had all available evidence been pooled.23 Lau et al.23 also showed that some interventions (with smaller magnitude of benefit, such as the use of beta-blockers) required further investigation before their effects could be established with confidence. Although this publication23 did little to immediately affect clinical practice (as the answers to the clinical questions were already known), it highlighted an important message to the research community. In a similar vein, in the present study I use cumulative meta-analysis in the focused area of simulationbased education (SBE) to address two questions related to the sufficiency of evidence: 1

2

Are researchers wasting resources on unnecessary replications after the answer to a particular question has been established? Are researchers performing enough replications to definitively answer important questions?

Cumulative meta-analysis can answer these questions by determining whether the evidence is sufficient that further study will not substantially alter conclusions (i.e. there will be no further change in educational practice) and, if so, the time-point at which

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

751

D A Cook the threshold of sufficiency was first reached (such that any subsequent studies were unnecessary).

process skill outcomes were included in the present dataset. The last date of search was 11 May 2011. Cumulative meta-analysis

METHODS

This study is a reanalysis of previously published data from a comprehensive review of technologyenhanced simulation-based medical education (TESME).25–27 The project overall and each original report adhered to PRISMA (preferred reporting items for systematic reviews and meta-analyses) standards.28 Figure 1 contains definitions of several key statistical terms. Original data collection The data for the present study are derived from a dataset of Hedges’ g standardised mean difference (SMD) effect sizes for 985 original research studies of TES-ME. Details of procedures for identifying studies and extracting SMDs, including trial flow diagrams and full search strategies, have been published previously.25–27 Very briefly, with the assistance of a research librarian, we searched multiple literature databases for potentially relevant articles. Two independent reviewers then used predefined inclusion criteria to select articles for full review. For each included study, a Hedges’ g SMD was calculated using standard methods. For studies making more than one comparison (e.g. a three-arm study), an SMD was calculated for each comparison. Only

For the present analyses, I conducted a series of random-effects meta-analyses for each of several different comparisons. For studies comparing TES-ME with no intervention, I analysed all studies together, and also looked at subgroups based on the clinical topic of minimally invasive surgery and the simulation modality of computerised virtual reality. For studies comparing TES-ME with non-simulation instruction, I analysed all studies together and also looked at subgroups based on the clinical topic of resuscitation and the non-simulation comparator of the lecture. For studies comparing TES-ME with alternative simulation approaches, I looked at studies comparing different levels of feedback, cognitive interactivity, range of difficulty and repetition (full definitions of these terms have been published previously26). For each comparison defined above, I conducted a cumulative meta-analysis similar to that described by Lau et al.23 Thus, I performed successive random-effects meta-analyses for each year from 1966 to 2011, updating the results from the previous year to include all studies published in the current year (i.e. the meta-analysis for 2002 included all studies published in years up to and including 2002).

Standardised mean difference (SMD) – a type of effect size; a statistical technique that converts the difference between two means to a common scale so that results from one study can be compared or pooled with results from another. The resulting unit-less value has a standard deviation of 1.0. There are several SMD metrics that vary slightly depending on the standard deviation used in the conversion; Cohen's d and Hedges' g are two of the most common. Cumulative meta-analysis – a research approach using a series of meta-analyses to show how patterns emerge in a body of evidence as a given variable (e.g., date of publication or study size) changes. Studies are sorted according to the variable of interest and each study is entered into the analysed dataset in sequence, with the pooled effect size being recalculated with each new entry. Pooled effect size – a weighted average of results (effect sizes) of several studies. Confidence interval (CI) – a range of values that encompasses the true value of the population at a given level of probability. For example, if a study were repeated 100 times, the resulting 100 95% CI's would contain the true population value 95 times. Statistical significance – the probability that an observed effect is not likely due to chance alone. Clinical/educational significance – the meaningfulness of an effect in clinical or educational practice. For example, a trivial effect could be statistically significant but have limited or no practical implications. Figure 1 Definitions of key statistical terms.

752

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

Meta-analysis of simulation-based education The fundamental questions in these analyses concern if and when evidence had been collected sufficient that further study would not alter conclusions. As there is no single threshold that defines ‘sufficient’ evidence, I defined a priori four different (and progressively more stringent) thresholds, each with a slightly different implication for interpretation: 1

2

3

4

a statistically significant effect: the pooled SMD was statistically significantly different from zero (no effect) with an a-value of 0.05 (i.e. p < 0.05) and a more stringent a-value of 0.01; a clinically or educationally significant effect, criterion A: stable classification of effect, whereby the pooled SMD enters an effect size range (using ranges defined by Cohen: 0– 0.19 = negligible; 0.20–0.49 = small; 0.50– 0.79 = moderate, and > 0.80 = large29), within which it remains thereafter; a clinically or educationally significant effect, criterion B: stable magnitude of effect, whereby the pooled SMD approaches the final pooled SMD (the best estimate of true effect) within  0.1 and thereafter remains within that range, and a precise estimate of effect: the 95% confidence interval (CI) around the pooled SMD no longer varies by more than  0.2, thus excluding anything greater than ‘negligible’ vacillations in the magnitude of effect.

RESULTS

A total of 592 studies reporting process skill outcomes are included in the present analyses. These include 432 studies comparing outcomes of SBE with those of no intervention, 51 studies comparing outcomes of SBE with those of non-simulation instruction, and 124 studies comparing outcomes of different simulation-based approaches (some studies reported more than one comparison). As reported previously, the pooled SMD was 1.10 for the 432 studies comparing outcomes of TES-ME with those of no intervention. Figure 2 presents the results of the cumulative meta-analysis and shows that the first study (published in 1973) found a statistically significant benefit to training (p = 0.05), and that the accumulation of additional evidence (additional studies) never changed this conclusion. However, the magnitude of effect varied somewhat, reaching a high of 1.23 and low of 0.66 (a moderate effect) before the classification stabilised at a large effect in 1997 (28 studies). The magnitude itself sta-

bilised within 0.1 of the final pooled SMD in 2000 (52 studies). Not until 2006 (199 studies) was a precise 95% CI around the pooled SMD achieved. These results are summarised in Table 1. Betweenstudy inconsistency, measured using I2, was high (> 50%) in all analyses (Fig. 1). These findings suggest that evidence to support a favourable effect of TES-ME on process skills existed as early as 1973 (with one publication) and that the effect of TESME could have been known to be quantitatively large as early as 1997. Table 1 also shows sub-analyses for studies using a no-intervention comparator on the clinical topic of minimally invasive surgery and the simulation modality of virtual reality. The overall picture is similar to the analysis of all studies using a no-intervention comparator, although the analyses refer to fewer studies and the timeline is accordingly somewhat delayed. However, all evidence sufficiency thresholds had been met by 2008 for minimally invasive surgery and by 2006 for virtual reality simulation. Figure 3 and Table 1 show the results of a cumulative meta-analysis of 51 studies comparing TES-ME with non-simulation instruction. In this analysis, the effect initially favoured non-simulation training, but the inclusion of the third study (1997) brought the pooled SMD to slightly favour simulation, and by 2004 (14 studies) this direction of effect was statistically significant (p < 0.05) and both the classification and absolute magnitude of the SMD had stabilised at a small effect. It is notable that, compared with studies using a no-intervention comparator, the studies using non-simulation comparators required more evidence to show statistical significance (which is likely to relate to the smaller effect size), but less evidence to show stability in other criteria. Table 1 shows sub-analyses of studies on the clinical topic of resuscitation and studies using the lecture as a comparator. Neither of these subgroups has yet reached the predefined threshold for precise level of effect, but all other thresholds were met by 2009. Figure 4(a) and Table 1 show results of a cumulative meta-analysis of studies of TES-ME in which the amount of feedback varied between intervention arms. The effect slightly favoured less feedback in the first study (1975). The second study was not published until 1994, but the inclusion of its evidence changed the direction of effect to favour the provision of more feedback, and by 2000 (12 studies) this effect had become statistically significant.

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

753

D A Cook

Table 1

Cumulative meta-analyses of studies investigating the skills outcomes of technology-enhanced simulation: a summary

Final analysis (May 2011)

Comparator No intervention

Trials, n

Pooled SMD

(participants, n)

(95% CI)

p-value

All

432 (20 934)

1.10 (1.03–1.16)

< 0.001

Task: minimally

114 (3313)

1.23 (1.10–1.37)

< 0.001

121 (3770)

1.11 (0.99–1.24)

< 0.001

Subgroup

invasive surgery Simulation: virtual reality All

51 (2341)

0.37 (0.23–0.51)

< 0.001

Task: resuscitation

15 (808)

0.35 (0.12–0.59)

0.003

Comparator: lecture

13 (546)

0.64 (0.25–1.03)

0.001

Simulation, feedback

All

81 (4551)

0.44 (0.30–0.58)

< 0.001

Simulation, interactivity

All

90 (4425)

0.65 (0.50–0.81)

< 0.001

Simulation, range of difficulty

All

20 (580)

0.70 (0.31–1.10)

< 0.001

Simulation, repetitions

All

7 (685)

0.71 (0.01–1.41)

0.048

Non-simulation instruction

SMD = standardised mean difference; 95% CI = 95% confidence interval; NA = not achieved; U = effect not yet clearly stable. p < 0.05 and p < 0.01: this p-value threshold reached and maintained. Stable classification: SMD did not cross into different Cohen’s classification (< 0.20 = negligible; 0.20–0.49 = small; 0.50–0.79 = moderate; 0.8 = large). Stable effect: SMD did not vary more than 0.1 from final analysis. Precise estimate: confidence limits were 0.2 or less. Results may show minor discrepancies from originally published results because minor corrections were made in data analysis procedures.

By 2008 (57 studies) all thresholds of evidence sufficiency had been met. Table 1 shows similar results for studies comparing different levels of cognitive interactivity and range of difficulty, although the predefined threshold for precise level of effect has not yet been achieved for the latter. By contrast, there is continued instability in the results of studies comparing different intensities of repetition between groups (Fig. 4b), which suggests that further evidence may yet add value to understanding of the educational effect of increasing the number of repetitions in TES-ME.

DISCUSSION

In this evidence-informed essay, I have used the technique of cumulative meta-analysis to explore the temporal unfolding of evidence in TES-ME. Most of the comparisons reported herein have more than ample evidence to support their most salient interpretations. In particular, studies comparing the outcomes of TES-ME with those of no intervention

754

showed statistical significance very early, and conclusions regarding the magnitude and precision of effect did not change appreciably over many years and hundreds of subsequent studies. Studies comparing the outcomes of TES-ME with those of other interventions, including non-simulation instruction and other simulation-based approaches, result in pooled SMDs of lower magnitude and therefore required more time to reach stable evidence of statistical significance, but even here there is ample evidence to draw robust conclusions about the field generally. However, some comparisons – most notably those of non-simulation and lecture-based approaches, and those of simulation-based approaches in which the number of repetitions is varied – appear to require additional evidence before stable conclusions can be drawn. This study employed four different thresholds of evidence sufficiency, addressing questions of statistical significance, educational significance (classification and magnitude of effect), and precision of effect. Each of these lends itself to slightly different appli-

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

Meta-analysis of simulation-based education

Year in which sufficient evidence was achieved (studies, n)* Year of first trial

Stable p < 0.05

p < 0.01

classification

Precise Stable effect

estimate

1973

1973 (n = 1)

1980 (n = 2)

1997 (n = 28)

2000 (n = 52)

2006 (n = 199)

1998

1998 (n = 2)

1998 (n = 2)

1998 (n = 2)

2008 (n = 75)

2008 (n = 75)

1998

1998 (n = 1)

1998 (n = 1)

1998 (n = 1)

2004 (n = 35)

2006 (n = 61)

1978

2004 (n = 14)

2005 (n = 17)

2004 (n = 14)

2004 (n = 17)

2007 (n = 25)

2000

2002 (n = 2)

2009 (n = 9)

2009 (n = 9)

2009 (n = 9)

NA

1997

2004 (n = 5)

2010 (n = 12)

2008 (n = 9)

2008 (n = 9)

NA

1975

2000 (n = 12)

2001 (n = 14)

2007 (n = 43)

2004 (n = 24)

2008 (n = 57)

1975

1998 (n = 9)

2000 (n = 14)

1999 (n = 12)

2000 (n = 14)

2006 (n = 41)

1997

2001 (n = 4)

2009 (n = 17)

2006 (n = 12)

2006 (n = 12)

NA

1966

2011 (n = 7)

NA

U

U

NA

cations. For example, in many educational contexts the magnitude of effect may matter less than its classification (e.g. for practical purposes it may not matter whether an intervention has an SMD of 0.9 or 1.2 because both are considered large effects). The precision of the estimate of effect is probably of greater interest to researchers than to teachers. Why do we have an excess of evidence? At least some of the ‘excess’ studies included in these analyses were designed to address questions other than those of the meta-analysis in which they were included. For example, most of the studies looking at cognitive interactivity were designed to explore a specific element of interactivity, such as active debriefing, repetition or pre-learning. Thus, what amounts to an ‘excess’ of evidence informing the broad phenomenon (e.g. interactivity) may simultaneously yield novel insights into a particular aspect of that phenomenon (e.g. debriefing).

Additional redundancy arises from replications that use different learners, tasks or educational settings. Such replications are justified to the degree that the contextual variations may plausibly impact results. However, the number of permutations is nearly infinite, and researchers must judge whether the results of prior work can be generalised or whether study in a new context is required. Learning theories and conceptual frameworks are invaluable in making such decisions. For example, if ample evidence supports the effectiveness of training for adult flexible bronchoscopy compared with no intervention, it seems reasonable to conclude that similar training would work for paediatric populations or for rigid bronchoscopy techniques. The specific sequence, duration or activities of training may need to be adjusted for optimal effectiveness, but such training decisions are best informed by the findings of studies that make comparisons among active interventions, which would not constitute a replication of the no-intervention comparison.

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

755

D A Cook

Figure 2 Cumulative meta-analysis of skills outcomes of technology-enhanced simulation in comparison with no intervention. Filled squares indicate the pooled standardised mean difference effect size; bars indicate 95% confidence intervals (95% CI), Cum. = cumulative

More often, however, authors appear to contribute to this excess unintentionally. Judging by the documented gaps in reporting7,16 and broad patterns in the literature,22 it seems authors are often unaware of the extent of the evidence in a field (i.e. existing publications), publish evidence collected as a matter of convenience (e.g. analysing and reporting routine course evaluation data) without considering the need for such studies, emulate the work of others (so-called ‘me too’ studies), or poorly conceive their questions (e.g. fail to clearly define the comparison group16 or fail to properly link the question to a relevant conceptual framework30). Examples of these phenomena refer to the huge number of studies that have evaluated the impacts of educational technologies such as Internet-based instruction (130 studies31) and simulation-based training (609 studies27) in comparison with no intervention, in which findings almost

756

universally favour instruction. The remedy for such deficiencies will involve at least three activities. Firstly, education researchers need training (i.e. faculty development) in how to search for relevant work, invoke appropriate conceptual frameworks, and ask questions that advance the field. Secondly, editors and peer reviewers need to be sufficiently familiar with the field to distinguish work that advances the field from redundant or duplicative studies. Thirdly, comprehensive knowledge syntheses (especially, but not exclusively, systematic reviews32) can identify and bring together studies published in disparate journals and fields to offer a true picture of the evidence on a given topic. Moreover, reviews that focus on a narrow topic or that use restrictions based on study design (e.g. by excluding non-randomised trials) or scientifically irrelevant study features (such as language of publication) contribute far less to our

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

Meta-analysis of simulation-based education

Figure 3 Cumulative meta-analysis of skills outcomes of technology-enhanced simulation in comparison with non-simulation training. Filled squares indicate the pooled standardised mean difference effect size; bars indicate 95% confidence intervals (95% CI), Cum = cumulative

understanding than do reviews that cover a broader sampling of relevant work, which both provide a broad picture of the lay of the land and permit between-study contrasts to further clarify key issues. Strengths and weaknesses of this study This analysis refers to a convenience sample of studies in a focused field of health professions education and results strictly apply only to this field (although the bottom-line conclusions can probably be generalised across education). This study is limited by all of the weaknesses noted in the reports from which its data are derived,25–27 including large statistical inconsistency, heterogeneity in interventions and outcomes, and variable quality in the original studies. Moreover, the four levels of evidence sufficiency, although grounded in commonly accepted statistical practices, were developed de novo for the purposes of this study. This study also emphasises quantitative evidence at the exclusion of other equally important sources of evidence, such as qualitative research. Limitations of meta-analysis in general Meta-analysis is a powerful tool, but has limitations, especially in fields such as education in which interventions and outcomes are not standardised.33

Firstly, meta-analysis requires that all included studies address a conceptually similar question (comparison). Secondly, even if studies address a similar question, between-study variation in learner populations, the operationalisation of interventions, outcomes and design flaws may lead to inconsistent or spurious answers. Thirdly, standard meta-analysis assumes that all other things are equal, but in educational practice we know that this assumption is rarely true.34 Statistical techniques such as metaregression can adjust (in a limited fashion) for known covariates, but not for unreported study features. Despite these limitations, meta-analysis can offer insightful perspectives on broad bodies of evidence and help to define areas of both strength and weakness in the evidence in a given field. Conclusions I do not mean to suggest that none of the ‘excess’ studies merited publication. As noted above, several such studies addressed specific questions beyond those evaluated in the present meta-analysis, and a given study can contribute to its field in many ways. Rather, my intent is to draw attention to the broader issue of unnecessary replication and to suggest that we often establish answers to questions much sooner than we realise. Once the evidence adequately answers a question, the continued conduct and reporting of studies exploring that ques-

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

757

D A Cook (a) High versus low feedback

(b) Many versus few repetitions

Figure 4 Cumulative meta-analyses of skills outcomes of technology-enhanced simulation in comparison with alternative simulation training, in conditions of (a) high versus low feedback, and (b) many versus few repetitions. Filled squares indicate the pooled standardised mean difference effect size; bars indicate 95% confidence intervals (95% CI) Cum = cumulative

tion represent a waste of author time and journal publication space, and unnecessarily delay the application to educational practice of what is known.

This exploration of evidence supports five bottom-line conclusions

1

758

Some replication is necessary. For example, both non-simulation comparison studies and studies comparing different levels of feedback showed unfavourable effects in the first published study, but replication evidence favoured

2

3

the opposite conclusion. Stopping after the first study would have misrepresented true effects. At some point, further replication is no longer needed, at which time we should: (i) put into practice what we know, and (ii) perform different studies to investigate different questions (or more nuanced aspects of the phenomenon). The required amount of evidence varies. Fewer studies will be required when the effect is uniform (as might be found with standardised interventions and assessments) or large. Conversely, more evidence will be needed when the effect size is small (a common occurrence in studies making comparisons of effectiveness in

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

Meta-analysis of simulation-based education

4

5

education22) or when we desire to know the precise effect size (an aspiration that is probably infrequent, unnecessary and perhaps a little ingenuous in education, given the lack of standardisation in interventions and assessments). Studies comparing the outcomes of instruction with those of no intervention contribute little to the science of education. For such studies in TES-ME, the pooled effects in 1980 (after two studies had been published) and in 2011 were essentially the same. The 430 studies published in the intervening years added only precision (i.e. a narrower CI, which, as noted above, is probably not very important) to this analysis. Programmatic research6 aimed at clarifying mechanisms of action13 and guided by the principle of ‘what’s next?’35 will do far more to advance this art-informed science. The existing evidence base in a given field may not be readily apparent to those engaged in scholarly work. Rigorous systematic reviews and syntheses of evidence are thus required to bring to light unappreciated sources of evidence.

6

7

8 9 10

11

12

13

14

Acknowledgements: the author thanks Ryan Brydges, PhD, Patricia J Erwin, MLS, Stanley J Hamstra, PhD, Rose Hatala, MD, Jason H Szostek, MD, Amy T Wang, MD, and Benjamin Zendejas, MD, for their assistance in the original literature search and data acquisition. Funding: this work was supported by intramural funds, including an award from the Division of General Internal Medicine, Mayo Clinic. Conflicts of interest: none. Ethical approval: not required.

15 16

17

18

19 REFERENCES 1 Dauphinee WD, Wood-Dauphinee S. The need for evidence in medical education: the development of best evidence medical education as an opportunity to inform, guide, and sustain medical education research. Acad Med 2004;79:925–30. 2 Harden RM, Grant J, Buckley G, Hart IR. BEME Guide No. 1: Best evidence medical education. Med Teach 1999;21:553–62. 3 van der Vleuten CPM, Dolmans DHJM, Scherpbier AJJA. The need for evidence in education. Med Teach 2000;22:246–50. 4 Thistlethwaite J, Davies H, Dornan T, Greenhalgh T, Hammick M, Scalese R. What is evidence? Reflections on the AMEE symposium, Vienna, August 2011. Med Teach 2012;34:454–7. 5 Wong G, Greenhalgh T, Westhorp G, Pawson R. Realist methods in medical education research: what

20

21

22

23

are they and what can they contribute? Med Educ 2012;46:89–96. Regehr G. It’s NOT rocket science: rethinking our metaphors for research in health professions education. Med Educ 2010;44:31–9. Cook DA, Levinson AJ, Garside S. Method and reporting quality in health professions education research: a systematic review. Med Educ 2011;45:227–38. Cook DA. Avoiding confounded comparisons in education research. Med Educ 2009;43:102–4. Gruppen LD. Improving medical education research. Teach Learn Med 2007;19:331–5. Lingard L. Qualitative research in the RIME community: critical reflections and future directions. Acad Med 2007;82 (10 Suppl):129–30. Bordage G. Moving the field forward: going beyond quantitative-qualitative. Acad Med 2007;82 (10 Suppl): 126–8. Shea JA, Arnold L, Mann KV. A RIME perspective on the quality and relevance of current and future medical education research. Acad Med 2004;79:931–8. Cook DA, Bordage G, Schmidt HG. Description, justification, and clarification: a framework for classifying the purposes of research in medical education. Med Educ 2008;42:128–33. Albert M, Hodges B, Regehr G. Research in medical education: balancing service and science. Adv Health Sci Educ Theory Pract 2007;12:103–15. Norman G. Fifty years of medical education research: waves of migration. Med Educ 2011;45:785–91. Cook DA, Beckman TJ, Bordage G. Quality of reporting of experimental studies in medical education: a systematic review. Med Educ 2007;41:737–45. Prasad V, Vandross A, Toomey C et al. A decade of reversal: an analysis of 146 contradicted medical practices. Mayo Clin Proc 2013;88:790–8. van de Wiel MW, Schmidt HG, Boshuizen HP. A failure to reproduce the intermediate effect in clinical case recall. Acad Med 1998;73:894–900. Reed DA, Cook DA, Beckman TJ, Levine RB, Kern DE, Wright SM. Association between funding and quality of published medical education research. JAMA 2007;298:1002–9. Baernstein A, Liss HK, Carney PA, Elmore JG. Trends in study methods used in undergraduate medical education research, 1969–2007. JAMA 2007;298:1038– 45. Todres M, Stephenson A, Jones R. Medical education research remains the poor relation. BMJ 2007;335: 333–5. Cook DA. If you teach them, they will learn: why medical education needs comparative effectiveness research. Adv Health Sci Educ Theory Pract 2012;17:305– 10. Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med 1992;327:248–54.

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

759

D A Cook 24 Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Cumulative Meta-Analysis. Introduction to MetaAnalysis. Chichester: Wiley 2009;371–6. 25 Cook DA, Brydges R, Hamstra SJ, Zendejas B, Szostek JH, Wang AT, Erwin PJ, Hatala R. Comparative effectiveness of technology-enhanced simulation versus other instructional methods: a systematic review and meta-analysis. Simul Healthc 2012;7:308–20. 26 Cook DA, Hamstra SJ, Brydges R, Zendejas B, Szostek JH, Wang AT, Erwin PJ, Hatala R. Comparative effectiveness of instructional design features in simulation-based education: systematic review and meta-analysis. Med Teach 2013;35: e867–98. 27 Cook DA, Hatala R, Brydges R, Zendejas B, Szostek JH, Wang AT, Erwin PJ, Hamstra SJ. Technologyenhanced simulation for health professions education: a systematic review and meta-analysis. JAMA 2011;306:978–88. 28 Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and metaanalyses: the PRISMA statement. Ann Intern Med 2009;151:264–9.

760

29 Cohen J. Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Hillsdale, NJ: Lawrence Erlbaum Associates 1988. 30 Bordage G. Conceptual frameworks to illuminate and magnify. Med Educ 2009;43:312–9. 31 Cook DA, Levinson AJ, Garside S, Dupras DM, Erwin PJ, Montori VM. Internet-based learning in the health professions: a meta-analysis. JAMA 2008;300: 1181–96. 32 Cook DA. Narrowing the focus and broadening horizons: complementary roles for non-systematic and systematic reviews. Adv Health Sci Educ Theory Pract 2008;13:391–5. 33 Cook DA. Randomised controlled trials and metaanalysis in medical education: what role do they play? Med Teach 2012;34:468–73. 34 Cronbach LJ. Beyond the two disciplines of scientific psychology. Am Psychol 1975;30:116–27. 35 Eva KW, Lingard L. What’s next? A guiding question for educators engaged in educational research. Med Educ 2008;42:752–4.

Received 23 December 2013; editorial comments to author 15 February 2014; accepted for publication 17 February 2014

ª 2014 John Wiley & Sons Ltd. MEDICAL EDUCATION 2014; 48: 750–760

How much evidence does it take? A cumulative meta-analysis of outcomes of simulation-based education.

Studies that investigate research questions that have already been resolved represent a waste of resources. However, the failure to collect sufficient...
482KB Sizes 3 Downloads 3 Views