2015, 37: 146–152

Medical school benchmarking – From tools to programmes TIM J. WILKINSON1, JUDITH N. HUDSON2, GEOFFREY J. MCCOLL3, WENDY C. Y. HU4, BRIAN C. JOLLY2 & LAMBERT W. T. SCHUWIRTH5,6 1

University of Otago, New Zealand, 2University of Newcastle, Australia, 3University of Melbourne, Australia, 4University of Western Sydney, Australia, 5Flinders University, Australia, 6Maastricht University, The Netherlands

Med Teach Downloaded from informahealthcare.com by Nyu Medical Center on 02/08/15 For personal use only.

Abstract Background: Benchmarking among medical schools is essential, but may result in unwanted effects. Aim: To apply a conceptual framework to selected benchmarking activities of medical schools. Methods: We present an analogy between the effects of assessment on student learning and the effects of benchmarking on medical school educational activities. A framework by which benchmarking can be evaluated was developed and applied to key current benchmarking activities in Australia and New Zealand. Results: The analogy generated a conceptual framework that tested five questions to be considered in relation to benchmarking: what is the purpose? what are the attributes of value? what are the best tools to assess the attributes of value? what happens to the results? and, what is the likely ‘‘institutional impact’’ of the results? If the activities were compared against a blueprint of desirable medical graduate outcomes, notable omissions would emerge. Conclusion: Medical schools should benchmark their performance on a range of educational activities to ensure quality improvement and to assure stakeholders that standards are being met. Although benchmarking potentially has positive benefits, it could also result in perverse incentives with unforeseen and detrimental effects on learning if it is undertaken using only a few selected assessment tools.

Background

Practice points

Medical education, as one of many branches of professional education, continuously strives to improve the quality of its educational offerings. To support this, informative benchmarking among medical schools is essential, but it may also result in unforeseen and unwanted effects. This has been partially captured in Goodhart’s law, most often stated as: ‘‘When a measure becomes a target, it ceases to be a good measure’’ (Goodhart 1981). In this article, we discuss benchmarking from the viewpoint of how the process may influence the behaviour of organisations, drawing an analogy with how assessment drives the learning behaviour of students. Benchmarking is defined (Meade 1998) as



the formal and structured process of searching for those practices which lead to excellent performance, the observation and exchange of information about them, their adaptation to meet the needs of one’s own organisation, and the implementation of the amended practice Although benchmarking often compares performance against a standard (criterion referenced), it also aims to search for areas of good practice, so it is often comparative (norm-referenced).





We draw an analogy between the effects of assessment on student learning and the potential effects of benchmarking on medical school activities. Benchmarking should be undertaken with a variety of carefully selected tools, and the results judiciously synthesised, just as assessment has moved towards a more programmatic approach. Benchmarking, like assessment, is a powerful driver of behaviour. We need to ensure that decisions that are made result in behaviours that improve quality, share good practice and foster innovation.

In this article, we refer to benchmarking as it applies to medical school performance and we refer to assessment as it applies to student performance. These distinctions are important because assessment results (how students from a medical school perform) can also be used in the benchmarking process (how medical schools help students to perform). In this article we treat assessment and benchmarking as separate concepts because the aim is to draw the analogy between assessment and benchmarking in their effects, respectively, on student and organisational behaviours. We argue that the similarities

Correspondence: Prof. Tim J. Wilkinson, Medical Education Unit, University of Otago, Christchurch, P O Box 4345, Christchurch 8140, New Zealand. Tel: +64 3 3643600; Fax: +64 3 3640525; E-mail: [email protected]

146

ISSN 0142-159X print/ISSN 1466-187X online/15/020146–7 ß 2015 Informa UK Ltd. DOI: 10.3109/0142159X.2014.932902

Medical school benchmarking

are also important as assessment of students is a powerful driver of learning behaviour (Newble & Jaeger 1983; Frederiksen 1984; Wilkinson et al. 2007; Cilliers et al. 2010, 2012a, 2012b, 2013) and therefore we argue it is important to theorise and understand how benchmarking of medical schools could drive organisational behaviours. Such behaviours could relate to curriculum development activities and other aspects of programme delivery. This article therefore aims to develop and apply a conceptual model to selected benchmarking activities of medical schools.

Med Teach Downloaded from informahealthcare.com by Nyu Medical Center on 02/08/15 For personal use only.

Method This article has three stages: (1) to draw an analogy between the effects of assessment on student learning and the potential effects of benchmarking on medical school activities; (2) to present a framework by which benchmarking can be evaluated, and (3) to apply this to some current activities in Australia and New Zealand.

Results Stage 1: Conceptual analogy The analogy we wish to draw is not far-fetched. Both assessment and benchmarking are processes by which information about performance is collected and judged against standards which, in turn, have high-stakes consequences. Table 1 outlines two hypothetical case studies. These illustrate the parallels between assessment and benchmarking, how lack of clarity of purpose can distort behaviours and highlights similarities between the educational impact of assessment and the ‘‘institutional impact’’ of benchmarking. In our analogy it is helpful to describe current scientific developments in the literature about assessment. Assessment was originally strongly based on ‘‘testing’’ with an emphasis on the psychometric properties of individual tools, thereby focusing more on easy to measure capabilities and the quantitative and reliable measurement of these, rather than validity. To rebalance these effects, more programmatic approaches have been developed and successfully implemented. These programmatic approaches use a suite of assessment tools over time to provide a more comprehensive picture of competence, carefully balancing formative and summative, and quantitative and qualitative assessments (van der Vleuten & Schuwirth 2005; Wilkinson et al. 2011). The choice of tools is made by deliberately considering validity, reliability, feasibility, acceptability, and educational impact (van der Vleuten 1996), and thus their contributions to the purpose of the whole assessment programme. From this it is clear that, notwithstanding its psychometric properties, no single instrument can fulfil all of these properties. With a broader approach, purpose and content determine the methods and not the other way around. Before any assessments are conducted, however, there should be a clear statement of purpose of what the process aims to achieve. The purpose can be purely decision-oriented (such as for selection purposes or for progression and

Table 1. Hypothetical case studies.

Hypothetical case study 1 John is a medical student with good clinical skills ability as judged by assessments and supervisors. His research skills are not as developed but he has a plan to improve these skills. He is keen to know how his learning in core knowledge learning is progressing. His supervisor suggests he take a formative MCQ test and that he avoids the subsections of where he already knows his strengths and weaknesses, answering instead questions where he is unsure of his competence. John is pleasantly surprised to get 62% right. More importantly it has helped identified areas to improve. His supervisor then produces a ranked list of the MCQ results of other students and John is ranked near the bottom. John is called to see the Dean and told to ‘‘pull your socks up as you’re clearly a weak student in general’’. One month later he plans to sit a repeat MCQ test in his areas of strength. He therefore neglects any learning about clinical skills and forgets there are weaknesses in research skills. His repeat score is 85% and his ranking has gone up. John has learnt nothing and has got worse in some areas. The Dean is very pleased. Hypothetical case study 2 Johnstone is a medical school with strengths in social accountability as judged by accreditations and other activities. The school’s research capacity is not as well developed but there is a plan to address this weakness. The school is keen to know how its students’ core knowledge learning is progressing. The accrediting body suggests the medical school participate in a benchmarking MCQ test. The school decides to avoid the subsections of known strength and known weakness, but focus on areas where the competence of its students is less clear. Johnstone does this and is pleasantly surprised when the mean student score is 62%. More importantly it has helped identify areas to improve. The media then produces a ranked list of the MCQ results of other schools and Johnstone is ranked near the bottom. Johnstone is called by the media and asked why it’s ‘‘a weak school in general’’. One year later, the school neglects social accountability, forgets there are weaknesses in research capacity and asks its students to sit MCQ tests in areas of known strength. The repeat mean score is 85% and the school’s ranking has gone up. Johnstone has learnt nothing and has got worse in some areas. The media ignore them.

graduation decisions) or alternatively, it can be used informatively: to optimise the information about strengths and weaknesses and how to improve. If these purposes are not clear, then data collection and their interpretation will logically become distorted. For example, if the purpose is to help identify areas for improvement, then a student will be more likely to reveal their weaknesses and therefore guide how their performance can be improved. If the purpose is merely to make a passfail decision, then students will prefer to conceal their weaknesses. As the case studies illustrate, collecting data for one purpose and using it for another, will only create tensions, distortions and difficulties in data interpretation. A key component of good assessment is validity – ensuring that what is actually assessed is adequately representative of those characteristics that are being sought in the assessment. Validity is not an easy concept and there have been numerous theories concerning how best to establish validity of a measure (Kane 2001). Current theories concur that it is necessary that the assessors have a clear focus on what they want to assess, which student attributes they value and want to capture, and how the assessment influences learning behaviours. Problems arise when student attributes that are valued are not so easily measured. It can be tempting to measure only the easily measurable attributes (because it is easier, more convenient and a feasible starting point) and ignore the important ones. As an example of how this can go wrong,

147

Med Teach Downloaded from informahealthcare.com by Nyu Medical Center on 02/08/15 For personal use only.

T. J. Wilkinson et al.

consider the use of MCQs versus aggregated narrative staff judgements to assess a student’s professional behaviour. The MCQ result will be numerical, easy to calculate, easy to summarise (in the form of means, confidence intervals, etc), and easy to produce rankings. But one can seriously doubt whether student responses to written MCQs is the most valid approach for assessing their actual professional behaviour. In contrast, staff observations of student professionalism are often narrative and descriptive. They may be highly informative, but such information cannot be summarised easily without careful reading and thoughtful collation. It is a common misconception to equate subjectivity with unreliability. The process of assessment requires expert judgement, which is intrinsically subjective as it requires assessors to assign a global value. Such judgements may still be reliable, as reliability is a matter of sampling and many well-sampled judgements will produce reliable – and valid – results (van der Vleuten et al. 1991). Thus, more useful interpretations about assessment results arise from the considered choice of tools and from the considered synthesis of all available information. Likewise, we suggest benchmarking of medical schools requires a similar approach.

Implications for benchmarking We argue that discussions on benchmarking should move forward in a manner that is similar to the recent changes in assessment focus described above (van der Vleuten & Schuwirth 2005), namely a move from individual testing methods to a programmatic approach. We have observed that benchmarking discussions are often aimed at choosing which benchmarking tool might be best. This is reminiscent of discussions around assessment, which used to focus predominantly on individual assessment tools. It has become apparent that assessment can have multiple purposes but if its purposes are not clear then the processes by which data are gathered, the choice of which data to collect and the interpretation of these data can become misleading (Dijkstra et al. 2010). Similarly, if the purpose of benchmarking is not clear, the information will become useless for quality improvement and may be manipulated by stakeholders for strategic gain. Organisational responses to a benchmarking process will depend on whether it is undertaken in the context of quality improvement, competition for limited ongoing training opportunities, or to reassure the public and professional accreditation bodies of student attainment of competency standards. University and medical school norm-referenced ‘‘league tables’’ are already rife in the UK1 and the USA2, yet for medicine, one could wonder whether these rankings are really undertaken on the values medical schools and the community rate highly? A recent report on medical school benchmarking used tests of knowledge as the only tool (Wilkinson et al. 2014). Although the authors pointed out that such a test of knowledge has a limited place, they also highlighted its utility. Understanding and anticipating that unwanted consequences may occur is vital if we are to avoid undesirable pitfalls 148

that could adversely affect the quality of a medical education system.

Stage 2: A framework for evaluating benchmarking In applying these considerations to the benchmarking of medical schools, which can be regarded as a form of ‘‘assessment’’ of a school’s performance, three areas of potential confusion should be highlighted: (1) Student assessment data can be a source of data for benchmarking, but only if these data were designed for, and are interpreted in terms of, comparisons between different cohorts of students, not for decisions about individual students (Glaser 1963). (2) It is important to clarify if the benchmarking activity relates to the quality of the process of medical education, the outcomes of this process, or both. (3) Collaborative assessment activities among medical schools that aim to draw on economies of scale (such as sharing MCQ items or objective structured clinical examination (OSCE) stations) can contribute to benchmarking but are neither necessary nor sufficient for benchmarking. To better anticipate unwanted consequences, the purpose, validity, reliability, acceptability and impact of any proposed benchmarking activity can be examined with the following questions: (1) What is the purpose? Is it to guide quality improvement or to produce league tables of schools? Are there political, financial or status-related consequences arising from the outcome of the benchmarking exercise (which might create the conditions that would catalyse Goodhart’s law [Goodhart 1981])? An explicit statement of aims and scope should be agreed upon before the activity starts. The resulting and necessary buy-in is essential for the validity of the collected data and any conclusions. Just as in scientific research: ‘‘invalid data, invalid conclusions’’ and ‘‘decide the purpose, before deciding the method’’. (2) What are the attributes of value? A clear description of the expected outcomes of the programmes being benchmarked is needed, acknowledging that these may differ among medical schools. Furthermore, the pathways to these outcomes are likely to differ. Examples of useful anchors for such statements of purpose can be found within the World Federation of Medical Education standards for medical education (World Federation for Medical Education 2012), the standards within the General Medical Council’s publication of Tomorrow’s doctors (General Medical Council 2009), the Medical Deans of Australia and New Zealand (MDANZ) competencies project (Medical Deans of Australia and New Zealand 2011), and the Australian Medical Council (AMC) Competence-based medical education consultation paper(Australian Medical Council 2010). (3) What are the best tools to assess the attributes of value? The tools and measures chosen for benchmarking medical schools do not need to be the same as (and ideally would be different from) those used for

What happens to the results?

What are the attributes of value?

What is the purpose?

Description

Career choices and practice locations of graduating doctors The results are ‘‘owned’’ by the medical deans of Australia and New Zealand but research projects are led either by individual investigators, medical schools, or by the collaboration

A research collaboration of all medical schools in Australia and New Zealand to explore predictors and influences on career choices and practice locations of graduating doctors To inform which aspects of curriculum design influence career choices and practice locations

Medical schools outcomes databasea

The results of the benchmarking are sent to the partner medical schools. Academic and other staff, from the participating schools, with responsibility for bioscience teaching have access to the results. De-identified data are also prepared for publication in academic journals.

The results of the benchmarking are made available to all medical schools in a deidentified fashion. Results are used by the schools to identify and remedy weaknesses in their curriculum.

Basic sciences, behavioural sciences, and clinical knowledge and its application.

Peer feedback reports and sharing of examiner training aiming to share innovation and improve the quality of assessment

Clinical competency (as above)

(continued )

To encourage communication among medical schools concerning quality standards in assessment and to promote research for developing international standards in assessment of medical competence Clinical and basic science knowledge and its application Partner schools have a requirement to contribute 150 high performance items annually, and provide feedback on performance of database items used. To compare students’ achievements in clinical competency development: communication skills—giving and receiving information; physical examination; clinical reasoning; and data interpretation skills

To compare students’ achievements in medical knowledge and its application, and thereby compare the effectiveness of the curricula in ensuring the knowledge base of their graduates.

To compare students’ achievements in the bioscience focussed phases of medical programmes

Knowledge and its application of the biosciences.

A consortium to share a large assessment bank for medical education on an international scale

A consortium of medical schools who collaborate on sharing selected OSCE stations, and peer review OSCE process and content, and assessor training, at participating schools

A comparison of medical schools on the basis of their students’ achievements on shared progress tests of knowledge and its application. Test development and the collaboration are ‘‘bottom-up’’ as are school instigated.

Collaborative progress testingc

International Database for Enhanced Assessments and Learning (IDEAL) consortiumd

A consortium of medical schools who share multiple choice and extended matching question items for use in final summative assessments

Australian medical schools assessment collaborationb

Sharing objective structured clinical examination (OSCE) stations (Australian Collaboration for Clinical Assessment in Medicine) –ACCLAiMb

Table 2. Examples of benchmarking exercises mapped against key questions.

Med Teach Downloaded from informahealthcare.com by Nyu Medical Center on 02/08/15 For personal use only.

Medical school benchmarking

149

150

b

http://www.msod.org.au/ http://www.acer.edu.au/amac c Arnold & Willoughby 1990; van der Vleuten et al. 1996. d www.idealmed.org

a

Positive Peer-led and peer-reviewed analyses will share aspects of good practice that medical schools can use to inform decisions about aligning their curricula to their school’s mission.

What is the likely ‘‘institutional impact’’ of the results?

Negative If there were explicit ranking of medical schools by graduate career outcomes (e.g. general practice or research compared to medicine and surgery), this could create perverse curriculum incentives

Questionnaire items measure career intentions. In due course, actual career choices and locations of practice will become available.

What are the best tools to assess the attributes of value?

Negative As this exercise measures only one attribute of value, there is the risk that by not measuring the other areas that are valued by medical schools, these unaddressed areas could suffer.

Multiple choice and extended matching question items are appropriate for testing bioscience knowledge. Wellworded questions can test application of bioscience knowledge. Positive The results could form part of the evidence for ongoing revision of the bioscience component of a medical programme.

Negative External bodies could use these data (if permission is given), among other data, to inform summative decisions about medical schools’ accreditation. External bodies may use the data for public disclosure or for funding decisions. As this exercise measures only one attribute of value, there is the risk that by not measuring the other areas that are valued by medical schools, these unmeasured areas could suffer. The results could be over-interpreted as there is evidence that one-off comparisons between schools in countries such as Australia produce differences in quality that are within the error bands.

Positive The results have been, and still are, used to inform review and revision of curricula. It leads to an iterative process of increasing commonalities in the curricula, and/or to a clear understanding of the differences.

Multiple choice question items are one form of assessment for medical knowledge and its application. Other forms of assessment would be required to ensure that the domain of clinical knowledge is appropriately tested.

Negative This exercise has a more integrated approach to assessment given the stations assess practical skills but many also require application of knowledge (e.g. clinical reasoning tasks) and interpersonal skills. However, it does not measure competency and professional behaviour in the workplace.

Positive The results form part of the evidence for ongoing improvement of the assessment of clinical skills and competency

Selected ACCLAiM OSCE stations.

Med Teach Downloaded from informahealthcare.com by Nyu Medical Center on 02/08/15 For personal use only.

Continuous quality improvement by workshops in member institutions for enhancing item writing expertise and the quality of items contributed by members; regular culling of banked items by international experts; and annual augmentation of formative and summative item banks (43400 new items each year from members) Negative This exercise mainly measures only one attribute of value; there is the risk that by not measuring the other areas that are valued by medical schools, these unmeasured areas could suffer.

Predominantly multiple choice and extended matching question items; plus some OSCE stations

T. J. Wilkinson et al.

Med Teach Downloaded from informahealthcare.com by Nyu Medical Center on 02/08/15 For personal use only.

Medical school benchmarking

assessing medical students. For example, comparisons on student career intentions, and actual later career choices, will give indications of how a medical school’s programme is achieving some of its goals, but are different to student assessment and therefore require different tools. Optimal approaches start with purpose and content before choosing methods. This is analogous to good research; the methodology should not drive the research question and, similarly, in good clinical care the choice of investigations should not drive the differential diagnosis. Instead, the differential diagnosis drives the diagnostic plan, and the research question the methodology. Also, single tools never give comprehensive answers to the complex question of the quality of a curriculum, so a combination of tools is needed. The choice of tools should also be compared against desirable measures of reliability (reproducibility with which they provide information), validity (how well they measure what we want them to measure) and feasibility. (4) What happens to the results? Even with the best intentions, good results in the wrong hands can have unwanted negative effects. As in good project planning, a clear description of ownership, publication, sharing and archiving of information needs to be made. (5) What is the likely ‘‘institutional impact’’ of the results? The ‘‘educational impact’’ of assessment on student learning behaviour is well known (Newble & Jaeger 1983; Frederiksen 1984; Wilkinson et al. 2007; Cilliers et al. 2010, 2012a, 2012b, 2013). For example, assessment that focuses only on rote factual knowledge, and not on communication, will lead students to memorising facts and not learning communication skills as well. Tests that only focus on cognitive skills will not instil better professional behaviour. From a benchmarking perspective similar effects can result, driving institutional behaviour. In one such scenario, perverse incentives created by national exit exams could result in universities spending money allocated for good education on test preparation courses, and dissuading weaker students from participating.

Stage 3: Applying the framework: some examples Table 2 provides some worked examples, within an Australian and New Zealand context, to the questions raised in this framework, for five benchmarking exercises: (1) Medical Schools Outcomes Database. (2) Australian Medical Schools Assessment Collaboration. (3) Collaborative progress testing. (4) Sharing OSCE stations (ACCLAiM) (5) Sharing a large assessment bank for medical education on an international scale (IDEAL). It can be seen that if the results from applying the framework were compared against a blueprint of desirable medical graduate outcomes, notable omissions would emerge. For example, there are no current exercises that benchmark schools by professionalism, critical thinking, or research skills, and only limited activities to benchmark patient

communication and clinical skills. If benchmarking were limited to the exercises shown here it can be seen that there would be incentives to narrow and restrict curriculum to only what is measured.

Discussion We suggest that, as assessment has done, benchmarking should move towards a programmatic approach with a dominant focus on formative purposes and improvement. This would foster quality improvement and sharing of good practice by focussing on areas (1) that are of value to all medical schools and (2) about which there is insufficient existing information. Benchmarking should be informative with full disclosure of results for participants only and not be used for league tables, although the wider public should know that such exercises are occurring and for what purpose. We encourage benchmarking of curricula to extend beyond aggregated student performance results, as other measures can also provide important and valuable information. For example, documenting the attributes on which a school failed its last 20 students, or awarded its top results, could provide insights into what that school values and the processes that support those values. Other more sophisticated indices could be developed such as demographic and descriptive data on student success from rural, regional, and lower social and economic status areas. Provision of such data is an example of information that may be highly valid, but not always numerical. Medical schools should be proactive in determining the areas to benchmark, and relate these areas to their missions and areas of interest. If all schools were benchmarked on the same attributes, then curriculum evolution is likely to move towards better performance in those specific areas. This may well be a desirable outcome if based on agreed competencies that all medical graduates must attain and the activities that medical programmes value. However, this could also create curriculum convergence and be a driver away from diversity and innovation. Instead, each medical school should benchmark itself on the attributes that the school values. For example, some may wish to benchmark against social accountability, others against research skills of graduates, while many may wish to benchmark against core skills and knowledge. In summary, benchmarking, like assessment, is a powerful driver of behaviour. We need to ensure that decisions that are made result in behaviours that improve quality, share good practice and foster innovation.

Notes on contributors TIM WILKINSON, MBChB, MClinEd, PhD, MD, FRACP, FRCP, is the Professor of Medicine, Associate Dean (medical education) and MBChB Programme Director at the University of Otago, New Zealand. JUDITH NICKY HUDSON, BMBS, MSc, PhD, is the Professor and Director of Rural Health at the University of Newcastle, Australia, and previously Chair of Assessment in the Joint Medical Program (Universities of Newcastle and New England) and Graduate School of Medicine, University of Wollongong.

151

T. J. Wilkinson et al.

GEOFF McCOLL, MB, BS, BMedSc, MEd, PhD, FRACP, is the Professor of Medical Education and Training in the Melbourne Medical School, University of Melbourne, Australia. WENDY HU, MBBS, MHA, PhD, FRACGP, is the Professor of Medical Education, School of Medicine, University of Western Sydney, Australia. BRIAN JOLLY, BSc(Hons), MA(Ed), PhD, is the Professor of Medical Education in the Faculty of Health and Medicine at the University of Newcastle, Australia. LAMBERT SCHUWIRTH, MD PhD, is the Professor of Medical Education and Director of the Prideaux Centre for Health Professions Education Research at Flinders University in Adelaide, Australia, and Professor for Innovative Assessment at Maastricht University, The Netherlands.

Declaration of interest: The authors declare no conflicts of interests. The authors alone are responsible for the content and writing of this article. Med Teach Downloaded from informahealthcare.com by Nyu Medical Center on 02/08/15 For personal use only.

Notes 1. http://www.thecompleteuniversityguide.co.uk/league-tables/rankings? s¼medicine 2. http://www.topuniversities.com/university-rankings/university-subjectrankings/2013/medicine

References Arnold L, Willoughby TL. 1990. The quarterly profile examination. Acad Med 65(8):515–516. Australian Medical Council. 2010. Competence-based medical education AMC consultation paper. [Accessed 27 April 2013] Available from http://www.amc.org.au/images/publications/CBEWG_20110822.pdf Cilliers FJ, Schuwirth LWT, Adendorff HJ, Herman N, Van der Vleuten CPM. 2010. The mechanisms of impact of summative assessment on medical students’ learning. Adv Health Sci Educ 15(5):695–715. Cilliers FJ, Schuwirth LWT, Herman N, Adendorff HJ, Van der Vleuten CPM. 2012a. A model of the pre-assessment learning effects of summative assessment in medical education. Adv Health Sci Educ 17(1):39–53. Cilliers FJ, Schuwirth LWT, Van der Vleuten CPM. 2012b. A model of the pre-assessment learning effects of assessment is operational in an undergraduate clinical context. BMC Med Educ 12:9. Cilliers FJ, Schuwirth LWT, Van der Vleuten CPM. 2013. Modeling the preassessment learning effects of assessment: Evidence in the validity chain. Med Educ 46(11):1087–1098. Dijkstra J, Van der Vleuten C, Schuwirth L. 2010. A new framework for designing programmes of assessment. Adv Health Sci Educ 15(3):379–393.

152

Frederiksen N. 1984. The real test bias: Influences of testing on teaching and learning. Am Psychol 39(3):193–202. General Medical Council. 2009. Tomorrow’s doctors. London: General Medical Council. Glaser R. 1963. Instructional technology and the measurement of learning outcomes: Some questions. Am Psychol 18(7):519–521. Goodhart C. 1981. Problems of monetary management: The U.K. experience. In: Courakis AS, editor. Inflation, depression, and economic policy in the west. Maryland: Rowman & Littlefield, pp 111–146. Kane MT. 2001. Current concerns in validity theory. J Educ Measure 38(4):319–342. Meade PH. 1998. A guide to benchmarking. Dunedin: University of Otago Press. Medical Deans of Australia and New Zealand. 2011. Developing a framework of competencies for medical graduate outcomes. [Accessed 27 April 2013] Available from http://www.medicaldeans. org.au/wp-content/uploads/Competencies-Project-Final-Report.pdf Newble DI, Jaeger K. 1983. The effect of assessments and examinations on the learning of medical students. Med Educ 17(3):165–171. van der Vleuten CPM. 1996. The assessment of professional competence: Developments, research and practical implications. Adv Health Sci Educ 1(1):41–67. van der Vleuten CPM, Norman GR, De Graaff E. 1991. Pitfalls in the pursuit of objectivity: Issues of reliability. Med Educ 25(2):110–118. van der Vleuten CPM, Schuwirth LWT. 2005. Assessing professional competence: From methods to programmes. Med Educ 39(3):309–317. van der Vleuten CPM, Verwijnen GM, Wijnen W. 1996. Fifteen years of experience with progress testing in a problem-based learning curriculum. Med Teach 18(2):103–109. Wilkinson D, Schafer J, Hewett D, Eley D, Swanson D. 2014. Global benchmarking of medical student learning outcomes? Implementation and pilot results of the International Foundations of Medicine Clinical Sciences Exam at The University of Queensland, Australia. Med Teach 36(1):62–67. Wilkinson TJ, Tweed M, Egan T, Ali A, McKenzie J, Moore M, Rudland J. 2011. Joining the dots: Conditional pass and programmatic assessment enhances recognition of problems with professionalism and factors hampering student progress. BMC Med Educ 11(1):29. Wilkinson TJ, Wells JE, Bushnell JA. 2007. What is the educational impact of standards-based assessment in a medical degree? Med Educ 41(6):565–572. World Federation for Medical Education. 2012. Basic medical education WFME global standards for quality improvement: The 2012 revision. [Accessed December 2013] Available from http://www.wfme.org/ standards/bme

Medical school benchmarking - from tools to programmes.

Benchmarking among medical schools is essential, but may result in unwanted effects...
135KB Sizes 2 Downloads 2 Views