International Journal of Technology Assessment In Health Care, 8:3 (1992), 479-489. Copyright © 1992 Cambridge University Press. Printed in the U.S.A.

QUALITY ASSESSMENT OF MEDICAL RESEARCH AND EDUCATION Hakan Eriksson Karolinska Hospital and the Swedish Medical Research Council

Abstract Different aspects of the process of evaluating research and education are discussed, using the discipline of medicine as a model. The focus is primarily on potential problems In the design of an evaluation. The most important aspects of an assessment are: to create confidence in the evaluation among scientists and/or teachers who are being assessed before beginning; to find experts for whom the scientists and/or teachers have professional respect; to choose assessment methods in relation to the focus, level, and objectives of the evaluation; and to make the report of the evaluation's findings short and explicit.

One of the most vital questions in the university today is how to develop methods to evaluate the quality, relevance, and accessibility of research in an efficient and objective manner. With its growing demands on resources and the increased importance of science, the evaluation of research has also become a central question for politicians and administrators. By necessity, the process of evaluation has been an obvious and integrated activity in the research that is done by the industrial sector. Within the university, however, the situation has been quite different. Science and education have always been considered the responsibility of society, which treats it as a long-term investment to generate knowledge for the future; thus, universities have not been required to meet the same demands for efficiency that industry requires. Academia has a long tradition of built-in systems for self-evaluation, including referee systems for scientific papers, peer review of grant applications, departmental seminars, etc.; these systems are designed more to guarantee the quality of the scientific work than to establish the efficient use of the resources that are allocated to the university. When more autonomous universities are given extended authority to control their resources, society makes greater demands for evaluation in order to monitor the efficiency of its investment in scientific and educational activities. In this situation it is critical for the university to oversee its own systems for internal evaluation. AIMS AND POLICIES

As has been discussed by Quade (21), "evaluation" can be defined a s : " . . . an attempt to assess and analyse how well the actual accomplishment of an activity or program 479

Eriksson

Evaluation objectives

Increased complexity in the objectives of the evaluation

Level of aggregation in the evaluation process

Level of assessment group

Level in the research and teaching process

Figure 1. The evaluation process rests upon three parameters: the different levels of aggregation of the data that are collected; the different levels in the research and teaching process; and the different compositions of the assessment groups responsible for the evaluation process. By balancing these three parameters, the complexity in the objectives of the evaluation can be increased to the final aim, that is, to monitor the effect of research and education on the health status of the population.

matches the anticipated accomplishment as formulated in the overall policies and objectives set by society." The purpose of the evaluation process is to guarantee that the policies and objectives that have been set by society are achieved. From the governmental point of view, these overall policies regarding science and education are: • • • • • • •

to initiate relevant research and education; to maintain and enhance research and teaching quality; to advance knowledge and technological capability; to increase economic and social returns; to achieve better management; to produce qualified personnel at the graduate, postgraduate, and professional levels; and to achieve economic, social, and cultural benefits.

Focusing on medical research and education, the aims and objectives can be formulated as: • to promote and maintain the good health of the population; • to enlarge understanding of the causes of disease; 480

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Assessment of research and education International National

\

Sector-bodies

University Department Research group

Scientist

Industry

Teacher

Figure 2. Different levels of aggregation of collected data in the evaluation process. The basic output in the research/teaching process is generated by the individual scientist/teacher. His/her results can then be aggregated on several different levels—that of the research group, the department, the faculty, the nation, and finally the world. Each level adds new possibilities of interdisciplinary integration of the data and new dimensions in the evaluation.

• to provide better means of preventing, diagnosing, and treating illness; and • to improve quality of life. THE EVALUATION PROCESS

The development of a method for evaluation is a key part of obtaining efficient research and education (8). The assessment of a whole discipline, such as medicine, is a very complex process that demands a comprehensive evaluation strategy. The overall aim is to verify the maintenance and improvement of human health in relation to the resources that are allocated for medical research and education. Thus, the most logical approach would be to design an assessment system that accurately measures the end product—human health. In order to achieve this measurement, the evaluation process must be looked at in a reductionist manner. The different levels of complexity in the objectives are depicted in Figure 1. The process rests upon three parameters: the different levels of aggregation in the data that have been collected for the evaluation; the different levels in the research and teaching process; and the different compositions of the groups who are responsible for the evaluation. By balancing these three parameters, the complexity in the objectives of the evaluation can be increased until they reach the final aim-monitoring the effect of research and education on the health of the population. An imbalance in the choice of parameters will render results that would reflect a false or overinterpretation and thus will be essentially useless as a basis for decisions. Thus, it is extremely important first to define the specific objectives of any particular evaluation, and then to choose the appropriate levels of each parameter. Figure 2 shows the different levels of aggregation in the evaluation process. The basic output in the research/teaching process is generated by the individual scientists or teachers. Their results can then be aggregated on several different levels: that of INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3,1992

481

Eriksson

The medical research process

Health Services Clinical Basic Figure 3. Different levels in the research and teaching processes in the medical discipline. The research ranges from basic investigator-initiated research to broadly integrated projects within health service, education, and care. In the basic research process, facts are collected and concepts formulated without specific applications in sight. Clinical research is directed toward patient-oriented studies on the understanding of normal and disordered functions, clinical investigations and observations, diagnosis, therapy, and prevention, all of which are of relevance to the maintenance of good health and provide social and economic benefits. Applied research is goal-oriented and aims at solving defined problems. Finally, health service research includes projects that integrate medicine, sociology, and economics.

the research group, the department, the faculty, the nation, and finally the world. Each level adds new possibilities for the interdisciplinary integration of the data and new dimensions in the evaluation. When considering the levels of aggregation in the evaluation process, the different types of research systems within medicine must be taken into account. In universities, most of the research is initiated by an individual investigator, whereas in industry, the research is primarily motivated by the special needs of a certain sector of the society or is formulated by the needs of a particular industry. The different research systems interact, and thus the data that are obtained in the evaluation must be related to the type of research system in which they are generated. The different levels in the research and teaching processes in medicine are shown in Figure 3. The research ranges from basic research that is initiated by the investigator to broadly integrated projects within health service, education, and care (14). Basic research collects facts and formulates concepts without necessarily having specific applications in sight. This type of research is mainly initiated by its investigators and it develops by its own internal force. The result of this work will hopefully bear fruit in the future. Clinical research is directed toward patient-oriented studies that aim at understanding normal and disordered functions, clinical investigations and observa482

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Assessment of research and education Create confidence; for the Evaluation

Evaluation design

Decision of Evaluation

Activity

The evaluation process

Data collection & data analyses

Report and recommendation^

Follow-up study

Implementation of changes

Figure 4 Different steps in the evaluation process. In the first step, the activity that is to be the target for the evaluation is identified, and a decision about the evaluation is made. In the next phase, an atmosphere of confidence for the evaluation among those who are to be evaluated is created. A design for that particular evaluation is made and implemented. Data are collected and analyzed according to the strategy in the design. The evaluation group writes its assessment report. The suggested changes are implemented. After a defined period of time, a follow-up study is performed.

tions, diagnosis, therapy, and prevention - all of which are relevant to the maintenance of good health and provide social and economic benefits. Applied research is goaloriented and aims at solving defined problems. Finally, health service research includes projects that integrate medicine, sociology, and economics. These projects are of potential relevance to those who use and manage health care services. Examples of the different levels of competence of the group that should assess research and teaching are: • • • • •

peer review group (international or national); expert group (international or national); parliamentary group; consultants; research and development (R&D) funding bodies (e.g., research councils, private foundations, etc.); • R&D user groups (e.g., industry, county councils, etc); and • university administrators. The most complex and competent group is a peer review committee that consists of international experts representing the different fields that are related to various aspects INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

483

Eriksson

of medical sciences and health services. The next level is a group that includes experts from specific areas of a single medical discipline. A parliamentary group has a very different composition and usually lacks specific competence within the medical area. This group is more suited to evaluate the importance of the research and teaching from a social point of view. A consultant can be appointed to assess a specific, more limited area of the field of medicine. Different bodies that fund research and development, such as the research councils, undertake assessment of the use of resources that they have allocated to specific research projects. Usually, these organizations do not evaluate the medical discipline in an integrated fashion. Users of the fruits of research and development (e.g., industry, county councils, etc.) perform evaluations of the areas that are of particular interest to their activities. Finally, university administrators evaluate different activities on the basis of statistical data from the performance of the university. Figure 4 depicts the different steps that can be identified in the process of evaluations. In the first step, the activity that is to be the target for the evaluation is identified and a decision about the evaluation is made. In the next phase, a feeling of confidence in the evaluation has to be created among those who are to be evaluated. A design for that particular evaluation is made and implemented. Data are collected and analyzed according to the strategy that was laid out in the design. The evaluation group writes its report of the results of the assessment, which should also contain recommendations for change. The next step is the implementation of these changes. After a defined period of time, a follow-up study should be carried out to verify how the recommendations have been followed and what effect they have had on the activity. In order to be successful, several specific questions must be asked prior to the evaluation: • • • • • • • •

What focus should it have? On which level should it be implemented? Who is the target group for the report? What design is to be used? Who is responsible for the evaluation? Which data should be collected? How shall these data be collected? How shall the report be presented?

ASSESSMENT PARAMETERS AND PERFORMANCE INDICATORS

In the design phase of an evaluation, it is critical to decide which parameters and indicators should be used to achieve the most relevant assessment of the particular activity. The choice must be based on the complexity of the objectives of the evaluation. Some performance indicators for a whole university include (14): • • • • • •

types of research activities; output and impact of the research; types of educational activities; concentration and selectivity in the allocation of resources; personnel; international activity;

484

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Assessment of research and education

120 100

Project grants (peer-reviewed) 4) Scientific publications (quality filtered by impact factor)

80

O Registered patents 0

03

60

Citations # Esteem indicators

o

0 Visiting scientists on PhD-lcvcl and above

40

0

PhD-disscrtations 0

Postdoctoral training programs

Graduate student training programs

20

£

0 Number of tutors Number of students ^

0 0

20

40

60

80

100

120

. Quantity Figure 5. Relative reflection of quality (peer-reviewed grants; scientific publications filtered by impact factor; registered patents) and quantity (numbers of students; numbers of tutors) of different assessment parameters.

• managemenfand administration; and • flexibility (e.g., the number of staff who are on fixed-term contracts). For a specific research project, several questions can be asked to generate a basis for the assessment of the project: • What is the intrinsic scientific merit of the work and the expectation of scientific advance in the long and short term? • What is the relationship of the project to other work in the same area that is being carried out by other research teams? • What is the project's applicability and exploitability? • What is the significance of the project for education and training? • What is the priority of the work and its relation to other projects that are currently being considered for support?

Certain parameters must be used as the basis for the evaluation of research and teaching; a few examples of such constants are shown in Figure 5. Some of these parameters do more to reflect the quality (peer-reviewed grants; scientific publications filtered by impact factor; registered patents) and others the quantity (numbers of students; numbers of tutors) of the activity that is being assessed. Two of the more popular methods that are used to evaluate research projects are citation analysis and peer-review analysis. These methods can be defined as follows: "citation analysis refers to the use of citation counts in international journals to produce quantitative assessments of the impact of the research performed"; "peer-review analysis is a subjective method based on individual scientist perception of the contributions by others to the scientific projects." INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

485

Eriksson PEER-REVIEW ANALYSIS

One of the most appreciated assessment methods is the peer-review analysis. The pros and cons of this method have been debated intensively (2;4;8;9;17). Some of the problems with this type of evaluation include: • • • • • •

possible lack of objectivity; tendency to support the conservation of ongoing projects; gives the more visible scientists higher ratings (the halo effect); disagreement about assessment criteria among the peers; difficulties in deciding what constitutes scientific quality; and high costs for this kind of evaluation.

One of the key questions for peer-review analysis is how to define "good science." The quality of scientific work is very complex, and some of the objective properties of the research (e.g., how it is planned and carried out; how methods have been chosen and used; how conclusions have been established and argued, etc) are related to standards that are based on social conventions (16). In this context, some key terms must be defined: • Quality: describes how well the research has been done; • Impact: expresses the actual influence of a specific result on surrounding research activities at a given time; and • Importance: refers to the potential influence on surrounding research activities of a specific result. Good science is characterized by high quality, impact, and importance.

There are several examples of great scientific discoveries that at their first presentation were rejected by so-called experts and of outstanding papers that were refused by referees from highly recognized scientific journals. An example of this is the case of Rosalyn Yallow who won the Nobel prize a few years ago for her work on radioimmunoassay. Her key article was initially rejected by the very influential journals. Later she made the pertinent comment that "the truly imaginative are not being judged by their peers. They have none." Thus, there is an obvious risk that peer-review evaluation might strengthen the present trends in research and teaching and discourage scientific projects and curriculum that include new imaginative ideas and proposals. It must be observed that the most creative ideas are usually generated in the interface between different traditional disciplines. CITATION ANALYSIS

The other most popular way to assess scientific quality is by citation analysis. In this method, citation scores are used to measure the penetration, visibility, utility] or impact of a particular result. In Figure 6, citation patterns have been categorized into four groups (19). Of these, the categories high citation rate in high-impact journals and low citation rate in low-impact journals are very clear in the evaluation. The category high citation rate in low-impact journals, however, will give a false impression of high research quality if an "impact filter" is not used. The group low citation rate

486

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Assessment of research and education

Mean Citation Impact

i High citation y/ rate in high / impact j o u r n a l s / ^

LU

hHigh citation rate in low impact journals

CC

Z

Q

aio yS

Q LU yS

3S

rr

CO

O

/^ y/

j

/ ^ Low citation | rate in low 1 impact journals I

Low citation rate in high impact journals

EXPECTED CITATION RATE Figure 6. In citation analysis, four main categories of journals can be identified (19). Of these, the two categories high citation rate in high-impact journals and low citation rate in low-impact journals are very clear in the evaluation. The category high citation rate in low-impact journals, however, will give a false impression of high research quality if an "impact filter" is not used. The group low citation rate in high-impact journals, on the other hand, will not sufficiently reflect the high quality of the research published.

in high-impact journals, on the other hand, will not sufficiently reflect the high quality of the research. Citation analysis has been used extensively as a valuable instrument to assess the performance of medical research. However, the method has certain limitations and it has been criticized for not accurately measuring scientific quality (l;3;5;6;10;ll;12; 13;15;18). It may even be abused and drive research in a wrong direction. Some of the problems with citation analysis include: self-citation and citation cartels; serial-story publications; negative citations; printing errors; homonyms; citation tradition within a journal; language in which the paper is written; methodological papers are cited more frequently than are applied papers; the dynamics of the research field; different time lags in different fields; and the question of who did the work (citations are ordered according to the first author)

From all of the experience in using citation analyses on a larger scale, it can be concluded that it is not a good idea to use this method to assess individual scientists or small research groups over a short period of time.

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1 9 9 2

487

Eriksson QUALITY ASSESSMENT OF MEDICAL EDUCATION Besides its research activities, the educational programs of the university must also be constantly evaluated. Most of the considerations described above regarding the assessment process for research are also applicable to the evaluation of education. Some of the parameters that can be used in the assessment of undergraduate and graduate programs include: • • • • • • •

continuous recording of students' performance and examination results; repeated evaluation of every course by standardized questionnaires that are given to all students; assessment of the accomplishment of different courses in relation to the aims of the curriculum; assessment of students' clinical skills after clinical courses; assessment of students' abilities to integrate knowledge from different courses; hearings with students and teachers regarding their apprehension of a specific course; and evaluation of programs with elective courses.

In the existing literature on evaluation, a clear distinction is usually made between comprehensive and partial evaluations (20). The latter ones are by far the most common and the majority of them focus on either research or teaching. However, a very important aspect of the function of the university is the interaction between research and teaching. In order to guarantee a high quality of education, an efficient network of exchange between the scientific and teaching activities must exist. Thus, it is very important for the university to undertake comprehensive evaluations in which the flow of information from research to teaching is assessed. CONCLUSIONS The process of quality assessment is very complicated; it is extremely important to begin by identifying the purpose of the evaluation, the level on which it should take place, and the methods that are to be used. The conclusions from this article can be summarized as a recommendation to: • Create confidence in the evaluation among the scientists or teachers who are to be evaluated and different target groups before evaluations start; • Find experts for whom the scientists or teachers have professional respect; • Choose assessment methods in relation to the focus, level, and objectives of the evaluation; and • Make the report of evaluation's findings short and explicit. REFERENCES

1. Anderson, A. No citation analyses please, we're British. Science, 1991, 252, 639. 2. Ceci, S. J., & Peters, D. P. Peer review: A study of reliability. Change, 1982, 44-48. 3. Cole, J. R., & Cole, S. Measuring the quality of sociological research: Problems in the use of the Science Citation Index. American Sociologist, 1971, 6, 23-29. 4. Cole, S., Cole, J. R., & Simon, G. A. Chance and consensus in peer review. Science, 1981, 214, 881-86. 5. Garfield, E. Citation analyses as a tool in journal evaluation. Science, 1972, 178, 471-79. 6. Garfield, E. The impact of citation counts. Times Higher Education, 1988 (suppl. 15), 12. 7. Hansen, H. F. Effective research - A project concerning the development of evaluation methodology. In Nordic Science Policy Council. Evaluation of research. Nordic experiences. FPRpublication, 1986, 5, 99-116. 488

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Assessment of research and education

8. Harnad, S. (ed.). Peer commentary on peer review: A case study in scientific quality control. Cambridge: Cambridge University Press, 1983. 9. Harnad, S. Rational disagreement in peer review. Science, Technology, and Human Values, 1985, 10, 55-62. 10. King, J. A review of bibliometric and other science indicators and their role in research evaluation. Journal of Information Science, 1987, 13, 261-76. 11. Luukkonen-Gronow, T. Bibliometrics as a tool for evaluation. In Evaluation of research. Nordic experiences. Nordic Science Policy Council. FPR-publication, 1986, 5, 127-52. 12. MacRoberts, M. H., & MacRoberts, B. R. The negational reference or the art of dissembling. Social Studies Science, 1984, 14, 91-94. 13. MacRoberts, M. H., & MacRoberts, B. R. Quantitative measures of communication of science: A study of the formal level. Social Studies Science, 1986, 16, 151-72. 14. Medical Research Council. Corporate strategy 1989. London: Medical Research Council, 1989. 15. Merton, R. K. The Mattew effect in science. Science, 1988, 159, 56-63. 16. Montgomery, H., & Hemlin, S. Conceptions of scientific quality. In Evaluation of research. Nordic experiences. Nordic Science Policy Council. FPR-publication, 1986, 5, 117-26. 17. Niiniluoto, I. Peer review: Problems and prospects. In Evaluation of research. Nordic experiences. Nordic Science Policy Council. FPR-publication, 1986, 5, 7-29. 18. Persson, O. Bibliographies and bibliometrics—Some problems of measurement. In Evaluation of research. Nordic experiences. Nordic Science Policy Council. FPR-publication, 1986, 5, 186-202. 19. Persson, O. Personal communication, 1989. 20. Premfors, R. Evaluating basic units in higher education. Stockholm: University of Stockholm, GSHR Report no. 33, 1985. 21. Quade, E. In Analysis for public decisions. New York: North Holland, 1982, 262.

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

489

Quality assessment of medical research and education.

Different aspects of the process of evaluating research and education are discussed, using the discipline of medicine as a model. The focus is primari...
2MB Sizes 0 Downloads 0 Views