European Journal of Radiology, 14 (1992) 0

1992 Elsevier

EURRAD

Science Publishers

147-156

B.V. All rights reserved. 0720-048X/92/$05.00

147

00256

Technology assessment - an American view John R. Thornbury’ and Dennis G. Fryback ‘Department

of Radiology and ‘Department

of Industrial Engineering.

(Accepted

Key words: Technology

assessment,

MRI; Technology

6 November

assessment,

University of Wisconsin, Madison, WI, U.S.A.

1991)

imaging; Magnetic resonance,

technology

assessment

Abstract

Technology assessment of an imaging method such as magnetic resonance (MR) is a complicated concept that includes aspects ofepidemiology, biostatistics, clinical efficacy determination, outcomes assessment, and knowledge of the technical and medical bases of the imaging method under study. To enhance understanding of the interrelations of the different aspects of technology assessment, a hierarchical model is proposed. This extends from the basic imaging physics domain through clinical applications in diagnosis and treatment decisions to patient outcomes and ultimately societal considerations. This overview paper presents the conceptual continuum of the hierarchical model, and then describes the interrelationships among efftcacy, cost effectiveness, and outcomes research as components embedded within the context of technology assessment. It also points out how scientific quality of research in MR imaging assessment can be enhanced through improved research design that takes into account basic concepts of the model.

Introduction The purpose of this overview paper is to present a description of how our multidisciplinary research team (at the University of Wisconsin and University of Rochester) plus other USA groups have attempted to define, model, and pursue technology assessment. Specifically, we have been evaluating an emerging, expensive, high technology imaging method, magnetic resonance imaging (MR). Some of these concepts we have described before in the literature, as cited in the references. Technology assessment is a complicated, often ambiguous, challenging research area where definitions, methodology, research design, and analysis have gradually become better defined and understood. The US Congress’ Office of Technology Assessment defines Technology as, ‘the techniques, drugs, equipment, and procedures used by health care professionals in delivering medical care to individuals and the systems within which such is delivered’ [ 11. The US Institute of Address for reprints: Dr J.R. Thornbury, Department of Radiology, University of Wisconsin, Madison, WI 53792-3252, U.S.A.

Medicine’s definition of Technology Assessment is, ‘any process of examining and reporting properties of a medical technology used in health care, such as safety, efficacy, feasibility, and the indications for use, cost, and cost effectiveness, as well as social, economic, and ethical consequences, whether intended or unintended’

[21. These definitions are broad and sweeping. The many areas of knowledge and expertise involved in doing technology assessment include: epidemiology, biostatistics, clinical decision making, efficacy determination, outcomes assessment, and knowledge of the technical and medical bases of a technology under consideration. Additionally, financial and health services management aspects include evaluation of cost-effectiveness, productivity, prioritization of health systems financial investments, and ultimately ethical and social impact considerations [ 31. Basic tenets of technology assessment in iUR imaging First, it is apparent that some basic conceptual model structure would be helpful in approaching the assessment of the use and impact of MR at multiple

148

levels in the health care system. Within the broad scope of technology assessment, in general, is embedded the subject of efficacy (or usefulness) of diagnostic imaging TABLE

1

A hierarchical

model of efficacy: typical measures

of analyses

Level 1

Technical efficacy. Resolution of line pairs. Modulation transfer function change. Gray-scale range. Amount of mottle. Sharpness.

Level 2

Diagnostic accuracy efficacy Yield of abnormal or normal diagnoses in a case series. Diagnostic accuracy (percentage correct diagnoses in case series). Sensitivity and specificity in a defined clinical problem setting. Measures of area under the ROC curve.

Level 3

Diagnostic thinking efficacy Number (percentage) of cases in a series in which image judged ‘helpful’ to making the diagnosis. Entropy change in differential diagnosis probability distribution. Difference in clinicians’ subjectively estimated diagnosis probabilities pre- to post-test information. Empirical subjective log-likelihood ratio for test positive and negative in a case series.

Level 4

Therapeutic efficacy Number (percentage) of times image judged helpful in planning management of the patient in a case series. Percentage of times medical procedure avoided due to image information. Number or percentage of times therapy planned pretest changed after the image information was obtained (retrospectively inferred from clinical records). Number of percentage of times clinicians’ prospectively stated therapeutic choices changed after test information.

Level 5

Patient outcome efficacy Percentage of patients improved with test compared with/without test. Morbidity (or procedures) avoided after having image information. Change in quality-adjusted life expectancy. Expected value of test information in quality-adjusted life years (QALYs). Cost per QALY saved with image information.

Level 6

Societal efficacy Benefit - cost analysis from societal viewpoint. Cost - effectiveness analysis from societal viewpoint.

examinations, such as MR. Recently, Fryback and Thombury have described a hierarchical model (Table 1) for appraisal of the literature on efficacy of imaging [4]. This model also can serve as a structure within which more scientifically sound research design can be achieved for assessing the impact and usefulness of MR imaging. Second, it is natural that the emergence of a new technology (like MR) goes through stages before it becomes generally used by physicians in daily clinical practice. Sox et al. have described very clearly in their book this progressive process [5]. The methods of research design and expectations from research are different for the earlier versus later stages following introduction of MR for use in particular clinical problems. Finally, in the literature it has been clearly established that high quality, scientifically sound, published efficacy studies were relatively few in the first half of the 1980s when the number and distribution of MR machines in the United States first accelerated exponentially. Kent and Larson’s benchmark paper in 1988 thoroughly analyzed and pointed out these scientific limitations in reported studies during this early phase of MR development. They addressed MR use in the brain and spine in the U.S. literature up through 1986 [6].. Subsequently, papers have appeared describing ways to at least approach, if not solve, these research design limitations [7-91. It is not surprising that early studies of a new technology often are quite vulnerable to being biased and limited in general application of results when one understands the Sox et al. description of the stages of introduction of a new diagnostic technology [5]. The six-tiered model - a conceptual continuum for efficacy

This model, recently described by Fryback and Thornbury [4], is not new. Parts of the levels began to be described in the early 1970’s [ lo- 121. Lusted incorporated some levels of the model in the design of the American College of Radiology Efficacy Study of the early 1970s [ 131. Fineberg described in 1978 some elements of all six levels in his introductory overview to a journal issue devoted to efficacy of imaging, particularly computed tomography (CT) [ 141. -The origin of this six-tiered model began to be developed in 1982 by work of a scientific committee of the National Council of Radiation Protection and Measurements (NCRP) (see Acknowledgement). Its earliest form was published in a paper by Fryback in 1983 [ 121. The efficacy model is based on a conceptual framework that has evolved along a path similar to the development of conceptual thinking about medical decision

149

making. Concomitant with the initial work of the NCRP, in the early 1980s was the American College of Physicians’ initiative to study and determine efficacy of tests and procedures (some of which were imaging methods) [ 151. Subsequently Brook and Lohr at the Rand Corporation [ 161 and Guyatt et al. at McMaster University [ 171 began processes at their institutions that resulted in further conceptual approaches about assessment of technologies. For this six-tiered model of efficacy, the basic generic definition of efficacy is, ‘the probability of benefit to individuals in a defined population from a medical technology applied for a given medical problem under ideal conditions of use’ [ 161. Efficacy, as defined here, overlaps partly with the idea of ‘effectiveness,’ which reflects the performance of a medical technology (e.g., imaging) under ordinary rather than ideal, conditions [ 161. This six-tiered hierarchical model extends from basic laws of physics of imaging, through practical clinical use in decisions about diagnosis and treatment, to more general patient outcome and societal issues (Table 1). It takes into account that the production and use of information from diagnostic imaging is a continuum process. An imaging device records images of a patient in an imaging medium or system (e.g., MR). These images are interpreted by an observer and diagnostic and prognostication statements are made. The physician managing the patient then takes this information, puts it together with the patient’s clinical presentation and laboratory results, and makes diagnostic estimates and treatment choices. This process is initiated by the physician selecting a particular imaging examination for a specific patient. The patient then has the examination, and the physician uses the results of the imaging examination in the process of clinical decision making. This individual patient based process is embedded in a larger health care system environment which ultimately has an influence on what different diagnostic imaging resources are available for the physician’s selection for use in a particular patient’s clinical problem. The model is a hierarchical one. A key feature is that for an imaging examination to be efficacious at a higher level, it must be efficacious at lower levels. However, the reverse is not necessarily true. Increase in efficacy at a lower level (e.g., improved imaging technical quality, level 1) does not guarantee improvement in efficacy at a higher level (e.g., patient outcome; level 5). Following is a brief expansion on each level: Level 1: Technical efficacy This area is the purview of the diagnostic imaging physicist. The properties of a specific imaging system

are measured in physical terms such as resolution of line pairs, sharpness, gray scale range etc. (Table 1). These calculations permit comparison of one system versus another on the basis of physics attributes. Other influences include potential for artifacts from the system or the patient, and the ability of the technologist to operate the imaging system optimally. Level 2: Diagnostic accuracy eff;cacy Calculations here attempt to measure the performance of the imaging system to provide diagnoses. It is important to keep in mind that a human observer is required to interpret and extract the diagnostic information from the images. Measures listed in Table 1 typically are used to express diagnostic accuracy. ROC analysis has become a common method for reflecting diagnostic accuracy [ 18-211. While diagnostic accuracy efficacy measures results from the joint function of image quality and interpretation by a human observer, there is another important aspect to keep in mind. That is, diagnostic accuracy efficacy is also a function of the clinician requesting the examination. It is this selection process that controls both the sensitivity and specificity of the imaging examination to the extent that this varies with the spectrum of the disease under study. While there are acknowledged problems of spectrum bias in control patients [ 221 and problems of establishing the true (or gold standard) diagnosis [23], ROC analysis is feasible and of practical value. However, showing that an imaging examination has relatively high diagnostic accuracy does not necessarily result in improved patient management and/or improved patient outcome. Level 3: Diagnostic thinking efficacy At this level, there is a basic assumption that in a particular patient’s problem the patient’s outcome cannot be affected by the image information unless the physician is led to do something different than would have been done without the imaging information. Likewise, the physician’s treatment choice should not change unless something changed in the physician’s diagnostic thinking. The imaging information may change the differential diagnosis, strengthen a competing diagnostic hypothesis, or simply reassure the physician that there is not an occult, unexpected serious disease (such as malignancy). There have been some studies in which physicians estimated before receiving imaging results the probabilities of the most likely diagnosis and the most important diagnosis. These probabilities were then re-estimated after receipt of the imaging results. Due to the

150

subjective nature of such probability estimates, there is controversy about difference in impact for differing magnitudes of change. However, when 120 change in probability (pre- versus post-imaging) occurs, then the imaging information is very unlikely to have been useful (or ‘efficacious’) [ lo]. There is another aspect of imaging effect on diagnostic thinking that occurs. Physicians commonly place great value on results that rule out unlikely and rare, but serious, diseases. This is not necessarily reflected by measuring pre- versus post-imaging diagnostic probabilities. Nevertheless, if no change in probabilities occurs, then the imaging information is very unlikely to have had an impact on treatment choice; and therefore, also no impact on patient outcome [4]. The hierarchical nature of the model accounts for this conclusion. Level 4: Therapeutic efficacy

As noted above, an imaging examination may influence diagnostic thinking and still not affect patient treatment. The greatest impact would be if treatment changed or was averted pre-and post-receipt of imaging results. Conversely, imaging results that have no impact on treatment choice, are very unlikely to benefit the patient, except through reassurance of the physician. While logistically difficult to conduct, studies collecting pre- and post-imaging treatment choices and probabilities have been done. For example, it is possible to show that imaging results caused cancellation of previously scheduled surgery [24]. Level 4 studies in CT, likewise, showed an impact on treatment [25-261. Level 5: Patient outcome efficacy

Studies at this level are the first point in the hierarchical model where the expected costs (e.g., radiation, monetary, pain, risk to life etc.) of an imaging examination may be directly weighed against expected benefits. Benefits include improving life expectancy, improving quality of life, avoiding other tests etc. With information from level 5 studies, a physician may be able to make more informed decisions about whether or not to request the examination based on these outcome variables. A level 5 study traditionally requires a prospective, randomized, controlled trial (RCT). This requires withholding the examination from one subgroup of patients. This often raises serious ethical problems making the RCT not possible to carry out. Nevertheless, some major studies have been done that have demonstrated major impact on patient outcomes. Most notable of these have been those involving mammography in the detection of breast cancer in women [27].

In the absence of the direct RCT approach, there is another avenue of assessment possible. This is the use of decision analysis based methodology [4]. One advantage of this approach is that each patient can have both competing examinations. This results in less research cost, a smaller size patient sample is required, and often less time is needed for the study. Good examples of this approach are the studies on imaging in the diagnosis of pulmonary embolus and its impact on treatment choices [28] and the use of CT versus MR in dementia [ 291. Level 6: Societal efficacy

Studies at this level go beyond the level of individual risks and benefits. These studies address the question of whether the cost borne by society as a whole for use of a specific technology is acceptable; given that it has been found (or anticipated to be) efficacious in individual applications at other levels. In this model, at this highest level, an imaging examination is efficacious to the extent that it is an efficient use of societal resources to provide medical benefits to society. The widespread advocacy and use of mammography for screening and detection of breast cancer, the introduction of very expensive nonionic low osmolar radiographic contrast agents, and the now widespread use of magnetic resonance imaging represent examples of areas for current analysis at the level of societal efficacy. Issues and policies involved in facing such level 6 problems were well stated by Fineberg [ 141 when he was writing about CT: ‘Evaluation of CT or of any dynamic medical technology will never provide final answers. Findings will be open to interpretation. Individual values and judgements will always play a role. Decisions about the development, reimbursement, and use of new technologies will continue to be made, however imperfectly. The great challenge, embodied in CT but embracing all of medicine, is to bring policy and practice into line with knowledge’ (p. 3). The relation of eff;cacy to cost-effectiveness

From a relatively narrow viewpoint, many of the types of measurements listed at the levels in Table 1 could be used in a cost-effectiveness analysis. Basically, cost-effectiveness studies compute a cost per unit of output for a medical technology. For example, one could compute cost per surgical operation avoided, cost per treatment choice changed, or cost per correct diagnosis achieved. From a broader (now more commonly the case) viewpoint, cost-effectiveness analysis can be used to

151

compare resources invested in a technology with different investments in other methods of health care. For this purpose, a more general measure of output (as at levels 5 and 6) is required. Commonly used measures are l@ years saved or quality-adjusted l(fkyears (QALYs) saved. Formal cost-effectiveness and cost-benefit analysis requires the creation of a complex decision tree that includes a variety of clinical and statistical data as well as many types of cost data. The data needed to complete the tree in an imaging cost-effectiveness analysis often include results from scientifically sound efficacy studies at levels 2 and 4 plus statistical information from the published literature about treatment choices and consequences, as well as patient outcomes. Thus, the hierarchical model for efficacy is closely related to costeffectiveness analysis. The model also provides a structure within which to assess the limitations and strengths of research study designs in the context of the continuum of the efficacy domain. Emergence and maturation of a new technology A new imaging technology, such as MR, goes through several stages of development. When use in human subjects and patients begins, one could characterize this as the ‘discovery’ phase. The basic process is well described by Sox et al. in their book about medical decision making [ 51. At first, it is critical to determine whether or not the new imaging method provides information for making the diagnosis of severe, clinically obvious disease. Thus, initially, image features are correlated with actual anatomic/pathologic characteristics of the disease being studied. This has been the foundation of initial clinical trials of new imaging since the discovery of X-rays in 1895 by Roentgen. Going on from initial results indicating high level imaging/pathologic correlation, it is then important to demonstrate that the imaging method will not show falsely positive features in patients who do not have the disease under study (‘normal volunteer controls’). Thus, as Sox et al. put it, the new imaging method needs at first to be studied in the ‘sickest of sick’ and the ‘wellest of the well’ [ 51. Thus, one expects in these early ‘discovery’ studies that the true positive rate will be quite high in patients with high probability of disease. Likewise, in the ‘normal’ control subjects, the true negative rate should be high. Substantial spectrum bias (i.e., referral bias) is naturally inherent in these ‘discovery’ studies. Once the new imaging method has proven highly predictive of ‘disease’ and ‘no disease’, its use is extended to patients with less clinically obvious, lower

probability, of a particular disease. Then the true positive rate and true negative rate begin to decline and false-positive and negative rates begin to rise. As the spectrum of patients imaged becomes more generalized, the diagnostic accuracy expressed in sensitivity and specificity begin to stabilize. One could call this, the ‘spreading out’ or ‘diffusion’ phase. However, most of these studies done as the clinical use expands often are done in tertiary care academic centers where referral bias is inherent in their patient study samples. Generalization of these results to use of the new imaging method in more general, primary care type patient populations is quite limited. Nevertheless, these initial biased results provide a benchmark of accuracy that can be aimed for, but usually is not achieved, in more generalized patient populations. Through clinical trial and error in this fashion, there will emerge some general sense (often anecdotal) of the efficacy (or ‘usefulness’) of the new imaging method. When a meta-analysis of published results indicates a reasonable return at levels 2 (diagnostic accuracy) and 4 (treatment impact) of the model, then more formal technology assessment studies may be considered. These studies usually will include features and measures of level 5 and even level 6 from the hierarchical model of efficacy. Recently Phelps and Mushlin of our research group proposed a linkage between level 5 (patient outcome) and level 2 (diagnostic accuracy) data collection [30]. Their aim was to devise a method for identifying appropriate clinical uses of a new technology for expensive, large scale, ‘large ticket’ cost-type formal technology assessment. They devised a model, based on medical decision analysis, that assumes there is a societal threshold price, expressed in dollars per QALY. Above this threshold, society is unwilling to invest resources. Below this threshold, investment of resources is considered reasonable and unquestioned. For the sake of discussion, this threshold could be placed at about $50,000 per QALY. As a practical example, one could determine whether or not MR could replace an existing imaging method (such as CT) for a particular medical diagnosis and management problem. For example, (as in our own research at the University of Wisconsin) could MR replace CT/Myelography or plain CT in the diagnosis of disk-caused nerve compression [ 3 l]? Given that the ROC parameters are known from level 2 diagnostic accuracy studies reported in the literature, Phelps and Mushlin compute a ‘challenge region’ in the ROC diagram. The ROC curve constructed for the new imaging method (e.g., MR) must fall within the above ‘challenge region’ if its incremental cost per

152

QALY is less than the societal threshold. The Phelps/Mushlin approach addressing issues about incremental patient outcome efficacy, in effect, reduces the empirical task to that of conducting a level 2 study. This is a much more feasible task than conducting a prospective randomized trial. Increasing the scientific quality of imaging research

As Kent and Larson well documented, there have been previous limitations of scientific quality of imaging research studies in the literature [6]. They pointed out shortcomings in sample size, referral bias, and three diagnostic reference standard biases. These three biases were: (1) work-up bias; (2) test review bias; and (3) diagnosis review bias. Very few papers reviewed by Kent and Larson had sufficient sample size for statistical validity. The majority of papers had substantial referral bias. Patient samples often were highly selected, in tertiary care settings, and concentrated upon patients seen by one or a very limited spectrum of different subspecialty physicians. Thus, generalization of results to less specialized patient populations is limited. The three reference standard biases were defined by Kent and Larson as follows: 1. Work-up bias - occurs when the test result contributed to the inclusion or exclusion of a case from a study. This may increase true positive rates by excluding potentially diseased patients whose imaging results are ‘negative’. 2. Test review bias - occurs when the final (true) diagnosis or the results of the comparison test are used in the planning or interpretation of the images under study. This bias may increase true positive rates. 3. Diagnosis review bias - occurs when the diagnosis was affected by the test results under study. If the final (true) diagnosis is derived primarily from the imaging results under study, this may increase the true positive rate. In addition to true positive effect, these biases may also cause underestimation of false-positive rates. When imaging research studies are done in the course of efficacy and technology assessment projects, we, and others [7-91 have recommended that some basic experimental design features be used to offset the above problems. Clinical problem definition and hypotheses

The clinical problem should be defined as clearly and precisely as possible. Hypotheses to be tested should, likewise be as clearly stated as possible and relate logically, statistically, and epidemiologically to the problem

definition. Patient inclusion should be stated clearly.

and

exclusion

criteria

Patient sample size

Sufficient numbers of patients must be recruited to provide adequate statistical power to ensure valid conclusions from the research. Recently, Kent estimated that 30 to 70 patients would be required to compare the diagnostic accuracy of MR to a gold standard [32]. Thirty allows rough estimates, 70 would permit estimation within 7- 10%. For comparing MR diagnostic impact, (level 2) versus traditional imaging, lo- 150 patients could be required. Major differences could be identified in lo-20 patients. Minor differences could require up to 150 patients. For comparing therapeutic impact (level 4) or patient outcome (level 5), of two imaging methods, 20-500 cases might be required. The large numbers reflect how heavily sample sizes depend upon the expected magnitude of the effect. In 1990 Arkin and Wachtel presented a thorough discussion of sample size requirements [ 331. Patient referral sources

Referral bias (or spectrum bias) is related to the differences among patient populations in the spectrum of disease presentation and severity [34]. To offset referral bias requires inclusion of patients from differing medical practice environments across a broad spectrum of specialty areas. These include general internal medicine/pediatric and family medicine specialties. This increases the likelihood that patients presenting with low, intermediate, and high probability of disease will be included. With this type of balanced patient sample, generalization of results is enhanced. It is further helpful to collect the physician’s initial diagnostic probability estimates prior to use of the competing imaging methods under study. Describing this spectrum of diagnostic probabilities permits physician readers of the research result papers to determine how closely the patient sample resembles their own practice environments. Another advantage provided is that subgroups of patients with relatively higher and lower probability of disease can be analyzed separately. It is important to keep in mind that adequate sample size must be retained that favors statistical validity for subgroup analysis. Sensitivity and specificity of the competing methods will differ among the two populations [ 351. All patients have the competing examinations

If all patients have both of the competing imaging examinations and are followed for a reasonable time to determine patient outcomes with some assurance, this

153

should reduce work-up bias. This avoids, for example, bias that would result from patients having ‘normal’ or ‘negative’ examination results and not being followed further. This approach also avoids the ethical problems of withholding examinations from some portion of the patients that are required by a traditional randomized controlled trial (RCT). If the decision analytic approach is used in the experimental design, the patient sample size required for statistical validity is usually smaller than in a RCT. Furthermore, the research project will be less costly and may well require less time than a RCT type of project. Randomized blinded readings Interpretation of the imaging examinations by observers blinded to all the clinical information about the patients and results of the competing examinations, reduces test reviewbias. Studies have shown that knowledge of the results of one examination when interpreting the competing examination causes measurable effect that results in usually higher false-positive results [36-371. Blinded reading may be done prospectively [ 371 or retrospectively [ 311 and by single or multiple observers. Using multiple readers makes it easier to determine interobserver variation. However, this complicates particularly the prospective approach. The blinded reading format also requires standardization of terms used and grading methods. Short pilot reading exercises help orient the image interpreters and decreases the effect of differing calibration levels among observers. Expert gold standard panel and determination of true diagnosis Having an interdisciplinary expert panel determine the true diagnosis usually provides the least biased opinion. This is particularly helpful in decreasing ‘surgical’ versus ‘nonsurgical’ specialties’ biases. Again standardization of terms and grading methods are important to decrease variations among the experts. Having competing specialties represented also tends to make the results of the research more acceptable and believable to a wider spectrum of physicians. If the expert panel makes its judgement of the ‘true diagnosis’ in each patient based on all information but excluding specific imaging results, then another usually vexing bias can be avoided. This is ‘incorporation’ bias where the knowledge of the result of the imaging examination may strongly influence the expert physician’s probability estimate of the true diagnosis. This is particularly true in the situation where neither biopsy nor

autopsy results provide the traditional clinical/pathological correlation confirmation. If the panel subsequently is provided the imaging results in randomized fashion, they may revise their estimate of probability of disease. Comparing the diagnostic probabilities before and after receipt of the imaging results permits calculation of the extent of incorporation bias. Cost-effectivenessand cost benefit analysis These analyses require the creation of a complex decision tree that includes different types of clinical and statistical data. The data relating to the diagnostic accuracy of the competing methods can be derived from the research project data. Data needed to determine the consequences of diagnostic and treatment choices on patient outcomes is usually obtained from the literature. New computer software is available to facilitate aggregation of the data into a comprehensive model [38]. Relation of outcome research and technology assessment Outcome research was defined in the U.S. Omnibus Reconciliation Act of 1986 as “research with respect to patient outcomes of selected medical treatments and surgical procedures for the purpose of assessing their appropriateness, necessity, and effectiveness” [ 391. Four pioneering research programs were funded in the areas of prostate disease, cataracts, myocardial infarction, and low back pain. Simultaneously in 1986, the Institute of Medicine established the Council on Health Care Technology. As outlined by Rettig [2], the Council made important contributions to the conceptual development of health care technology assessment. These: (1) broadened the endpoints of technology assessment by including patient well-being and outcomes as criteria. (2) expanded outcomes to include functional and health status assessment. (3) refocused technology assessment on a specific clinical condition (e.g., low back pain) rather than the specific technology or procedures used by physicians. In effect this extended the conceptual approach beyond simply determination of accuracy (of imaging methods for example) to assessing physicians’ choices of procedures and technologies in treating a given condition. (4) asserted that diagnostic assessments had to include the effect of a diagnostic test on patient management and outcome as a central criterion for judging the utility of that test. In the private insurance sector, the Blue Cross/Blue Shield Association developed a Technology Evaluation Program which established criteria for assessing a technology. Among these were that scientific evidence must

154

permit conclusions concerning effect on health outcomes. In the medical sector in the 198Os, the American College of Physicians’ Clinical Efficacy Assessment project conducted studies of the diagnostic and therapeutic impact of a technology by analyzing the literature for evidence of these impacts. Likewise, the American Medical Association, during this time, established its Diagnostic and Therapeutic Technology Assessment Program. Recent large scale funding by the U.S. government’s support of health care technology assessment, outcomes research, and related areas was accomplished within the Omnibus Budget Reconciliation Act of 1989. This Act created from the former National Center of Health Services Research a much broader scope entity, the Agency for Health Care Policy and Research (AHCPR). This legislation (per Rettig’s summary [2 3) preserved continuity through health services research and technology assessment and added new efforts in information dissemination, data base development, and practice guidelines development. It refocused the AHCPR efforts on evaluation of medical practice. It also recognized that there is an evolutionary and educational process yet to be gone through. Additionally, new experimental methodologies need to be developed to flesh out these ambitious broad scope programs. Funds on the order of $100 million dollars were initially dedicated to this opening phase. Rettig observed that as a result of focusing on medical practice assessment in a multidimensional fashion, this potentially integrates traditional clinical research, epidemiology, and health services research. Further, the clinical trial declines from the premier position it has held. While clinical trial data are still of value, nonexperimental data drawn from other sources (e.g., insurance claims data bases and self-administered patient health status utility assessment questionnaires) assume increasing importance. Currently the AHCPR continues to fund many smaller research projects in technology assessment. However, its major funding now supports research by multidisciplinary patient outcomes research teams (PORT projects). Specific clinical conditions and procedures include: back pain, total knee replacement, acute myocardial infarction, cataracts, benign prostate hypertrophy, ischemic heart disease, biliary tract disease, diabetes, pneumonia, Cesarean section and other obstetrical procedures. The Agency conducts a multifaceted program to enhance the quality of health care by improving the scientific basis for medical practice. Additional important aims are to increase the involvement of patients and consumers in decision-making regarding their health care.

In the midst of all this large scale funding and conceptual rhetoric for AHCPP programs, much basic methodologic work needs to be developed and refined. Also, use of imaging technology (such as MR) is inherent in the PORT projects. Decisions by physicians about use of imaging methods require experimental data from well designed imaging efficacy (as a type of technology assessment) research studies. And, as many critics have pointed out, much better quality studies are needed. Discussion At the Nijmegen MR Symposium, March 28, 1992, addressing the topic of MR technology assessment, this paper is being presented to try to sort out the complicated concepts, definitions, methods of research and interrelationships of efficacy, technology assessment, and outcomes research. Further work needs to be done in understanding these interrelationships. Nevertheless, in the clinical practice area, use of magnetic resonance imaging needs to have a sound research based foundation of data that optimizes physicians’ decisions about MR use in clinical diagnosis and treatment. It is important that many ongoing Technology Assessment research programs of high quality be pursued at both the initial ‘discovery’ and later ‘diffusion’ phases of MR clinical use. It is appropriate in the early ‘discovery’ phase of a new use of MR in a specific clinical problem that research projects are not too expensive or too sophisticated, and yet still provide data that contain the least bias possible. After the ‘diffusion’ phase of a new clinical application of MR, a more sophisticated, high scientific quality, larger scale, often multiinstitutional based project is in order. Phelps and Mushlin have proposed a way to select clinical use of MR for specific large scale research projects [ 301. They describe a linkage between levels 5 and 2 of the hiearchical model. Decision analysis and medical decision making basic research concepts are implicit in their approach. Likewise, use of the hierarchical model that has been proposed for efficacy [4] permits experiments to be designed to address diagnostic impact (level 2 efficacy), therapeutic impact (level 4 efficacy), and patient outcomes (level 5 efficacy). Hopefully, some order will emerge from what to many outside observers is a confusing situation. As we continue to face and to deal with the basic concepts and interrelationships of efficacy, cost-effectiveness, technology assessment, outcomes research, and clinical practice assessment, future high quality research should result in better understanding and more clinically useful, valid experimental data and results.

155

Acknowledgements The model described in this paper was formulated in discussions of Scientific Committee Number 69 of the National Council on Radiation Protection and Measurements. Portions of the text draw on draft materials produced by that working group. The group was chaired by one of the authors (J.R.T.) and the other (D.G.F.) was a member. The authors gratefully acknowledge the contributions of the other members, Robert A. Goepp, D.D.S., Lee B. Lusted, M.D., Keith I. Marton, M.D., Barbara J. McNeil, M.D., Ph.D. and Milton C. Weinstein, Ph.D., who were involved in the initial model development meeting. Others subsequently involved in helping to elaborate aspects of the model were Kunio Doi, Ph.D., Charles E. Metz, Ph.D., Harvey Rudolph, Ph.D., Alvin I. Mushlin, M.D., and Charles E. Phelps, Ph.D. This study was supported by Grant # ROlHS05427 Agency for Health Care Policy and Research

References Quoted in Banta HD, Behney CJ, Willems JS. Toward rational technology in medicine. New York, NY: Springer Publishing Company, 1981; 5. Rettig RA. Technology assessment - an update. Invest Radio1 1991; 26: 165-173. Issues in technology assessment for health administration education. An unpublished white paper prepared by the PEW Technology Assessment Task Force. Presented at the annual meeting of the Association of University Programs in Health Administration, Washington, DC, 23 March 1991. Fryback DG, Thombury JR. The efftcacy of diagnostic imaging. Med Decis Making 1991; 11: 88-94. Sox HC Jr, Blatt MA, Higgins MC, Marton Kl. Medical Decision making. Boston, MA: Butterworths, 1988; 117-l 19. Kent DL, Larson EB. Magnetic resonance of the brain and spine: is clinical efficacy established after the first decade? Ann Intern Med 1988; 108: 402-424. Begg CB, McNeil BJ. Assessment of radiologic tests: control of bias and other design considerations. Radiology 1988; 167: 565-569. 8 McNeil BJ. Clinical research in radiology. Decisions in Imaging Economics 1990; 3: 24-29. 9 Thornbury JR, Kido DK, Mushlin AI, Phelps CE, Mooney C, Fryback DG. Increasing the scientific quality of clinical efficacy studies of magnetic resonance imaging. Invest Radio1 1991; 26: 829-835. 10 Thornbury JR, Fryback DG, Edwards W. Likelihood ratios as a measure of diagnostic usefulness of excretory urogram information. Radiology 1975; 141: 561-565. 11 McNeil BJ, Adelstein SJ. Determining the value of diagnostic and screening tests. J Nucl Med 1976; 17: 439-448. 12 Fryback DG. A conceptual model for output measures in costeffectiveness evaluation of diagnostic imaging. J Neuroradiol 1983; 10: 94-96. 13 Lusted LB, Roberts HV, Edwards W, Wallace DL, Lahiff M, Loop JW, Bell RS, Thornbury JR, Seale DL, Steele JP, Fryback

DG. Efficacy of diagnostic X-ray procedures. Available from the American College of Radiology, Reston, VA, 1980. 14 Fineberg HV. Evaluation of computed tomography: achievement and challenge. AJR 1978; 131: 1-4. 15 Schwartz JS, Ball JR, Moser RH. Safety, efficacy and effectiveness of clinical practices: a new initiative (editorial). Ann Intern Med 1982; 96: 246-241. 16 Brook RH, Lohr K. EBiciency, effectiveness, variations and quality: boundary crossing research. Med Care 1985; 23: 710-722. 17 Guyatt G, Drummond M, Feany D, Tugwell P, Stoddart G, Haynes RB, Bennett K, Labelle R. Guidelines for the clinical and economic evaluation of health care technologies. Sot Sci Med 1986; 22: 393-408. 18 Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143: 29-36. 19 Swets JA. Measuring the accuracy of diagnostic systems. Science 1988; 240: 1285-1293. 20 Hanley JA, McNeil BJ. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Med Decis Making 1984; 4: 137-150. 21 Angelos Tosteson AN, Begg CB. A general regression methodology for ROC curve estimation. Med Decis Making 1988; 8: 204-215. 22 Gur D, King JL, Rockette HE, Britton CA, Thaete FL, Hoy RJ. Practical issues of experimental ROC analysis. Selection of controls. Invest Radio1 1990; 25: 583-586. 23 Henkleman RM, Kay I, Bronskill J. Receiver operator characteristic (ROC) analysis without truth. Med Decis Making 1990; 10: 24-29. 24 Farnsworth PB, Steiner E, Klein RM, San Filippo JT. The value of routine preoperative chest roentgenograms in infants and children. JAMA 1980; 244: 582-583. 25 Wittenberg J, Fineberg HV, Black IB, Kirkpatrick RH, Schatfer DL, Ikeda MK, Ferrucci JT, Jr. Clinical efficacy of computed body tomography. Am J Roentgen01 1978; 131: 5-14. 26 Wittenberg J, Fineberg HV, Ferrucci JT, Simone FJ. Clinical efficacy of computed body tomography II. Am J Roentgen01 1980; 134: 1111-1120. 27 Mushlin AI. Diagnostic tests in breast cancer: clinical strategies based on diagnostic probabilities. Ann Int Med 1985; 103: 79-85. 28 The PIOPED Investigators. Value of the ventilation/perfusion scan in acute pulmonary embolism. Results of the Prospective Investigation of Pulmonary Embolism Diagnosis (PIOPED). JAMA 1990; 263: 2753-2759. of computerized 29 Simon DG, Lubin MF. Cost-effectiveness tomography and magnetic resonance imaging in dementia. Med Decis Making 1986; 5: 335-354. 30 Phelps CE, Mushlin AI. Focusing technology assessment. Med Decis Making 1988; 8: 279-289. 31 Thombury JR, Fryback DG, Turski PA, Javid M, McDonald JV, Bienlich BR, Gentry LR, Sackett JR, Martin PA. Comparison of MR vs CT myelography and plain CT in the diagnosis of disk-caused nerve compression in acute low back pain patients. Presented at the 77th Scientific Meeting of the Radiological Society of North America, Chicago, IL, 2 December 1991. 32 Kent DL. Clinical efficacy of MR needs rigorous study. Diagn Imaging 1990; 12: 69, 71, 161. 33 Arkin CF, Wachtel MS. How many patients are necessary to assess test performance? JAMA 1990; 263: 275-278. 34 Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 1978; 22: 926-930.

156 35 Hlatky MA, Pryor DB, Harrell FE, Jr., Califf RM, Mark DB, Rosati RA. Factors affecting sensitivity and specificity of exercise electrocardiography. Am J Med 1984; 77: 64-71. 36 Eldevik OP, Dugstad G, Orrison WW, Haughton VM. The effect of clinical bias on the interpretation of myelography and spinal computed tomography. Radiology 1982; 145: 85-89. 37 Ritkin MD, Zerhouni EA, Gatsonis CA, Quint LE, Paushter DM, Epstein JI, Hamper U, Walsh PC, McNeil BJ. Comparison

of magnetic resonance imaging and ultrasonography in staging early prostate cancer. N Engl J Med 1990; 323: 621-626. 38 Hollenberg JP. SMLTREE, Version 2.9, New York, NY, 1986. 39 Report to Congress: Progress of Research on Outcomes of Health Care Services and Procedures. Introduction and Overview,p. 1. Agency for Health Care Policy and Research. AHCPR Pub. No. 91-0004, May, 1991.

Technology assessment--an American view.

Technology assessment of an imaging method such as magnetic resonance (MR) is a complicated concept that includes aspects of epidemiology, biostatisti...
1MB Sizes 0 Downloads 0 Views