Downloaded from www.ajronline.org by 117.244.21.45 on 10/05/15 from IP address 117.244.21.45. Copyright ARRS. For personal use only; all rights reserved
17
Perspective
How to Evaluate William
the Radiology
Literature
C. Black1
We, as radiologists, have acquired much influence over the way medicine is practiced because of the dramatic technological progress in diagnostic imaging. Many more radiologic tests are performed than ever before, and the scope of our specialty continues to broaden. In addition, as clinicians become less familiar with the rapidly evolving imaging technology, they increasingly rely on our interpretations of radiologic findings and our advice on whom, how, and when to test. As our influence grows, so does our need for information about the tests we perform and the diseases we diagnose. Because our individual experience is limited, we each must rely on the radiologic literature for information relevant to the increasing scope of our work. However, we must read this literature critically for two reasons. First, evaluations of radiologic tests published in either the radiologic or nonradiologic literature almost invariably contain methodologic flaws [1]. These flaws can result from poor training, carelessness, or bias of clinical investigators; unavoidable logistical, financial, and ethical constraints on clinical research; and the madequacy of the established methodology itself, which is still evolving to meet the unique needs of diagnostic imaging. This is not to say that most radiologic reports are false. Rather, these reports and the individual claims made in any one report vary in their degree of validity, which we can assess better with an understanding of proper study design. There is a second reason for analyzing critically. To make use of any study-irrespective of its quality-we must be able to relate its results to care of patients. To do this, we must recognize the assumptions that underlie concluding remarks about clinical applications. For instance, a study that Received 1
Present
July 10, 1989; address:
accepted
Diagnostic
and Department
of Radiology,
AJR 154:17-22,
January
September
Radiology, Georgetown
Methodology Selection
of Accuracy
Assessment
of Patients
Perhaps the most important question about a radiologic study is “to whom do the results apply?” Unless we know
7, 1989.
Warren
G. Magnuson
University
Hospital,
1990 0361 -803X/90/1
provides estimates of a test’s accuracy does not necessarily tell us anything about the clinical usefulness or cost-effectiveness of the test. Such conclusions would have to be based on additional information or assumptions about other pertinent variables, such as the pretest probability of disease and effectiveness of therapy for the patient under consideration. This is not to say that a study of accuracy is useless, but that it is limited in scope. Its usefulness is partly determined by the availability of information from other sources. Thus, even if journals could guarantee that published papers were methodologically flawless, someone would still have to determine how and to whom these papers pertained. This is the difficult job of the radiologist. In this article, I provide guidelines for evaluating both the quality and clinical relevance of radiologic reports (Table 1). Most such reports pertain to diagnostic accuracy, that is, the ability of a test to distinguish different disease possibilities (including normal) in a particular clinical setting [2]. Therefore, I begin by describing the proper study design of accuracy assessment, paying particular attention to the problems of selection of patients and bias. Next, I consider the presentation of results and elaborate on the concept of accuracy. Finally, I discuss how accuracy should be integrated with other factors to determine the appropriateness of clinical testing.
541-0017
Clinical Washington,
© American
Center,
Bldg.
DC 20007.
Roentgen
10, Am. 1 C660, Address
Ray Society
reprint
National requests
Institutes
of Health
to W. C. Black
(NIH),
at NIH.
Bethesda,
MD 20892
Downloaded from www.ajronline.org by 117.244.21.45 on 10/05/15 from IP address 117.244.21.45. Copyright ARRS. For personal use only; all rights reserved
18
BLACK
something about how the diseased and normal control subjects in the study were selected, we cannot answer this question. Ideally, patients are described in such a fashion that similar patients can be identified easily for future application of the study results. Disease spectrum and other patient variables.-The clinical manifestations and anatomic extent of a disease are vital determinants of accuracy [3, 4]. For example, MR imaging may not distinguish asymptomatic patients with and without early (stage IA) cervical carcinoma [5]. On the other hand, MR imaging may distinguish more advanced cervical cancer (stage IIB to IVB) from a normal cervix in all cases (no overlap between columns 0/IA and IIA-IVA in Table 2). Because diseases are generally dynamic processes rather than static entities, accuracy must be qualified according to the anatomic extent of disease. Other patient variables, such as age, sex, body habitus, and other illnesses, may contribute greatly to determinations of accuracy. For example, the presence of excess body fat increases the accuracy of CT and MR imaging, but decreases the accuracy of sonography. Such factors should be taken into account when study results are related to prospective patients for testing. Retrospective vs prospective selection-The problem with retrospective studies of accuracy is that the patients are selected on the basis of events that lead to a diagnostic procedure. Such events may not be typical occurrences in patients outside the study with the same disease. For this reason, the patients in a retrospective study may not be a representative sample of the disease patients who are ordinarily tested. For example, a study comparing ventilationperfusion lung scan and autopsy findings of pulmonary em-
AJR:154,
TABLE Staging
2: Correlation Between of Cervical Carcinoma
0/lA lB IIA IIB-IVA
Stage
0/lA
lB
lIA
IIB-IVA
27 0 0 0
3 12 2 1
0 1 0 2
0 5 0 13
Note.-4 x 4 table condensed et al. [5].
1990
MR and Surgical/Pathologic
Surgical/Pa thologic MR Stage
January
from 7 x 7 originally
pubhshed
by
Togashi
bolism might not be applicable to the prospective evaluation of postoperative patients with mild shortness of breath. The study patients, all of whom would have died, would probably have had more extensive embolic disease or more severe underlying illnesses than patients whose only clinical evidence of embolism was mild postoperative shortness of breath. In addition, the control subjects in a retrospective study may not closely reflect patients outside the study who are tested because of clinical findings suggestive of disease. A more representative sample of diseased and control subjects can be selected in a prospective study, where characteristics of patients are designated before testing, for example, mammograms
on all asymptomatic
women
between
the
ages
of
40 and 50 years. This is the major advantage of a prospective over a retrospective study. Of course, even in a prospective study there may be other variables outside the control or knowledge of the investigator, such as access of the patient to the medical system, which have an effect on patient selection.
Bias TABLE
1: Checklist
Abstract What are Introduction
the
for Critical
objectives,
Evaluation
findings,
and
conclusions
What is the purpose of the diagnostic Material and Methods How
are the
patients
is the
test(s)
is the
gold
and
for diagnosis?
(pathologist)
How is accuracy spectrum
blinded
and is the final
the radiologic
interpreta-
uniformly?
fairly?
reported?
of disease
statistical
from
the tests evaluated
and
bidity, age, sex, and body sentation of data? Is the
Is it appropriate?
blinded from the final diagnosis
Is the gold standard applied In a comparison study, are Results Are
tested?
interpreted?
criteria well defined and reproducible?
standard
Are the radiologists diagnostician tion?
test?
of those who are ordinarily
performed
Are the interpretation What
study?
selected?
Are they representative How
of the
analysis
important covariates, such as comorhabitus, accounted for in tabular pre-
clearly
described
and
appropriate?
Discussion Are the deficiencies in the methodology of accuracy assessment acknowledged and discussed? Are other relevant factors, such as disease prevalence, therapeutic effectiveness, recommendations?
and
cost,
adequately
accounted
for
in the
clinical
The patient-selection process determines to whom the study results might apply. How these selected patients are evaluated in the study determines how well the results represent the “true” accuracy of the test for the selected group. These assessments, regardless of whether accuracy is reported explicitly with numeric values for sensitivity and specificity, or implicitly through images, are almost always biased to some degree by the process of diagnostic evaluation [3, 6]. We illustrate the effects of these biases with the aid of the binary table. Test review bias.-The test should be interpreted without knowledge of the final diagnosis or other contributory information. If the radiologist is not so “blinded,” his observations and/or his interpretation of those observations may not be based solely on test results. He will be inclined to miscategorize false-negative cases as true-positive and false-positive cases as true-negative (Fig. 1). This leads to an overestimation of both sensitivity and specificity. For example, suppose an investigator is trying to determine how well MR imaging discriminates hemangiomas from liver metastases using certain criteria. If he were to have prior knowledge of the diagnosis, his assessment of certain ambiguous features, such as contour, shape, relative signal intensity, and his categorization of the MR findings could be influenced by his expectation of how the lesion should appear.
EVALUATING
AJR:154, January 1990
RADIOLOGY
LITERATURE
19
DISEASE
Downloaded from www.ajronline.org by 117.244.21.45 on 10/05/15 from IP address 117.244.21.45. Copyright ARRS. For personal use only; all rights reserved
PRESENT
TP
POSITIVE
TEST
NEGATIVE
Fig.
1.-Effect
of test
arrows)
PRESENT
ABSENT
POSITIVE
TP
FP
NEGATIVE
EN
ABSENT
FP