How to evaluate the radiology literature.

Downloaded from www.ajronline.org by 117.244.21.45 on 10/05/15 from IP address 117.244.21.45. Copyright ARRS. For personal use only; all rights reserved

17

Perspective

How to Evaluate William

the Radiology

Literature

C. Black1

We, as radiologists, have acquired much influence over the way medicine is practiced because of the dramatic technological progress in diagnostic imaging. Many more radiologic tests are performed than ever before, and the scope of our specialty continues to broaden. In addition, as clinicians become less familiar with the rapidly evolving imaging technology, they increasingly rely on our interpretations of radiologic findings and our advice on whom, how, and when to test. As our influence grows, so does our need for information about the tests we perform and the diseases we diagnose. Because our individual experience is limited, we each must rely on the radiologic literature for information relevant to the increasing scope of our work. However, we must read this literature critically for two reasons. First, evaluations of radiologic tests published in either the radiologic or nonradiologic literature almost invariably contain methodologic flaws [1]. These flaws can result from poor training, carelessness, or bias of clinical investigators; unavoidable logistical, financial, and ethical constraints on clinical research; and the madequacy of the established methodology itself, which is still evolving to meet the unique needs of diagnostic imaging. This is not to say that most radiologic reports are false. Rather, these reports and the individual claims made in any one report vary in their degree of validity, which we can assess better with an understanding of proper study design. There is a second reason for analyzing critically. To make use of any study-irrespective of its quality-we must be able to relate its results to care of patients. To do this, we must recognize the assumptions that underlie concluding remarks about clinical applications. For instance, a study that Received 1

Present

July 10, 1989; address:

accepted

Diagnostic

and Department

of Radiology,

AJR 154:17-22,

January

September

Radiology, Georgetown

Methodology Selection

of Accuracy

Assessment

of Patients

Perhaps the most important question about a radiologic study is “to whom do the results apply?” Unless we know

7, 1989.

Warren

G. Magnuson

University

Hospital,

1990 0361 -803X/90/1

provides estimates of a test’s accuracy does not necessarily tell us anything about the clinical usefulness or cost-effectiveness of the test. Such conclusions would have to be based on additional information or assumptions about other pertinent variables, such as the pretest probability of disease and effectiveness of therapy for the patient under consideration. This is not to say that a study of accuracy is useless, but that it is limited in scope. Its usefulness is partly determined by the availability of information from other sources. Thus, even if journals could guarantee that published papers were methodologically flawless, someone would still have to determine how and to whom these papers pertained. This is the difficult job of the radiologist. In this article, I provide guidelines for evaluating both the quality and clinical relevance of radiologic reports (Table 1). Most such reports pertain to diagnostic accuracy, that is, the ability of a test to distinguish different disease possibilities (including normal) in a particular clinical setting [2]. Therefore, I begin by describing the proper study design of accuracy assessment, paying particular attention to the problems of selection of patients and bias. Next, I consider the presentation of results and elaborate on the concept of accuracy. Finally, I discuss how accuracy should be integrated with other factors to determine the appropriateness of clinical testing.

541-0017

Clinical Washington,

© American

Center,

Bldg.

DC 20007.

Roentgen

10, Am. 1 C660, Address

Ray Society

reprint

National requests

Institutes

of Health

to W. C. Black

(NIH),

at NIH.

Bethesda,

MD 20892


18

BLACK

something about how the diseased and normal control subjects in the study were selected, we cannot answer this question. Ideally, patients are described in such a fashion that similar patients can be identified easily for future application of the study results. Disease spectrum and other patient variables.-The clinical manifestations and anatomic extent of a disease are vital determinants of accuracy [3, 4]. For example, MR imaging may not distinguish asymptomatic patients with and without early (stage IA) cervical carcinoma [5]. On the other hand, MR imaging may distinguish more advanced cervical cancer (stage IIB to IVB) from a normal cervix in all cases (no overlap between columns 0/IA and IIA-IVA in Table 2). Because diseases are generally dynamic processes rather than static entities, accuracy must be qualified according to the anatomic extent of disease. Other patient variables, such as age, sex, body habitus, and other illnesses, may contribute greatly to determinations of accuracy. For example, the presence of excess body fat increases the accuracy of CT and MR imaging, but decreases the accuracy of sonography. Such factors should be taken into account when study results are related to prospective patients for testing. Retrospective vs prospective selection-The problem with retrospective studies of accuracy is that the patients are selected on the basis of events that lead to a diagnostic procedure. Such events may not be typical occurrences in patients outside the study with the same disease. For this reason, the patients in a retrospective study may not be a representative sample of the disease patients who are ordinarily tested. For example, a study comparing ventilationperfusion lung scan and autopsy findings of pulmonary em-

AJR:154,

TABLE Staging

2: Correlation Between of Cervical Carcinoma

0/lA lB IIA IIB-IVA

Stage

0/lA

lB

lIA

IIB-IVA

27 0 0 0

3 12 2 1

0 1 0 2

0 5 0 13

Note.-4 x 4 table condensed et al. [5].

1990

MR and Surgical/Pathologic

Surgical/Pa thologic MR Stage

January

from 7 x 7 originally

pubhshed

by

Togashi

bolism might not be applicable to the prospective evaluation of postoperative patients with mild shortness of breath. The study patients, all of whom would have died, would probably have had more extensive embolic disease or more severe underlying illnesses than patients whose only clinical evidence of embolism was mild postoperative shortness of breath. In addition, the control subjects in a retrospective study may not closely reflect patients outside the study who are tested because of clinical findings suggestive of disease. A more representative sample of diseased and control subjects can be selected in a prospective study, where characteristics of patients are designated before testing, for example, mammograms

on all asymptomatic

women

between

the

ages

of

40 and 50 years. This is the major advantage of a prospective over a retrospective study. Of course, even in a prospective study there may be other variables outside the control or knowledge of the investigator, such as access of the patient to the medical system, which have an effect on patient selection.

Bias TABLE

1: Checklist

Abstract What are Introduction

the

for Critical

objectives,

Evaluation

findings,

and

conclusions

What is the purpose of the diagnostic Material and Methods How

are the

patients

is the

test(s)

is the

gold

and

for diagnosis?

(pathologist)

How is accuracy spectrum

blinded

and is the final

the radiologic

interpreta-

uniformly?

fairly?

reported?

of disease

statistical

from

the tests evaluated

and

bidity, age, sex, and body sentation of data? Is the

Is it appropriate?

blinded from the final diagnosis

Is the gold standard applied In a comparison study, are Results Are

tested?

interpreted?

criteria well defined and reproducible?

standard

Are the radiologists diagnostician tion?

test?

of those who are ordinarily

performed

Are the interpretation What

study?

selected?

Are they representative How

of the

analysis

important covariates, such as comorhabitus, accounted for in tabular pre-

clearly

described

and

appropriate?

Discussion Are the deficiencies in the methodology of accuracy assessment acknowledged and discussed? Are other relevant factors, such as disease prevalence, therapeutic effectiveness, recommendations?

and

cost,

adequately

accounted

for

in the

clinical

The patient-selection process determines to whom the study results might apply. How these selected patients are evaluated in the study determines how well the results represent the “true” accuracy of the test for the selected group. These assessments, regardless of whether accuracy is reported explicitly with numeric values for sensitivity and specificity, or implicitly through images, are almost always biased to some degree by the process of diagnostic evaluation [3, 6]. We illustrate the effects of these biases with the aid of the binary table. Test review bias.-The test should be interpreted without knowledge of the final diagnosis or other contributory information. If the radiologist is not so “blinded,” his observations and/or his interpretation of those observations may not be based solely on test results. He will be inclined to miscategorize false-negative cases as true-positive and false-positive cases as true-negative (Fig. 1). This leads to an overestimation of both sensitivity and specificity. For example, suppose an investigator is trying to determine how well MR imaging discriminates hemangiomas from liver metastases using certain criteria. If he were to have prior knowledge of the diagnosis, his assessment of certain ambiguous features, such as contour, shape, relative signal intensity, and his categorization of the MR findings could be influenced by his expectation of how the lesion should appear.

EVALUATING

AJR:154, January 1990

RADIOLOGY

LITERATURE

19

DISEASE


PRESENT

TP

POSITIVE

TEST

NEGATIVE

Fig.

1.-Effect

of test

arrows)

PRESENT

ABSENT

POSITIVE

TP

FP

NEGATIVE

EN

ABSENT

FP

Users' guide to the surgical literature: how to evaluate clinical practice guidelines.

Teaching trainees how to critically evaluate the literature - a crossover study at two pediatric residency programs.

How to evaluate a manuscript for publication?

Literature in pediatric radiology.

How to successfully achieve an academically productive radiology department.

How to evaluate adenomyosis in patients affected by endometriosis?

Interventional radiology leads the top-cited radiology literature of the last 67 years.

Some suggestions on how to evaluate a transposer hearing aid.

How best to use and evaluate Patient Information Leaflets given during a consultation: a systematic review of literature reviews.

Using crowdsourcing to evaluate published scientific literature: methods and example.

Boots on the ground: how to influence your local radiology departments to use appropriate CT dose.

The who, why and how of radiology teaching files.

The who, why and how of radiology teaching files.

How to improve the quality of patient care and still save money in the radiology department.

Users' guide to the surgical literature: how to perform a high-quality literature search.

Microbiology for Radiologists: How to Minimize Infection Transmission in the Radiology Department.

Basic science research in pediatric radiology - how to empower the leading edge of our field.

The proliferation of multidisciplinary team meetings (MDTMs): how can radiology departments continue to support them all?

How to train radiology residents to diagnose pulmonary embolism using a dedicated MRI protocol.

How to contribute occupationally to ecological sustainability: a literature review.

[Calcium - How to and why evaluate bone remodeling? Characteristics of the chronic kidney disease patient].

Guiding fluid resuscitation in critically ill patients: how to evaluate the available tools?

How to evaluate the distribution of an "invisible" amphiphile between biological membranes and water.

Malaria in the Traveler: How to Manage Before Departure and Evaluate Upon Return.