Downloaded from www.ajronline.org by 117.253.98.211 on 11/20/15 from IP address 117.253.98.211. Copyright ARRS. For personal use only; all rights reserved

iii9

Commentary

.

.

.

Understanding Receiver-Operating-Characteristic A Graphic Approach Jan

Curves:

Bnismar1

As the decision threshold moves to the right along the xaxis, sensitivity (defined as TP/[TP + FN}) ranges from one, when all tests are read as abnormal (no false negatives), to zero, when all are called normal (no true positives) (Fig. 2). Maximal sensitivity is achieved when all tests are reported as abnormal. However, specificity, defined as TN/(TN + FP), moves in concert from zero (all tests called abnormal, no true negatives) to one (all tests read as normal, no false positives). Maximal specificity consequently is gained by blindly reporting all tests as normal. Obviously, a sensitivity value given without a corresponding specificity value (or the reverse) is meaningless.

Receiver-operating-characteristic (ROC) curves describe the relationship between signal and noise and were developed to compare different radar devices [i , 2]. ROC curves have gained increasing popularity in radiology for comparing different test and tester combinations [3]. In an earlier issue of AJR [4], a graphic definition of the commonly used efficacy terms sensitivity, specificity, accuracy, positive predictive value, and negative predictive value was presented. A similar graphic approach can be used to explain ROC curves. The diagram in Figure 1 defines sensitivity and specificity. The horizontal axis (x-axis) denotes the outcome of a test from normal to abnormal. The outcome in the diseased group is plotted above the x-axis, and the outcome in the normal group is plotted below the x-axis.

Decision

Threshold

and Efficacy

Not Only Normal

,

AJR 157:1119-1121,

November

after revision June 28, 1991. Specialist Hospital and Research

1991 036i-803X/91/i575-1

1 19 0 American

Test

Outcome

When evaluating a test (e.g., interpreting a chest radiograph), the radiologist works with more possibilities than just normal and abnormal, such as definitely abnormal, probably abnormal, equivocal, probably normal, and definitely normal. Depending on the consequences of a positive test, nadiologists move their decision threshold (consciously or unconsciously). Consider a chest radiograph in a 20-year-old boy with a “vague, nodular opacity projecting over the left apex.” Now consider two possible clinical histories: History of osteosarcoma with previously resected pulmonary metastasis; recent hemoptysis. In this case, the prior probability of a new metastasis is high, and the consequences of missing a lesion may be significant. Therefore, the rational

Terms

If only the two alternatives normal or abnormal are given, the diagnostician can place the decision threshold, which separates normal from abnormal, anywhere along the x-axis. All tests may be called normal (Fig. i , point A) on all may be called abnormal (Fig. i , point B); usually a decision threshold is selected somewhere in between (Fig. 1 point C). An “ovenneader” tends to place the threshold too fan to the left, whereas an “underreader” places the threshold too far to the right. The decision threshold divides the normal population into a true-negative (TN) and a false-positive (FP) group, and the diseased population into a true-positive (TP) and a falsenegative (FN) group. Received February 26, 1991 : accepted ‘Department of Radiology, King Faisal Brismar.

or Abnormal

Centre,

P.O. Box 3354,

Roentgen

Ray Society

Aiyadh

1 1 21 1 , Saudi

Arabia.

Address

reprint

requests

to J.

i i 20

BRISMAR

DecIsion Threshold

Downloaded from www.ajronline.org by 117.253.98.211 on 11/20/15 from IP address 117.253.98.211. Copyright ARRS. For personal use only; all rights reserved

FN Increasing NormalIty

Increasing Test AbnormalIty

For tests that have consequences when either positive or negative, the choice probably should be a threshold in the vicinity

of equivocal

test

The Sensitivity-Specificity

TN

FP

Test negative

Test positIve

Fig. 1.-Graphic explanation of efficacy terms. Decision threshold separates normal from abnormal test results; it can be placed anywhere along test outcome with disease

axis (e.g., points (above horizontal

A, B, and C). This threshold divides axis) into true-positives (TP) and

group false-

negatives (FN) and group without disease (below axis) into true-negatives (TN) and false-positives

1991

phasize specificity (i.e., the film should be underread). Interpretation of the chest film: “vague nodular opacity of unknown chronicity, may represent confluence of normal structures; suggest comparison with prior films and attention on followup films.”

TP

Test

November

AJR:i57,

(FP).

findings

(Fig. 2, points

B or C).

Curve

As seen from Figure 2, a unique combination of sensitivity and specificity values is obtained for each position of the decision threshold. Each sensitivity value can be plotted against its corresponding specificity value to create the diagram in Figure 3. The points 1 .0, 0.0 (left lower corner) and 0.0, 1 .0 (right upper corner) in this diagram represent positions of the decision threshold at the extreme left and the extreme right on the outcome axis in Figure 2, respectively.

The Receiver-Operating-Characteristic

Curve

The traditional ROC axes have been altered, with the horizontal axis pointing to the left (Fig. 3). Normally ROC curves are presented with the horizontal axis representing false-positive fraction, that is (1 specificity), Sensitivity whereas the vertical axis denotes true-positive fraction (i.e., sensitivity). The sensitivity-specificity curve presented in this way is thus, except for the names of the axes, identical to the ROC curve. As the phrase ROC curve to most readers is an acronym without meaning, it perhaps should be discarded. A term such -

--

Fig. 2.-Decision threshold can be placed anywhere along outcome axis. At position A, only definitely abnormal test results (Al) are classified as abnormal; at position B, both probably (A?) and definitely abnormal results are read as abnormal; at position C, also equivocal results (?) are called abnormal, whereas in position 0, only definitely normal outcomes are read as normal. Lower part of diagram shows how sensitivity and specificity old gives

change as decision threshold is moved. Each position a unique pair of sensitivity and specificity values.

-

-

I

of thresh-

T

.

radiologist adjusts the decision threshold to emphasize sensitivity (i.e., ovenneads the film). Interpretation of the chest film: “nodular pulmonary opacity; must exclude metastatic lesion; suggest CT with possible biopsy.” lnguinal hernia, routine preoperative chest radiograph. In this case, the prior probability of serious chest disease is low, as is the consequence of a false-negative interpretation. Therefore, the decision threshold should be adjusted to em-

1.0 Specificity

0.5

0.0

Fig. 3.-If, for each of four positions (A, B, C, and D) of decision threshold in Figure 2, each sensitivity value is plotted against its conesponding specificity value and points are connected, a sensitivity-specificity curve (receiver-operating-characteristic [ROC] curve) is obtained. Direction of x-axis is chosen to correspond to usual presentation of ROC curves. Size of shadowed area, covered by curve, is a measure of efficacy of test.

AJR:157,

November

RECEIVER-OPERATING-CHARACTERISTIC

1991

as sensitivity-specificity the curve.

curve

(S-S

curve)

better

describes B A

Downloaded from www.ajronline.org by 117.253.98.211 on 11/20/15 from IP address 117.253.98.211. Copyright ARRS. For personal use only; all rights reserved

Construction Curve

the Receiver-Operating-Characteristic

,*x_

,

,

\I

of a Receiver-Operating-Characteristic

In order to construct an ROC curve, for example, for comparing CT and MR studies to detect liver metastases, patients are examined with both techniques, and the true diagnosis (metastasis or no metastasis) is established by a separate gold standard. The observer is asked to interpret each study as either definitely abnormal, probably abnormal, equivocal, probably normal, on definitely normal. Sensitivity and specificity are then calculated for each of four alternatives (Fig. 2): (i) only those read as definitely abnormal are regarded as positive tests; (2) those read as probably abnormal on definitely abnormal are classified as positive; (3) those read as equivocal, probably abnormal or definitely abnormal are regarded as positive; and finally (4) all but the definitely normal tests are considered positive. With these four sensitivityspecificity value pairs and the two corners (sensitivity = 1 for specificity = 0, and specificity = 1 for sensitivity = 0), an ROC curve is plotted.

Interpreting

ii2i

CURVES

uI

:i

.1

ii

-

C

:=‘

-

Fig.

4.-A-C,

Three

different tests. An ideal test (A), which cleariy from those without disease, gives a sensitiv-

separates those with disease ity-specificity curve passing Test

without

close to upper left-hand

any discrimination

(C)

gives

a curve

corner of diagram.

running

straight

from

lower left-hand to upper right-hand corner. Usually, curve runs somewhere in between (B).

Curve

The curve in Figure 3 gives the sensitivity-specificity combination for every possible placement of the decision threshold. An ideal test, cleanly separating normal subjects from diseased ones (Fig. 4, A), gives a sensitivity-specificity (i.e., ROC) curve at the left upper corner of the diagram. A test/ tester combination without any discrimination between normal and abnormal (Fig. 4, C) (e.g., reading mammograms blindfolded) would give a curve running straight from the left lower to the right upper corner. In most situations, the curve would run somewhere in between (Fig. 4, B); the better the test/ tester combination is, the closer to the left upper corner the curve would nun, or, more correctly, the larger area the curve would cover (shadowed in Fig. 3). The ROC curve thus can be used to compare the efficacy of different tests (or different equipment) used by the same tester, as well as the skill of different testers that use the same test material. Construction of an ROC curve requires extra effort from the researcher, as compared with simply calculating sensitiv-

ity and specificity values. However, for a serious comparison of different diagnostic tests, no approach is available to replace the ROC curve. For readers who wish to pursue a more in-depth analysis of ROC curves, several excellent articles are available [5, 6].

REFERENCES 1 . van Meter 0, Middleton

0. Modern statistical approaches: reception in 1954:PIGT-4 : 1 19-141 2. Peterson WW, Birdsall TG, Fox WC. The theory of signal detectability. IRE communication

theory.

IRE Trans

Trans 1954:PIGT-4:i7i-212

3. Metz CE. ROC methodology in radiological imaging. Invest Radiol 1986:21:720-733 4. Brismar J, Jacobsson B. Definition of terms used to judge the efficacy of diagnostic tests: a graphic approach. AJR 1990:155:621-623 5. Hanley JA, McNeal BJ. The meaning and use of the area under a receiver operating

characteristic

6. Hanley JA. Receiver

(ROC)

operating

state of the art. Crit Rev Diagn

curve.

Radiology

characteristic Imaging

1982:143:29-36

(ROC) methodology.

1989:29:307-334

The

Understanding receiver-operating-characteristic curves: a graphic approach.

Downloaded from www.ajronline.org by 117.253.98.211 on 11/20/15 from IP address 117.253.98.211. Copyright ARRS. For personal use only; all rights rese...
446KB Sizes 0 Downloads 0 Views