Downloaded from www.ajronline.org by 117.253.98.211 on 11/20/15 from IP address 117.253.98.211. Copyright ARRS. For personal use only; all rights reserved
iii9
Commentary
.
.
.
Understanding Receiver-Operating-Characteristic A Graphic Approach Jan
Curves:
Bnismar1
As the decision threshold moves to the right along the xaxis, sensitivity (defined as TP/[TP + FN}) ranges from one, when all tests are read as abnormal (no false negatives), to zero, when all are called normal (no true positives) (Fig. 2). Maximal sensitivity is achieved when all tests are reported as abnormal. However, specificity, defined as TN/(TN + FP), moves in concert from zero (all tests called abnormal, no true negatives) to one (all tests read as normal, no false positives). Maximal specificity consequently is gained by blindly reporting all tests as normal. Obviously, a sensitivity value given without a corresponding specificity value (or the reverse) is meaningless.
Receiver-operating-characteristic (ROC) curves describe the relationship between signal and noise and were developed to compare different radar devices [i , 2]. ROC curves have gained increasing popularity in radiology for comparing different test and tester combinations [3]. In an earlier issue of AJR [4], a graphic definition of the commonly used efficacy terms sensitivity, specificity, accuracy, positive predictive value, and negative predictive value was presented. A similar graphic approach can be used to explain ROC curves. The diagram in Figure 1 defines sensitivity and specificity. The horizontal axis (x-axis) denotes the outcome of a test from normal to abnormal. The outcome in the diseased group is plotted above the x-axis, and the outcome in the normal group is plotted below the x-axis.
Decision
Threshold
and Efficacy
Not Only Normal
,
AJR 157:1119-1121,
November
after revision June 28, 1991. Specialist Hospital and Research
1991 036i-803X/91/i575-1
1 19 0 American
Test
Outcome
When evaluating a test (e.g., interpreting a chest radiograph), the radiologist works with more possibilities than just normal and abnormal, such as definitely abnormal, probably abnormal, equivocal, probably normal, and definitely normal. Depending on the consequences of a positive test, nadiologists move their decision threshold (consciously or unconsciously). Consider a chest radiograph in a 20-year-old boy with a “vague, nodular opacity projecting over the left apex.” Now consider two possible clinical histories: History of osteosarcoma with previously resected pulmonary metastasis; recent hemoptysis. In this case, the prior probability of a new metastasis is high, and the consequences of missing a lesion may be significant. Therefore, the rational
Terms
If only the two alternatives normal or abnormal are given, the diagnostician can place the decision threshold, which separates normal from abnormal, anywhere along the x-axis. All tests may be called normal (Fig. i , point A) on all may be called abnormal (Fig. i , point B); usually a decision threshold is selected somewhere in between (Fig. 1 point C). An “ovenneader” tends to place the threshold too fan to the left, whereas an “underreader” places the threshold too far to the right. The decision threshold divides the normal population into a true-negative (TN) and a false-positive (FP) group, and the diseased population into a true-positive (TP) and a falsenegative (FN) group. Received February 26, 1991 : accepted ‘Department of Radiology, King Faisal Brismar.
or Abnormal
Centre,
P.O. Box 3354,
Roentgen
Ray Society
Aiyadh
1 1 21 1 , Saudi
Arabia.
Address
reprint
requests
to J.
i i 20
BRISMAR
DecIsion Threshold
Downloaded from www.ajronline.org by 117.253.98.211 on 11/20/15 from IP address 117.253.98.211. Copyright ARRS. For personal use only; all rights reserved
FN Increasing NormalIty
Increasing Test AbnormalIty
For tests that have consequences when either positive or negative, the choice probably should be a threshold in the vicinity
of equivocal
test
The Sensitivity-Specificity
TN
FP
Test negative
Test positIve
Fig. 1.-Graphic explanation of efficacy terms. Decision threshold separates normal from abnormal test results; it can be placed anywhere along test outcome with disease
axis (e.g., points (above horizontal
A, B, and C). This threshold divides axis) into true-positives (TP) and
group false-
negatives (FN) and group without disease (below axis) into true-negatives (TN) and false-positives
1991
phasize specificity (i.e., the film should be underread). Interpretation of the chest film: “vague nodular opacity of unknown chronicity, may represent confluence of normal structures; suggest comparison with prior films and attention on followup films.”
TP
Test
November
AJR:i57,
(FP).
findings
(Fig. 2, points
B or C).
Curve
As seen from Figure 2, a unique combination of sensitivity and specificity values is obtained for each position of the decision threshold. Each sensitivity value can be plotted against its corresponding specificity value to create the diagram in Figure 3. The points 1 .0, 0.0 (left lower corner) and 0.0, 1 .0 (right upper corner) in this diagram represent positions of the decision threshold at the extreme left and the extreme right on the outcome axis in Figure 2, respectively.
The Receiver-Operating-Characteristic
Curve
The traditional ROC axes have been altered, with the horizontal axis pointing to the left (Fig. 3). Normally ROC curves are presented with the horizontal axis representing false-positive fraction, that is (1 specificity), Sensitivity whereas the vertical axis denotes true-positive fraction (i.e., sensitivity). The sensitivity-specificity curve presented in this way is thus, except for the names of the axes, identical to the ROC curve. As the phrase ROC curve to most readers is an acronym without meaning, it perhaps should be discarded. A term such -
--
Fig. 2.-Decision threshold can be placed anywhere along outcome axis. At position A, only definitely abnormal test results (Al) are classified as abnormal; at position B, both probably (A?) and definitely abnormal results are read as abnormal; at position C, also equivocal results (?) are called abnormal, whereas in position 0, only definitely normal outcomes are read as normal. Lower part of diagram shows how sensitivity and specificity old gives
change as decision threshold is moved. Each position a unique pair of sensitivity and specificity values.
-
-
I
of thresh-
T
.
radiologist adjusts the decision threshold to emphasize sensitivity (i.e., ovenneads the film). Interpretation of the chest film: “nodular pulmonary opacity; must exclude metastatic lesion; suggest CT with possible biopsy.” lnguinal hernia, routine preoperative chest radiograph. In this case, the prior probability of serious chest disease is low, as is the consequence of a false-negative interpretation. Therefore, the decision threshold should be adjusted to em-
1.0 Specificity
0.5
0.0
Fig. 3.-If, for each of four positions (A, B, C, and D) of decision threshold in Figure 2, each sensitivity value is plotted against its conesponding specificity value and points are connected, a sensitivity-specificity curve (receiver-operating-characteristic [ROC] curve) is obtained. Direction of x-axis is chosen to correspond to usual presentation of ROC curves. Size of shadowed area, covered by curve, is a measure of efficacy of test.
AJR:157,
November
RECEIVER-OPERATING-CHARACTERISTIC
1991
as sensitivity-specificity the curve.
curve
(S-S
curve)
better
describes B A
Downloaded from www.ajronline.org by 117.253.98.211 on 11/20/15 from IP address 117.253.98.211. Copyright ARRS. For personal use only; all rights reserved
Construction Curve
the Receiver-Operating-Characteristic
,*x_
,
,
\I
of a Receiver-Operating-Characteristic
In order to construct an ROC curve, for example, for comparing CT and MR studies to detect liver metastases, patients are examined with both techniques, and the true diagnosis (metastasis or no metastasis) is established by a separate gold standard. The observer is asked to interpret each study as either definitely abnormal, probably abnormal, equivocal, probably normal, on definitely normal. Sensitivity and specificity are then calculated for each of four alternatives (Fig. 2): (i) only those read as definitely abnormal are regarded as positive tests; (2) those read as probably abnormal on definitely abnormal are classified as positive; (3) those read as equivocal, probably abnormal or definitely abnormal are regarded as positive; and finally (4) all but the definitely normal tests are considered positive. With these four sensitivityspecificity value pairs and the two corners (sensitivity = 1 for specificity = 0, and specificity = 1 for sensitivity = 0), an ROC curve is plotted.
Interpreting
ii2i
CURVES
uI
:i
.1
ii
-
C
:=‘
-
Fig.
4.-A-C,
Three
different tests. An ideal test (A), which cleariy from those without disease, gives a sensitiv-
separates those with disease ity-specificity curve passing Test
without
close to upper left-hand
any discrimination
(C)
gives
a curve
corner of diagram.
running
straight
from
lower left-hand to upper right-hand corner. Usually, curve runs somewhere in between (B).
Curve
The curve in Figure 3 gives the sensitivity-specificity combination for every possible placement of the decision threshold. An ideal test, cleanly separating normal subjects from diseased ones (Fig. 4, A), gives a sensitivity-specificity (i.e., ROC) curve at the left upper corner of the diagram. A test/ tester combination without any discrimination between normal and abnormal (Fig. 4, C) (e.g., reading mammograms blindfolded) would give a curve running straight from the left lower to the right upper corner. In most situations, the curve would run somewhere in between (Fig. 4, B); the better the test/ tester combination is, the closer to the left upper corner the curve would nun, or, more correctly, the larger area the curve would cover (shadowed in Fig. 3). The ROC curve thus can be used to compare the efficacy of different tests (or different equipment) used by the same tester, as well as the skill of different testers that use the same test material. Construction of an ROC curve requires extra effort from the researcher, as compared with simply calculating sensitiv-
ity and specificity values. However, for a serious comparison of different diagnostic tests, no approach is available to replace the ROC curve. For readers who wish to pursue a more in-depth analysis of ROC curves, several excellent articles are available [5, 6].
REFERENCES 1 . van Meter 0, Middleton
0. Modern statistical approaches: reception in 1954:PIGT-4 : 1 19-141 2. Peterson WW, Birdsall TG, Fox WC. The theory of signal detectability. IRE communication
theory.
IRE Trans
Trans 1954:PIGT-4:i7i-212
3. Metz CE. ROC methodology in radiological imaging. Invest Radiol 1986:21:720-733 4. Brismar J, Jacobsson B. Definition of terms used to judge the efficacy of diagnostic tests: a graphic approach. AJR 1990:155:621-623 5. Hanley JA, McNeal BJ. The meaning and use of the area under a receiver operating
characteristic
6. Hanley JA. Receiver
(ROC)
operating
state of the art. Crit Rev Diagn
curve.
Radiology
characteristic Imaging
1982:143:29-36
(ROC) methodology.
1989:29:307-334
The