Receiver Operating Characteristic Curves: A Basic Understanding .
1
.
DavidJ.
Vining,
MD
Gregory
W. Gladish,
BS
Receiver operating characteristic (ROC) is measurement that can be used to compare gies against human observer performance radiologist) Use of ROC curves allows one of radiologic interpretations when calculating ity for an imaging modality and avoids the assuming that imaging findings are absolutely
one form of an objective newer imaging technolo(the ability of the expert to account for a continuum sensitivity and specificinaccuracies that arise from normal or abnormal. An ROC curve is generated by plotting sensitivity on the y axis as a function of 1 specificity] on the x axis for a continuum of diagnostic criteria. ROC curves allow visual analysis of the trade offs between the sensitivity and the specificity of a test with regard to the variable diagnostic criteria used by radiologists. Because ROC curve analysis is gaining wide acceptance in medical literature, an explanation of ROC methods with the use of simple examples is necessary to increase the knowledge and understanding of practicing radiologists. .
[
-
INTRODUCTION
U
As imaging
technologies
rapidly
emerge,
radiologists
need
an
objective
means
to
compare the efficacy of these newer techniques with more established methods. Receiver operating characteristic (ROC) is one form of an objective measurement which the newer techniques are compared with human observer performance ability
of the
modalities
expert in
radiologist).
1960,
their
Since
application
ROC has
curves
were
continually
first
applied
expanded
in (the
to radiologic
(1-4).
For radiologists, ROC curves provide information for an objective comparison of imaging techniques to determine, for example, whether computed tomography (CT) on magnetic resonance (MR) imaging is better for the detection of liver metastases. For hospital administrators, ROC curves allow, for example, a comparison to be made of the performance of imaging systems before making major investments. For radiology residents, questions regarding ROC curves are included in the written
Abbreviations: TNF = true-negative Index
terms:
FNF
Images,
RadloGraphics
1992;
I From the Department of Merit for a scientific
and
receivedjune
92)-RD-A-16 ogy, C
from
TheJohns
RSNA,
24; the Hopkins
= false-negative fraction, TPF
=
FPF = false-positive fraction
fraction, true-positive
#{149} Receiver
interpretation
operating
fraction,
characteristic
ROC
curve
=
receiver
operating
characteristic,
(ROC)
12:1147-1154 of Radiology, exhibit at the acceptedJune State
30.
of Louisiana
Hospital,
Louisiana State University Medical Center, Shreveport, La. Recipient of a Certificate 199 1 RSNA scientific assembly. Received March 17, 1992; revision requested April
600
Supported Board
N Wolfe
in part
of Regents. St. Baltimore,
by a Louisiana Address MD
reprint
Education requests
Quality to DJ.V.
Support
Fund
, Department
grant
9
(1990-
of Radiol-
21205.
1992
1147
1.0
examinations
for board certification (5). Examples of ROC curve utilization are certain to be found in the current literature and at scientific assembly presentations. When radiologists are confronted with two or more ROC curves representing different diagnostic tests or imaging modalities (Fig 1), the better modality must be determined. This article tion,
presents and
the
use,
interpretation
the idea of helping their meaning and
anatomy, of ROC
construccurves
C 0
(a U0 0.5
. U)
0 0.
I-
with
radiologists understand their limitations. 0
USE OF ROC CURVES In everyday medical practice, mistakes occur when clinicians attempt to determine the presence or absence of a disease on the basis of the findings from a single imaging study.
0.5
U
Similar
inaccuracies
occur
without
disease
who
have
normal
ings. Quoting single sensitivity and specificity values for a diagnostic test (or imaging modality) to determine its overall usefulness can be ambiguous by failing to account for the wide variability
of interpretations
sents
Thallium
find-
used
1.0 Fraction
1. Typical ROC curves. Which the better diagnostic modality?
Figure
in radiology
practice when sensitivity and specificity are used to describe the findings from a radiologic examination with an assumption that there are only two possible interpretations: normal or abnormal. Sensitivity refers to the percentage of patients with disease who have abnormal findings at examination, whereas specificity describes the percentage of patients
False-Positive
(Shod
Axis
Stress
repre-
Test
of the Left Ventriclej
View
Borderline
Normal
curve
Definitely
Abnormal
*44 0% Wall
Figure thallium
50%
Defect
2.
Wall
A single stress test.
100%
Defect
Wall
Defect
criterion is used to judge A ventricular wall defect
greater than 50% is considered one less than 50% is normal.
abnormal,
the
whereas
by radiolo-
gists.
For example, the thallium stress test is used to distinguish between healthy people and those with heart disease. To illustrate the problem of applying a single criterion, thaIhum stress test results will be considered abnormal if a ventricular wall defect is greater than 50% and normal if the defect is less than 50% (Fig 2). If the results of testing each member in the diseased and healthy groups are compared with
the
actual
the thallium of 84% and signify that
occurrence
of heart
disease,
stress test might have a sensitivity a specificity of 94%. These terms 84% of people with disease will
have positive findings and 94% of people without disease will have negative findings (Fig 3). As these results show, the thallium stress test is not a perfect test to use for determining if a patient does or does not have disease. For the particular by the radiologist, heart
divided
into
the
diagnostic healthy
a true-negative
criterion population fraction
and a false-positive fraction the population with disease true-positive fraction (TPF) tive fraction (FNF) (Fig 4). If a radiologist “underreads”
used is (TNF)
(FPF), whereas is divided into and a false-negafindings
a
and
considers them abnormal only if they are definitely abnormal, the examination will have low sensitivity but high specificity. Those who
1148
U
RadioGrapbics
U
Vining
and
Gladish
Volume
12
Number
6
‘‘‘‘‘‘I,,, ‘‘‘‘‘‘I,,, ‘‘‘I,,,,,, ‘‘‘‘I,,,,, ‘‘‘‘‘‘I,,,
******
I,,
,
I,,,
tI NORMAL
GROUP
94%
FPF
TNF
=
‘P TN 9
,,
,
f
‘P
t_
HEART =
6%
DISEASE
TPF
84% Tp
Tru#{149} N.gatlv.
FP
Fals.
FN
.
Posltlv#{149}
GROUP FNF
=
=
Figure 3. determine
16%
Tru#{149} PosItIvi
lium
Fals#{149} Nsgatlv.
sensitivity
stress
With one criterion to the outcome of a thal-
test,
findings
of84%
and
will have
a
a specificity
of 94%.
curves
TruePositive Fraction
=
TrueNegative Fraction
=
FalsePositive Fraction
=
1
-
Specificity
FP TN+FP
=
1
-
Sensitivity
FN TP+FN
Sensitivity
TP+FN Specificity
specificities
TN
+ FP
4. Chart shows the divisions ofthe popuaccording to diagnostic criteria used by radiologists. FN = number of examinations with falsenegative findings, FP = false-positive findings, Thy = true-negative findings, 77’ = true-positive Figure lation
findings.
“overread”
findings
use
less
strict
criteria
ANATOMY OF AN ROC CURVE The general form of the ROC curve is shown in Figure 5. In a mathematic sense, an ROC curve is a curvilinear graph that plots the TPF on the y axis as a function of the FPF on the x axis for a range of diagnostic criteria. The TPF equals the percentage of patients with disease who have positive findings, whereas the FPF equals the percentage of patients without disease who have positive findings. A TPF and an FPF are calculated for each diagnostic criteiion (range, definitely abnormal to definitely normal) and are plotted to form an ROC U
False-
Negative Fraction
help by allowing visual analysis offs between the sensitivities and of a variety of diagnostic criteria
used by radiologists.
Th
=
provide
of the trade
TP
=
and
consider more findings abnormal, thus resulting in high sensitivity but low specificity. In reality, radiologists report findings with a continuum of responses ranging from definitely abnormal to equivocal to definitely normal due to the subjectivity and the bias of the individual radiologist. Often this is referred to as the radiologist’s “hedge.” As the diagnostic criteria (biases) of the radiologist vary, sensitivity and specificity of the test or imaging modality for a particular diagnostic criterion also
curve. A move
toward
equivalent
tic criteria
outcome
the
to the
use
right
on an ROC
ofless
(overreading)
stringent
curve
for determining
of an examination,
thereby
is
diagnos-
the
increas-
ing sensitivity but decreasing specificity. A move toward the left on an ROC curve, or the use of more strict criteria (underreading), increases specificity but decreases sensitivity of
the diagnostic
test. The upper
an ROC curve 0%) is formed
(sensitivity, 100%; when all findings
right corner
of
specificity, are read as
vary.
The modality
sensitivity and specificity of an imaging in relation to this continuum of interpretations are difficult to describe. ROC
November
1992
V#{149}mingand
Gladish
U
RadioGraphics
U
1149
Overre5d%
1.0
Moat
C
Probably
Sensitive
Normal
Point
Equivocal
0 U (0 U-
a) >
:
0.5
(I)
0 0
d)
Figure
5. Anatomy of an ROC curve. A toward the right on an ROC curve is
move equivalent to overreading the findings, thereby increasing sensitivity but decreasing specificity. reading)
A move increases
toward the left (underspecificity but decreases
of the diagnostic
sensitivity
0 0.5
test.
1.0
False-Positive
Fraction
1.0
1.0
C
0
13 (a U0
>
0.5
0.5
.
(a 0 0. 0
0
Cl,
I-
0.0 0
0.5
False-Positive
Fraction
b.
Figure curve Except
6. Plots show classic ROC (a) and “new age” ROC (b) curves. can be reoriented (1.0 at the origin and 0.0 at the night corner) for labels, the ROC curves are unchanged.
abnormal. curve
The
lower
(sensitivity,
formed
when
left corner
0%;
all findings
By realizing
that
the
of an ROC 100%) is
specificity,
are read FPF equates
as normal. to
1 specificity] and by reversing the direction ofincreasing values on the x axis (1.0 at the origin and 0.0 at the right corner), the x axis becomes specificity. The y axis label, TPF, -
is the
same
as sensitivity.
With
new
labels,
these are now called “sensitivity-specificity curves,” as proposed by Brismar (6). Except for the labels, the ROC curves are unchanged (Fig
6).
U
CONSTRUCTION
with mined
with
U
Vining
and
Gladish
two
OF
an ROC known
by some
disease
the
curve.
ROC
experiment,
populations “gold”
and
AN
ROC
one
starts
of people standard):
other
(deter-
one
without
group
disease.
In
the case of imaging modalities, there will be two sets of images for each modality: one set of images with a known radiologic finding and the other without the finding. In the example of a thallium stress test, a radiologist might apply the following continuum of diagnostic criteria to determine the outcome of a patient’s examination: (a) defi50%
RadioGrapbics
The x axis of a conventional to create a sensitivity-specificity
CURVE To conduct
nitely
U
0.0
Specificity
a.
1150
0.5
1.0
1.0
abnormal
wall
defect,
findings
show
(b) probably
Volume
a greater
abnormal
12
than
find-
Number
6
CRITERION
: DEFINITELY
ABNORMAL
c ,,,‘P,,,,,
r
,,
iu
c:
r
1A!
f
u
1r r
‘P‘P‘P‘P‘P ‘P‘P‘P ‘P
‘P ‘P‘P‘P‘P‘P‘P‘P‘P ‘P ‘P ‘P‘P‘P‘P‘P ‘P‘P‘P ‘P ‘P ‘P‘P‘P‘P‘P ‘P‘P‘P‘P NORMAL
GROUP
Ut
tt***t* tt*t***
Ut
HEART
FPF2%
DISEASE
TPF
GROUP
57%
T
CRITERION
: PROBABLY
‘P’P’P’P’P’P ‘P i ‘P’P’P’P’P’P ‘P ‘P’P’P’P’P’P ‘P ‘P’P ‘P ‘P’P’P’P’P’P ‘P‘P’P’P ‘P’P’P’P ‘P’P ‘P‘P’P ‘P ‘P’P’P’P’P’P ‘P ‘P’P ‘P ‘P’P’P’P ‘P’P ‘P ‘P’P ‘P ‘P’P’P’P’P’P ‘P’P’P’P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P’P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P’P’P’P NORMAL FPF
GROUP 6%
ABNORMAL
t * * * * * , * * * * * t 1
* * * * * * * t * * * *
1992
Figure
8.
Probably
abnormal
cri-
greater
than
30%
wall
defect).
c c!
****t***t* *****t****
**********
HEART
DISEASE TPF
ings show a 30%-50% wall defect, (c) equivocal findings reflect a 10%-30% wall defect, (d) probably normal findings show a 1%-10% wall defect, and (e) definitely normal findings demonstrate a 0% wall defect. The probably abnormal criterion incorporates all findings that are probably abnormal plus all those that are definitely abnormal to classify the results of an examination as abnormal. Similarly, the equivocal and probably normal criteria incorporate their own collection of abnormal findings plus those in the more abnormal categories. In an ROC experiment, a set of images from patients with known diagnoses is read by cxperts (radiologists) as repeated series. In the first series, only a definitely abnormal crite-
November
n
A
7. Definitely abnormal criFindings are considered abwith a greater than 50% wall
terion. Findings are considered abnormal if they are either definitely abnormal on probably abnormal (a
.Ti #{241} r
Figure terion. normal defect.
=
GROUP 84%
non is applied to the a TPF and an FPF are
set of images from extracted. For the
which 5cc-
ond series, both definitely abnormal and probably abnormal criteria are applied and TPF and repeated
FPF are calculated. This process is for each of the subsequent levels. Finally, the collection of TPF and FPF values are plotted to form an ROC curve. To illustrate, assume the following values for the thallium stress test: (a) Definitely abnormal findings have a TPF of0.57 and an FPF of0.02 (Fig 7), (b) probably abnormal findings have a TPF of0.84 and an FPF of 0.06 (Fig 8), (C) equivocal findings have a TPF of
%‘ining
and
Gladish
U
RadioGraphics
U
1151
: EQUIVOCAL
CRITERION
‘P’P’P’P
tt*tttr
‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P
***t*t*
‘P’P’P’PccQ
Figure Findings
if they probably
greater
Figure
rion.
9.
Equivocal are
criterion.
considered
are definitely abnormal,
abnormal or equivocal
10% wall defect).
10. Findings
Probably normal are considered
FPF
(a
#{231}t
‘P ‘P ‘P ‘P ‘P
‘P’P’P ‘P’P’P ‘P’P’P ‘P’P’P ‘P’P’P NORMAL FPF
0.90 and an FPF of0.27 (Fig 9), and (d) probably normal findings have a TPF of 0.95 and an FPF ofO.54 (Fig 10). Values for TPF and FPF are not calculated for the definitely normal criterion (0% wall defect), since the radiologist blindly considers all examination findings abnormal at this level. This point is located at the upper right corner on the ROC curve (TPF, 1.0; FPF, 1.0). Plotting these values produces the ROC curve shown in Figure 11.
1 152
U
RadioGraphics
U
Vining
and
Gladish
‘P ‘P HEART
DISEASE TPF
: PROBABLY
L
c 4L
lt
l
ci
#{231}
:
#{231}L #{231} ? ci
t**t*tt***
27%
CRITERION
?
‘P ‘P
GROUP =
criteab-
normal if they are definitely abnormal, probably abnormal, equivocal, on even probably normal (a greater than 1% wall defect).
L
c?
NORMAL
abnormal,
than
‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P A1 J ‘P ‘P ‘P ‘P ‘P’P ‘P ‘P’P ‘P ‘P’P ‘P’P’P ‘P’P’P ‘P ‘P ‘P’P’P’P’P’P’P’P’P ‘P’P ‘P ‘P’P’P ‘P ‘P ‘P’P’P ‘P’P’P ‘P’P
?ci ‘P’P ? ‘P’P ‘P ‘P’P ‘P ‘P’P’P ‘P’P ‘P
NORMAL
t*t***t**t
‘P ‘P ‘P ‘P ‘P’P ‘P ‘P
‘P ‘P ‘P ‘P
*t**tt**t* HEART
54%
DISEASE TPF
INTERPRETATION
OF
To compare two or more an ROC curve is produced (Fig
* * * * *
?cLil
GROUP
U
GROUP
90%
=
12).
If a radiologist
GROUP
95%
ROC
CURVES
imaging modalities, for each modality resorts
to a system
of
flipping a coin to make diagnoses (random guessing), 50:50 distributions oftrue positive/false negative and true negative/false positive are found in the groups with and without disease, respectively (Fig 13). An ROC curve for random guessing consists of a straight diagonal line (curve C). If a perfect diagnostic test (or imaging modality) exists that completely separates two groups of people (with and without disease)
Volume
12
Number
6
1.0
C
TPF FPF
0
13
=
0.84
=
0.06
TPF
=
0.90
FPF
=
0.27
FPF
-
0.54
(a
U-
a)
.2
0.5 FPF
U)
=
0.02
0 0.
I-
0 0.5 False-Positive
1.0
Fraction
FIgure thallium
11. Plotting TPF and FPF values stress test ROC curve.
yields
the
Figure 12. Curve A represents an ROC curve obtained from a “perfect” diagnostic test, whereas curve C is produced from random guessing. Curve B resembles a more typical ROC curve.
1.0
C
0
13
Ce U-
(Random
Guessing)
0
.
0.5
U)
0 0. I-
0 0
0.5 False-Positive
without
any
false-positive
1.0 Fraction
or false-negative
findings (Fig 14), an ROC curve resembling curve A is created. Sensitivity and specificity equal 100% for all diagnostic criteria. In reality, an ROC curve for a diagnostic test or imaging modality more likely resembles curve B. As an ROC curve approaches the y axis (in the direction of curve A), the test or modality better approximates a perfect test and better
enables
one to distinguish between patients with disease and those without (finding is present or absent on image), with fewer falsepositive and false-negative findings over a
November
1992
continuum the answer is better U
of diagnostic criteria. Therefore, to the question ofwhich modality is system
A in Figure
1.
SUMMARY
Important
points
regarding
ROC
curves
in-
dude the following: erated by plotting function of Li continuum is confusing,
(a) An ROC curve is gensensitivity on the y axis as a specificity] on the x axis for a of diagnostic criteria. If the name the x axis can be reoriented and
Vining
and
Gladish
U
RadioGrapbics
U
1153
RANDOM
GUESSING
ft
,
L
:r
!A
c2? I i
I
IJJI:
I
ii
11
1:
(
1
I
. i.
i:
.
‘P’P’P’P’P’P’P’P’P’P ‘P’P’P’P’P’P’P’P’P’P *tt*
‘P’P’P’P’P’P’P’P’P’P ‘P’P’P’P’P’P’P’P’P’P ‘P’P’P’P’P’P’P’P’P’P Figure duces FPFs
13.
Random
guessing pro50:50 TPFs/FNFs and TNFs/ in the two populations.
NORMAL
FPF
negative
HEART
GROUP =
50%
DISEASE TPF
14. A perfect test comseparates the two groups any false-positive or false-
Figure pletely without
******
PERFECT
GROUP 50%
=
TEST
‘P‘P‘P‘P‘P‘P‘P‘P‘P‘P
findings.
‘P‘P‘P‘P‘P‘P‘P‘P‘P‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P ‘P’P’P ‘P’P’P ‘P’P ‘P ‘P’P’P ‘P’P’P ‘P’P ‘P’P’P’P’P’P’P’P’P’P ‘P ‘P’P’P ‘P’P’P ‘P’P ‘P ‘P’P’P ‘P’P’P ‘P’P
‘P ‘P ‘P ‘P ‘P ‘P
‘P‘P‘P‘P‘P‘P‘P‘P‘P‘P NORMAL FPFO%
an ROC
thought of as a sensitivity-speci(b) A move toward the right on an ROC curve is equivalent to using less stringent diagnostic criteria (overreading), thereby increasing sensitivity but decreasing specificity. A move toward the left (underreading) increases specificity but decreases sensitivity of the diagnostic test. (c) The ROC curve closest to the y axis represents the best diagnostic test because it approaches a perfect test, with fewer false-positive and false-negative findings. ficity
curve
curve.
DISEASE
HEART
GROUP
TPF
U
1.
Goodenough
4.
K, Lusted
LB.
Ra-
of receiver
(ROC)
curves.
operating Radiology 1974;
110:89-95. Lusted LB. Logical analysis in roentgen diagnosis. Radiology 1960; 74:178-193. Metz CE. ROC methodology in nadiologic imaging. Invest Radiol 1986; 2 1:720-733. Turner DA. An intuitive approach to receiver operating characteristic curve analysis. J NucI
Med
5.
DJ, Rossman applications
characteristic
3.
100%
=
REFERENCES diographic
2.
GROUP
McNeil
1978; BJ.
19:213-220. Case
15: decision
analysis
I. In:
Siegel BA, ed. Nuclear radiology N test and syllabus. Reston, Va: American College of Radi6.
ology, Brismar
1990; 299-307.
J. characteristic
Understanding
curves: 1991; 157:1119-1121.
1 154
U
RadioGrapbics
U
Vining
and
Gladish
a graphic
Volume
receiver-operatingapproach.
12
Number
AJR
6