Glendon Stanton
G. Cox,
MD
J. Rosenthal,
#{149} Larry
MD
T. Cook,
PhD
Chest Radiography: of High-Resolution with Conventional
This study was performed to compare the performance of observers using three display formats for chest radiography. The display formats were conventional radiographs, digitized radiographs (2,048 x 2,048 X 12 bits) printed on laser film, and digitized radiographs (2,048 X 2,048 X 12 bits) displayed on a high-resolution (2,560 X 2,048 x 12-bit) gray-scale display. The test set for the study consisted of 163 cases. Sixty-four of the cases were normal, whereas the 99 remaining cases demonstrated one or more common radiographic abnormalities. Nine abnormalities were selected for analysis: costophrenic angle blunting, interstitial disease, atelectasis, pneumothorax, parenchymal mass, consolidation, obstructive disease, hilar/mediastinal mass, and apical scarring. Six experienced general radiologists participated in the evaluation. Receiver operating characteristic curves were generated for each abnormality and display format. The results indicate that, while the three display formats are equivalent for the detection of some abnormalities, detectable differences in observer performance may be seen even at 2,048 X 2,048 X 12 bits for the detection of obstructive disease, pneumothorax, interstitial disease, and parenchymal masses.
I
Radiology
1990;
radiography,
McMillan,
MD
conventional radiography, the radiographic film serves as the means for image acquisition, archival, and display. Although film continues to be the medium of choice for chest radiography, the successful applications of interactive display and image processing techniques in other areas of medical imaging have stimubated an interest in the design and development of digital acquisition and display systems that could ultimately replace traditional screen-film techniques. In theory, the separation of image acquisition functions from the functions related to image disN
play
and
proved
interpretation
control
quisition
offers
of either
parameters
im-
process.
can
Ac-
be opti-
mized independent of display parameters, and it becomes possible to correct for technical errors in image acquisition by adjusting various display parameters. Furthermore, the interactive display of images may improve diagnostic performance by effectively
matching
the
is the
assumption
of observers be
mance tional ies
have
that
using
the
equivalent
to the
of observers techniques. been
system
perfor-
in an
effort
to determine the spatial and contrast resolution necessary for adequate reproduction of chest nadiognaphs. Most of these investigations have been concerned with determining
60.11,
176:771-776
the Department of Diagnostic Blvd. Kansas City, KS 66103.
ber
1, 1989; revision requested January Address reprint requests to G.G.C. C
RSNA,
1990
Radiology, University of Kansas Medical From the 1989 RSNA scientific assembly. 24, 1990;
final
revision
received
April
Center,
Received 16; accepted
39th
shown that the size, on the larger
the number of pixels the better the observer Chest images displayed 512 format information
provide than
in the image, performance. in a 512 X
more diagnostic those displayed
a 256
X 256 format,
formats terms
are much of observer
but
both
in
of these
less satisfactory performance
in than
are conventional film images. Observer performance continues to improve when the matrix size is increased from 512 X 512 to 1,024 X 1,024
pixels.
While
observer
perfor-
mance when using images displayed at 1,024 X 1,024 pixels approximates that when using plain film for many detection tasks, there is a measurable deterioration of observer performance for the detection of subtle chest lesions. Only recently
tnix
for
Hz.
These
have
display
systems
be-
in the design units, it is now
at refresh
frame
rates
buffers
for the present 2,048 displays.
of 72
are the
generation Because
basis
of 2,048 of the high
X
data rates that must be maintained between the frame buffer and the display in these
monitor, the high-resolution
driven vertens. up
video
monitors systems are
by 8-bit digital-to-analog The use of high-speed
tables
level
I From Rainbow
generally the pixel
eledigistud-
possible to construct high-resolution frame buffers that can support images on a 2,560 X 2,048 X 12-bit ma-
using convenA number of stud-
performed
ies have smaller
result of improvements of solid-state memory
performance
display
the minimum number of pixel ments required for displaying tized chest images (1-5). These
come available that are capable of displaying images of greaten than 1,024 X 1,024 X 12 bits. Primarily as a
characteristics
of the display to those of the obsenyen’s visual system and by allowing certain features in the image to be enhanced or suppressed. A primary concern in the design and construction of an interactive display system for chest radiography
Index terms: Diagnostic radiology, observer performance #{149} Radiography, comparative studies #{149} Radiography, computer-assisted #{149} Radiog#{149} Thorax,
H III, PhD
Comparison Digital Displays and Digital Film’
must
raphy, digital 60.1215
#{149} John
J Dwyer
#{149} Samuel
and
interactive
functions
allow
conlook-
window!
the
entire
con-
and
DecemApril 19.
Abbreviations: characteristic, =
tive
true-positive
fraction
ROC SD
=
fraction
receiver
standard assuming
operating
deviation,
TPF185
a false-posi-
of 0.185.
771
tnast
range
despite
of the
the
image
to be viewed
limitations
imposed
by
the 8-bit digital-to-analog converters. There are few data documenting the performance of observers using 2,560
x 2,048
interactive
displays
to conventional
film.
relative
One
recent
study did indicate that observer penformance for the detection of septal lines and pulmonary nodules with a 2,048 X 2,048 interactive display was
comparable film
to that
and
suggested
with
conventional
that
a 2,048-line
completed
a study
comparing
pixel ages
using
laser-printed displayed
x 2,048 graphs
images interactively
This article our study.
presents
the
MATERIALS
radioen-
of the
chest.
results
of
AND
during
the
first
3 weeks
reviewed
all
tween
1 and
5 was
dence
value
of 1 indicating
dition
was
“definitely
condition fidence
was values
“definitely of 2 or
type
present”
A confidence the presence
agnostic mality
criteria for were strictly
any overlap in vations. Where
for the were
each type of abnordefined to eliminate
the classification necessary, specific
classification For
example,
was defined opacification
volume
loss,
terstitial
with
component. category of increased
chymal mation.
destruction, Because
The included lung
of interstitial airway disease,
structive
pattern to the
gory
rather
category.
cluded or
previously
in the
described
making
their
tions
of the
were
then
decision
two
Concordance
defined
either
The
reviewing
evaluated
nator.
by the study
as exact
assigned
“obstructive
loss
areas
of linear
focal The
was
defined
Because granubomas such lesions
unless
they
772
were
Up to four
Radiology
#{149}
in one
level at either For example,
extreme if the first
a particular
rating
of 4 by
the
first
nodule” of the in were
lobes category
larger
nodules
prevalence
of
were
category
this
tween
joint
in
1 or 2 was
4 or 5 by the
which
all
di-
were
could not reference
were
subjected
excluded
from
in
Dunby a
able-
set.
nor the the receiver
operating characteristic (ROC) studies. The resulting test set consisted of 163 chest typical
radiographs of the
tered
in
our
that patient
and
were regarded as population encoun-
institution.
none
Of
of the would
these,
abnormalities have
been
99 remaining one or more
presented
64 dem-
being
in
Ta-
This
was
of abnormal were multiple of cases.
The
by test cases
greater
than
radiographs abnormalities
be-
abnormalities
ranged
in difficulty
to very subtle. radiographs
and
examinations
were included in the test set. The posteroanterior radiographs were obtained in an automated medium (Eastman
chest screens, Kodak,
room by OC film Rochester,
using Lanex , and a 12:1 grid NY). These
examinations were photo timed at 140 kVp with an average of 6 mAs. The portable examinations were obtained without a grid by using Lanex Regular screen with
OC film (Eastman
Kodak).
tons were
and
80 kVp,
was
per per
view
be-
the test
202.
num-
of
of abnormalities
anteroposterior
ples ples to
be satisfactorily to the diagnostic
are
Technical
an average
with
inch, line
of the
in the test set were X 4,096 X 12-bit digi-
a laser
film
digitizer
base
The
pixel
in
were
The
the
a one-on-one
To
record
particular
a 2,048
laser
interpolation
to map
were
format
X 2,048
printer
uses the
by
recorded image
images
43-cm Ektascan laser-sensitive man Kodak) by using a laser er.
with
X 12 bits 2,048 X then
to create
study.
size
0.08 mm. The image was
X 2,048 resulting
images tape
for the
recorded
a total of 4,096 samthe 35-cm field of
to 2,048 The
X 12-bit magnetic
(Ma-
NY). The was 0.08 mm, was 312 sam-
was therefore of each digitized
reduced averaging.
2,048 on
giving across
digitizer.
this digitizer matrix size then pixel
facexpo-
3 mAs.
tnix Instruments, Orangeburg, baser spot size for this unit and the sampling frequency
of the three parreviewer dis-
study coordinator participated
was
tab matrix
of the for any
resolved
of at beast two Cases in which
set
The
category
number
The radiographs digitized to a 4,096
discon-
discrepancies
test
total
set
sure
obser-
considered
the
the observers
consensus ticipants.
ed as normal. The tions demonstrated
of abnor-
of
and
were
session
than
categories
5 by
to be in were considered
of abnormality
onstrated tabulated
in
and
assignment
our patient popunot categorized 5 mm
a confi-
reviewer
in the
1 . The
portable
of the rating reviewer as-
Cases in which the observations two reviewers were discordant
Neither the two reviewers
scar-
cavitary
two
confi-
abnormality
reviewer
criteria
cate-
or more
the
ble
investigated.
of each
from quite obvious Both posteroanterior
vations of the reviewers were concordant were included in the test set without further review. To eliminate equivocal cases from the test set, all assignments of 3 in the face of any other assignment by the
crepancies viated with
as-
parenchymal
to include
calcified bation,
ameter.
disease”
“parenchymal
masses.
bulla forof a
were
by
disease
reflected
of the
of one
being
of examples
in a number
was
dence rating of 1 and the second reviewer assigned a rating of 2, the observations were considered concordant. Similarly, a
ing
fibrosis in obcases of an ob-
fibrosis
coordi-
agreement
normabities
the number cause there
radiologists
or differences
signed
in
observa-
of observations
ratings
dence scale.
criteria
classifications.
a joint review by the study coordinator and the two reviewing radiologists.
of eviparen-
than the “interstitial disease” The “atelectasis” category in-
volume
ring. or
with
alof
“obstructive cases volume,
re-
dant.
an in-
or bleb and of the frequency
component structive signed
or without
“probably
bers
second
“consoli-
as a process with without evidence
and
disease” dence
of obserrules
of abnormalities
formulated.
dation” veolar
di-
Cona
abnormality was “equivocal” or indeterminate. When possible the two reviewers were asked to describe the anatomic bocation of the observed abnormalities. The reviewers were required to refer to the
Cases
the
that
value of 3 mdiof a given type of
reviewer.
(Table
study,
a
the
present,”
other
abnormalities
and
that
was
reviewer
of the
present”
of process
if an
of nine
a confi-
the con-
present.” 4 indicated
by one
purposes
with
or “probably
spectiveby. cated that
on a category. value be-
of 5 indicating
made
the
nadiographs
that
discordant
or more
250
not
value
of
These
assigned,
nations 1). For
20 years
and recorded their observations worksheet listing each disease For each category a confidence
were considered Observations
demonstrating
then preradiologists,
than
the second agreement.
or examinations
radio-
radiography.
certified radiologist serving as the study coordinator. The cases selected represented either radiographicably normal examione
f
of February
From these folders, selected by a board-
a given
more
radiologists
reviewers
To create the test set used in this study, the radiographic records and film folders for patients undergoing chest examina1989 were reviewed. 250 radiographs were
had
in chest
confidence
METHODS
tions
of whom
on
experience
not
and imat 2,560
abnormalities
250 radiographs were to two board-certified
both
present
particular
X 2,048-
pixels to conventional for a variety of commonly
countered
ac-
observ-
2,048
were
graph. The sented
confidence
digital display system might be an ceptable alternative to the conventional methods of displaying chest radiognaphs (6). We have recently en performances
mality
on
data
also 35
X
film (Eastfilm recordimage
a cubic
2,048
this
spbine
image
to
the 4,096 array size used to generate the one-on-one printing format. When printing the images on film, window and bevel settings were adjusted to create two yensions
of each
interpret-
was
printed
examinaof the ab-
of 4,095 and higher-contrast
radiograph.
with
the
One
full
version
window
a level setting of 2,048. version of the image
September
width A was
1990
also generated
by using
a window
which
width
of 1,732 and a level setting of 2,288. The study coordinator then compared these two printed images with the original ra-
diograph and selected the image that most closely approximated the appearance of the conventional radiograph for inclusion
in the
test
set.
All other
baser-
printed images were discarded. The interactive display system used for this study is one of the earliest 2,560 X 2,048 X 12-bit systems available (Megascan Technology, Boston). The monitor is driven by an 8-bit 500-MHz digital-to-anabog converter. The frame buffer of the display is a 9-Mbyte memory capable of accommodating an entire 2,560 X 2,048 X 12-bit
image.
Two
of these
displays
were
interfaced to our HYPERchannel (Network Systems, Minneapolis) image management network with VME multibus. The display systems were placed in controlled
environments
where
room
light-
ing could be varied and where no light fell directly on the screen of the display monitor.
The
nonglare
surface
of the
dis-
play screen further reduced the effects of ambient lighting. Local memory for each display
system
chester
800 Mbyte
was
radiologist’s ferred
provided
request,
from
the
by a Win-
magnetic
disk.
At the
were
trans-
images
tape
archive
to the
Win-
chester disk via the HYPERchannel network. Because of the nature of the interface between the Winchester disk and the display
system,
1 5 seconds
were
required
to load each 2,048 X 2,048 X 12-bit image into the frame buffer of the display systern. In the future, this time will be neduced by using a VME bus rather than the emulator. Standard software techniques
were
zoom
used
to provide
electronic
and pan functions.
Six board-certified
radiologists
pantici-
pated in the review of the test set. Each radiologist read the full set of 163 cases. Each
participant
saw
one-third
presented
as conventional
one-third
as baser-printed
and one-third system.
with
Each
of the
film
images,
the interactive
participant
set
radiographs,
was
display
allowed
to
see each case only once. Consequently, each image in the 489-image data base was read only twice by independent observers.
The participants in two session,
reading
completed sessions.
During
the test set the
first
the conventional films and baserprinted images were placed on a standard film alternator. The conventional films and laser-printed images were presented randomly until each radiologist compbeted the hand-copy reading session. The second
reading
session
consisted
presentation of the remaining the test set with the interactive minimum
of 2 weeks
reading
sessions.
sessions
always
elapsed
The interactive followed
the
of the
portion of display. A between
the
display film/digital
hard-copy sessions. No reader saw the images of the same patient twice, so there was no bias due to repeated readings of the same case in different formats. The hard-copy reading session was very much like a standard-film reading session, Volume
176
#{149} Number
3
these
observers
experience
every
day. Consequently, any bias in the study is probably related to the unique features of reading from a high-resolution cornputer display with contrast manipulation available.
During the case review, no radiologist was presented with more than one yension of any given case. The order of presentation of normal and abnormal cases
was randomized for each reader. The readers were supplied with a notebook containing
response
case was identified
forms
only
in which
each
by case number.
The reader was asked to make independent responses for each of the nine types of abnormality on a five-point scale identicab to that used in establishing diagnostic consensus. The diagnostic criteria for
each category of abnormality were reviewed at the beginning of each reading session. Completion of the test set required each observer to assign 1,467 confidence values. Consequently, the data base for analysis of observer performance
comprised 8,802 observer responses. The six participants in the study were selected to reflect a range of experience with interactive digital display systems. Only board-certified radiologists participated in this review. None of the three radiologists involved in case selection was allowed to participate in the study or to be present during the reading sessions. During the review session only the reviewing radiologists and a nonradiobogist observer were present in the reading room. The robe of the nonradiologist observer was to ensure that the recording of responses on the response form matched the case being reviewed. Interaction between the reviewer and the observer was limited to instructions concerning the recording of data and the operation of the interactive display system. In reviewing the conventional and laser-printed films, the observers were encouraged to maintam their usual viewing habits and reading rates, modifying them only enough to allow time for completion of the response forms. No time limit was imposed for completion of any of the reading sessions. The observer response data were ana-
lyzed using ROC techniques (7). Because ROC techniques cannot be used to analyze multiple a composite
abnormalities on to generate curve for all abnormalities,
ROC curves
for each
of the nine
disease
processes were generated separately (8). For the purposes of this analysis, a case was considered negative if it was either normal or if there were abnormalities other
than
the
one
being
analyzed.
between
pairs
of responses
for each given case. The program also computes a comparison of various ROC curves by using the x2 test (9). With this
program,
the ROC analysis
describes
ference
ROC curves are completely by two statistically defined The “a” parameter is the dif-
between
the means
of the two
distributions divided by the standard deviation of the signal-plus-noise distribution. The “b” parameter is the quotient of the standard deviation (SD) of the noise distribution divided by the SD of the signal-plus-noise distribution. Given conrebated data sets, the Corroc2 program uses a bivaniate normal model to estimate joint probability densities. A x2 statistic with 2 df can be constructed from a covaniance matrix by using parameters estimated with Cornoc2. This statistic can then be
used
to compare
the significance
of ap-
parent differences between any two ROC curves (9). The statistical significance of the differences between the ROC curves was estimated by using three indexes. On the basis
of the
under
calculated
the
curve
and
icance of their with the paired second
values
for
their
SDs,
differences two-tailed
index
the
areas
the
signif-
was
evaluated test (10). The
t
of comparison
was
the
bi-
variate x2 test just described. The final index of comparison in our study was the calculated value of the true-positive fraction assuming a false-positive fraction of 0.185, designated of TPF185 as an
as TPF185. appropriate
The index
selection for
comparison two-by-two observer
was based on a conventional contingency-table analysis of responses. For purposes of this
analysis,
observer
responses
of
1 and
2
were assigned to the “normal” test result, while responses of 3 or greater were considered “abnormal.” Of the 8,802 total responses, 7,588 represented the negative population for each of the abnormalities tested
and
each
display
format.
For
this
negative population, there were 1,404 false-positive responses, giving a cumulative false-positive fraction for all abnormalities and all display formats of 0.185. In all cases, the hypothesis tested is that the index for one display modality is equal
to that
for
the
modality
with
which
compared. P values of .05 indicate that the compared indexes are different at the 95% confidence bevel. It should be noted that in any ROC analysis based on data from multiple observers, the SDs for the areas under the curves and for the calculated true-positive fractions reflect both interobserver and intraobserver variations. it is being
RESULTS
The
program used to perform the ROC analysis-Conroc2-uses a maximum-bikelihood estimation technique to calculate binormal ROC data, taking into account correlations
The resulting characterized parameters.
the
relationship of the decision variable to the experimentally determined noise and signal-plus-noise response distributions.
The reading usual
time
conventional sessions reading
limit
required the hard minutes.
room
was
and digital film place in the
took
environment.
imposed,
to complete copy ranged The interactive
modality
and
the
time
the reading of from 70 to 120 display
reading session took 50-120 for completion. The areas under the curve display
No
and
minutes
disease
Radiology
for each process #{149} 773
are shown in Table 2, along with the SD for each value. The values for the area under the curve for conventional film ranged from 0.806 for detection of consolidation to 0.982 for the detection of pneumothorax. For the digital images printed on film, the areas under the curves range from 0.805 to 0.981, again for the detection of consolidation and pneumothonax, respectively. The areas for the interactive display system ranged from 0.789 for detection of consolidation to 0.951 for the detection of panenchymal masses on nodules. The P values determined by applying the paired two-tailed t test to the area indexes are shown in Table 3. The P values determined by application of the bivariate x2 test to the panameters used to fit the ROC curves are presented in Table 4. Finally, the calculated TPF.185 values and the P values obtained by applying the pained two-tailed t test values are shown in Tables 5 and 6, respectively. In terms of comparative performance, the three display modalities were equivalent by all indexes of comparison for the detection of costophrenic angle blunting, atelectasis, consolidation, apical scarring, and hiban or mediastinal masses. For the detection of obstructive airway disease, the digital images recorded on film were significantly better than the intenactively displayed images when these modalities were compared with the x2 test. For the detection of pneumothonaxes,
under dexes
comparisons
of the
performance
with
Costophrenic
film this used
images, but the decrease depended for comparison.
significance of on the index For the detec-
tion of parenchymal masses, the digitab images-whether recorded on film on displayed-tended to outperform the conventional film images. For this abnormality, these differences were significant for digital hard copy using any index of companison, but for the interactive display only
the difference was significant when the TPF.185 index was
used. DISCUSSION Our
study
the performance #{149} Radiology
was
designed
to evaluate
of radiologists
using
for Each Display
Digital
Film
disease
Pneumothorax
Interstitial
disease
mass
Parenchymal
Table
.934(.020)
.907 (.022) .880(.025)
.888(.024)
.805(.053) .902(.038)
.789 (.044)
.913(.022)
.842
.857 (.034)
.892(.025)
.982(010)
.981
.797 (.052) .898(.041)
.919 (.019)
.884 (.021) .956 (.014)
(.023)
.863 (.046)
in parentheses
Note-Numbers
Interactive Display
.871
.918(.033) .826(.052)
mass
of
Class
.890(.031) .806 (.046)
HIlar/mediastinal
Obstructive
and
Film blunting
angle
Modality
Conventional
Atelectasis Consolidation Apical scarri.n
.910(.045)
(.013)
(.048)
.838 (.034) .951 (.020)
are SDs.
3 for Comparison
PValues
of the Calculated
Areas
Costophrenic
Digital
blunting
Digital Film vs Interactive
Film vs Interactive
Film
Display
Display
.65
.23
.37
Atebectasis
.79
.61
.82
Consolidation Apical scarnin
.99 .75
.79 .89
.82 .89
Hilar/mediastinal mass Obstructive disease Pneumothorax Interstitial disease Parenchymal mass
.12 .40 .95
.82 .33
.18
Table
angle
Curves
Conventional
vs Abnormality
the ROC
under
Conventional
.10 .05 .25 .84
.05 .04 .08
.21 .05
4
x2 PValues
Bivaniate
for Comparison
of Display
Formats
Conventional
Conventional
vs Digital Abnormality Costophrenic Atelectasis Consolidation
angle
Apical scarring Hilan/mediastinal Obstructive Pneumothorax
in-
tenactive display compared with conventionab and digital film images. For interstitial disease, the interactively displayed images again showed decreased performance relative to the conventional and digital
ROC Curves
the
Abnormality
area
the
2
Areas under Abnormality
the curves and the TPF185 inindicated a significant decrease
in observer
774
Table
Interstitial
Film blunting
mass
disease disease
Parenchymal
mass
Digital
Film vs Interactive
Display
Display
.39 .40
.46 .46
.36 .97
.41 .95
.86 .80
.22 .80
.12
.94
.12
.51
.39
.65
.12
.14 .05
.08 .16
.05 .14 .01 .56
three different diagnostic modalities for the detection of a spectrum of abnormalities commonby encountered in chest radiography. The detection tasks ranged from the relatively simple identification of pneumothoraxes to more difficult determinations such as the identification of areas of early consolidation. A number of normal examina-
gree
tions
multiple abnormalities ing multiple abnormalities
and
a number
Film
vs Interactive
of examinations
with multiple abnormalities were included in the test set because such cases represent a significant population in our practice. The inclusion of cases with multiple abnormalities was also intended to prevent the observers from identifying a particular type of lesion as the object of the study. An attempt was made to include a distribution of abnormalities in terms of difficulty of detection, although no rating of the de-
of difficulty
of the
cases
was
per-
formed.
No ROC techniques are available for generating a composite curve from a test set that includes multiple abnormabities. Consequently, it is necessary to compare the performance of the var-
ious display modalities for each specific disease process. A test set including
examination ysis
as long
or cases showon a single
can be used as the
for ROC
anal-
score
each
observers
abnormality independently, long as the disease categories overlapping or interrelated. structions
to the
observers
and as are not The inin this
study
were designed to meet these criteria. When performing the ROC analysis in a study tion must
of this type, particular be paid to the definition
September
attenof
1990
Table
5
True-Positive
Fractions
Assuming
a False-Positive
Fraction
Conventional Abnormality
Costophrenic Atelectasis
Apical
angle
blunting
scarring
Hilan/mediastinal Obstructive Pneumothorax Interstitial
mass
disease disease
Parenchymal
mass
Note-Numbers
for Comparison
(.044) (.062)
.830 .772
(.044) (.053)
.885 .790
.651
(.066)
.688
(.063)
.613(066)
are
.822 (.089)
.839 (.084)
.684
(.079)
.861
(.064)
.717
(.075)
.724
(.068)
.802
(.064)
.654
(.062)
.974
(.028)
.969
(.026)
.865
(.045)
.857
(.041)
.781
(.054)
.723 .917
(.050) (.037)
.936 (.035)
SDs.
of TPF.185 Determined
Abnormality
Film
angle
(.036) (.052)
.861 (.081)
Conventional vs Digital
Costophrenic Atelectasis
Display
.813 .749
.789 (.055)
in parentheses
Table 6 PValues
Interactive
Film
Film
Consolidation
of 0.185
Digital
blunting
with
Two-tailed
Conventional Film vs Interactive
Display
t Test Digital Film vs Interactive Display
.62 .81
.12 .62
.25 .75
Consolidation
.65
.65
.35
Apical
.62
.84
.84
.07
.69
.07
.28
.40
.12
.91
.02
.01
.18 .03
.03 .03
.38 .98
scarring
Hilar/mediastinal Obstructive
mass disease
Pneumothorax Interstitial
Parenchymal
disease mass
phor system and displayed on 2,560 X 2,048-pixel monitors. These investigatons concluded that the interactive display of chest images on high-resolution displays offers an alternative to the viewing of computed chest nadiographs in a hand-copy format. On the basis of this study, the penformance of 2,048 X 2,048 digital hardcopy
imaging
display
true-positive and true-negative examinations. Because pathologic or surgical confirmation is not available for many abnormalities detected with chest radiography, it has become standard practice to establish the truth status of a test case by a consensus of experienced observers and then to exclude these observers from participation in any other aspect of the study. This was the approach taken in the construction of our test set.
In an ROC study based on simultaneous interpretations for multiple abnormalities, care must be taken when assigning the “truth value” to cases demonstrating one or more abnormalities. In such a study there are three ways in which to define a “negative” examination. First, an examination may be defined as negative only if it shows none of the disease processes being tested. In this analysis only examinations showing the abnormality of interest and examinations that show no abnormality of any type are considered. This approach antifactualby shifts the
ROC curve
to the left and increases
the
area under the curve. A second alternative is to define a case as negative if it demonstrates a disease process other than the one of interest. In such an analysis, examinations showing no abnormality are not considered, and the comparison is between examinations that show the abnormality and examinations that show any abnormality othen than the one being analyzed. This approach artifactually shifts the ROC
Volume
176
#{149} Number
3
curve to the night, decreasing the area index. The third approach to the definition of negative is to include both normal cases and cases that do not show the abnormality being tested. This was the option selected for the present study and results in values for the area under the ROC curves that are intermediate between the upper and lower bounds established by the other methods. Regardless of the way in which the class of true-negative results is defined, all of the analyses reveal similar trends in terms of observer performance. Reports of applications of display systems with matrix sizes of approximately 2,048 X 2,048 are just beginning to appear. Hayrapetian et al (6) cornpared observer performance for the detection of septal lines and parenchymal nodules using conventional radiographs, digital hard copy, and 2,048line digital display with and without user interaction. On the basis of their findings, these investigators concluded that 2,048-line displays might be an alternative to conventional film in chest radiography. In a follow-up of this diagnostic study, Widoff et al (1 1) reported that the detection of simulated nodules placed in an anthropomorphic chest phantom using a 2,048 interactive display was comparable to that achieved with analog film. A third study reported by Frank et al (12) examined observer performance for cornputed chest radiographs obtained using a 2,140 X 1,740 X 10-bit storage phos-
is generally
to
equivalent
that of conventional radiography. The only exception to the trend toward statisticab equivalence-as determined by means of the area index-is the detection of panenchymal masses. For this task, the digital hand copy shows significantly improved performance cornpared with the analog film images regardbess of the index of comparison used. This improvement in observer performance for detection of parenchymal nodules is likely related to the improved rendition of contrast information by the digitized hard copy. With regard to the performance of the interactive display relative to anabog film on digital hard copy, the area and TPF185 index comparisons show that the conventional nadiognaphs were significantly better than the interactive for
the
detection
of
interstitial
lung disease. The finding that observer performance for the detection of interstitial disease was equivalent for digital hand copy and analog radiography, while the performance of an observer using the interactive display decreased significantly, suggests that the decrease is due to the nature of the interactive display
system
rather
than
being
an
ef-
fect rebated to the digitization of the images. We believe that this detenioration in performance is related to the unfamiliarity of our observers with soft-copy display of chest radiographs. Because no training sessions were used, the participants necessarily applied the diagnostic critenia used for conventional radiographs in making their assessments. Because of the improved contrast rendition of the digital format compared with that of conventional film, minor adjustments in the window and level parameters of the interactive display tend to accentuate the interstitiab markings. Consequently, the ROC curve for the interactive display is shifted to the night and the area index decreases. Another effect observed in this study was a twofold increase in the time required for completion of the interactive reading sessions compared with that for the hard-copy sessions. This increase was due largely to the interactive adjustment of image contrast and brightness. Effects of this sort are reduced or eliminated as observers become more experienced with the display system and as the user interface is improved.
For
the
detection
of pneumothoraxes,
Radiology
#{149} 775
the conventional film and digital hard copy are found to be equivalent, while comparisons of the area and TPF.185 indexes indicate a significant decrease in observer performance for the interactive display. That observer performance for the digital hard copy is equivalent to that for conventional film again suggests observer unfamiliarity with the interpretation of images when using interactive display features. Furthermore, there is some inherent loss of edge definition because of the raster scanning operation of the monitor. This loss of definition undoubtedly contributes to diagnostic errors in tasks requiring accurate definition of linear features oriented obliquely or perpendicularly to the raster lines of the video monitor. Theoretically, this effect may be at least partially eliminated by the use of image magnification and edgeenhancement algorithms. The use of the bivariate x2 test mdicated other significant differences in the observer performance with the three diagnostic systems. For the detection of obstructive airway disease, there is a significant decrease in observer performance for the interactive display compared with the digital hard copy. Two factors are likely to account for this finding. First, the uniform contrast rendition of the digital hard copy allowed greater sensitivity to regional differences in image contrast. On the other hand, the interactive window and level features of the display tend to confound the observers’ interpretation of these contrast differences. Results of the x2 test also indicated a significant decrease in performance with the interactive display compared with the digitized radiographs in the detection of interstitial disease, as well as a significant decrease in the performance with conventional film compared with digitab hard copy for the detection of parenchymal masses and nodules. Several objections to the use of the area and the X2 indexes for comparison of ROC curves can be made (13,14). Principal among these is the contnibution to the area index by the region of the curve at high false-positive fractions. For comparing ROC curves that cross or have similar configurations in the region of high false-positive fractions, the area and x2 indexes are perceived as relatively insensitive to local variations in observer performance in the range of false-positive fractions encountered in the clinical setting. For this reason, comparison of the calculated true-positive fractions at a selected
776
#{149} Radiology
false-positive true-positive
fraction fraction
or of the over
a range
average of
clinically relevant false-positive values has been advocated by some investigatons (4,7,14). As the final comparison of observer performance in our study, the selection of the TPF185 was based on the calculated average false-positive fraction for all abnormalities and all display formats tested rather than on an arbitrary assessment of a “clinically relevant” false-positive fraction or range of false-positive fractions. One difficulty with the display systern used for this study was related to the phosphor system selected for the monitor. The original version of the 2,560 X 2,048 monitor used a white phosphor (#P167). This resulted in a noticeable orange tint that tended to distract the observers. The output of the monitor was also characterized as “dim” compared with the back-bit conventional and digital hard copy. The manufacturer has recently changed the phosphor to provide a blue tint (#P104) similar to the tint used in the base of many radiographic films. The luminance of the monitor has also been significantly increased. We believe that these improvements will significantly enhance performance with the interactive display. Our study was designed to accentuate any differences in observer performance between conventional radiographs, digital hard copy at 2,048 X 2,048 pixels, and interactively displayed 2,560 X 2,048 X 12-bit images. Our results suggest that for certain abnormalities, the performances with the three display formats are not equivalent. Our findings do indicate that for all abnormalities tested, the digital hard copy performed as well as or betten than conventional film. We also found that in some instances, the performance of the interactive display systern failed to match that of digital hard copy or of conventional film (15). Although the causes of these differences can probably be reduced or eliminated by further experience with the display system and by applying image enhancement, it is premature to conclude that the present generation of 2,560 X 2,048 displays can produce images equivalent to those on conventional film for all detection tasks. It is equally premature to conclude that the interactive display of images at 2,560 X 2,048 pixels offers no advantage over conventional film, since observers using the interactive system show significant improvements in performance for certam tasks, such as the detection of parenchymal nodules. U
References 1.
Chakraborty DP, Breatnach ES, Yester Soto B, Barnes GT, Fra.zer RG. Digital conventional chest imaging: a modified
ROC study
2.
of observer
simulated nodules. 158:35-39. Foley WD, Wilson
performance
Radiology CR,
MV, and
using
1986;
Keyes
CS, et al.
The effect of varying spatial resolution on the detectability of diffuse pulmonary nodules: assessment with digitized conventional radiographs. Radiology 1983; 141:25-31. 3.
Goodman LR, Foley WD, Wilson Rimm AA, Lawson TL. Digital ventional chest images: observer
CR, and conperfor-
mance with film digital radiography tem. Radiology 1986; 158:27-33. 4.
Lams PM, requirements
Cocklin ML. for digital
Spatial resolution chest radio-
graphs: an ROC study of observer mance in selected cases. Radiology 158:11-19. 5.
MacMahon
H,
K, Sabeti
Vyborny
V. Solomon
CJ,
SL.
observer performance. 158:21-26.
7. 8.
Hayrapetian A, al. Comparison play formats in an ROC study. Metz CE. ROC imaging. Invest Rockette
HE,
perfor1986;
Metz
CE,
Digital
raphy of subtle pulmonary an ROC study of the effect
6.
sys-
Doi
radiog-
abnormalities: of pixel size
Radiology
on
1986;
Aberle DR. Huang HK, et of 2048-line digital disconventional radiographs: AJR 1989; 152:1113-1118. methodology in radiology Radiol 1986; 21:720-733.
Gun
D, Cooperstein
LA,
et al.
Effect of two rating formats in multi-disease ROC study of chest images. Invest Radiol
9.
1990;
25:225-229.
Metz CE, Wang P-L, Kronman HB. A new approach for testing the significance of differences sured from
between correlated
F, ed. Information imaging.
The
ROC curves meadata. In: Deconinck
processing
Hague:
in medical
Martinus
Nijhoff,
Probability
and
Englewood
Cliffs,
1984; 431-445. 10.
Miller
I, Freund
tistics
for
NJ: Prentice-Hall,
11.
12.
13.
14.
Widoff
1965;
B, Aberle
sta-
166.
DR. Brown
K, et al.
Hard copy versus soft copy display of 2,000 digital chest images: ROC study with simulated lung nodules (abstr). Radiology 1989; 173(P):401. Frank MS. Jost RG, Blame GJ, Moore SM, Whitman RA, Hagge R. Interpretation of mobile chest radiographs from a high-resolution CRT display (abstr). Radiology 1989; 173(P):401. Habicht JP. Assessing diagnostic technologies (abstr). Science 1980; 207:1414.
Hanley istic
15.
JE.
engineers.
JA.
(ROC)
Receiver methodology:
operating the
characterstate
of the
art. CRC Rev Diagn Imaging 1989; 29:307335. Slasky BS, Gun D, Good WF, et al. Receiven operating characteristic analysis of chest image interpretation with conventional, laser-printed, and high-resolution workstation images. Radiology 1990; 174:775-780.
September
1990