ULTRASONIC IMAGING 14, 159-185 (1992)

SEGMENTING ULTRASOUND IMAGES OF THE PROSTATE USING NEURAL NETWORKS James S. Prater and William D. Richard Washington University in St. Louis Department of Electrical Engineering One Brookings Drive St. Louis, MO 63130

This paper describes a method for segmenting transrectal ultrasound images of the prostate using feedforward neural networks. Segmenting two-dimensional images of the prostate into prostatk;and nonprostate regions is required when forming a threedimensional image of the prostate from a set of parallel two-dimensional images. Three neural network architectures. are presented as examples and discussed. Each of these networks was trained using a'small portion of a training image segmented by an expert sonographer. The results of applying the trained networks to the entire training image and to adjacent images in the two-dimensional image set are presented and discussed. The final network architecture was also trained with additional data from two other images in the set. The results of applying this retrained network to each of the images in the set are presented and discussed. o 1992 Academic Press, Key words: Neural network; prostate; segmentation; transrectal; ultrasound.

I. INTRODUCTION The use of transrectal ultrasound in the diagnosis and management of prostate cancer has grown rapidly in recent years. Using ultrasound images of the prostate, expert physicians in the field have developed criteria to suggest the presence of disease or differentiate one disease from another [l]. Other imaging modalities have significant drawbacks. Plain and contrast radiographs have been shown to be of little benefit in evaluation of the prostate pathology [2]. Computed tomography allows for determination of the size of the gland, but in general there are no specific findings to differentiate between benign and malignant processes [3]. Magnetic resonance imaging (MRI) allows identification of the pathology of the prostate, although there is some overlap between benign and malignant prostatic disease indications [4].In addition, MRI is expensive, relatively immobile, requires a long examination time, and may not be readily available. Transrectal ultrasound imaging can avoid these difficulties. The transrectal ultrasound exam in its current form provides the physician with only two-dimensional imaging capabilities. Three-dimensional imaging capabilities, available with MRI, are not available to aid in the determination of prostate size or for use in monitoring the change in size of the gland during the aging process or during radiation or endocrine therapy. The availability of three-dimensional capabilities would, therefore, enhance the diagnostic process considerably.

159

0161-7346/92 $5.00 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.

Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

The shape of the contour of the prostate in a two-dimensional ultrasound image of the gland is, itself, an important indicator used in staging prostate cancer [5]. By identifying the contour of the prostate in a series of two-dimensional prostate images, it is possible to form a three-dimensional image of the gland [6]. These three-dimensional images, which readily show the irregular surface typical of many cancerous glands, have been used to monitor the size and shape of the prostate during therapy (71. The segmentation, or contour identification, required in the three-dimensional image formation process is currently performed manually. Manual image segmentation requires an expert sonographer and is both time consuming and expensive. Automatic methods have the promise of high speed and low cost. This paper describes an automatic method for segmenting transrectal ultrasound images of the prostate using feedforward neural networks. Three neural net architectures are presented as examples and discussed. Each of these networks was trained using a small portion of a training image segmented by an expert sonographer. The results of applying the trained networks to the entire training image, as well as to’ adjacent images in the two-dimensional image set, are presented and discussed. The final network architecture was also retrained with additional data from two other images in the set. The results of applying this retrained network to each of the images in the set is presented and discussed. 11. OPERATION OF FEEDFORWARD NEURAL NETWORKS The most common type of neural network used in engineering applications is the layered feedforward network. Figure 1 shows such a network, including input, hidden, and output layers [S, 91. Generally, one input and one output layer are used with one or more hidden layers. A network with no hidden layers reduces t o a linear classifier, or perceptron [lo]. Input units are essentially storage registers, simply holding the value of the input data. Hidden and output units are simple model neurons. The function of each neuron is to combine input data using a weighted sum operation and pass the resulting sum through an s-shaped nonlinear function called a sigmoid. Figure 2 shows this process. The output (or state) of neuron (or unit) i, driven by N inputs zj,each

Input Units

Fig. 1 Three-layer neural network.

160 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

SEGMENTING ULTRASOUND IMAGES WITH NEURAL NETWORKS

Inputs

Input Weights

x2

*3

X

J

/

fwio

Bias Weight

XN "1-

N 1

1

j=O

Fig. 2 Neuron unit function.

with weight

wij,

is given by N

si = f ( C w i j z j ) , j=O

where f() is the sigmoid function. In this case, f() is given by

Other sigmoid functions are sometimes used in neural net systems, including the hyperbolic tangent function. Nonsigmoidal functions such as ramp or step functions may also be used. The choice of nonlinearity is somewhat arbitrary, although most training methods require a continuous function. Networks with continuous nonlinearities, such as sigmoids or ramps, also have the capability to implement more complex functions than networks with threshold type nonlinearities [ll]. Each layer in the network of figure 1 receives input from the previous layer. Each unit also has a bias input, the j = 0 term in Eq. (l), with constant value zo = 1. To produce output, networks simply input a pattern, and each unit performs the function given by Eq. ( l ) , operating on its input and passing its output to the next layer. Thus, the network implements a nonlinear mapping from its input space to its output space. The input-output mapping of a given neural network is determined by its architecture (connection topology) and by the values of the weights within the network. The architecture is chosen by the designer, and the weights are obtained by a training process. Training the network is usually done as follows. First, a set of small random weights is chosen. Second, an input pattern is presented to the net, and the network output is calculated. Third, the network output is compared to the desired output for the given

161 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

input pattern, and an error is calculated. Fourth, the weights of the network are adjusted in some way to reduce the output error. The second, third, and fourth steps are repeated using many input patterns and the desired outputs for those patterns. Often, a set of several thousand patterns, called the training set, is used repeatedly in this process. Eventually, a weight set is developed that produces the correct output for all or most of the input patterns (the training set). Thus, the desired nonlinear input-output mapping is learned by example, rather than being designed into the network. The power of a neural net classifier lies in this nonlinear mapping. Linear classifiers can only segment the multi-dimensional input space into two regions using one decision boundary (a hyperplane). Neural net classifiers combine decision boundaries of each hidden unit in the network to segment the input space into regions of arbitrary shape [9]. Note that the number of hidden units required to implement these decision boundaries is not known. While some limits and estimates of the number of hidden units needed do exist [S, 121, the number of hidden units used in a given network is usually determined by trial and error. Many methods of adjusting weights for training exist. Much of the current interest in feedforward networks resulted from the development of backpropagation training algorithms (8, 131. Backpropagation assigns output error contributions to units in the network by taking partial derivatives of the output error with respect to network weights, starting a t the output and moving back toward the input. This is a form of gradient descent optimization [14]. Better optimization techniques have yielded significant improvements in training time and accuracy. The networks reported here were trained using one such method called conjugate gradient optimization. The conjugate gradient training program calculates network error over the entire training pattern set (instead of one pattern a t a time) and then changes the network weights to reduce that error. This process constitutes one iteration of the program. Several hundred such iterations were needed to train the networks reported here. In this work, the “opt” program from the Oregon Graduate Institute (Beaverton, Oregon) was used [15]. Opt is available with documentation via internet electronic mail. 111. NEURAL NETWORK ARCHITECTURES AND RESULTS Figure 3 is the center image of a 40 image set of the prostate of a 29 year old man obtained using a special-purpose transrectal ultrasound unit. This unit used a 7.5 MHz probe and had an axial resolution of 1 millimeter and a lateral resolution of 1 millimeter. Figure 4 is a manually segmented version of the image in figure 3 created by an expert sonographer. This image pair was used to train each of the neural networks described below. Figure 5 and figure 6 are adjacent images in the image set. Figure 5 was obtained 1 mm closer to the apex of the gland, and figure 6 was obtained 1 mm closer to the base. Three neural network architectures were developed and trained to segment these images into prostate and non-prostate regions. The results reported below are summarized in table I. Figure 7 shows the inputs used by the first example neural network developed for segmenting ultrasound images of the prostate. This network uses 5 inputs: (1) the mean of a 3 by 3 window centered at the pixel of interest; and (2) the four maximum 3 by 3 window means in the four directions from the pixel of interest in the same row and column. The preprocessing performed here is designed to find the “halo” defining the edge of the prostate in a typical transrectal ultrasound image.

162 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

SEGMENTING ULTRASOUND IMAGES WITH NEURAL NETWORKS

Fig. 3 Image used for training (slice 21).

Fig. 4 Manually segmented version of Fig. 3.

163 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

Fig. 5 Adjacent image closer to apex (slice 22).

Fig. 6 Adjacent image closer to base (slice 20).

164 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

SEGMENTING ULTRASOUND IMAGES WITH NEURAL NETWORKS

network number of architecture training patterns

5-6-2 132-30-2 132-30-2 61-10-2 61-15-2

1334 3000 15908 7500 22500

,

Percent Correct on Images training adjacent (apex) adjacent (base)

90.0 98.8 99.6 99.3 98.8

87.3 92.8 92.9 95.3 95.1

86.9 91.1 90.9 94.7 94.7

The network used with this preprocessing method had 5 inputs, 6 hidden units, and 2 complementary outputs; therefore it is called a 5-6-2network. One of the outputs being indicates the pixel of interest is inside the prostate, and the other output “on” indicates the pixel of interest is outside the prostate. Only one output should be “on7’for each input pattern, thus they are called complementary outputs. (While only one output is needed, the training program produces networks with both outputs.) The network was trained using 1334 input-output patterns taken from the images of figure 3 (input) and figure 4 (output). Patterns were obtained by taking samples from each image spaced ten pixels apart on a grid covering most of that image. The network classified 90.25 percent of the training data correctly after training. Training required approximately 1 CPU hour on a SUN SPARCstation 1. Each segmentation performed using the trained network required approximately 9 CPU minutes to perform. Figure 8 shows the results of applying the trained network to the entire training image shown in figure 3. To avoid edge effects, processing was started 25 pixels from each edge of the 501 x 333 pixel image. All performance results of networks applied to images reported here were taken over the resulting 451 x 283 pixel image region. The 5-6-2 network classified 90.0 percent of the training image correctly. This was somewhat less than its performance on the training set since only a small part (1334 pixel sites or 0.8 percent) of the training image was used as training data. The rest of the training image was not presented to the network during training. Figure 9 and figure 10 show the results of applying the trained network to the adjacent images shown in figures 5 and 6, respectively. The trained network correctly classified 87.3 percent of the pixels in figure 9 and 86.9 percent of the pixels in figure 10. Since it



* ->

Fig. 7 Input preprocessing for 5-6-2 network. The four arrows represent taking the maximum 3x3 window mean in each of four directions from the pixel of interest. The dot represents the 3x3 mean centered a t the pixel of interest.

I V

= mean of 3 x 3 window

x 3 window mean i n indicated d i r e c t i o n

= maximum 3

165 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

Fig. 8 Segmentation of training image (Fig. 3) by 5-6-2 network.

Fig. 9 Segmentation of adjacent image (Fig. 5) by 5-6-2 network.

166 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

SEGMENTING ULTRASOUND IMAGES WITH NEURAL NETWORKS

Fig. 10 Segmentation of adjacent image (Fig. 6) by 5-6-2 network.

was created by processing figure 3, part of which is training data, figure 8 is the best of the three segmentations created by the 5-6-2 network. Vertical and horizontal stripe artifacts are present in figures 8, 9, and 10. The stripe artifacts occur in areas that the network has not learned to classify correctly which fall between training sample points. To help eliminate this effect, later networks were trained with data sampled on a finer grid. Other errors are also clearly present, one of which is the misclassified region centered near the posterior of the prostate. With so little input data, the network cannot distinguish between maxima that occur due to the bright region around the prostate and maxima that occur due to other bright regions. This causes errors in regions of the image where local bright regions are present, such as near the center of the posterior of the prostate. Later networks used more input data t o help eliminate such errors. Figure 11 shows the inputs used by the second example neural network developed for segmenting ultrasound images of the prostate. This network uses 132 inputs: (1) the mean of a 3 by 3 window centered at the pixel of interest; (2) 91 means of 3 by 9 and 9 by 3 windows in the same row and column as the pixel of interest, respectively; and (3) 40 means of 3 by 3 windows forming a rectangle centered at the pixel of interest. The preprocessing performed here is designed to detect the “halo” and some local features. The preprocessing is linear. The row and column data are input to the network according to their position in the row and column, regardless of the location of the pixel of interest within that row and column. The data taken from the center of the row containing the pixel of interest, for example, always feeds the same network input, regardless of whether the pixel of interest is near the middle of the row or near the end of the row.

167 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

* * *

n

* * * * * * * * * * * * *

n

*

T

=

I

* *

* *

mean of 3 x 3 window

= mean of 3 x 9 window

Fig. 11 Input preprocessing for 132-30-2 network. The rectangular boxes with dots are the regions over which local means are taken in the same row and column as the pixel of interest. The rectangle of dots are 3x3 window means included to provide local information near the pixel of interest.

The network used with this preprocessing method had 132 inputs, 30 hidden units, and 2 complementary outputs. The 132-30-2 network was trained with 3000 inputoutput patterns taken from the images of figure 3 and figure 4. The network classified 100 percent of the training data correctly after training. Training required approximately 3 CPU hours on a SUN SPARCstation 1. Each segmentation performed using the trained network required approximately 46 CPU minutes to perform. Figure 12 shows thc results of applying the trained network to the entire training image shown in figure 3. The 132-30-2 network classified 9S.S percent of the pixels in the training image correctly. Figures 13 and 14 show the results of applying the trained network to the adjacent images shown in figures 5 and 6, respectively. The trained network correctly classified 92.S percent of the pixels in figure 13 and 91.1 percellt of the pixels in figure 14. Three changes were made to improve performance of the second (132-30-2) network over the that of the first (5-6-2) network. First, more training patterns were used (3000 vs. 1334). Second, different preprocessing provided a richer variety of input data (132 inputs vs. 5 ) . Third, more hidden units were used (30 vs. 6). These changes almost

168 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

S E G M E N T I N G ULTRASOUND IMAGES W I T H N E U R A L N E T W O R K S

Fig. 12 Segmentation of training image (Fig. 3) by 132-30-2 network.

Fig. 13 Segmentation of adjacent image (Fig. 5) by 132-30-2 network.

169 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

Fig. 14 Segmentation of adjacent image (Fig. 6) by 132-30-2 network.

eliminated the stripe artifacts present in the earlier images (figures S, 9, and 10). This seems mainly due to using more training data. Two slightly larger sampling grids were used here, with 1500 pixel locations each, to obtain the 3000 pattern training set. The grid spacing was still 10 pixels, but the second grid was offset 5 pixels in the vertical and horizontal directions from the first so that new sample points fell in the middle of the regions where stripe artifacts previously arose. Other errors were also substantially reduced by giving the net enough input information and enough hidden units to correctly identify ambiguous parts of the image, such as those found near the edges of the prostate. The second network was retrained using 15908 input-output patterns, rather than t h e 3000 patterns used originally, to evaluate the effectiveness of using a much larger amount of training data. The network classified 100 percent of the training data correctly after training. The results are shown in figures 15, 16, and 17. For the adjacent images, the trained network correctly classified 92.9 percent of the pixels in figure 16 and 90.9 percent of the pixels in figure 17. The center image, figure 15, had 99.6 percent of its pixels classified correctly and exhibited less noise than before, but the adjacent images did not show much improvement over the earlier ones in figures 13 and 14. Thus, using an extremely large number of training patterns did not improve overall network performance. This is typical of neural net applications, where overtraining on training data does not improve (and may degrade) performance on test data [lG]. Figure 18 shows the inputs used by the third example neural network developed for segmenting ultrasound images of the prostate. This network uses 61 inputs: (1) the mean

170 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

S E G M E N T I N G ULTRASOUND IMAGES W I T H N E U R A L N E T W O R K S

Fig. 15 Segmentation of training image (Fig. 3) by 132-30-2 network. Here, the 132-30-2 network was trained with a larger training set.

Fig. 16 Segmentation of adjacent image (Fig. 5) by 132-30-2 network. Here, the 132-30-2 network was trained with a larger training set.

171 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

Fig. 17 Segmentation of adjacent image (Fig. 6) by 132-30-2 network. Here, the 132-30-2 network was trained with a larger training set.

of a 3 by 3 window centered at the pixel of interest; (2) 31 means of 3 by 15 windows in the same row as the pixel of interest; (3) 21 means of 15 by 3 windows in the same column as the pixel of interest; (4) 4 means of 3 by 3 windows 15 pixels from the pixel of interest in the same row and column; and (5) 4 means of 3 by 3 windows 11 pixel diagonally from the pixel of interest. The preprocessing performed here supplies both global and local information. The preprocessing is linear, and the row and column means are symmetric about the pixel of interest and are zero padded when necessary. The network used with this preprocessing method had 61 inputs, 10 hidden units, and 2 complementary outputs. Fewer hidden units were used here than with thesecond (13230-2) architecture after analysis of the second network revealed many essentially inactive hidden units. (This type of analysis is done by deleting hidden units and running the net through the training patterns to meamre changes in output). The 61-10-2 network was trained using 7500 input-output patterns taken from the images of figure 3 and figure 4. This represents 4.5 percent of the available pixel sites. The size of the pattern set was chosen as a reasonable intermediate value between the two pattern set sizes used with the second architecture. The network classified 100 percent of the training data correctly after training. Training required approximately S CPU hours on a SUN SPARCstation 1 . Each segmentation performed using the trained network required approximately 11 CPU minutes to perform. Figure 19 shows the results of applying the trained network to the entire training image. The 61-10-2 net classified 99.3 percent of the pixels correctly. Figures 20 and 21

172 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

SEGMENTING ULTRASOUND IMAGES WITH NEURAL NETWORKS

*

= mean of 3 x 3 window

= mean of 3

x 15 window

= mean of 15 x 3 window

Fig. 18 Input preprocessing for 61-10-2 network. Rectangular boxes are regions over which local means are taken in the same row and column as the pixel of interest. Here, local information forms a circle around the pixel of interest.

show the results of applying the net to the adjacent images shown in figures 5 and 6, respectively. The trained network correctly classified 95.3 percent of the pixels in figure 20 and 94.7 percent of the pixels in figure 21. This network achieved significantly better performance on the adjacent images than either of the first two network architectures. Some errors still arose near the prostate gland edges, but most of the outlying errors and interior errors were eliminated. Performance comparisons between different images based solely on the percent of pixels classified correctly are somewhat misleading. The size of the prostate gland varies from image slice to image slice. For example, prostate pixels account for 26.3 percent of image slice 21 (near the middle) but only 7.5 percent of image slice 40 (at the extreme apex). To obtain a balanced measure of segmentation performance which takes the relative sizes of prostate and nonprostate areas into account, the correlation between the network’s segmentation and the correct segmentation is calculated. As noted by Mathews [17], for a two class problem variable (pixel) values can be taken as 0 or 1. If the classes are denoted A and B, the correlation coefficient then reduces to:

173 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

Fig. 19 Segmentation of training image (Fig. 3) by 61-10-2 network.

Fig. 20 Segmentation of adjacent image (Fig. 5 ) by 61-10-2 network.

174 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

SEGMENTING ULTRASOUND IMAGES WITH NEURAL NETWORKS

Fig. 21 Segmentation of adjacent image (Fig. 6) by 61-10-2 network.

where AA is the fraction of class A correctly labeled as A by the classifier, Ag is the fraction of class A misla.beled as B by the classifier, B g is the fraction of B correctly labeled as B, and BA is the fraction of B mislabeled as A . The correlation coefficient ranges from -1 to +I, with +1 denoting perfect correlation (perfect performance), 0 denoting no correlation (classification no better than cha.nce), and -1 denoting perfect anticorrelation (exactly wrong in all cases). segmentations done by the 61-10-2 network have correlation coefficients of 0.98 on the training image, 0.88 on adjacent image slice 20, and 0.86 on adjacent image slice 22. Performance of the trained 61-10-2 network on images beyond the adjacent image (farther from the training image) declined substantially. Moving five slices from the training image in each direction, the trained network correctly classified 91.9 percent of the pixels in image slice 16 and 93.6 percent of the pixels in image slice 26. Correlation coefficients were 0.78 and 0.83, respectively. Ten slices away from the training image, the network correctly classified only 86.8 percent of the pixels in image slice 11 and 84.9 percent of the pixels in image slice 31, with correlation coefficients of only 0.54 and 0.56. Segmentations of image slices 16, 26, 11 and 31 performed by this neural network are given in figures 22, 23, 24, and 25 respectively. Image slice 21 was used to train the network since it lies near the center of the prostate. As images far from the center of prostate were segmented, the shape and size of the prostate changed substantially, and the network no longer performed well.

To improve performance near the apex and base of the prostate, training data from image slices 11 and 31 was introduced. A 61-15-2 network had been trained on image

175 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

Fig. 22 Segmentation of slice 16 of the 40 slice set by 61-15-2 network.

Fig. 23 Segmentation of slice 26 of the 40 slice set by 61-15-2 network.

176 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

S E G M E N T I N G ULTRASOUND IMAGES W I T H N E U R A L N E T W O R K S

Fig. 24 Segmentation of slice 11 of the 40 slice set by 61-15-2 network.

Fig. 25 Segmentation of slice 31 of the 40 slice set by 61-15-2 network.

177 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

image slice Fig. 26 Percentage of pixels classified correctly by retrained 61-15-2 network.

slice 21 along with the 61-10-2 network described above. The weights from this larger network trained on slice 21 were used as a starting point for retraining. Thus, the network had to learn to classify patterns from the new training slices while not “forgetting” what it had learned from image slice 21. The larger network was chosen because some of its hidden units were essentially unused (after training on image slice 21) and thus available for learning new information from the new patterns. The pattern set consisted of 7500 patterns from each of the three training images, for a total of 22500 training patterns. The retrained 61-15-2 network learned 100 percent of this training set correctly and was used to segment all 40 images in the set. Figure 26 shows the results. Percent correct is given for each segmentation in the set where the expert sonographer could segment the image with high confidence. There are some image slices near the apex or base of the prostate where no such reliable segmentation could be made, so percent correct for the network segmentation cannot be calculated. Correlation coefficients for each image slice segmentation are given in figure 27. Network performance (percent correct) peaks a t the training images and is significantly better than that of the previous network on image slices 16 and 26. Segmentations of image slices 11, 21 and 31 (from which training data was taken) and image slices 16 and 26 performed by this retrained 61-15-2 neural network are shown in figures 28, 29, 30, 31, and 32, respectively, for comparison to previous segmentation results. While the segmentations near the apex and base of the prostate are substantially worse than those near training images, the bulk of the prostate’s volume lies within the range of good performance.

178 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

SEGMENTING ULTRASOUND IMAGES WITH NEURAL NETWORKS

0

10

20

30

40

image slice Fig. 27 Correlation coefficients for each slice segmented by retrained 61-15-2 network.

Fig. 28 Segmentation of slice 11 of the 40 slice set by retrained 61-15-2 network.

179 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

Fig. 29 Segmentation of slice 21 of the 40 slice set by retrained 61-15-2 network.

Fig. 30 Segmentation of slice 31 of the 40 slice set by retrained 61-15-2 network.

180 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

SEGMENTING ULTRASOUND IMAGES WITH NEURAL NETWORKS

Fig. 31 Segmentation of slice 16 of the 40 slice set by retrained 61-15-2 network.

Fig. 32 Segmentation of slice 26 of the 40 slice set by retrained 61-15-2 network.

181 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

IV.

THE IMPORTANCE OF PREPROCESSING

While the training process creates a nonlinear input-output mapping, the choice of input data for the network is critical. It is tempting to think about feeding the entire image to a network to maximize the information available during training. For these 501 x 333 images, such a network would have 166,833 inputs. Performing the weighted sum operation at each hidden unit over such a large input set is not practical. Such a network also becomes very difficult to train due to the high dimensionality of the input set. For practical feedforward neural net applications, it is important to constrain the network by providing the information needed to solve the problem while limiting extraneous information. This is the role of preprocessing. The preprocessing methods applied here are based on the visual cues used by human observers to find the prostate gland in an ultrasound image. Three such cues are readily apparent. First, ultrasound images tend to have a bright halo just outside the boundary of the prostate gland. Second, the texture of the image within the prostate gland is different from that outside the gland, although this is a subtle effect that varies with distance from the transducer probe. Third, the shape of the prostate gland is somewhat uniform, particularly in its aspect ratio. Input configurations were chosen that would allow the network to use these cues if the training program found them. The training program itself is a “black box”; the user cannot tell the network what input features to extract. Rather, the designer must choose input data containing the right information so the network can find t h e known visual cues. Since the bright halo around the boundary of the prostate gland is the most prominent visual cue, the first architecture used the maxima of local window means in four directions as input. The idea was similar to looking around at mountains: if large peaks are observed in all directions, the observer must be in a valley. Here, if bright regions are observed in several directions, the point (pixel) of observation is probably inside the halo around the prostate gland. This worked remarkably well for as few inputs as were used (five), verifying the usefulness of the bright halo as a visual cue. Aspect ratio and texture cues were not available to this net since it had insufficient input data to recognize them. To evaluate the importance of texture cues, networks were trained using only local information, such as a few local neighborhood means near the center pixel, but they failed to classify the image correctly. This result showed that some global context is needed to make use of local information. Note that human observers also need global image information to segment ultrasound images of the prostate. The maxima computation in the first input architecture proved to be a very timeconsuming nonlinear operation. To avoid this operation and provide enough input information to utilize all three visual cues, the second input architecture was developed. Here, rectangular region means in the same row and column as the center pixel were used as inputs, along with a rectangle of means around the center pixel. The row and column means allow a network to find the bright halo. Also, the net could interpret the row and column means to measure overall prostate width and height, and thereby look for a reasonable aspect ratio. A net trained with the row and column means alone (94 inputs) gave good classification results. To include more local information for texture cues, the rectangle of means around the center pixel was added (for a total of 132 inputs). The

182 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

SEGMENTING ULTRASOUND IMAGES WITH NEURAL NETWORKS

local input information improved classification accuracy by 1 to 2 percent on adjacent test images. Good results from the second input scheme led to a. variation using fewer inputs to reduce computation time. This third input scheme centers larger rectangular means about the center pixel, zero padding when the pattern goes off the edge of the image, instead of just giving the net all the oblong means in the same row and column as the center pixel. The number of local means was reduced to form a roughly circular pattern around the center pixel. A total of 61 inputs were used. This input architecture provided the best performance of those implemented here. These networks may or may not be using the known visual cues used by human observers, but designing the network input based on those visual cues was vital to obtaining a network with reasonable complexity. The three network examples described here illustrate how input data can be chosen to obtain networks which are efficient as well as accurate. T h e design process was as follows: A set of useful visual cues was identified. A simple input scheme was developed in the first network to find the most prominent visual cue, the bright halo. The second input scheme changed the way global information was presented both to avoid time intensive maxima computations and to provide access to the aspect ratio cue. It also added more local information to make use of the texture cue. The third input scheme was developed to reduce the number of network inputs and thereby save CPU time. In summary, the goal of preprocessing is to choose input data which point the neural network toward useful characteristics for data classification, without adding superfluous input data.

V.

DISCUSSION

The research effort described here has concentrated on identification of possible neural network architectures for segmenting two-dimensional images of the prostate. The results presented here need to be extended before neural networks can be successfully used to segment clinical data in an everyday environment. The networks investigated so far are all variations on one theme, using data in the same row and column as the center pixel along with some data close to the center pixel to obtain a mix of global and local input information. Many other input schemes are possible, and a wider range of them should be investigated in pursuit of greater accuracy and reduced computational complexity. One promising approach is the use of threedimensiona.1 input data, instead of the two-dimensional data now used. This would allow a network to use correlations between image slices. Another limitation of this research is that training data was taken from a maximum of three images. Prostate size and shape vary from individual to individual. Our third input architecture is relatively insensitive to size variations since the input data are presented without a known reference to position in the image. Since the row and column information may contain the halo around the prostate at widely varying distances from center, depending on the position in the image, size variation effects should be minimal. The network may also be recognizing the zero padding used when the input pattern goes off the edge of the image as a rough measure of position. In any case, size variation effects need to be characterized. Note that size and shift invariant neural networks can be implemented, albeit with higher complexity [IS, 191. Complications created by glands

183 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

PRATER AND RICHARD

with tumors or other irregularities also need to be considered. To do so, networks must be trained with exemplary images (with known segmentations) from individuals covering the range of likely variations. The networks reported here have a relatively small number of hidden units. To successfully capture a wider range of information, including most of the prostate gland variations seen in clinical settings, networks with a larger number of hidden units may be required. This would increase the CPU time needed for training and segmentation. Once the neural network is trained, however, only segmentation requires CPU time. To illustrate the potential of automatic image segmentation using neural networks, consider a hypothetical image processing system using commercially available integrated circuits. The Intcl S0170NX neural nctwork chip can process data in a feedforward network with one hidden layer in S microseconds [20]. The Intel chip incorporates 64 neuron units and 64 inputs per device. Thus, the last network example of table I, with 61 inputs and 15 hidden units, can be implemented with one such chip. For a 501x333 image, each image could be automatically segmented by the neural net chip in 1.33 scconds. Such an implementation would be faster than any human sonographer. While the Intel device is somewhat exotic, only one would be required, so the system could still be inexpensive.

VI.

CONCLUSIONS

This paper presents three relatively simple neural networks that have been trained to segment ultrasound images of the prostate. Considering their simplicity, these networks perform surprisingly well. The preprocessing used with these networks attempts to capturc the visual cues used by humans to classify image pixels as being inside or outside of the prostate. Use of this global information is key to the operation of thcse networks. Future work will naturally cxtend to the use of networks with more global and local input information, as well as the introduction of three-dimensional input data. The “texture” of the image near the pixel of interest may provide additional information useful for segmentation.

A great deal of additional training is neetlcd bcfore these networks can bc applicd in clinical settings. Many sets of 2-D prostate imagcs covering a wide range of healthy and diseased prostates must be obtained and manually segmented to obtain adequate training data. Nevertheless, the performance of thc relatively simple networks described here, coupled with the availability of low cost neural net hardware, demonstrates the potential of neural networks for automatically segmenting ultrasound images of the prostate. REFERENCES [l] Rifkin, M.D., Ultrasound of the Prostate (Raven Press, New York, 19SS).

[a] Lilicnfeld, R.M., Berman, M., Khedkar, M., and Sporer, A., Comparative evaluation of intravenous urogram and ultrasound in prostatism, Urology 26, 310-312 (1985).

(31 Lee, D.J., Leibel, S., Shiels, R., Sanders, R., Siegelman, S., and Ordcr, S., The value of ultrasonic imaging and CT scanning in planning the radio-therapy for prostatic carcinoma, Cancei, 45, 724-727 (19SO). [4] Ling, D., Lee, J.K.T., IIeiken, J.P., Balfe, D.M., Glazer, H.S., and McClennan, B.L., Prostatic carcinoma and benign prostatic hyperplasia: Inability of MR imaging to distinguish between the two diseases, Radiology 158, 103-107 (1986).

184 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

SEGMENTING ULTRASOUND IMAGES WITH NEURAL NETWORKS

[5] Yoshio, A., Prostatic Cancer Staging by Ultrasound and Digital Examination, in Diagnostic Ultrasound of the Prostate (Proc. 1st International Workshop on Diagnostic Ultrasound of the Prostate), Resnick, M., Hiroki, W., and Karr, J., Eds., pp. 48-52, Elsevier, New York (1988).

[GI Nakamura, S., Three-dimensional image processing of ultrasonography, IEEE Comp. Graphics and Applications 4, 36-45 (1984). [7] Nakamura, S., Monitoring of the Three-Dimensional Prostate Shape Under AntiAndrogen Therapy, in Diagnostic Ultrasound of the Prostate (Proc. 1st International Workshop on Diagnostic Ultrasound of the Prostate), Resnick, M., Hiroki, W., and Karr, J., Eds., pp. 137-144, Elsevier, New York (1988). [8] Rumelhart, D., and McClelland, J., Parallel Distributed Processing, vol. 1: Foundations, (MIT Press, Cambridge, MA, 1986). [9] Lippmann, R.,An introduction to computing with neural nets, IEEE A S S P Magazine 5, 4-22 (1987). [lo] Rosenblatt, F., Principles of Neurodynamics, (Spartan, New York, NY, 1962).

(111 Sontag, E.D., Remarks on Interpolation and Recognition Using Neural Nets, in Advances in Neural Information Processing Systems 3, R.P. Lippmann, J. Moody, and D.S. Touretzky, Eds., pp. 939-945 (Morgan Kaufmann, San Mateo, CA 1991). [12] Baum, E.B., and Haussler, D., What size net gives valid generalization?, Neural Computation 1, 151-160 (1989).

[13] Werbos, P.J., Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, Ph. D. dissertation in Applied Math, Harvard University, Cambridge, MA, (1974). [14] Luenberger, D.G., Linear and Nonlinear Programming, (Addison-Wesley, Reading, MA, 1984).

[15] Barnard, E., and Cole, R.A., A Neural-Net Training Program Based on Conjugate Gradient Optimization, Technical Report No. CSE 89-014, Department of Compiiter Science and Engineering, Oregon Graduate Institute of Science and Technology, Beaverton, OR, (1989). [I61 Hect-Nielsen, R., Neurocomputing, (Addison-Wesley, Reading, MA, 1990). [17] Mathews, B.W., Comparison of t h e predicted and observed secondary structure of the T4 phage lysozyme, Biochimica et Biophysicn Acta 405, 442-451 (1975). [IS] Elliman, D.G., and Banks, R.N., Shift invariant neural net for machine vision, IEE Proceedings, Part I, Communications, Speech and Vision 137, 183-187 (1990). [19] Fuliushima, K., Miyake, S., and Ito, T., Neocognitron: a neural network model for a mechanism of visual pattern recognition, IEEE Tmns. on Systems, Man, and Cybernetics SMC-13, 826-834 (1983). [20] Intel 80170NX data sheet, (Intel Corporation, Santa Clara, CA, 1991).

185 Downloaded from uix.sagepub.com at UNIV OF CALIFORNIA SANTA CRUZ on April 4, 2015

Segmenting ultrasound images of the prostate using neural networks.

This paper describes a method for segmenting transrectal ultrasound images of the prostate using feedforward neural networks. Segmenting two-dimension...
5MB Sizes 0 Downloads 0 Views