Psych~utryResearch: ~eurai~agi~g, 4533-51 Elsevier

33

Segmentation Techniques for the Classification Tissue Using Magnetic Resonance Imaging Gregg Cohen, Nancy C. Andreasen, Randall Alliger, Stephan James Kuan, William T.C. Yuh, and James Ehrhardt Received November March 1. 1992.

14, 1991; revised

version

received

February

of Brain

Arndt,

26, 1992: accepted

Abstract. A technique is described for classifying brain tissue into three components: gray matter, white matter, and cerebrospinal fluid. This technique uses simultaneously registered proton density and T,-weighted images. Samples of each of the three types of tissue are identified on both image sets and used as “training classes”; these tissue samples are then used to generate a linear discriminant function, which is used to classify the remaining pixels in the image data set. Effects of varying the location and number of training classes have been explored; six pairs of training classes have been found to yield a suitable classification. Interrater and test-retest reliability have been examined and found to be good. Intrascanner and interscanner reproducibility has also been evaluated; classification rates are reproducible within the same individual when the same scanner is used, but in this study poor reproducibility occurs when the same individual is scanned on two different scanners. The vaiidity of the technique has been tested by examining CorreIations between traced and segmented regions of interest, evaluating ~o~elations with age, and conducting phantom studies, in addition to using visual inspection of the classified images as an indication of face validity. From all four perspectives, the method has been found to have good validity.

Additional

applications,

strengths,

and limitations

Key Words. Gray matter, white matter, cerebrospinal

are discussed.

fluid, image processing.

Before the advent of modern neuroimaging methods such as magnetic resonance (MR), post-mortem techniques were the only ones available to evaluate the integrity of brain structure and to attempt to make pathophysiological correlates with disease states. The disadvantages of post-mortem study are multiple and obvious. Because of its superb anatomic resolution, its capacity to image in multiple planes, and its potential for making repeated evaluations over time, MR provides an unparalleIed opportunity to study diseases of the central nervous system and to observe their neuropathological evolution.

Gregg Cohen, MS., is Research Scientist, Department of Psychiatry; Nancy C. Andreasen, M.D., Ph.D., is Professor of Psychiatry and Director, Mental Health Clinical Research Center; Randall Alliger, Ph.D., is Research Scientist, Department of Psychiatry; Stephan Arndt, Ph.D., is Research Scientist, Department of Psychiatry; and James Kuan, M.D., is Postdoctoral Fellow, Mental Health Clinical Research Center, Department of Psychiatry, The University of Iowa Hospitals and Clinics, Iowa City, IA. William T.C. Yuh, M.D., E.E., and James Ehrhardt, Ph.D., are Professors of Radiology, Department of Radiology, The University of Iowa Hospitals and Clinics, Iowa City, IA. (Reprint requests to Dr. N.C. Andreasen, University of Iowa Hospitals and Clinics, MHCRC, 2711 JPP, 200 Hawkins Dr., Iowa City, IA 52242, USA.) 0161781/92/$05.~

@ I992 Elsevier Scientific Publishers Ireland Ltd.

One potential application of MR is to develop techniques for measuring the volume of tissue types and structures. Because it is essentially impossible to dissect apart gray matter and white matter in neuroanatomic specimens, and because cerebrospinal fluid (CSF) is no longer present in preserved brain specimens. studies o1 post-mortem tissue do not permit the accurate evaluation of relative amounts of’gray matter, white matter, and CSF. MR can provide a tool that serves both neuroscience and neuropathology. It can be used to study the volume of tissue distribution in the normal human brain in order to observe potential changes in tissue distribution during the aging process, and it can be used to compare tissue volumes in normal individuals with those in individuals suffering from a variety of disease states such as Alzheimer’s disease or schizophrenia. Techniques for dividing brain tissue into its component types were originally developed for computed tomography (CT; Jernigan et al., 1979). Since the resolution of CT was relatively poor, however, these techniques were limited to classifying tissue into CSF and brain. Further, techniques for manipulating CT images to enhance tissue differentiation are very limited. By contrast, MR lends itself well to the development of approaches to segment brain tissue into component compartments, since variation in repetition or echo time (TR or TE) can dramatically change and even reverse gray scale values. Several research groups have reported techniques for segmenting brain tissue using MR. The earliest approaches to anatomic segmentation relied on intensity contours, differenti~ contours, and outline optimization. In such early work. segmentation techniques were applied to images collected with a singie spanning sequence (Kennedy et at., 1989). An alternate approach, introduced Iater, has emphasized the use of two different scanning sequences; typically, one is selected that highlights the differentiation between CSF and brain tissue, while a second is selected that produces a high level of contrast between gray and white matter. When scanning data are collected with a multiecho sequence, the images can be acquired simultaneously, eliminating difficulties in movement artifacts or head repositioning. After the multiple echo data have been collected, a variety of mathematical techniques can be applied, including image subtraction, image addition. or other “image math” or statistical methods. Techniques of this type have been reported by several investigators (Kennedy et al., 1989; Ashtari et al., 1990; Pfefferbaum et ai.. 1990; Jernigan et al., 1991~. 19916, 1991~; Kahn et al., 1991; Rusinek et al., 1991). In most studies reported to date. these techniques have been shown to be sensitive to changes in CSF volume secondary to the aging process. Using most methods, good to excellent reliability has been reported when different individuals segment the same images. Some groups have also validated their techniques using phantom studies. We report herein on an approach to segmenting brain tissue developed in our laboratory. Since one important potential application is to study changes in individuals over time, we have paid particularly close attention to the issue 01 reproducibility as we have developed these segmentation algorithms. Segmentation techniques are potentially a quantitative tool for measuring the volumes of gray matter, white matter, and CSF; to be practically useful, the measurements must be shown to be reproducible within the same individual, In addition, since most imaging laboratories are sub_ject to changes in both MR software and hardware, it is

35

important to determine whether segmentation algorithms are robust to such changes or are affected by them. Consequently, we have compared the values generated by this particular segmentation algorithm on the same individual scanned on different occasions and the same individual scanned using different scanning equipment. Methods Equipment and Software. Scanning data were obtained using two different GE scanners. The first was a 1.5 Tesla mobile Signa unit, while the second was a 1.5 Tesla Signa scanner (both from GE Medical Systems). After imaging data were collected, they were processed on a Silicon Graphics 4D 20 Personal IRIS Workstation. Software was developed locally by investigators from the Psychiatric Image Processing Laboratory. The program BRAINCLASS, written in C, is used to apply the segmentation algorithms described below and to classify tissue types. It is part of a group of locally developed programs designed to solve a variety of fund~ental problems in image analysis (Andreasen et al., in press). of Appropriate Scanning Sequence. The ideal scanning sequence for accurate classification of tissue type will have a clear and clean separation between each type and permit completion of a study within an acceptable period of time. We experimented with multiecho sequences, finally deciding on a combined set of images that were either proton density (PD) or Tz (T,) weighted. The T,-weighted image gives a very bright signal for CSF, with the rest of brain appearing much darker, and is particularly useful for identifying the boundaries of the subarachnoid space and differentiating the boundaries of the brain from CSF. Thus, this image is used to “cut” the brain and CSF from the skull and to assist in the determination of the separation of the boundaries of the brain from CSF. The PD image, on the other hand, produces the most contrast between gray and white matter components in the brain. This is primarily due to the amount of myelin present in the different components. White matter has more protons per cubic volume than gray matter, primarily due to the presence of the myelin sheath in the white matter. Fig. 1 presents examples of PD and T,-weighted images, selected from a coronal slice sequence. Samples of gray matter, white matter, and CSF are outlined in Fig. 1. After trying a variety of different multiecho sequences, we determined that a multiecho sequence of TE = 30,90, and a TR = 2,700 appeared to yield Identification

Fig. 1. Proton density image (left) and Tz-weighted image (right)

These images were acquired coronally with a GE Signa 1.5T Tesla magnetic resonance scanner with a re~tition time (TR) of 2700 msec and an echo time (TE) of 30 msec for the proton density (PD) (left) and 90 msec for the TP(right) with a 5-mm slice thickness and a 2.5mm interslice gap. Samples of cerebrospinal fluid (CSF), gray matter, and white matter have been outlined on these images. Note the difference in gray scale value for the 3 tissue types on the 2 different images. CSF samples are identified within the lateral ventricle, gray matter within the caudate, and white matter within the adjacent white matter areas.

the greatest contrast between gray matter, white matter, and CSF. The 30-msec echo yields ii PD image, while the 90-msec echo yields a T,-weighted image. Imaging plane and slice thickness were other issues. Slices in the coronal plane were selected to cut structures of interest perpendicular to their long axis; most structures of interest (e.g.. ventricular system, caudate nucleus. and hippocampus) have their longest extension in the transaxial plane and therefore are subject to the least partial voluming when sampled in the coronal plane. At the time this study was undertaken, we chose to use slice thicknesses of 5 mm with a 25mm gap between the sampled volumes, in order to obtain a maximally strong signal (from the 5-mm slices) and to prevent overlap of radiofrequency (RF) and the magnetic gradient signal interference during the readout phase of the image-volume excitation, acquisition (i.e., requiring a gap). Newer software permits slice interleaving and adequate signals from thinner slices. Other imaging parameters for this study were a 256 ?< 256 matrix. one NEX, FOV - 26. Evaluation and Elimination of Field Artifacts. Field artifacts are present in most images, due both to inhomogeneities in the magnetic field and in the radio frequencies used to excite the tissue. While ongoing software improvements for most imaging lquipment have substantially reduced field artifacts during the past several years, some still remain. These field artifacts produce shading that can sometimes be seen visually on the image and that produces a systematic shading in signal intensity values. Such artifacts may make mathematical segmentation techniques unreliable or inaccurate, and they must therefore be removed in a systematic manner. Our technique for removing field artifacts involves the inversion of the slope of the artifact. Our images showed artifacts in the horizontal and vertical directions that were linear in nature. The correction for these artifacts involved generating a frequency distribution of pixel values for each horizontal line of data within a slice, identifying the slope of the distribution, and then inverting it; this procedure was done line by line on each slice to remove artifacts in the x/y dimension. No specific procedure was used to remove potential artifacts in the z dimension, since “training classes” for the discriminant function analysis (see below) were collected from different 7 locations as a technique for correcting for artifacts in this dimension. of Image Analysis Techniques. Our initial approach to segmenting images involved the concept of “thresholding”: i.e., finding the point at which the distribution of gray levels breaks down into somewhat discrete distributions. The gray level values of an image can

Piloting

Fig. 2. Gray level intensity histograms of proton density (left) and T2-weighted (right) images of the brain

E

s

lrn-

0

0

so

100

Gray Level

1*0

lntenrlty

200

660

0

so

100

Gray Level

160

PO0

PI0

Intensity

These histograms represent the distribution of gray level values for the “cut” brains with number of pixels present on the y-axis and gray level intensity on the x-axis. There is a grouping of gray level values for gray matter and cerebrospinal fluid in the proton density (PD) image and a similar grouping of gray level values for the gray matter and white matter in the T,-weighted image. This illustrates the inadequacy of simple threshofding and the need to use a more complex statistical method for the separation of these components.

37

be displayed as a histogram or frequency distribution, with the x-axis denoting a gray level value or intensity of the pixel and the y-axis denoting the number of occurrences of the gray level in that image. Fig. 2 presents examples of frequency distributions derived from PD and T”,-weighted images. The PD image shows two different distributions of gray level values; as Fig. 2 illustrates, however, there is no definite break that clearly defines the components, while the T,-weighted image displays a similar lack of a clear “point of rarity.” An alternate approach involved image subtraction. The approach was used initially to segment CSF from brain. Since CSF is brighter on the PD image than the T,-weighted image, the T,-weighted image can be resealed so that the mean pixel values are equal in the two images; if the PD image is thereafter subtracted from the T,-weighted image, the contrast between brain and CSF is greater in the subtraction image than in either the T,-weighted or PD images. While this approach worked relatively well, visual inspection and comparison with the scans indicated some degradation in quality. Further, when subsequent “image math” techniques (e.g., image addition) were applied in an attempt to segment gray and white matter, degradation increased further. Additional experimentation suggested a segmentation algorithm based on disc~minant function analysis to be preferable. Discriminant Function Analysis Approach. This approach relies on sampling signal intensity in tissues of known type. Samples are selected from tissue regions that are optimal in the sense that partial voluming is reduced to a minimum. As described in more detail below, we have experimented with varying the number and size of the regions; examples of potential regions include: lateral ventricles (CSF); caudate, hippocampus, thalamus (gray matter); and frontal and parietal white matter. Sample size of the various regions ranges from 50 to 200 pixels. These regions are visually identified on the images and manually traced. Thereafter, they serve as “training classes,” which are entered into a discriminant function analysis (Afifi and Azen, 1972; Bock, 1975; Finn, 1974; Green, 1978; Lunneborg and Abbott, 1983) to classify tissue type in the slice, and subsequently in the entire image. The discriminant analysis consisted of two basic phases: (1) estimation of the discriminant functions and (2) classification. The manually sampled pixels from given structures form three a priori groups-CSF, gray matter, and white matter-that were measured on two variables-PD and Tz pixel intensity. Both phases were performed on the set of slices forming a particular scan. The estimation phase drew on sampled values to generate linear functions that maximally discriminated the three groups. In particular, the discrimination was achieved by maximizing the ratio of between-group sums of squared deviations (i.e., the intergroup distances) to within-group deviations (i.e., the within-group spread) (Rao, 1952). For our application, we did not pool the within-group variance-covariance matrices and so did not assume that they were homogeneous in the groups as is often done for hypothesis-testing applications. This estimation procedure has a number of advantages. As a standard statistical technique introduced in the late 1940’s by Fisher (Wilks, 1962), it is widely known and has been in use for many years. The technique’s familiarity and wide use in a variety of applications have resulted in a substantial body of literature supporting its acceptance and generalized utility (Kleinbaum et al., 1988). In addition, as a statistical tool, discriminant analysis is readily accessible as a component of most available statistical packages (e.g., SAS Institute, 1990). While statistical packages may require special effort for an adaptation to image processing. they are specifically designed for efficient statistical calculations. After defining the discrirninant function coefficients, calibrated to maximally segregate the groups of sampled pixels in a particular scan, we began the classification phase. For each pixel’s location, the PD and T, image intensity values were used in the discriminant function. Classification proceeded for each pixel location, slice by slice, throughout the scan. The actual classi~cation of a pixel’s location as representing CSF, gray matter, or white matter uses posterior information to adjust a priori probabilities of group membership (Afifi and Azen, 1972; Bock, 1975). This method is Bayesian in form since the posterior knowledge conditions the probability estimates for group assignment. The estimated discriminant function and the location’s two image intensity values provide the posterior information.

However, we have no firm a prior-1 knowledge 01 the relative proportions of CSb, gray matter. and white matter for a particular subject. fn lieu of this information, we set the a prior1 probabilities equal. ‘This has the effect of assigning the pixel location to the group with closest generalized distance (i.e.. Mahalanobis D:) adjusted for the group’s spread (i.e. the withingroups sums of squares and cross-product deviations; Bock. 197.5). While knowing the exact relative proportions of CSF, gray matter, and whtte matter may improve the discrimination, using equal initial probabilities makes the analysis rely more on the pixel value’s generalized distance to each group’s centroid. When a pixel value lies in an area of relative ambiguity (e.g., a D between gray and white matter classifications), the errors will tend to be asymmetrical when the true population proportions are not in fact equal. Specifically, ambiguous values will be assigned in error more often to the truly less frequenr group than vice versa. Thus, incorrect a priori probabilities affect the kind of error made more than they affect the total number of classification errors. The total number of errors will be increased only if the probabilities are grossi\ incorrect and the function qhows pooldiscrimination using generalized distance. Application of the Tissue Segmentation technique are as follows:

Technique.

Actual

steps

involved

in thus

1. “Cut” the brain from the skull. For this process, we used a combination of edge-detection techniques and manual training. The brain is then masked out of the PD and T,-weighted images. This procedure is done using the PD images, since they give the best resolution of CSF/cranial boundary. 2. Remove field artifacts by application of a slope-correction algorithm. ‘The images displayed a general “slope” toward one direction in gray-level intensities. This was corrected through inversion of the slope in the horizontal direction of the image. The vertical direction of the image was also tested and corrected for this type of artifact. 3. Identify “training classes” by selecting tissue samples from preselected gray matter, white matter, and CSF regions. This identification is done using the PD images, since they are best for visually discriminating gray matter from white matter. Six training class areas are identified for each of the three tissue types, drawing data from four different MR slices. This actually yields a total of 12 training classes for each tissue type, since both PD and T, data are available for each class traced on the PD images and are perfectly registered since they were acquired simultaneously. The rationale for the selection of six classes and the specific location of the classes are described below (Effects of Multiple Training Classes). 4. Apply the discriminant function technique, using BRAINCLASS, to segment the tissue and to generate a gray matter/white matter/CSF mapped image for each image slice. This program also generates summary statistics of the number of pixels in each slice map that correspond to gray matter, white matter, and CSF. thereby permitting calculations of tissue volume in the entire brain. 5. Maps for each slice can be visually displayed and checked to insure that tissue classification appears to be correct. 6. If desired, edge-detection techniques and other semi-automated or automated approaches can be applied to generate rapid estimates of the volume of brain structures such as caudate, thalamus, or hippocampus. Gray matter in subregions (e.g., prefrontal cortex and temporal cortex) can also be differentiated, and central CSF can be differentiated from peripheral CSF to obtain a measure of sulcal enlargement. Results This

approach

of reliability, lnterrater

to segmenting reproducibility,

Reliability:

tissue

was subsequently

subjected

to a variety

of tests

and validity. Effects

of the

Operator

on Measurement.

To assess

39

interrater reliability, two independent tracers identified the “training class” regions on a total of 10 scans. Table 1 presents the results of this study. As shown in Table 1, the agreement between two independent raters was good for all three tissue types, ranging from 0.79 to 0.96. Table 1. Interrater reliability for 10 subjects Gray matter

White matter

Cerebrospinal fluid

Volume Pearson correlation lntraclass correlation

0.96 0.91

0.92 0.88

0.87 0.87

% volume Pearson correlation lntraclass correlation

0.86 0.86

0.79 0.79

0.87 0.86

lntrarater Reliability. The ability of a single rater to reproduce data on two subsequent tracings, spaced several months apart, was also assessed. For this substudy, 10 scans were also used. As shown in Table 2, reliability was again good for all three tissue types. Reprodu~ibil~: Effect of the Instrument on Reliability. Since one important application of segmentation techniques is to obtain quantitative measurements of tissue change (especially gray matter loss and CSF increase) in a given individual over time, it was deemed important to evaluate the reproducibility of the technique. Consequently, we examined the same individual on two separate occasions using the same scanner/ instrument and a different scanner/ instrument.

Table 2. lntrarater reliability for 10 subjects GUY matter

White matter

Cerebrospinal fluid

Volume Pearson correlation lntraclass correlation

0.87 0.86

0.96 0.95

0.67 0.58

% volume Pearson correlation intraclass correlation

0.88 0.83

0.94 0.90

0.61 0.60

Using the Same Scanner. Six individuals were studied on two occasions, spaced approximately 2 weeks apart, with the GE 1.5 Tesla Signa scanner, a permanent installation that has been used for the majority of our research. Table 3 presents the results of this substudy. The ratings between the two occasions were good, indicating that the measurements are reproducible when data are collected on the same scanner. Reproducibility

Effects of Using Different Scanners. In a second study, we studied five individuals on two separate occasions using two different scanners, but with identical

40

Table 3. lntrascanner reliability for 6 subjects -

_..__._.~ ._..__ -._.

Gray matter ._- . .._.._

~_ --_.

White matter

Cerebrospinal fluid

Volume Pearson correlation lntraclass correlation

0.84 0.83

0.68 0.68

(j 57

% volume Pearson correlation lntraclass correlation

0.83 0.83

0.70 0.70

0.64 0.51 _.-

w50

~._

_._. -

scanning sequences. One scanner was the GE 1.5 Tesia Signa mobile unit, while the second was the GE 1.5 Tesla Signa permanent installation. Agreement between the two studies was quite poor, with most reliability coefficients in an unacceptable range. This poor interscanner reliability may be due in part to the nature of the temporary GE installation, which was positioned outdoors in a parking lot with heavy construction going on nearby, a situation that probably produced substantial field artifacts and other factors that would diminish the utility of mathematical segmentation techniques. These results suggest that investigators must evaluate interscanner variance in studies that use more than one scanner. Effects of Multiple Training Classes. Because of the possibility that inhomogeneities are not evenly distributed throughout the brain, and because of the possibility that a larger information base might enhance the accuracy and quality of segmented images, we also explored the effects of using multiple training classes. For this substudy, we compared the results when 3,6, and 10 training classes were used. Table 4 summarizes the location of the training classes. Fig. 3 presents examples of images generated with the three different sets of training classes. Comparison of the three different sets of images shows that there is a visual advantage in using six-sample training classes over three samples: however, the addition of another four samples to the training class did not show a substantial visual improvement in image quality. Consequently, we have selected an approach that uses six training classes for our segmentation analysis of experimental data sets. This strategy provides a partial correction for artifacts occurring in the z-axis. Validity. While reliability places a ceiling on the inferences that can be obtained from measurements, the ultimate test of a method is its validity (i.e., the extent to which the method actually measures what it purports to measure). Determining validity of structural brain measures is, in fact, quite difficult. One obvious approach to validation is to use post-mortem specimens, but this approach has a number of inherent difficulties: (1) The process of formalin fixation may produce artifacts in the images generated. (2) It is impossible to dissect gray matter from white matter cleanly and thoroughly, and CSF has already been lost through the post-mortem brain removal process. (3) Potential artifacts may be produced through brain shrinkage and distortion. We did experiment with scanning formalin-fixed brains, slicing them, digitizing them, and performing segmentation analyses on the formaiin-

Anterior white matter, left Medial white matter, right Posterior white matter, left

Three sets Anterior cortex, left Caudate nucleus, right Posterior cortex, left

Anterior white matter, left and right Medial white matter, left and right Posterior white matter, left and right

Anterior white matter, left and right Medial white matter, left and right Medial white matter, right and left Medial white matter, left and right Posterior white matter, left and right

Anterior cortex, left and right Caudate nucleus, left and riiht Posterior cortex, left and right

Ten sets Anterior cortex, left and right Caudate, left and right Caudate nucleus, right and left Caudate, left and right Posterir cortex, left and right

six se818

White matter

Grev matter

Table 4. Location of training classes on coronal slices

Extracerebral, anterior, left and right Lateral ventricle, left and right Lateral ventricle, laft and right Lateral ventricle, left and right Extracerebral, posterior, left and right

Lateral ventricle, left and right

Extracerebral anterior, left Lateral ventricle, left and right

Lateral ventricle, left

Extracerebral, anterior, left Lateral ventricle, right

Cerebrospinal fluid

22 10 12 14 3

22 12 3 10

22 12 3 10

SliCe

P

Fig. 3. Sample images showing the difference in image quality when different numbers of training classes are used

WM GM 3 training regions/class

a

6 training regions/class

b

10 training regions/class

These examples show the effects of usmg 3 (a), 6 (b), and IO (c) training classes. Note that image quality appears to be good when 6 or IO training classes are used

43

fixed images, which could potentially be compared to the segmented MR images. Because of the many artifacts that occurred in the MR images, however, we chose to emphasize other indirect indicators of validity. These include reproducibility, face validity, correlations between traced and segmented regions of interest, correlations with age, and phantom studies, Segmented images can be visually checked against MR images and against samples of post-mortem brain slices obtained either through anatomy/ pathology laboratories or through inspection of a variety of standard atlases. To the extent that segmented images provide an excellent visual simulation of the size and shape of recognized brain structures, they can be presumed to have face validity (i.e., they conform to common knowledge). Face validity is not a quantitative measure, and therefore it cannot stand alone as an index of validity. Nevertheless, a good visual agreement between processed images and unprocessed natural images suggests that the data are accurate. Fig. 4 displays (a) an MR image, (b) a segmented image, and (c) a post-mortem brain slice of the same region. The anatomy in all images is clearly very similar. Face Validity.

Between Traced and Segmented Regions of Interest. In our ongoing MR research, we are obtaining estimates of a variety of brain structures and components, such as the ventricular system, the caudate, the hippocampus, and overall cerebral size. In these studies, structures are identified in all slices in which they are present by an operator who uses either edge-detention techniques or manual tracing. These measurements can be compared with those generated through segmentation techniques. Each gives an independent assessment of the size of the same structure. To the extent that the measurements agree, segmentation techniques can be considered to be valid. We are currently in the process of exploring the relationship between segmentation and manual tracing for a variety of structures. At present, we have analyzed agreement for two measurements in two separate samples of subjects: cerebral size (n = 74) and ventricular system (n =I 10). Table 5 reveals that the agreement between the traced and segmented measurements is excellent, indicating that the segmented images can be considered to have a reasonably high level of accuracy, at least for brain and CSF. A subsequent report will address the agreement for subcortical structures such as caudate, hippocampus, and amygdala. Correlations

Correlations With Age. A variety of studies have indicated that CSF volume increases in the normal human brain as a consequence of the aging process (Grant et al., 1987; Pfefferbaum et al., 1990; Jack et al., 1991; Jernigan et al., 1991a, Kohn et al., 1991). We have indirectly assessed the validity of our particular segmentation algorithm by evaluating its sensitivity to detect similar increases in CSF over time. Data are now available for CSF volume, gray matter volume, and white matter volume in a sample of 52 healthy male volunteers who range in age from 20 to 70. Fig. 5 plots the changes in volume of CSF, gray matter, and white matter in relation to age. As expected, CSF increases with age, while gray matter decreases. Thus, our technique shows excellent sensitivity in detecting changes in brain structure that occur as a consequence of the aging process.

Fig. 4. Comparison of (a) T, weighted mid-coronal image, (b) a classified mid-coronal image, and (c) a post-mortem mid-~o~nal brain slice

Phantom Studies. To develop a phantom, appropriate materials must be found that will provide similar imaged characteristics to the material that one wishes to simulate. Although there have been many studies of materials that would be suitabfe for MR phantoms under various circumstances. there was no mention in the literature of a material that provided tissue-equivalent imaging under both PD and TX-weighted imaging. Polyacrylamide, agarose, Gadoiinium chelates (Cd-DTPA), Cupro and Mangano salts, and silicone were all mentioned in the literature (Chui et al., 1985; Kneeland et al., 1986; Lutz and Schultz, 1986; Zhu et al., 1986; Bakker et al., 1987; DeLuca et al., 1987; Goldstein et al., 1987; Gray and Felmelee, 1987). However, none was used for dual echo imaging. On the basis of the work of Kohn et al. (1991), we chose to experiment with different concentrations of graphite in agarose solutions. The first type of phantom that was designed was that of the test tube type. This

45

Table 5. Comparison of cerebral and vsnt~culsr system volumes from manusl tracings vs. tracings from segmented images Segmented (cc) Traced(cc) PMW3Oll MeaIl

SD

Mean

SD

r

Cerebral volume {n = 79)

i 349.1

$49.4

1272.4

f 46.8

0.90

Ventricle volume (n = 10)

26.4

4.0

27.5

3.7

0.98

Fig. 5. Mean gray matter, white matter, and CSF volume in male normal controls as a function of age

loo-

of

I

,

20.29

30-39

40-49

SO-59

60 +

Decade Note that cerebrospinal fluid (CSF) increases with age, as does white matter, while gray matter decreases.

phantom consisted of 17 Corning 50 ml centrifuge tubes filled with a varying concentration of graphite (O-O.25 g/SO ml) in 1% agarose. The tubes are placed upright into the head coil of the scanner and images are acquired normally (as they would be for a standard PD-T, imaging study). The images from these studies are used for the determination of the appropriate concentrations of graphite in agarose (i.e., to match the magnitude of signal intensity produced by brain tissue, which is, for example, 45-70 for T,-weighted images). After determination of the appropriate concentrations of graphite to be used for simulation of gray matter, white matter, and CSF (0.08, 0.15, and 0.02 g/SO ml agarose, respectively), two phantoms were constructed. The first phantom consisted of three layers of material and was used to test the edge response of the system (see Fig. 6). This was accomplished by placing the phantom object at a 27” angle to the plane of imaging, creating regions of mixed amounts of each material included in each voxel volume (approximately 5 voxels per interface). The second phantom also consisted of three layers of material, but also had small irregular objects embedded within that matrix (see Fig. 7). These were used for testing the ability of the segmentation algorithm to detect objects, and also to permit calculation of volumes

Fig. 6. Examples of an edge phantom, as seen with a proton density image (a) and a Tn-weighted image (b); the classified image corresponds closely to the edges seen with magnetic resonance (c)

for these objects (with previously known \ olumes). Table 6 shows the results of the volumetric determinations. while Figs. 6 and 7 show the ability of the algorithm to

Table 6. Comparison of known volumes to measured volumes with segmented images in phantom Real Object

volume __-_-.(cc)

Measured volume ____ (cc)

Difference (cc)

% Difference

Star

18.6

18.5

-0.1

0.5

PIUS

27.2

-0.2

0.7

Concave

30.0

27 D 29.9

-0.1

0.3

o-cup

31.0

30.8

-0.2

0.6

~____

47 Fig. 7. Edge phantom composed of three different magnetic resonance (MR) responsive materials with small objects embedded in it

T2 image

PD image

Classified image These objects have been selected to have variable shape and a known volume. The edges of the filled object are plastic, with no MR response, and have been rejected as being one of the 3 materials present. Images are shown of a proton density (PD) image, a Tz-weighted image, and a classified image.

segment the edge phantom image and reject those areas that are not discernibly one material over another. Discussion We have described a technique for segmenting brain tissue into its three components (gray matter, white matter, and CSF), using information derived from two MR scanning sequences that were collected simultaneously. This method has been demonstrated to have good reliability, good reproducibility (if data are collected using the same scanner), and good validity. Thus, the method compares favorably with others that have been previously reported. Applications.

applications.

Segmentation techniques have a number of important potential Since they provide quantitative estimates of the tissue volume of

4X

specific brain compartments, individual data can be compared to normed data to determine whether an individual is within the appropriate normative range. This application is not as yet fully developed, since it requires the collection of relatively large amounts of normative data. Nevertheless, it has potential future applications in both clinical radiology and in neuroscience research. A second application is to provide a quantitative measure of changes in tissue of a given individual over time. Since the method has been shown to have good reliability and reproducibility, it can potentially be used to monitor tissue loss in a variety of pathological states, such as Alzheimer’s disease or schizophrenia. It can also be used to study the normal aging process in older individuals or the normal developmental process in children. A third potential application involves increasing the efficiency of image-analysis techniques that seek to measure the volume of cerebral substructures. At present, such studies tend to be quite labor intensive. That is, they tend to rely heavily on manual tracing of the borders of structures in serial slices, followed by summing and generating volumetric estimates; visually guided manual tracing is required for many subcortical structures, since edge-detection techniques are not at present sufficiently powerful. As thinner MR slices are acquired, the accuracy of measuring volumes potentially increases, but on the other hand the amount of information generated is enormous, suggesting the value of finding methods that permit the application of semi-automated techniques for boundary detection. Since segmented images contain only three shades on the gray scale, they lend themselves well to edge detection. An additional advantage of segmented images is that they provide clearer boundaries for some structures that are relatively difficult to see visually on PD, T,-weighted, or Tz-weighted images, even when they are high quality, such as the thalamus. A fourth application is in the domain of functional neuroimaging. One problem in functional imaging involves identifying the boundaries of brain structures, such as the cortical rim or subcortical nuclei. Even with high-resolution tomographs, the boundaries of these regions are not clearly identified with functional images. If MR images can be overlaid with functional images, either through the use of stereotactic head-fixation devices or through computerized methods such as the fitting of boundaries of the two images, then segmentation techniques can potentially yield a highly accurate and useful method for identifying regions of interest on functional images. Because segmentation techniques identify clear boundaries between brain and CSF, they may also reduce partial voluming problems that arise when functional imaging techniques are used to measure cortical metabolic function in conditions such as Alzheimer’s disease, since cortical atrophy may produce an intermixing of gray matter and CSF, thereby artificially reducing estimates of cortical gray matter metabolism. Segmented images can be used to identify the boundaries of the cortex and also to identify those cases where cortex and CSF are combined in functional images. Strengths. In addition to the above applications, our proposed method for segmentation has a variety of other strengths. To our knowledge, it is the first segmentation technique that reports a rigorous exploration of the effects of modifying parameters on the classification paradigm (e.g., effects of changing the

49 scanner, effects of changing location and number of training classes). This technique is relatively simple to use and permits relatively rapid processing of images. Once the data are loaded on a workstation, and the brain tissue is separated from the rest of the skull, steps 2-5 (see Methods) can be performed in approximately 10 minutes. Limitations. The segmented images so closely resemble MR images and postmortem brain slices that it is tempting to surmise that they represent actual measures of gray matter, white matter, and CSF. Nevertheless, they are best conceived of as representing gray-like matter, white-like matter, and CSF-like material, particularly given that there is no direct validation of this method. They must in fact be considered to be approximations of tissue volume, not measurements. They are derived from MR images, and the information used to generate the segmented estimates inevitably contains some mixture of tissues in particular samples. This is especially likely to be the case when gray matter is sampled from the cortex. We have not yet assessed the various limitations that may arise in this method if it is applied more broadly. For example, it is possible that reproducibility might be diminished if a l-year time interval were allowed to elapse between scanning sequences. Although we have assessed the effects of changing scanners, we have not assessed the effects of upgraded software. Software improvements could potentially affect the reproducibility of the method as well. Finally, the effects of tissue pathology have not been assessed; these might potentially affect the accuracy of classification of gray matter (particularly if cortical gray matter is more prominently partially volumed in individuals with disorders such as Alzheimer’s disease) or if other abnormalities that are visible to the naked eye (e.g., vascular disease and infectious processes) occur in tissue. Future Directions. We have reported on a technique that draws on two scanning sequences, one of which is PD-weighted and the other of which is T,-weighted. These two sequences have the advantage of being complementary images of one another, of being readily collected simultaneously with multiecho sequences, and of therefore having perfect simultaneous image registration. Additional information and accuracy might be added if T,-weighted images were introduced as well. Such an addition is technically difficult, however, since it typically cannot be collected at the same time as the PD and T,-weighted data, and it therefore might have diminished accuracy due to slight changes in head position in the two separate studies. Alternately, the application of discriminant analysis techniques for tissue classification needs further exploration with a single T,-weighted sequence. Other future directions to be explored include the effects of a variety of disease states on the segmentation data, long-term stability of the method, and collection of a relatively large normative sample to enhance its broad clinical utility. Acknowledgments. This research was supported in part by National Institute of Mental Health grants MH-31593, MH-40856, and MHCRC-43271; The Nellie Ball Trust Fund, Iowa State Bank & Trust Company, Trustee; and Research Scientist Award MH-00625.

References Afifi. A.A., and Azen, S.P. Stattstrc,al Anul.vsis. Kew York: Academic Press, 1972 Andreasen, N.C.; Cohen. G.; Harris, G.: Ciz.adlo, T.; Parkkinen, .I.; Rezai, K.; and Sway/e. V.W. II. Imaging processing for the study of brain structure and function: Problems and programs. Journal of Neuropsychiatr_y and Clinical Neurosciences. in press. Ashtari, M.; Zito. J.L.; Gold, B.I.: Lieberman. J.A.; Borenstein, M.T.; and Herman. P.G Computerized volume measurement of brain structure. lnvcstigativr Radioiog~,. X:798-805 1990. Bakker, C.J.G.: de Graaf C.N.. and van Dijk, 1’. Derivation of quantitative information m NM R imaging: A phantom study. Physics of Medirine and Biology, 29( 12): I5 1I- 1.525, 1984. Bock, R. D. Multivariate Statistical Methods in Behaviorul Research. New York: MctirawHill. 1975. Chui, M.; Bakesley, D.: and Mohapatra, S. Test method for MR images slice profile. Journal qf Computer Assisted Tomography. 9(6): I 150-I 152, 1985. DeLuca, F.; Maraviglia, F.; and Mercurio, B.A. Biological tissue simulation and standard testing material for MRI. Magnetic, Resonance in Medicine, 4: 189-192, 1987. Finn, J.D. A General .Model.for Multivariate Analysis. New York: Holt. Rinehart and Winston, 1974. Goldstein, D.C.; Kundel. H.L.; Daube-Witherspoon, M.E.; Thibault, L.E.; and Goldstein. E.G. A silicone gel phantom suitable for multimodality imaging. fnvestigative Radiology. 22: 153-157, 1987. Grant, R.; Condon, B.; Lawrence. A.; Hadley. D.M.; Patterson. J.; Bone, I.; and Teasdale. G.M. Human cranial CSF volumes measured by MRI: Sex and age influences, Magnetic, Resonance Imaging, 5:465-468, 1987, Gray, J.E., and Felmelee, J.P. Section thickness and contiguity phantom for MR imaging. Radiology, 164: 193-197, 1987. Green, P.E. Mathematical Tools,for Applied Multivariate Ana

Segmentation techniques for the classification of brain tissue using magnetic resonance imaging.

A technique is described for classifying brain tissue into three components: gray matter, white matter, and cerebrospinal fluid. This technique uses s...
2MB Sizes 0 Downloads 0 Views