Longitudinal Patch-Based Segmentation of Multiple Sclerosis White Matter Lesions.

HHS Public Access Author manuscript Author Manuscript

Mach Learn Med Imaging. Author manuscript; available in PMC 2016 August 24. Published in final edited form as: Mach Learn Med Imaging. 2015 October ; 9352: 194–202. doi:10.1007/978-3-319-24888-2_24.

Longitudinal Patch-Based Segmentation of Multiple Sclerosis White Matter Lesions Snehashis Roy1,★, Aaron Carass2, Jerry L. Prince2, and Dzung L. Pham1 1Center

for Neuroscience and Regenerative Medicine, Henry Jackson Foundation

2Department

of Electrical and Computer Engineering, The Johns Hopkins University

Author Manuscript

Abstract

Author Manuscript

Segmenting T2-hyperintense white matter lesions from longitudinal MR images is essential in understanding progression of multiple sclerosis. Most lesion segmentation techniques find lesions independently at each time point, even though there are different noise and image contrast variations at each point in the time series. In this paper, we present a patch based 4D lesion segmentation method that takes advantage of the temporal component of longitudinal data. For each subject with multiple time-points, 4D patches are constructed from the T1-w and FLAIR scans of all time-points. For every 4D patch from a subject, a few relevant matching 4D patches are found from a reference, such that their convex combination reconstructs the subject’s 4D patch. Then corresponding manual segmentation patches of the reference are combined in a similar manner to generate a 4D membership of lesions of the subject patch. We compare our 4D patch-based segmentation with independent 3D voxel-based and patch-based lesion segmentation algorithms. Based on ground truth segmentations from 30 data sets, we show that the mean Dice coefficients between manual and automated segmentations improve after using the 4D approach compared to two state-of-the-art 3D segmentation algorithms.

1 Introduction

Author Manuscript

Magnetic resonance imaging (MRI) is a widely used noninvasive modality to image the human brain. Segmentations of T2-hyperintense lesions from serial MR images provide quantitative information about lesion evolution, which is associated with the progression and prognosis of MS. Longitudinal information, such as the appearance of new lesions, as well as enlargement or reduction of lesions over time are routinely used as a clinical measures for disease progression and response to treatment. Although manual delineations of lesions are considered as ground truth, they are time consuming, prone to extensive inter-rater variability, and impractical for large clinical data sets [1]. Therefore accurate automatic lesion segmentations from longitudinal MR scans are desirable. There are multiple ways to longitudinally segment lesions from serial MR images. The simplest way is to segment lesions at each time-point independently. Many lesion segmentation techniques are available for segmenting lesions from a single time-point (cf. ★Support for this work included funding from the Department of Defense in the Center for Neuroscience and Regenerative Medicine and by the grants NIH/NINDS R01NS070906, NIH/NIBIB R21EB012765.

Roy et al.

Page 2

Author Manuscript

[2,3,4]). However, applying 3-D algorithms to 4-D data results in inconsistencies due to differences in noise or intensity variations in lesions between scans. Particularly, in the detection of chronic lesions, it is advantageous to consider multiple time-points to inform the segmentation at a single time-point.

Author Manuscript

Registration based approaches [5] first register all the time-points to an atlas (or baseline). Once registered, new appearing or shrinking lesions are found by directly comparing voxelwise intensity changes. Although this is a true “4D” approach, its accuracy depends on that of the registrations, which may not be robust in presence of lesions. Another type of segmentation method relies on intensity models of pairwise subtraction images. Images from two consecutive time-points are first registered. Then a logistic regression model or a statistical threshold is applied on the subtraction image to find significant intensity variations, which leads to incident lesions in the follow-up scan. Several approaches based on time series have also been reported for automatic segmentation of MS lesions. In these methods, temporal derivatives [6] or principal component analysis of the temporal profiles of voxel-wise intensity features are used to characterize the lesion progression in MS. However, they typically need more time-points for stable results, which may not be feasible in a clinical scenario. In this paper, we propose a supervised 4D method for segmenting MS lesions from serial MR images. We do not employ any pair-wise difference images, but use features from all the time-points simultaneously, thereby producing a longitudinally consistent complete lesion map at every time-point. Instead of using voxel-wise features, we use patch based features, which have been shown to be preferable in many image processing tasks [7,8,9].

Author Manuscript

2 Materials and Method 2.1 Dataset The dataset contains ten subjects, each with three time-points. Each time-point contains a T1-w MPRAGE and a FLAIR image, scanned at 0.83 × 0.83 × 1.1 mm3 and 0.83 × 0.83 × 2.2 mm3 resolution, respectively. Two consecutive scans are separated by approximately one year for all the subjects. Among the 10 subjects, 8 of them were diagnosed with relapsingremitting MS (RRMS), one with secondary-progressive MS (SPMS), and one with primaryprogressive MS (PPMS). Lesions were manually delineated on the 30 time-points to generate binary reference segmentations. 2.2 Method The algorithm requires at least one subject to be withheld as the training data, called a

Author Manuscript

reference, defined as a (m + 1)-tuple of T images, where

to

,

denotes m-channel input MR images, such as T1-w, T2-w, PD-w, or FLAIR,

for the tth time-point, t = 1, …, T. denotes the manual binary segmentations of lesions. T denotes the total number of time-points. Although we use only MPRAGE and FLAIR sequences in all our experiments (i.e., m = 2), any number of sequences can be accommodated in the methodology. Similarly, any number of reference images can be used,

Mach Learn Med Imaging. Author manuscript; available in PMC 2016 August 24.

Roy et al.

Page 3

Author Manuscript

as shown in Sec. 3.1. However, we define the notation with just one reference for clarity. Images and scalars are denoted by lowercase variables, while vectors are denoted by bold lowercase variables. Matrices are represented by uppercase variables. The subject is denoted as an m-tuple of T images,

.

Note that the subject and reference need not be registered. MR images and , k = 1, …, m, are intensity normalized so that the modes of their white matter (WM) intensities are unity. The WM intensity modes were found automatically based on a smooth kernel density estimator of the histograms.

Author Manuscript

For every reference MR image, 3D patches are extracted around every voxel, and are transformed into 1D vectors of dimension d. For example, for 3 × 3 × 3 patches, d = 27. Patches from multiple time-points are concatenated to form a subject feature vector. Subject features are denoted by b(j) ∈ ℝmTd×1, where d × 1 patches from m sequences and T timepoints are concatenated. Similarly, a reference MR feature aMR(i) ∈ ℝmTd×1 is the

Author Manuscript

, t = 1, …, T. concatenation of patches from m MR images of T time-points Corresponding manually segmented patches are concatenated to form segmentation features aS(i) ∈ ℝTd×1. The elements of aS(i) are ∈ {0, 1}. i and j denote the voxels in the reference and subject space, i = 1, …, M, j = 1, …, N, where M and N are the number of non-zero voxels in the reference and subject, respectively. From now on, we also refer to the MR (mTd × 1) or segmentation (Td × 1) feature vectors as a “patch”. All MR features are normalized so that their ℓ2 norms are unity [7]. A reference MR dictionary of patches is defined as AMR ∈ ℝmTd×M, whose columns are the MR patches aMR(i). Similarly a reference segmentation dictionary is defined as AS ∈ ℝTd×M, whose columns are the segmentation patches aS(i). In Sec. 3.1, we will explain a strategy to modify the dictionaries AMR and AS in order to improve the accuracy and speed of the algorithm. Since the subject MR contrasts are co-registered, a subject patch b(j) contains information about temporal trajectories of the lesion evolution around the center voxel j. Given a rich dictionary of patches AMR, we assume that it contains similar looking examples like b(j). Therefore it is likely that for a given b(j), a few patches from AMR can be found whose convex combination is b(j). More specifically,

(1)

Author Manuscript

where x(j) contains sparse weights for each of the M reference patches in the dictionary AMR. Minimizing the ℓ0 norm of x(j) enforces sparsity, which results in only a few patches in AMR contributing to b(j). The non-negativity constraints in x(j) enforces similarity in the texture between the subject patch and the contributing reference patches. Eqn. 1 can be efficiently solved by minimizing theℓ1 norm of x(j). We use an elastic net regularization, which generates x(j) by minimizing the following objective function,


Roy et al.

Page 4

Author Manuscript

(2)

Author Manuscript

The first term is a least squares fitting term enforcing the similarity between the subject patch b(j) and its reconstruction using the dictionary AMR. The second term is an ℓ1 regularization enforcing sparsity in x. The third term is a ℓ2 ridge regularization. Combination of the ℓ1 and ℓ2 penalties together enforces a grouping effect, where similar looking patches in the dictionary are chosen while keeping a certain level of sparsity in x̂(j). λ1 and λ2 are two weights balancing the contributions of ℓ1 and ℓ2 terms. We empirically chose λ1 = λ2 = 0.01. Once x̂(j) is estimated for every subject patch, the lesion membership can be obtained at the jth voxel by combining the segmentation patches aS(i) using the same weights applied to the segmentation patches, μ(j) = ASx̂(j), where μ(j) is a Td × 1 4D lesion membership at the jth voxel. Since we are only interested in the WM lesions, the false positives due to FLAIR hyperintensities in the gray matter are removed by multiplying the memberships with WM masks obtained from LesionTOADS [3]. The memberships are thresholded at a threshold to generate hard segmentations, empirically determined in Sec. 3. 2.3 Dictionary Modification

Author Manuscript

The sparse coefficient vector x̂(j) is an M × 1 vector, where M is the number of reference voxels, with typically M ~ 107. It is computationally expensive to solve Eqn. 2 for N subject patches, where N ~ 107 as well. It is common practice to subsample the dictionary to reduce its size and thereby reduce computational expense. However, in this case, special consideration must be paid to the way the subsampling is performed in order to sufficiently represent the different ways that lesion voxels may evolve. For T timepoints at a single voxel, there are 2T possible combinations for a lesion mask to evolve. We therefore create 2T sub-dictionaries to ensure that every possible lesion evolution time series contains a sufficient number of samples. This is performed as follows. For every Td × 1 reference manual segmentation feature vector aS(i), T center voxels from the T time-points are considered. We first characterize a reference feature aS(i) (and aMR(i)) according to the labels of those T center voxels of the T reference manual segmentation patches. Since each element in aS(i) ∈ {0, 1}, there are (2T − 1) combinations for at least one of the T center voxels being a lesion (i.e., label 1), and one where all center voxels are zero, i.e., normal tissue. Therefore, we divide AMR and AS into 2T sub-dictionaries AMR(k) ∈ ℝmTd×Lk, AS(k) ∈ ℝTd×Lk, k = 1, …, 2T, according to the label types of the center voxels of the manual segmentation patches. Lk is the number of patches in each sub-dictionary, and

Author Manuscript

. Instead of using the whole reference dictionary AMR for every subject patch b(j) in Eqn. 2, we use a kd-tree to find n similar looking MR reference patches from each of the 2T number of AMR(k) and combine them in a new subject patch specific dictionary BMR(j), having n×2T reference patches. Corresponding segmentation patches are combined to form a smaller segmentation dictionary BS(j) as well. With T = 3 in our case, we choose n = 200, reducing the number of eligible reference patches from M ~ 107 to 1600. Eqn. 2 is solved for


Roy et al.

Page 5

Author Manuscript

each subject patch b(j) using the patch specific dictionary BMR(j). Then the memberships μ(j) is obtained with BS(j). By choosing a small number (n = 200) of similar looking patches from each of the sub-dictionaries, the computational burden in Eqn. 2 is reduced by including all possible types of reference lesion patches, therefore keeping the richness of the dictionary.

3 Results

Author Manuscript

We compare our 4D method with two other methods, a voxel based method LesionTOADS [3] and a 3D patch-based method [4], both of which segment each 3D volume independently of the other longitudinally acquired volumes. Fig. 1 shows segmentations obtained from these three methods. Although the lesion intensity change across time-points are visually imperceptible, both LesionTOADS and the 3D patch method sometimes miss or underestimate (yellow arrows) lesions as shown in Fig. 1. In comparison, our 4D patch method produces a more accurate and temporally consistent segmentation. Quantitative comparisons between automatic and manual segmentations are carried out using sensitivity, Dice coefficient, lesion true positive rate (LTPR), lesion false positive rate (LFPR), and absolute volume difference (VD), as defined in [2]. Voxel-based overlap measures such as sensitivity and Dice are affected if the lesion volume is small [2], because lesion boundaries are often significantly different between raters. Hence, LTPR and LFPR are lesion count based measures indicating if a lesion is detected in the automated segmentations.

Author Manuscript

After the membership images (μ(j) at every voxel) are obtained, they are thresholded to generate a hard segmentation. The optimal threshold is obtained using a cross-validation on two randomly chosen subjects. One of them is treated as test data, and is segmented using the other one as the reference, and vice versa. The maximum average Dice is observed to be 0.734 at a threshold of 0.44. This threshold is used for the remainder of the experiments. 3.1 Number of Reference Images

Author Manuscript

As mentioned earlier, multiple references can be included in our framework. To include more than one reference image in Eqn. 2, AMR and AS are populated with patches from all the reference images. Like any supervised method, the performance of our lesion segmentation method depends on the variability of the reference patches such that matching example patches can always be found for any subject patch. Therefore it is expected that the segmentation accuracy will improve by increasing the number of reference images. In this section, we describe an empirical procedure to select the number of reference images. Five subjects are randomly chosen for this experiment. Four subjects are again randomly chosen as potential reference images, and then all five subjects are segmented using one or more of the four references. Then Dice coefficients are averaged over all 15 time-points in each case. Median Dices are 0.468, 0.539, 0.549, and 0.549 for 1–4 references, respectively. Therefore, we used three references to segment all the 10 subjects.


Roy et al.

Page 6

3.2 Quantitative evaluation

Author Manuscript

In this section, we compare our 4D method with three references against LesionTOADS and a 3D patch-based method on 10 subjects. The 3D patch-based method also uses the same three references. Lesion volumes are often used as clinical measures in MS progression. Hence we have plotted automatically detected lesion volumes (in cc) with respect to the ground truth in Fig. 2 (a). Solid blue, magenta, red, and black lines indicate linear fits of LesionTOADS, 3D and 4D patch-based methods, and the unit slope line. The slopes are 0.594, 0.748, and 0.951 for LesionTOADS, 3D and 4D patch-based method, respectively, while the R2 for the linear fits are 0.488, 0.732, and 0.889, respectively. The intercepts are 6.75, 2.88, and 1.75 cc, respectively. A t-test for the null hypothesis that the intercepts are zero gives p-values 5 × 10−4, 0.060 and 0.079 for LesionTOADS, the 3D and the 4D patchbased methods, indicating that LesionTOADS has a significant bias (6.75 cc) in segmentation, while the patch-based methods do not.

Author Manuscript

Boxplots of sensitivity, Dice coefficients, LTPR, and LFPR, for the 30 time-points, between automatic and manual segmentations are shown in Fig. 2(b). Median sensitivities for LesionTOADS, 3D and 4D patch-based method are 0.546, 0.526, 0.575, respectively, while median Dice coefficients are 0.458, 0.507 and 0.545. Median LTPR are 0.465, 0.379, and 0.532, while median LFPR are 0.795, 0.582, and 0.532, respectively. A Wilcoxon signedrank test shows significant improvement of the 4D method over LesionTOADS in Dice coefficient and over 3D patch-based method on LTPR (p < 0.01). The LFPR for the 4D method is significantly lower than LesionTOADS (p < 0.01), but not the 3D patch-based method. The median absolute volume differences are 0.405, 0.539 and 0.284, respectively. Non-parametric testing gives the p-values for VD as 0.03 and 0.07 comparing the 4D method with the LesionTOADS and 3D patch-based method.

Author Manuscript

4 Discussion A limitation of the current implementation is that the number of reference time-points must be the same as the number of subject time-points. This may not always be feasible in a clinical setting. If a subject has fewer scans than the reference, then a part of the reference time-points can be used. However, if the subject has more scans than the reference, one possible way is to divide the subject time-points into components having the same number of time-points as the reference. Then each component are segmented separately using the reference, and the corresponding segmentations can be fused.

Author Manuscript

Like any supervised algorithm, the accuracy of our 4D method depends on the choice of the references. For a subject patch b(j), a few matching reference patches are extracted from the dictionary AMR in Eqn. 1. We made the underlying assumption that the references are rich enough such that matching patches can always be found. If this assumption is violated, suboptimal results are obtained. An illustration is shown in Sec. 3.1. Among the randomly chosen four references, one of them had PPMS, while the others had RRMS. Example patches corresponding to relapsing-remitting types of lesions generally observed in RRMS were not abundant in the reference with PPMS. Therefore the Dice coefficient obtained from a single reference (the subject with PPMS) is significantly lower than two references (0.468


Roy et al.

Page 7

Author Manuscript

vs 0.539, p = 0.026), where one of the references was a subject with RRMS. Hence, the references should be chosen carefully so as to include relevant example patches. Large inter-rater variability are usually observed in MS lesion segmentation [2]. Since lesions are small objects with ambiguous boundaries, overlap measures are generally lower when lesion load is small. We observed a minimum Dice of 0.08 for a subject with very low lesion load. Our Dice and sensitivities are of similar range in magnitude with previously reported 3D lesion segmentation methods [10,4], although the results are not directly comparable because the validations are done on different datasets.

References

Author Manuscript Author Manuscript

1. Garcia-Lorenzo D, Francis S, Narayanan S, Arnold DL, Collins DL. Review of automatic segmentation methods of multiple sclerosis white matter lesions on conventional magnetic resonance imaging. Med Image Anal. 2013; 17(1):1–18. [PubMed: 23084503] 2. Geremia E, Clatz O, Menze BH, Konukoglu E, Criminisi A, Ayache N. Spatial decision forests for MS lesion segmentation in multi-channel magnetic resonance images. NeuroImage. 2011; 57(2): 378–390. [PubMed: 21497655] 3. Shiee N, Bazin PL, Ozturk A, Reich DS, Calabresi PA, Pham DL. A Topology-Preserving Approach to the Segmentation of Brain Images with Multiple Sclerosis Lesions. NeuroImage. 2009; 49(2): 1524–1535. [PubMed: 19766196] 4. Roy S, He Q, Carass A, Jog A, Cuzzocreo JL, Reich DS, Prince JL, Pham DL. Example based lesion segmentation. Proc of SPIE. 2014; 9034:90341Y. 5. Ganiler O, Oliver A, Diez Y, Freixenet J, Vilanova JC, Beltran B, Ramio-Torrenta L, Rovira A, Llado X. A subtraction pipeline for automatic detection of new appearing multiple sclerosis lesions in longitudinal studies. Neuroradiology. 2014; 56(5):363–374. [PubMed: 24590302] 6. Gerig G, Welti D, Guttmann CRG, Colchester ACF, Szekely G. Exploring the discrimination power of the time domain for segmentation and characterization of active lesions in serial MR data. Med Image Anal. 2000; 4(1):31–42. [PubMed: 10972319] 7. Roy S, Carass A, Prince JL. Magnetic resonance image example based contrast synthesis. IEEE Trans Med Imag. 2013; 32(12):2348–2363. 8. Roy S, Carass A, Prince JL. Synthesizing MR contrast and resolution through a patch matching technique. Proc of SPIE. 2010; 7263:76230j. 9. Wang L, Shi F, Gao Y, Li G, Gilmore JH, Lin W, Shen D. Integration of sparse multi-modality representation and anatomical constraint for isointense infant brain MR image segmentation. NeuroImage. 2014; 16(1):152–164. [PubMed: 24291615] 10. Weiss N, Rueckert D, Rao A. Multiple sclerosis lesion segmentation using dictionary learning and sparse coding. Med Image Comp and Comp Asst Intervention (MICCAI). 2013; 8149:735–742.

Author Manuscript Mach Learn Med Imaging. Author manuscript; available in PMC 2016 August 24.

Roy et al.

Page 8

Author Manuscript Author Manuscript Fig. 1.

Author Manuscript

A comparison of our 4D segmentation method with LesionTOADS [3] and a 3D patch-based method [4] is shown. Individual 3D segmentations sometimes miss or underestimate (yellow arrows) lesions, while our 4D method produces temporally more consistent segmentations.

Author Manuscript Mach Learn Med Imaging. Author manuscript; available in PMC 2016 August 24.

Roy et al.

Page 9

Author Manuscript Fig. 2.

Author Manuscript

(a) Scatter plot of manual lesion volumes (in cc) against automated ones are shown for LesionTOADS (blue), 3D patch based (magenta) and 4D patch-based method (red). (b) Boxplots of voxelwise sensitivity, Dice coefficients, lesion true positive rate (LTPR), and lesion false positive rate (LFPR) are shown.

Author Manuscript Author Manuscript Mach Learn Med Imaging. Author manuscript; available in PMC 2016 August 24.

Evaluating the effects of white matter multiple sclerosis lesions on the volume estimation of 6 brain tissue segmentation methods.

Longitudinal multiple sclerosis lesion segmentation data resource.

Multiple deep white matter lesions mimic multiple sclerosis as an unusual complication of left atrial myxoma.

A pilot MRI study of white and grey matter involvement by multiple sclerosis spinal cord lesions.

Fully automatic detection of deep white matter T1 hypointense lesions in multiple sclerosis.

Phenotypic and functional characterization of T cells in white matter lesions of multiple sclerosis patients.

The impact of isolated lesions on white-matter fiber tracts in multiple sclerosis patients.

Gadolinium-Enhancing Lesions Lead to Decreases in White Matter Tract Fractional Anisotropy in Multiple Sclerosis.

Multiple Factors Involved in the Pathogenesis of White Matter Lesions.

Fos RNA accumulation in multiple sclerosis white matter tissue.

LONGITUDINAL INTENSITY NORMALIZATION IN THE PRESENCE OF MULTIPLE SCLEROSIS LESIONS.

Imaging of white matter lesions.

[Gray matter lesions and cognitive impairment in multiple sclerosis].

Quantitative susceptibility mapping (QSM) of white matter multiple sclerosis lesions: Interpreting positive susceptibility and the presence of iron.

Patterns of regional gray matter and white matter atrophy in cortical multiple sclerosis.

A Sensitive and Automatic White Matter Fiber Tracts Model for Longitudinal Analysis of Diffusion Tensor Images in Multiple Sclerosis.

Discrepancy in CCL2 and CCR2 expression in white versus grey matter hippocampal lesions of Multiple Sclerosis patients.

FLAIR* to visualize veins in white matter lesions: A new tool for the diagnosis of multiple sclerosis?

Early-Stage White Matter Lesions Detected by Multispectral MRI Segmentation Predict Progressive Cognitive Decline.

Automatic segmentation and volumetry of multiple sclerosis brain lesions from MR images.

Evaluating the effect of multiple sclerosis lesions on automatic brain structure segmentation.

Evaluation of Atlas-Based White Matter Segmentation with Eve.

Neural Network-Based Learning Kernel for Automatic Segmentation of Multiple Sclerosis Lesions on Magnetic Resonance Images.

Improved segmentation of white matter tracts with adaptive Riemannian metrics.