1164

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 34, NO. 5, MAY 2015

Detecting Statistically Significant Differences in Quantitative MRI Experiments, Applied to Diffusion Tensor Imaging Dirk H. J. Poot and Stefan Klein*

Abstract—In this work we present a framework for reliably detecting significant differences in quantitative magnetic resonance imaging and evaluate it with diffusion tensor imaging (DTI) experiments. As part of this framework we propose a new spatially regularized maximum likelihood estimator that simultaneously estimates the quantitative parameters and the spatially-smoothlyvarying noise level from the acquisitions. The noise level estimation method does not require repeated acquisitions. We show that the amount of regularization in this method can be set a priori to achieve a desired coefficient of variation of the estimated noise level. The noise level estimate allows the construction of a CramérRao-lower-bound based test statistic that reliably assesses the significance of differences between voxels within a scan or across different scans. We show that the regularized noise level estimate improves upon existing methods and results in a substantially increased precision of the uncertainty estimates of the DTI parameters. It enables correct specification of the null distribution of the test statistic and with it the test statistic obtains the highest sensitivity and specificity. The source code of the estimation framework, test statistic and experiment scripts are made available to the community. Index Terms—Cramér-Rao lower bound, diffusion tensor imaging (DTI), quantitative magnetic resonance imaging (qMRI), regularization, statistical noise analysis, statistical testing.

I. INTRODUCTION

R

ECENT developments in quantitative magnetic resonance imaging (qMRI) methods have enabled the objective measurement of tissue properties such as and (anisotropic) diffusion of water with diffusion tensor imaging (DTI) [1]. Since actual tissue properties are measured, the values should be less dependent on scanner hardware and exact scanning protocol than standard weighted MR scans. Thereby, qMRI enables a whole new type of MRI analysis: it allows meaningful comparison of the measured values, within a scanning session, between different scanning sessions, between subjects, and between scanners. For example, it allows Manuscript received June 26, 2014; revised November 07, 2014; accepted December 02, 2014. Date of publication December 18, 2014; date of current version April 29, 2015. Asterisk indicates corresponding author. D. H. J. Poot is with the Biomedical Imaging Group Rotterdam, Departments of Medical Informatics and Radiology, Erasmus MC, 3015 GE Rotterdam, The Netherlands, and also with the Quantitative Imaging Group, Delft University of Technology, 2628 CJ Delft, The Netherlands. *S. Klein is with the Biomedical Imaging Group Rotterdam, Departments of Medical Informatics and Radiology, Erasmus MC, 3015 GE Rotterdam, The Netherlands (e-mail: [email protected]). Digital Object Identifier 10.1109/TMI.2014.2380830

comparing healthy to diseased tissues within a single subject, it allows identifying voxelwise changes due to disease progression or therapy effects, and allows quantitatively comparing the tissue properties of a subject to reference values obtained from healthy control subjects. In such comparisons the statistical significance of any differences that are found should be reliably assessed. This is not straightforward since the precision of the obtained quantitative values varies across scanners and examinations and is spatially varying, due to the fitting of a, typically nonlinear, model to the acquired images as well as due to spatial variability of the noise level in the acquired images. Quantitative MRI experiments acquire a series of magnitude MR images, in which some acquisition settings, such as diffusion weighing, change to obtain tissue property dependent image contrast [1]. To these images, a parametric model that describes the relation between image intensity and the underlying tissue properties, potentially extended with nuisance parameters, is fitted. In magnitude MR images the image noise introduces a bias in the image intensity [2]–[5]. Thus, the noise level should also be known to correct for the noise induced bias in the fitting process. Traditionally, the noise level in a MR image is estimated from the background area of that image since there the signal is Rayleigh distributed [6]–[8]. This is a valid approach for traditional acquisitions (nonaccelerated, single coil), where the noise level can be assumed to be constant throughout the image. However, such methods cannot be expected to provide a reliable noise level estimate for modern accelerated acquisitions in which multiple coils and parallel imaging are used [9]. In such acquisitions the noise level is spatially varying over the subject and the background area might even be suppressed [8], [10], invalidating any noise measure obtained from that region. In this work, we propose to estimate the noise level field within the object of interest. In contrast to most previous work on this topic [11]–[14], we propose to estimate the tissue property maps and noise level simultaneously, without requiring repeated acquisitions. To fit the parametric model to the images we employ a maximum likelihood (ML) or, when regularization is used, a maximum a posteriori (MAP) estimator. The regularization is used to obtain spatially smooth noise level fields, as it is known a priori that the noise level varies smoothly over the image [12]. The strength of the regularization can be set a priori by employing the derived explicit relation between coefficient of variation (CV) of the noise level and the regularization strength. This regularized noise level estimate is used in

0278-0062 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

POOT AND KLEIN: DETECTING STATISTICALLY SIGNIFICANT DIFFERENCES IN QUANTITATIVE MRI EXPERIMENTS

the construction of a Cramér Rao Lower Bound (CRLB) based test statistic. This statistic evaluates the statistical significance of differences between two voxels. In contrast to a conventional statistical test on the difference between means (or medians) of two regions of interest, our method allows true voxelwise comparisons between, for example, baseline and follow-up scans or between voxels located at different positions along a white matter fiber tract. The a priori known CV of the noise level estimate allows the derivation of the null distribution of the test statistic. In the experiments we check the validity of the derived expressions and compare the new noise level estimation method to previously presented methods [11]–[13]. The experiments are performed with simulated and real DTI data [15]. DTI allows the study of tissue microstructure, including the orientation and integrity of brain white matter and muscle tissue, by acquiring at least six diffusion weighted images [16], [17]. Measures derived from DTI, such as fractional anisotropy (FA) and mean diffusivity (MD), are known to be related to neurological diseases, see for example [18]–[22]. In this work the DTI model is used as it is one of the most widely used forms of quantitative MRI and it has a nonlinear model which, compared to other quantitative MR applications, has a relatively large number of parameters. However, all of the presented methods can directly be applied in other quantitative MRI applications, such as and mapping and dynamic contrast enhanced MRI as well as to more complex diffusion models. In summary, the contribution of this work is three-fold: 1) a regularized estimate of the noise level field; 2) explicit control over the CV of this estimate; 3) a test statistic that uses the estimated noise level and CRLB to optimally detect statistically significant differences. II. METHODS This section describes A) the estimation method, B) an explicit relation between regularization strength and coefficient of variation (CV) of the noise level estimate, C) the Cramér Rao lower bound and D) how it can be used in statistical tests, and, finally, E) reference methods to which our method will be compared. Assumptions and approximations made in the derivations are labeled by #. The source code implementating the methods described here, as well as the simulation experiments, are available on our website. A. Estimating the Tissue Parameters and Noise Level Let be the field with tissue property and nuisance parameters. Let be the noise level field, and let be a vector containing all images acquired in the experiment. Let denote the measured signal of voxel in image . Similarly let denote a vector of length containing the tissue parameters in voxel , and the noise level in voxel . To distinguish between the real (ground truth) values and the estimated we use the notation and , respectively. The estimation procedure uses a model that predicts the intensity of voxel for all images . Specifically, for DTI this model for the noise free intensity of is given 1Available

online: http://bigr.nl/people/DirkPoot

1165

by [23] (1) (2) the diffusion gradient direction where is the b-value and of image is the intensity without diffusion weighting, and the symmetric 3 3 diffusion tensor, where we omitted the subscript for clarity of notation. For a more extensive introduction to DTI acquisition and postprocessing, we refer the reader to [23]–[25]. The MAP estimation procedure is defined as (3) (4) a log prior on where is the likelihood function of the data, , and a log prior on . When and are zero, the MAP estimator reduces to a maximum likelihood estimator [2]. Assuming that the different voxels and the different images are statistically independent measurements , the cost function can be expanded to

(5) where is the spatial domain in which the fit is performed. The domain can be the entire image or a region selected by a mask. It is known that nonaccelerated and SENSE accelerated magnitude MR images are Rice distributed [3], [26]–[29]. For diffusion weighted images, k-space reconstruction methods that give Rice distributed magnitude images, such as SENSE are preferable [5]. Therefore, unless noted otherwise, we use the Rice distribution for in (5). It is possible to use other distribution functions in (5), such as the noncentral chi distribution that models GRAPPA reconstructions [4]. Evaluating to what extent the relations we find below hold for those distributions is beyond the scope of this paper. In the current work we focus on estimating the noise level parameter field . Therefore, for simplicity, we set the prior , which can be used to spatially regularize , to zero. Thus there is no direct coupling between the of different voxels. To obtain a precise and smooth estimate of we use a Gauss-Markov random field (MRF) as regularization prior (6) where is a scalar parameter that controls the strength of the regularization. In the experiments in this study we use the (graph) Laplacian for (7) is the set of direct neighbors of voxel . That is, where contains all for which where is the integer vector containing the image grid location of voxel . Note that at the image borders, the set of neighboring voxels is reduced.

1166

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 34, NO. 5, MAY 2015

When evaluated on a Cartesian grid the Laplacian of voxel is the sum of the second derivatives in every dimension. The kernel of Laplacian with these border constraints contains only the constant field. Thus, only when for any positive scalar . The Laplacian is evaluated on , the logarithm of , since relative differences of are important and a priori the magnitude of is unknown. There are two noteworthy special cases of the regularization strength : and . When is constant since that is the only field in the kernel of the regularizer. The constant will be some (weighted) average of the noise level in the entire region . As a single noise level is estimated, this approach compares to methods in which the noise level is estimated from a background region [6], [30], [31]. However, as the noise level is estimated from the region of interest, it can be expected to be closer to the “true” noise level when the noise level is not constant, or when physiological effects introduce signal fluctuations that can be regarded as noise. Note that in our implementation for , the global noise level is estimated by alternating optimization of and the scalar . In the second special case, , the noise level is estimated for each voxel independently. In this case the estimation of the different voxels is completely decoupled [13]. Since the estimate of is obtained without any smoothing, the value can be expected to be accurate. However, due to the small number of measurements per voxel in typical experiments, the precision of will be low in this case. The optimization of (5) is nontrivial as it is a nonlinear function containing millions of variables. However, the only coupling of the optimization between different voxels is through (and possibly when it would specify a regularizing prior for ). Since we use the Laplacian which has a spatially local effect, the optimization in (5) is performed by iteratively optimizing the cost function in small, partially overlapping, 3-D patches within with size . Each patch is optimized with a trust region dogleg method where the Newton steps are approximated by a preconditioned conjugate gradient method [32]. The fitting procedure was implemented in MATLAB with custom routines. It is well known that the ML estimate is biased due to the reduced degrees-of-freedom of the residual [33]. To approximately correct for this bias in a postprocessing step we introduce a modified estimate

ments these bias correction methods differ. Experiments (not shown) indicate that MODIF is preferable in these cases. B. Predicted Coefficient of Variation and Degrees-of-Freedom In this section we derive a relation between the coefficient of variation (CV) of and and derive the effective degrees-offreedom from which is estimated. In this theoretical derivation we assume a linear model for and normal distribution of the noise . In the experiments section (Sections III-A and III-B) we will investigate the errors introduced by these assumptions and by the approximations made in the derivation below. In this section we correct the bias in by adding the log prior on (9). With these assumptions, the cost function (5) reduces to (10) with (11) is the th row of the matrix with size where , which fully specifies the linear model of voxel . In (10) and originate from the logarithm of the normal distribution. In this case does not depend on and thus it can be estimated separately and for each voxel independently with a least squares algorithm that minimizes . In the following derivation we will use that , i.e., the squared 2-norm of the residual divided by the ground truth noise variance is chi square distributed with degrees-offreedom [34]. From this it follows that without regularization scaled by is distributed. When we obtain the cost function as function of only

(12) where the second line is a second-order Taylor approximation of around in which

(8)

(13)

which will be referred to as MODIF estimate in contrast to the MAP estimate obtained with (4). This correction exactly removes the bias in the variance estimate for linear models with Gaussian noise [34]. Alternatively, a prior on can be introduced that can be shown to be equivalent to (8) for linear models and Gaussian noise

(14) (15) (16)

(9) These estimates will be referred to as obtained with PRIOR. For nonlinear models and/or non-Gaussian distributed measure-

With this Taylor approximation the optimum of approximated by a single Newton step

can be (17)

POOT AND KLEIN: DETECTING STATISTICALLY SIGNIFICANT DIFFERENCES IN QUANTITATIVE MRI EXPERIMENTS

Obviously ables. Since

, and therefore also

are random vari, the is in good approximation given by the standard deviation of [35], . Thus we need to derive the covariance of

(22)

and are multiplied by a factor is multiplied by . This implies that when is fixed, the (blurring) effect of the regularization reduces when the number of measurements increases. Note that , the assumption of statistical independence of neighboring voxels, is relevant. When neighboring voxels are not statistically independent, for example due to interpolation (e.g., by zero filling in k-space) or smoothing, the actual degrees-of-freedom from which the regularized noise level field is estimated is reduced. This will not influence however, is not influenced by spatial correlations. By acas counting for the reduced effective degrees-of-freedom due to spatial correlation in (21) and (25) the correct CV can be found. For zero filling in k-space, this amounts to multiplying by the ratio of the number of sampled k-space positions to the total number of k-space positions after zero filling.

(23)

C. Cramér Rao Lower Bound

(24) (25)

The Cramér Rao lower bound (CRLB) is a lower bound on the covariance of unbiased estimates [38], [2]. It is based on the likelihood model of the measurements, which models the acquisition process and the measurement noise. The CRLB allows propagating the modeled measurement noise to uncertainty in and is given by

(26)

(29)

(18) (19) (20) where is the identity matrix of the correct size, the approximation from (18) to the next line is , and (21)

and thus also is independent of the Note that ground truth noise level . Additionally, for any spatial regularization term that is shift invariant and has a local effect, such as the Laplacian, the CV is the same for each voxel. This implies that the diagonal of is constant. The value of this constant can easily be found with the conjugate gradient method [36], [37], avoiding the need to explicitly construct and/or its inverse. With this derivation (27) When a target CV is pre-subscribed for voxels that are not affected by border effects, (27) can be solved with this value to obtain a regularization strength that approximately attains this. The pre-subscribed value should balance the attained precision and the smoothing effect that might be causing a biased estimate. The number of voxels over which effectively is smoothed linearly scales with the degrees-of-freedom of . By assuming is also distributed as scaled distribution when , an (approximate) number of degrees-of-freedom of can be obtained by finding the value of for which the distribution has the same CV as the prediction . distribution is given by with The CV of the . For this CV reduces to . Since even for is within 10% of the actual CV of , we can solve from and obtain (28) Note that from (23) it follows that for each , the is identical. Thus when both

1167

with

the Fisher information matrix

(30) where is the Fisher information in a Rice distributed variable, which depends on the noise free intensities and noise level . When a (vector) function is evaluated, the CRLB of is given by (31) In DTI an example of such function is the FA [23] or a function that selects the diffusion tensor elements from . are not known In practice the ground truth values and and the CRLB is evaluated at and instead. Using instead of has an effect on the CRLB since scales as (for ) [39], . So each term in (30) scales approximately as and thus the CRLB scales, in a good approximation, with . Due to this relation, has to be accurately and precisely known for the CRLB to reliably assess the uncertainty of . D. Testing for Differences With the CRLB In this subsection we derive a test statistic based on the CRLB. With this test the statistical significance of the difference between vectors can be evaluated,

1168

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 34, NO. 5, MAY 2015

where the are two arbitrarily selected voxels of the same or even of different experiments. The null hypothesis is that the two test vectors are different noisy measurements of the same underlying ground truth parameter vector. The alternative hypothesis is that the parameter vectors are different. The test vectors may be acquired in different experiments or with different SNR, therefore the CRLB of these vectors typically will be different. Therefore we compute the test statistic by

We first derive , for which we need the distribution of . The uncertainty in is mainly due to the uncertainty in , which is approximately distributed as scaled (see Subsection II-B, ). Since the CRLB scales in good approximation as is in good approximation a realization of with distributed as . Note that and . When we recall that for any random vector thus with mean and covariance matrix [45], [46], we can derive that

(32) (33) test. However, since which is similar to a Hotelling's is not known a priori, not identical for all , and not a sample covariance matrix, it is not exactly equivalent to any previously derived test statistics [40]–[43]. When would be evaluated with it would describe the covariance of each estimate very well. Then in (32) would be distributed, where is the degrees-of-freedom in , which, except for degenerate cases, is the length of . However, since is estimated with a finite precision, only the expected value of provides an accurate estimate of the covariance of . This uncertainty in has an effect on the distribution of the test statistic . Similarly to [40]–[43] we approximate the null distribution of the test statistic by a distribution , where is the effective degrees-of-freedom of the covariance estimate of the inverse CRLB weighted mean of the . For our test, as derived below

(36)

(37) where the approximation is a first-order Taylor expansion of around . With this approximation, the expected value of and the covariance is given by (38)

(34) where is the degrees-of-freedom in the estimate of (28) and the trace of a matrix . When deriving the effective degrees-of-freedom we encounter the Behrens-Fisher problem. This is the problem of testing vectors for equality without assuming equality of the covariance matrices. To the best of our knowledge, only approximate solutions for the Behrens-Fisher problem have been described in the literature [43]. This problem is already present in the scalar case of the Welch test [44]. To obtain a good estimate of the degrees-of-freedom , we use the definition of the distribution as ratio of two independent distributions, similar to e.g., [43]. A random variable is distributed when

(39) we use a very similar first-order Taylor approxTo compute imation of around . Although in this case the Taylor approximation is not required, we still use it to introduce similar approximation errors in as in . With this approximation we obtain (40) (41) from which it is easy to derive that

and

(35)

(42)

and independent distributed variables with with and degrees-of-freedom, respectively. Note that the stochastic part of is described by and the stochastic part of is described by . By taking the expectation of over and of over we obtain new statistics and whose distributions only depend on and , respectively. By solving for , a good value for can be extracted.

When we now substitute for in (39) , we have all parts required to solve from and obtain (34). Note that and that for the scalar case reduces to the number used in the Welch test [44] when the CRLB is substituted for the observed variances. With these degrees-of-freedom the distribution of the test statistic under the null hypothesis is fully specified, and thus we are able to compute the -values of the test statistic.

POOT AND KLEIN: DETECTING STATISTICALLY SIGNIFICANT DIFFERENCES IN QUANTITATIVE MRI EXPERIMENTS

1169

E. Reference Methods

A. Regularization Strength

In the experiments we compare our proposed noise level estimate to several existing methods. The first reference method is the pooling-of- (POS) method in Andersson et al. [3], where 6 neighbors of voxel are used to refine . Note that this reference method is computationally expensive as essentially the diffusion tensor parameters have to be fitted on seven voxels to obtain . Thus the computational load is approximately multiplied by a factor 7. Next, we used reference methods presented in Landman et al. [10]. From their methods, we use the “standard deviation of uniform voxel intensities” (SDU), “robust variability of model-based leave-one-out residuals” (SRV1), and regularized SRV1 (SRR1) methods and when possible also the “robust variability spatially robust estimation with many repetitions” (SRVN), regularized SRVN (SRRN), and “Generalized maximum likelihood of Rician distributed images” (GML) methods. In the SDU method, for each voxel, the standard deviation in a 5 5 sliding window is evaluated for all images and is then the median over all standard deviations per voxel. The SRV1 method is their proposed method when a model of the measurements is available, but no repeated acquisitions are available. In this method the robust scale estimate is computed on the leave one out residuals, where voxels with a low signal-to-noise ratio (SNR) are left out. The SRR1 method spatially regularizes this SRV1 estimate with a third-order polynomial (see [10] for details). The SRVN method is their proposed method when repeated acquisitions, that is multiple images with the same acquisition settings, are available. The median is computed over from each set of repeated images. The SRRN method is a spatially regularized version of SRVN. The GML method provides a model-free maximum likelihood estimate of the noise level when repeated acquisitions are available. Note that our simulation experiments do not require the robustness provided by the methods of Landman et al. [10] as no outliers were generated in the simulations. For each of the Landman et al. [10] noise level estimates , corresponding were estimated with

The relation between and (27), is validated with estimated by PRIOR from simulated isotropic 100 100 100 3-D images. Validating this relation implicitly tests –5. For this experiment we use a logarithmically spaced range of values for between 0.001 and 100, several between 1 and 50, and the most simple linear model possible, with parameters, as any linear model will have the same relation. With this model and a ground truth Gaussian noise level , images are simulated from which will be computed by the different methods. Image boundary effects are avoided by making the Laplacian, and thereby , periodic over the image domain. Due to the translation invariance of the Laplace operator, the linear signal model, and the constant can be obtained from a spatial sample: . The subscript s is used to indicate that the spatial std of is divided by the spatial mean of , instead of evaluating these over statistically independent measurements. Approximate 95% confidence intervals of the for each pair of and are obtained by repeating each experiment times. The observed will be compared to .

(43) Another class of noise level estimation methods is based on residue resampling methods [47]. Although these methods can have the advantage of being model free, they do not spatially regularize and thus are bound to the precision limits dictated by . Therefore, we do not consider these estimators in the experiments. III. EXPERIMENTS The accuracy of the derived relation between the precision of , and is evaluated in Section III-A. Next, in Section III-B the accuracy and precision of , and the CRLB are studied in a diffusion tensor simulation experiment. Together, these subsections test the (approximate) expressions derived in Section II-B. Next, in Section III-C the estimators are compared on real MRI data and finally, in Section III-D the effect of the different estimators of on the detection power in real MRI data is studied.

B. Diffusion Tensor Simulation Experiment In the DTI experiment ground truth images were constructed with the Numerical Fiber Generator software package (Brain Research Institute, Melbourne, Australia) [48], see Fig. 2. A smoothly varying ground truth noise level field with a mean magnitude of 5.4% of the magnitude of the images was generated [see Fig. 3(a)]. Inside the phantom this resulted in mean standard deviation of the SNR of the diffusion weighted images of . In the optimization procedure the block size was roughly optimized to maximize overall execution speed while reaching the optimum with sufficient precision. The ranged from 3 to 25 over the different iterations. Note that in this experiment . A Monte Carlo experiment was performed with Rician distributed datasets simulated from the noise free ground truth images and . From each dataset and were computed with MODIF and . Additionally, the reference methods were evaluated. For each method, maps of were computed by dividing the std and mean computed over the different realizations. Also maps of the bias and the root mean square error (RMSE) of and were computed. To evaluate how closely the predicted variability (CRLB) of the diffusion tensor elements and FA correspond to the observed variability (and bias), the RMSE over the entire experiment was compared to the square root of the CRLB evaluated on the first simulation. This experiment tests the effect of (slight) violations of and hence also the effect of the violation of on the validity of the approximations –5. C. Noise Field in DTI Next, the noise level estimation method was evaluated on images acquired from a custom-made hardware DTI phantom [49]. This phantom consisted of a pair of crossing fibers, a pair of kissing fibers and set of parallel fibers in a salted water filled clear plastic spherical phantom with a diameter of 160 mm. repetitions of a DTI protocol were acquired to

1170

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 34, NO. 5, MAY 2015

obtain a total of diffusion weighted ( s/mm ), and s/mm images. The images were acquired with a Trio Scanner (3T; Siemens AG, Siemens Medical Solutions, Erlangen, Germany) with a 32-channel head coil. They were multislice single-shot EPI sequences without slice gap, no averaging, in-plane resolution 2 mm 2 mm and slice thickness of 2 mm, 80 slices with an acquisition matrix 128 128 with 95 phase encoding steps, 100% sampling, ms. The FOV was 170 170 166 mm ms, pixel bandwidth 1260 Hz. For each repetition and were computed by each method and the CV was evaluated over the repeats. This experiment tests as well as –5 in acquired data. To investigate potential bias in the methods was computed from all data simultaneously by each estimator, including the SRVN and GML methods, which are estimators designed for in [12]. Since in this case much more data per voxel is available, in most methods any potential bias will be substantially reduced. D. Significance Tests To demonstrate the relevance of improving the noise level estimate for detection of significant differences we performed an experiment. In this experiment we try to detect differences in diffusion properties along a fiber of the hardware phantom using the acquisition described in Section III-C. The differences in the FA and diffusion tensors along the fiber are investigated by, for each repeat, computing the test statistic between all possible pairs of 90 voxels on the fiber. To compare two voxels, the -value of the test statistic (33) was computed using the distribution with evaluated by (34) and given by (28). See Fig. 7(b) for the 90 selected voxels. The accuracy of the null distribution was evaluated by, for each voxel in the fiber, computing the -values between the 12 11/2 different pairs of repeated scans. This experiment implicitly tests –12. The resulting distributions of -values are inspected with the area under the receiver operator characteristic (ROC) curve, assuming the different repeats are equal and the different locations are unequal. Also some high specificity points on the ROC curve are separately inspected and the observed test results are compared to the derived null distribution. As additional reference a null distribution of the FA test statistic is obtained by permutation testing (PERMUTE) [50]. In the permutation test of permutations of with are voxel and all fixed to the value obtained by the profitted (3) with and posed fit with . The -value of the observed absolute FA difference is given by the fraction of absolute FA differences in the permutation test that are larger than the observed difference. Note that this permutation test assumes identical diffusion weighting gradients for both voxels. This is true in this experiment, but in general the two tested voxels may be from acquisitions acquired with different settings. Also note that the “weak distributional assumptions” ([50]) required to justify this permutation are equal and in both voxels (under the null hypothesis). The permutation test is only performed for the FA measure since, due to the comparisons, this permutation test only supports scalar measures and not the full tensor .

Fig. 1. Mean , mean different values of

, and as function of for . The shaded areas indicate 95% confidence bounds.

IV. RESULTS A. Regularization Strength , and Fig. 1 shows the mean , mean . Fig. 1(a) shows that even with the bias correcting prior is biased. The main cause for this bias is the nonlinear relation between standard deviation and variance as is shown by Fig. 1(b), where the mean value for low is closer to one. For and is an unbiased estimator for the variance, so is a biased estimator for the standard deviation. From Fig. 1(c) it is clear that for low and low is very high; i.e., the uncertainty in is so large that is not very useful, underscoring the importance of regularizing the noise level field. Fig. 1(c) also shows the improvement in the precision of when increases. Fig. 1(d) shows for the different values of and . Note that only for low values of and low , the prediction deviates from the observed value by up to 10%. That the deviation is largest in these cases makes sense since the quadratic approximation of the cost function (12) is least valid in these cases as is large and will deviate the most from . Although the difference between the and is largest when is large, the predicted can still be used to detect the cases with unusably high CV. B. Diffusion Tensor Simulation Experiment Fig. 2 shows the effect of the spatially varying noise on the direction of main eigenvector encoded FA map, where only the ground truth and the results for are shown. The images corresponding to the other estimators are visually nearly identical to the images. Fig. 3 shows estimated from the first simulation in the Monte Carlo experiment. This figure clearly shows the superior noise level field obtained by regularizing the estimate. Note that the SDU estimate is quite bad in this

POOT AND KLEIN: DETECTING STATISTICALLY SIGNIFICANT DIFFERENCES IN QUANTITATIVE MRI EXPERIMENTS

Fig. 2. FA map colored by the main eigenvector direction of the ground truth and estimate of the first simulation.

Fig. 3. One slice of the 3-D noise level field used in the simulations and the noise fields fitted for the different values of .

simulation example since only in very few regions the images are homogeneous. Also the SRV1 estimate is very noisy, while the SRR1 estimate is strongly regularized. Additionally note the substantial upward bias in the SRV1 and SRR1 estimates. This upward bias can be noticed in the results in [10] as well. It is caused by the assumption in (3) in [10] that is distributed as . However it only has that distribution when contains asymptotically many measurements. When the number of measurements is finite, the variance of is larger and thus the noise level field overestimated in the SRV1 method. We observed in our experiments an overestimation by a factor 1.66. Therefore, in the remainder of this subsection we corrected the SRV1 and SRR1 noise level estimate by this factor. Note that this factor depends on the used diffusion weighting gradients and number of diffusion weighted images, thus it should be re-evaluated for each acquisition setup separately. Fig. 4A shows the RMSE of as obtained by the Monte Carlo experiment. Observe that estimating the noise level for each voxel independently leads to a high RMSE due to high variability, while estimating a single noise level results in a high RMSE due to bias. By regularizing , better estimates are obtained. The results of the POS method of Andersson et al. [3] are between the results of our method with and . Note that the SRV1 method has a RMSE that is higher than the regularization, while the SRR1 method has a suboptimal RMSE due to bias introduced by the polynomial regularization that is used.

1171

Quantitatively, Table I shows that for the proposed estimator with as well as POS the RMSE is mainly due to the statistical variation of , while for the other estimators a large fraction of the RMSE is due to the bias. The minimum RMSE shows that in all cases there are some voxels in which the RMSE is due to noise only, as the minimum is close to the CV. Note that in this simulation experiment CV is derived from Rice distributed data and the nonlinear diffusion tensor model. Even so, Table I shows that the CV in this experiment is very similar to the of experiment A that has a linear and model and Gaussian distributed measurents. Also, again we see that the CV is predicted well by . Fig. 4B shows the relative bias of the (1, 1) element of the estimated diffusion tensors. Although we only show one tensor element, it was verified that the conclusions derived below also hold for the other elements. As is visible in Fig. 4B(e), using a constant noise level within the ML estimator may cause bias in the estimated parameters. By improving the noise level estimate, this bias is reduced, as is visible by comparing with Fig. 4B(a)–(d). The peak in the bias visible in Fig. 4B(a) coincides with the peak in the noise level field. The increased bias is caused by the low SNR in combination with the low precision of the noise level estimate. This causes incomplete correction of the Rice induced intensity bias. Fig. 4C shows of , thus investigating how closely the CRLB evaluated on the first fit in the Monte Carlo experiment predicts the MSE evaluated over all experiments. Fig. 4C(e) clearly shows poor correspondence when a constant noise level is estimated while the actual noise field is spatially varying. In this experiment with substantial spatial variation in the noise level, estimating the noise level for each voxel independently, as shown in Fig. 4C(a), already brings the CRLB closer to the MSE. Fig. 4C(b)–(d) shows that the correspondence improves further by regularizing the noise level. As additional measure, we also investigated the fractional anisotropy (FA) of the diffusion tensors in Fig. 4D, which shows of the FA. Again this figure shows that the CRLB evaluated on the first fit predicts the MSE best when the noise level is spatially regularized, although even then the MSE is higher than the CRLB when the FA is very low, as in the corners of the presented image. This is due to the well-known upward bias in FA for low FA values [51]. Such bias is not predicted by the CRLB. In this experiment, the mean computation time of a single run on a single core of a (busy) AMD Opteron 6172 CPU was 5:19, 4:00, 5:18, 6:59 min for , respectively and 6:50 min for the POS method. However, note that, especially for the proposed method, computation time can be improved further. C. Noise Field in DTI Fig. 5 shows the noise field estimated from the hardware phantom acquisition with the different methods evaluated on the full data ( images). Although a ground truth is lacking, the noise level is clearly not constant. Fig. 6 shows evaluated over the repeats. The CV itself is noisy due to the still low number of repeats. However, it is important to observe that no structure is visible

1172

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 34, NO. 5, MAY 2015

Fig. 4. (A) , (B) bias in divided by ground truth mean diffusivity, (C) of , (D) of the FA. In C and D, a value 0 indicates that the CRLB evaluated on the first fit is equal to the MSE of the entire Monte Carlo experiment. In columns (a)–(e), the results are shown for five different values of .

TABLE I OBSERVED MINIMUM

AND MEDIAN OF AS WELL AS AND OF THE DIFFUSION TENSOR SIMULATION EXPERIMENT. FOR THE PROPOSED METHOD THESE ARE COMPARED TO AND WITH FROM EXPERIMENT A. NO VALUE FOR IS AVAILABLE FROM EXPERIMENT A FOR

in the CV maps, except for the SDU method. For evaluated on the full data and the mean and CV of across the repeats Table II shows the mean and standard deviation inside a 20 25 1 voxel region shown in blue in Fig. 7(b). For comparison Table II also gives . This table shows that only the and the SDU methods give substantially different noise level estimates when the full data is used. However, when the fit is performed on each repeat separately, the SRV1 and SRR1 methods also give biased noise level estimates. The table also shows that the CV is slightly higher than for . This may be due to some blurring in the acquisition, possibly due to decay, which reduces the effective degrees-of-freedom from which the noise level is estimated. Since the CV appears to be insensitive to the SNR and the imaged structures (see Fig. 6), the elevated CV is probably not due to the Rice distribution nor the nonlinearity of . The next subsection investigates if the deviation of deteriorates the test statistic. CV from

Fig. 5. of the DTI hardware phantom acquisition as obtained by the different methods evaluated on the full data.

Fig. 6. CV of

of the DTI hardware phantom acquistition.

POOT AND KLEIN: DETECTING STATISTICALLY SIGNIFICANT DIFFERENCES IN QUANTITATIVE MRI EXPERIMENTS

TABLE II FOR EACH OF THE ESTIMATORS THE MEAN AND STANDARD DEVIATION IN A ROI IN THE HARDWARE PHANTOM. THE ROI IS BLUE IN FIG. 7(B). FULL: THE COMPUTED ON THE FULL REPEATED DATASET. REP: THE MEAN COMPUTED ON EACH OF THE REPETITIONS INDIVIDUALLY. CV: THE OF OBSERVED COEFFICIENT OF VARIATION OVER THE DIFFERENT REPETITIONS. : THE PREDICTED CV OVER THE REPETITIONS

Fig. 7. DEC-FA and FA map of the hardware phantom. In the FA map the selected voxels are colored. 7(c) shows the mean FA and 95% confidence interval of the FA measurements along the selected line as estimated by the different estimators.

D. Significance Tests Fig. 7 shows the direction of main diffusion encoded FA (DEC-FA) map of the phantom and FA map with a selection overlaid. The 78 selected voxels in one of the nearly straight fibers in this phantom are used for the significance tests. The experimentally observed null distribution of the test statistic (33) is obtained from all selected voxels of the different repetition pairs in the acquisition. Since there is no motion or physiological effect, the only difference between the different repeats of the scan should be due to thermal noise of which the level in each repetition is estimated by the different methods. The -values of the observed test statistics are computed with the F distribution

1173

with the appropriate and given by (34). The fraction of -values below the thresholds (false positive rate) both for the FA and the diffusion tensor is presented in the columns “null” in Tables III and IV. For the null distribution, the observed fraction should be . The tables show that the observed null distribution is reasonably close to the theoretical distribution for with MODIF for moderate values of and as well as for the POS estimator of Andersson et al. [3] and very close for the permutation test. This demonstrates that, for these values of , the effect of all approximations that were used to derive the theoretical null distribution, the slight bias of the estimates of and , and unmodelled parts of the acquisition process in this experiment together do not cause large differences between the actual distribution of differences in estimated parameters and the theoretically derived distribution. For the other estimators, the null distribution differs. To investigate the sensitivity of the test to small changes in the FA and the diffusion tensor, the test statistic is computed between all pairs of selected voxels in all repetitions, where each pair in the test is selected from a single repetition. The small changes of the FA and diffusion tensor along the fiber cause some pairs of voxels to be detected as significantly different from each other. The area under the ROC curve between the observed null and test distributions is given in the column “AUC” of Tables III and IV. The fraction of detected differences from the theoretical null distribution is presented in the columns “test” in these tables. To more clearly see the sensitivity differences induced by the differences in precision of the columns “cal” show the sensitivity of the test statistic when the specificity, as evaluated with the observed null distribution, was calibrated to the nominal values. That is, the -value threshold was adjusted such that the false positive rate was . These columns show that the sensitivity of the test is maximal for the proposed method when is regularized with . The permutation test is unable to detect changes in FA in this experiment, with and sensitivity not exceeding the false positive rate . V. DISCUSSION AND CONCLUSION Our results demonstrate a substantial reduction of the coefficient of variation (CV) of the noise level when regularization is applied. The regularized noise level estimation also improves the accuracy of the diffusion tensor estimates , but does not substantially modify their precision. Potentially, the precision of can be improved by also regularizing [with in (4)]. This will be a topic of further research. The main benefit of the simultaneous estimation of and lies in the subsequent analyses that require an accurate and precise to statistically compare . The proposed noise level estimation method contains two traditional methods in which a single global noise level or a per voxel estimate are obtained as limit cases. Both are shown to be suboptimal compared to intermediate regularization strengths (bias versus variance tradeoff). Also, our new method compares favorably to other recently proposed noise level estimation methods, both in computation time as well as precision. The presented relation between the CV of and the

1174

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 34, NO. 5, MAY 2015

TABLE III FOR DIFFERENCES. AUC: AREA UNDER THE ROC CURVE. THEN FOR THREE SIGNIFICANCE TESTING THE DIFFUSION TENSOR LEVELS , THE FRACTION OF SIGNIFICANT DIFFERENCES IN THE OBSERVED NULL DISTRIBUTION (NULL), TEST DISTRIBUTION (TEST), AND FOR THE TEST DISTRIBUTION CALIBRATED WITH THE NULL DISTRIBUTION (CAL)

TABLE IV TESTING FA

FOR DIFFERENCES. AUC: AREA UNDER THE ROC CURVE. THEN FOR THREE SIGNIFICANCE LEVELS THE FRACTION OF SIGNIFICANT DIFFERENCES IN THE OBSERVED NULL DISTRIBUTION (NULL), TEST DISTRIBUTION AND FOR THE TEST DISTRIBUTION CALIBRATED WITH THE NULL DISTRIBUTION (CAL)

regularization strength allows an informed a priori choice of this regularization strength. The real MR acquisitions demonstrated the spatial variations of the noise level that are present in modern MR acquisitions. The experiments showed that the proposed methods provide accurate and precise , even when the number of acquired images is small. Also, it was shown that the predicted can be reliably used to derive the null distribution of a test statistic that is based on the Cramér Rao lower bound (CRLB). As was shown, the -value derived from this test quite accurately describes the chance of occurrence in the observed null distribution. Thus, the developed test can be reliably used to detect statistically significant differences between voxels within or across examinations. Regularizing the noise level that is used in this test improves its sensitivity. The test enables true voxelwise comparisons between baseline and follow-up examinations as well as, for example, detection of differences in tissue properties along a white matter fiber tract. In patient examinations the motion compensation hinders application of the test statistic since it requires interpolating the images. All commonly used interpolation methods have a nonzero blurring effect, of which the effect on the effective degrees-of-freedom is typically difficult to compute. A pragmatic solution is to first zero extend the k-space of the original images by a known factor and subsequently use an interpolation method which does not introduce any significant additional blurring. Then, the correct value of the effective degrees-of-freedom can be derived by multiplying (28) with the ratio of the number of sampled k-space position to the total number of k-space positions after zero filling.

, (TEST),

We compared our noise level estimation method to several published noise level estimation methods. For a specific value , the results of , in this application between and of the method of Andersson et al. [3] are similar to our results. However, the results show that, in the current application, higher values of provide a better estimate of the noise level and more sensitive tests. Also, the relations derived in Section II-B allow choosing a good value of by trading off CV and bias. Additionally, our method is computationally slightly less expensive than the method of Andersson et al. While the methods of Landman et al. are robust against outliers, in our experiments they did not succeed to accurately estimate the noise level when was small. This introduced a bias in the number of images the diffusion tensor estimates and made the -values computed from the test statistic unreliable. The sensitivity and AUC of the permutation test were low. This is due to a large spread of FA values in the null distribution created by the permutation test, which might be due to violation of the “weak distributional assumptions” by differences in , MD, or orientation of the diffusion tensor. There are other methods that also quantify the uncertainty, such as the bootstrap and Wild bootstrap methods [47], [52]. These methods will behave similarly to the methods of Landman et al. since the noise level is estimated only from the different measurements within a voxel. Thus, while bootstrap and Wild bootstrap results are accurate, they will also display the large variability that is observed in our experiments when setting or in the SRV1 method. The derivation of was based on Gaussian noise and linear models. However, as the results show, the derived relation also

POOT AND KLEIN: DETECTING STATISTICALLY SIGNIFICANT DIFFERENCES IN QUANTITATIVE MRI EXPERIMENTS

accurately predicts the observed coefficient of variation even when a nonlinear model and a sufficient number of Rician distributed measurements are used. The Rician distribution was used as example of the non-Gaussian noise that is present in MR images. Future work may investigate if similar relations can be derived for noise models such as the noncentral chi distribution of GRAPPA reconstructions. In conclusion, we proposed a novel method to estimate the noise level and diffusion tensor parameters simultaneously, with spatial regularization of the noise level field. Additionally, we provide a test that uses this noise level estimate to test the statistical significance of differences between any pair of voxels. The experiments demonstrated that with proper regularization the CRLB of the parameter estimates accurately and precisely corresponds to the mean squared error evaluated over many experiments, thereby validating the use of the CRLB to determine confidence intervals. This is made explicit in the proposed test statistic which can be used to obtain the statistical significance of differences between voxels.

REFERENCES [1] Quantitative MRI of the Brain: Measuring Changes Caused by Disease, P. Tofts, Ed.. New York: Wiley, 2004. [2] J. Sijbers, A. J. den Dekker, E. Raman, and D. Van Dyck, “Parameter estimation from magnitude MR images,” Int. J. Imag. Syst. Technol., vol. 10, no. 2, pp. 109–114, 1999. [3] J. L. R. Andersson, “Maximum a posteriori estimation of diffusion tensor parameters using a Rician noise model: Why, how and but,” NeuroImage, vol. 42, pp. 1340–1356, 2008. [4] S. Aja-Fernández, A. Tristán-Vega, and W. S. Hoge, “Statistical noise analysis in GRAPPA using a parametrized noncentral chi approximation model,” Magn. Reson. Med., vol. 65, no. 4, pp. 1195–1206, 2011. [5] S. N. Sotiropoulos et al., “Advances in diffusion MRI acquisition and processing in the human connectome project,” NeuroImage, vol. 80, pp. 125–143, 2013. [6] J. Sijbers, D. H. J. Poot, A. J. den Dekker, and W. Pintjens, “Automatic estimation of the noise variance from the histogram of a magnetic resonance image,” Phys. Med. Biol., vol. 52, pp. 1335–1348, Feb. 2007. [7] J. Sijbers, A. J. den Dekker, M. Verhoye, J. Van Audekerke, and D. Van Dyck, “Estimation of noise from magnitude MR images,” Magn. Reson. Imag., vol. 16, no. 1, pp. 87–90, 1998. [8] S. Aja-Fernández, A. Tristán-Vega, and C. Alberola-López, “Noise estimation in single- and multiple-coil magnetic resonance data based on statistical models,” Magn. Reson. Imag., vol. 27, no. 10, pp. 1397–1409, 2009. [9] O. Dietrich, J. G. Raya, S. B. Reeder, M. F. Reiser, and S. O. Schoenberg, “Measurement of signal-to-noise ratios in MR images: Influence of multichannel coils, parallel imaging, and reconstruction filters,” J. Magn. Reson. Imag., vol. 26, no. 2, pp. 375–385, 2007. [10] B. Landman, P. L. Bazin, S. A. Smith, and J. L. Prince, “Robust estimation of spatially variable noise fields,” Magn. Reson. Med., vol. 62, pp. 500–509, 2009. [11] I. I. Maximov, E. Farrher, F. Grinberg, and N. J. Shah, “Spatially variable Rician noise in magnetic resonance imaging,” Med. Image Anal., vol. 16, pp. 536–548, 2012. [12] B. A. Landman, P.-L. Bazin, and J. L. Prince, “Estimation and application of spatially variable noise fields in diffusion tensor imaging,” Magn. Reson. Imag., vol. 27, no. 6, pp. 741–751, 2009. [13] R. Salvador et al., “Formal characterization and extension of the linearized diffusion tensor model,” Human Brain Mapp., vol. 24, no. 2, pp. 144–155, 2005. [14] H. Boisgontier, V. Noblet, F. Heitz, L. Rumbach, and J.-P. Armspach, “Generalized likelihood ratio tests for change detection in diffusion tensor images: Application to multiple sclerosis,” Med. Image Anal., vol. 16, no. 1, pp. 325–338, 2012. [15] P. J. Basser, J. Mattiello, and D. LeBihan, “Estimation of the effective self-diffusion tensor from the NMR spin echo,” J. Magn. Reson. Imag., vol. 103, pp. 247–254, 1994.

1175

[16] P. J. Basser and D. K. Jones, “Diffusion-tensor MRI: Theory, experimental design and data analysis—A technical review,” NMR Biomed., vol. 15, pp. 456–467, 2002. [17] D. LeBihan, “Looking into the functional architecture of the brain with diffusion MRI,” Nature Rev., vol. 4, pp. 469–480, 2003. [18] M. M. Mielke et al., “Regionally-specific diffusion tensor imaging in mild cognitive impairment and Alzheimer's disease,” NeuroImage, vol. 46, no. 1, pp. 47–55, 2009. [19] B. D. Peters, J. Blaas, and L. de Haan, “Diffusion tensor imaging in the early phase of schizophrenia: What have we learned?,” J. Psychiatric Res., vol. 44, no. 15, pp. 993–1004, 2010. [20] M. Langen et al., “Fronto-striatal circuitry and inhibitory control in autism: Findings from diffusion tensor imaging tractography,” Cortex, vol. 48, no. 2, pp. 183–193, 2012. [21] M. M. van der Graaff et al., “Upper and extra-motoneuron involvement in early motoneuron disease: A diffusion tensor imaging study,” Brain, vol. 134, no. 4, pp. 1211–1228, 2011. [22] C. A. Sage et al., “Quantitative diffusion tensor imaging in amyotrophic lateral sclerosis: Revisited,” Human Brain Mapp., vol. 30, no. 11, pp. 3657–3675, 2009. [23] P. J. Basser and C. Pierpaoli, “Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI,” J. Magn. Reson. B, vol. 111, no. 3, pp. 209–219, June 1996. [24] S. Mori, Introduction to Diffusion Tensor Imaging. : Elsevier, 2007. [25] D. C. Alexander, “Chapter Modelling, Fitting and Sampling in Diffusion MRI,” in Visualization and Processing of Tensor Fields. Berlin, Germany: Springer, Mar. 2009, pp. 3–20. [26] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger, “Sense: Sensitivity encoding for fast MRI,” Magn. Reson. Med., vol. 42, no. 5, pp. 952–962, 1999. [27] J. Rajan, J. Veraart, J. Van Audekerke, M. Verhoye, and J. Sijbers, “Nonlocal maximum likelihood estimation method for denoising multiple-coil magnetic resonance images,” Magn. Reson. Imag., vol. 30, no. 10, pp. 1512–1518, 2012. [28] R. M. Henkelman, “Measurement of signal intensities in the presence of noise in MR images,” Med. Phys., vol. 12, no. 2, pp. 232–233, 1985. [29] H. Gudbjartsson and S. Patz, “The Rician distribution of noisy MRI data,” Magn. Reson. Med., vol. 34, pp. 910–914, 1995. [30] D. H. J. Poot, J. Sijbers, A. J. den Dekker, and W. Pintjens, “Automatic estimation of the noise variance from the histogram of a magnetic resonance image,” in Proc. IEEE/EBMS Benelux Symp., Dec. 2006, pp. 135–138. [31] S. Aja-Fernández, C. Alberola-López, and C. F. Westin, “Noise and signal estimation in magnitude MRI and Rician distributed images: A LMMSE approach,” IEEE Trans. Image Process., vol. 17, no. 8, pp. 1383–1398, Aug. 2008. [32] J. Nocedal and S. J. Wright, “Springer Series in Operations Research,” in Numerical Optimization, 2nd ed. New York: Springer, 2000. [33] D. R. Cox and E. J. Snel, “A general definition of residuals,” J. R. Stat. Soc. Ser. B Methodol., vol. 30, no. 2, pp. 248–275, 1968. [34] R. D. Cook and S. Weisberg, Residuals and Influence in Regression. New York: Chapman Hall, 1982. [35] R. C. Lewontin, “On the measurement of relative variability,” Systematic Zool., vol. 15, no. 2, pp. 141–142, Jun. 1966. [36] M. R. Hestenes and E. Stiefel, “Methods of conjugate gradients for solving linear systems,” J. Res. Nat. Bur. Stand, vol. 49, no. 6, pp. 409–436, 1952. [37] J. R. Shewchuk, An introduction to the conjugate gradient method without the agonizing pain Carnegie Mellon Univ., Pittsburgh, PA, USA, Tech. Rep., 1994. [38] A. van den Bos, Parameter Estimation for Scientists and Engineers. New York: Wiley, Aug. 2007. [39] M. W. A. Caan et al., “Estimation of diffusion properties in crossing fiber bundles,” IEEE Trans. Med. Imag., vol. 29, no. 8, pp. 1504–1515, Aug. 2010. [40] Y. Yao, “An approximate degrees of freedom solution to the multivariate Behrens Fisher problem,” Biometrika, vol. 52, no. 1/2, pp. 139–147, 1965. [41] S. Johansen, “The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression,” Biometrika, vol. 67, no. 1, pp. 85–92, 1980. [42] D. G. Nel and C. A. Van Der Merwe, “A solution to the multivariate Behrens-Fisher problem,” Commun. Stat. Theory Methods, vol. 15, no. 12, pp. 3719–3735, 1986. [43] K. Krishnamoorthy and J. Yu, “Modified Nel and Van der Merwe test for the multivariate Behrens-Fisher problem,” Stat. Probabil. Lett., vol. 66, no. 2, pp. 161–169, 2004.

1176

[44] B. L. Welch, “The generalization of ‘student's’ problem when several different population variances are involved,” Biometrika, vol. 37, no. 1/2, pp. 28–35, Jan. 1947. [45] K. B. Petersen and M. S. Pedersen, Nov. 2012, The Matrix Cookbook [Online]. Available: http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=3274 [46] G. Seber and A. Lee, “Wiley series in probability and statistics,” in Linear Regression Analysis. New York: Wiley, 2002. [47] B. Whitcher, D. S. Tuch, J. J. Wisco, A. G. Sorensen, and L. Wang, “Using the wild bootstrap to quantify uncertainty in diffusion tensor imaging,” Human Brain Mapp., vol. 29, no. 3, pp. 346–362, Mar. 2008. [48] T. G. Close et al., “A software tool to generate simulated white matter structures for the assessment of fibre-tracking algorithms,” NeuroImage, vol. 47, no. 4, pp. 1288–1300, 2009.

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 34, NO. 5, MAY 2015

[49] P. Pullens, A. Roebroeck, and R. Goebel, “Ground truth hardware phantoms for validation of diffusion-weighted MRI applications,” J. Magn. Reson. Imag., vol. 32, pp. 482–488, 2010. [50] T. E. Nichols and A. P. Holmes, “Nonparametric permutation tests for functional neuroimaging: A primer with examples,” Human Brain Mapp., vol. 15, pp. 1–25, 2001. [51] J. A. D. Farrell et al., “Effects of signal-to-noise ratio on the accuracy and reproducibility of diffusion tensor imaging-derived fractional anisotropy, mean diffusivity, and principal eigenvector measurements at 1.5T,” Journal of Magnetic Resonance Imaging, vol. 26, no. 3, pp. 756–767, 2007. [52] B. Efron and R. Tibshirani, An Introduction to the Bootstrap. New York: Chapman Hall, 1993.

Detecting statistically significant differences in quantitative MRI experiments, applied to diffusion tensor imaging.

In this work we present a framework for reliably detecting significant differences in quantitative magnetic resonance imaging and evaluate it with dif...
3MB Sizes 0 Downloads 6 Views