Magnetic Resonance Imaging 32 (2014) 132–149

Contents lists available at ScienceDirect

Magnetic Resonance Imaging journal homepage: www.mrijournal.com

Maximum likelihood estimation for second level fMRI data analysis with expectation trust region algorithm Xingfeng Li ⁎, Damien Coyle, Liam Maguire, Thomas Martin McGinnity Intelligent Systems Research Centre, University of Ulster, Magee Campus, Derry, BT487JL Northern Ireland, UK

a r t i c l e

i n f o

Article history: Received 18 June 2012 Revised 6 August 2013 Accepted 11 October 2013 Keywords: Mixed effect model Second level fMRI data analysis Variance analysis Trust region algorithm Maximum Log Likelihood (LL) estimation

a b s t r a c t The trust region method which originated from the Levenberg–Marquardt (LM) algorithm for mixed effect model estimation are considered in the context of second level functional magnetic resonance imaging (fMRI) data analysis. We first present the mathematical and optimization details of the method for the mixed effect model analysis, then we compare the proposed methods with the conventional expectationmaximization (EM) algorithm based on a series of datasets (synthetic and real human fMRI datasets). From simulation studies, we found a higher damping factor for the LM algorithm is better than lower damping factor for the fMRI data analysis. More importantly, in most cases, the expectation trust region algorithm is superior to the EM algorithm in terms of accuracy if the random effect variance is large. We also compare these algorithms on real human datasets which comprise repeated measures of fMRI in phased-encoded and random block experiment designs. We observed that the proposed method is faster in computation and robust to Gaussian noise for the fMRI analysis. The advantages and limitations of the suggested methods are discussed. © 2014 Elsevier Inc. All rights reserved.

1. Introduction The mixed model or mixed effect model [1] is commonly applied for second level functional magnetic resonance imaging (fMRI) data analysis [2–4]. This method has the advantage of accounting for the fixed effect and (individual) random effect within one model, therefore, the fMRI response effects and the covariance of the response from different runs/subjects can be estimated within one model simultaneously. In theory, second level data analysis alone is not possible because both the first level and second level must be combined within one mixed model, and model parameters (fixed effect variance, random effect variance, and predictors) can be encompassed and estimated within the framework of the generalized linear mixed model (GLMM) [5,6]. Because of the hierarchical nature of the fMRI dataset, both first and higher levels of the model parameters can be estimated within one GLMM using iterative schemes. However, this is not practical for fMRI data analysis because of the computational burden of fitting GLMM at every voxel is very expensive. As a result, most second/higher level fMRI data analysis [2–4,7,8] are separated from the first level or only consider the first level variance and its projection onto the second level. In the implementation of the

⁎ Corresponding author. Tel.: +44 28 71 67 51 95; fax: +44 28 71 37 52 54. E-mail address: [email protected] (X. Li). 0730-725X/$ – see front matter © 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.mri.2013.10.007

second level analysis, expectation-maximization (EM) [9,10] and Newton-Raphson (NR) algorithms are often employed [3,4,11,12] to estimate the variance parameters in the (log) likelihood (LL) function [13,14]. EM methods often have fairly simple forms and hence are easy to program, however, they do not take into account the second order gradient (Hessian matrix). Furthermore, EM offers computational advantages over the NR-type method, as it does not have to directly evaluate the full likelihood function. Previous studies have compared EM with NR methods for the mixed effect model [15], and concluded that in most situations a well-implemented NR algorithm is preferable to the EM algorithm. However, the NR method is sensitive to initial parameter values, although trust region optimization approaches such as the Levenberg–Marquardt (LM) [16–18] method have been proposed to circumvent this limitation in numerical analysis [19]. To the best of the authors’ knowledge, the effectiveness of this method has not been tested for second level fMRI data analysis previously. In addition, mixed model estimation can be considered as an inverse problem, and we may also wish to regularize the variance parameter to avoid an ill-posed inverse problem which may affect the accuracy of the estimation. Though this problem can be dealt with using LM iteration [20,21], there is no report of the LM algorithm performance for the variance regularization in the fMRI data analysis. The objectives of this study are three-fold. First, we apply maximum likelihood estimation method to fMRI second level fMRI analysis. The second purpose is to evaluate the feasibility of the trust

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

region and expectation trust region algorithms for fMRI data analysis. Thirdly, we compare EM with trust region and expectation trust region algorithms for second level analysis. Finally, we report our results from different algorithms (EM, trust region, and expectation trust region) for the analysis based on both synthetic datasets and real human fMRI datasets. 2. Method

where R is the matrix with the variance from the first level analysis in 2 the diagonal and 0 elsewhere. σrandom is the random effects variance from the second level fMRI data analysis. If we compare different groups, patients, we can set the design matrix to  i.e., controls with  0 0 ⋯ 1 1 T , where 1 in the top row corresponds to the X¼ 1 1 ⋯ 0 0 total number of controls (n1), and 1 in the bottom row corresponds to the total number of patients (n2). In this case and considering unstructured covariance, the variance has the following form:

2.1. First level data analysis



There are three datasets used in this study. The first and second datasets were produced in a previous study [22–24]. Briefly, two experiments were run with three visual stimuli at different relative contrasts. Both experiments were conducted binocular (for details of the data collection, see Appendix A or [22]). For the first experiment/dataset, the stimuli were matched in contrast appearance (i.e., scaled in multiples of their own detection thresholds and so the contrasts presented were: Achromatic (Ach) = 11%, redgreen (RG) = 4%, blue-yellow (BY) = 30%). The only difference between the second experiment and the first experiment is that the second experiment/dataset had the three stimuli (Ach, RG and BY) at approximately the same cone contrasts (5-6%). For the first level fMRI data analysis, a two-gamma function gðt Þ ¼ ðt=d 1 Þa1 expð−ðt−d 1 Þ=b1 Þ−cðt=d 2 Þa2 expð−ðt−d 2 Þ=b2 Þ with typical parameters (a 1 = 6, a 2 = 12, b 1 = b 2 = 0.9, c = 0.35) was employed to model the hemodynamic function in the general linear model (GLM) [2,25]. Linear drift and baseline were excluded by the zero and first order polynomials within GLM. Taking the response to the Ach stimuli as an example, the contrast vector within GLM was set to [1 0 0 0 0] for the detection of the brain response of the Ach effect, where 1 corresponds to the model of hemodynamic function in response to the Ach stimulus, and 0 correspond to the RG, BY, and drifts. Similarly, the RG/BY stimulus response is set to [0 1 0 0 0]/[0 0 1 0 0] in this study. For the third dataset, the experiment was carried out monocularly (for details of the data collection, see Appendix A or [26]). In the first level fMRI data analysis of this dataset, online sinusoidal models obtained from the Fast Fourier transform (FFT) of the time series were adopted to model the hemodynamic response [26,27]. This approach overcomes the limitation of GLM which requires a predefined model for all regions. Linear drifts were excluded by zero and first order polynomials. The contrast vector within GLM was set to [1 0 0] for detection of activation in response to the stimuli, where 1 corresponds to the model of the hemodynamic response function. For all the datasets in the first level data analysis, first order autoregression (AR) models were applied to model the error term.

133

R1 Σ ¼RþQ ¼ 0

 " 2 0 σ1 þ R2 σ 12

# σ 12 ⊗A σ 22

ð3Þ

where R1 and R2 is the control and patient groups fixed effect variance with diagonal and 0 elsewhere from the first level analysis, respectively; ⊗ is Kronecker product, A is the relationship matrix, in our study we set A = I with appropriate dimensions (see Appendix B); σ12, σ22, and σ12 are the random effects variance/ covariance of the two groups for the second level fMRI data analysis. These are the parameters that need to be identified. 2.3. Hessian and information matrices for second level data analysis From multivariate statistical analysis [28], we know that the probability density of the data y in Eq. (1) has the form of the multivariate normal distribution [29], i.e., −n=2

f ðy; β; ΣÞ ¼ ð2πÞ

jΣ j

−1=2

  1 T −1 exp − ðy−XβÞ Σ ðy−XβÞ 2

ð4Þ

Taking the natural logarithm of the expression on the right side of Eq. (4) yields the LL of β and Σ given the observed data (y,X). Because the logarithm is a monotonic strictly increasing function, maximizing the likelihood is equivalent to maximizing the LL [6,10,28]: ‘ðβ; ∑; yÞ ¼ −

n 1 1 T −1 lnð2πÞ− lnjΣj− ðy−XβÞ Σ ðy−XβÞ 2 2 2

ð5Þ

From matrix theory [6] and using:   ∂ lnjΣj −1 ∂Σ ¼ tr Σ ∂θ ∂θ

ð6Þ

∂Σ−1 −1 ∂Σ −1 ¼ −Σ Σ ∂θ ∂θ

ð7Þ

2.2. Variance estimation in the second level data analysis

∂ðy−XβÞT Σ−1 ðy−XβÞ T −1 ¼ −2X Σ ðy−XβÞ ∂β

ð8Þ

For each voxel, the second level model for fMRI data analysis is [2]:

and representing Eq. (5)) as:

y ¼ Xβ þ ε

ð1Þ

where y is the estimated effect from the first level with E(y) = Xβ, var(y) = Σ, where X is n × p design matrix, and Σ = R + Q. If we average different subjects, i.e., we set X ¼ ½ 1 1 ⋯ 1 T , the 2 covariance components Q = σrandom I [2], where I is the n × n identity matrix, and n is total number of runs/subjects to be combined within the mixed model, and 2

Σ ¼ R þ σ random I

ð2Þ

S1 ¼

∂Σ ∂θ

¼ Σθ , the score function is expressed (from

∂‘ 1  −1  1 T −1 −1 ¼ − tr Σ Σθ þ ðy−XβÞ Σ Σθ Σ ðy−XβÞ 2 2 ∂θ

ð9Þ

For example, if we average different runs/subjects, we substitute Eq. (2) into Eq. (9) to estimate the random effect in the second level 2 analysis, i.e., θ = σrandom , and we have ∂Σ ∂Σ ¼ Σθ ¼ I ¼ ∂θ ∂σ 2random

ð10Þ

134

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

Therefore, the score function S1 in Eq. (9) for averaging different runs/subjects becomes: S1 ¼

i   1h −1 T −1 −1 −tr Σ þ ðy−XβÞ Σ Σ ðy−XβÞ 2

ð11Þ

From Eq. (5), calculating the LL partial derivative to β yields S2 ¼

∂‘ T −1 ¼ X ∑ ðY −XβÞ ∂β

ð12Þ

The corresponding Hessian matrix is (using Eqs. (2), (10), and (11)): ∂S 1 ∂S 1 ¼ ∑θ ∂σ 2random ∂∑     h i 1 −1 −1 T −1 3 −2ðy−XβÞ Σ ¼ tr Σ Σ ðy−XβÞ 2

H 11 ¼

ð13Þ

and we obtain H 22 ¼

∂S 2 T −1 ¼ −X ∑ X ∂β

ð14Þ

We may take the expectation of the Hessian matrix H by substituting E(y − Xβ) = 0 into Eq. (13), which leads to the information matrix as F ¼ E ðH Þ ¼

   1 −1 2 tr Σ 2

ð15Þ

Similarly, the Hessian and information matrices for different group comparisons can be calculated as detailed in Appendix B. 2.4. T statistics for the estimation From Eq. (12), we can solve for S2 = 0 and get the optimal estimation of β [6,29]:   ^ ¼ X T Σ−1 X −1 X T Σ−1 y β

ð16Þ

ð17Þ

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   ^ CT C V^ ar β

ð18Þ

with a nominal v = n − rank(X) df. 2.5. Iterative trust region method for ML estimation The NR algorithm has been used to solve maximum likelihood estimation problems [10], the iteration algorithm can be expressed as ðkþ1Þ

^ θ

 ðk Þ   ðk Þ  ðk Þ −1 ^ θ ¼^ θ −H S ^ θ

ð19Þ

  ðk Þ is the score where H is the Hessian matrix from Eq. (13); S ^ θ function (Eq. (11)). In the literature relating to trust region algorithms, the LM algorithm is often employed, the method

ðk Þ

−1

¼^ θ −ðH þ λ⋅diagðH ÞÞ

 ðk Þ   ðk Þ  ^ θ S ^ θ

ð20Þ

where diag is the diagonal matrix; and λ N 0(λ b 0 if we want to maximizing LL) is sufficiently large so that it can reduce the  size of  ðk Þ ðk Þ . step and rotate it slightly in the direction of S ^ θ the ^ θ Because we need to vary the LM parameter as the iteration progresses, the method is called a trust region algorithm in numerical analysis [17,18]. For the expectation trust region method, we replace the Hessian matrix (Eq. (14)) with the information matrix (Eq. (15)). It is easy to see that the LM algorithm is the combination of the gradient decent method and the NR method, since it is well known that the gradient method has the following form: ðkþ1Þ

^ θ

ðk Þ

−1

¼^ θ −ðλ⋅diagðH ÞÞ

 ðk Þ   ðk Þ  ^ S ^ θ θ

ð21Þ

We can see that the LM algorithm in Eq. (20) is the average between Eqs. (19) and (21). In the trust region method, λ is also called the trust region parameter, which we found is better to increase in each iteration for second level fMRI data analysis. From function optimization theory, if we want to maximise LL function (Eq. (5)), we must ensure matrix H + λ ⋅ diag(H) in Eq. (20) is negative definite. This can be realized by choosing λ properly, because of the flexibility in the choice of λ [18]. This can be achieved by setting the initial value of λ using the following rule: if H N 0, λ b − max(diag(H)), else λ N abs(max(diag(H))), where abs denotes the absolute value of a number. Then the following update algorithm is used: ^ θ

Finally, the effects defined by contrast vector C in β can be ^ and the T statistic is [2]: estimated as E ¼ C β T ¼ E=

ðkþ1Þ

^ θ

ðkþ1Þ

and its estimated variance matrix is: !    −1 ∂2 ‘ −1 −1 T −1 ^ ^ ¼−ðH 22 Þ ¼ X Σ X V ar β ¼ −E ∂β

provides a solution to the problem of nonlinear least squares estimation [16,17,19] and has been applied in mixed model analysis [6,30,31]. The LM algorithm is more robust than the NR method, which suggests that in many cases it finds a solution even if it starts very far off the final maximum. The idea of LM is to introduce a damping factor λ to adjust the Hessian matrix at each iteration in Eq. (19), as them is Eq. (20):

ðk Þ

−1

¼^ θ −ðH þ λ⋅diagðH ÞÞ

 ðk Þ   ðk Þ  ^ θ S ^ θ þV

ð22Þ

where V is a vector that satisfies V T(H + λI) − 1S(θ) ≥ 0. This condition ensures that ‖V T(H + λI) − 1S(θ)‖ ≥ ‖θ (k + 1) − θ (k)‖. In fMRI analysis, setting V = 0 (i.e., Eq. (22) with proper selection of λ) is enough to ensure the LL function increases in each iteration. 2.6. Numerical implementation details for trust region algorithm and EM algorithm MATLAB (www.mathworks.com) was employed to implement expectation trust region and EM algorithms. For easy of comparison with EM algorithm, we adopted the fixed effect variance baseline remove method as suggested previously [2], i.e., we subtracted min(R) = Rmin from R in Eqs. (2) and (3), which gives an equivalent model for y with variance:   2 ðR−R min Þ þ σ random þ R min

ð23Þ

then we subtract Rmin from R to get the random effect. After the variance baseline has been removed, we apply the (expectation) trust region algorithm. The first step when using the algorithm is to estimate the initial values for the parameters vector θ =

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149 2 [σrandom ,β]0T for the group average (group comparison can be done in the same way). We set λ0 = 1000, by starting with some initial value, i.e.,

2

σ random

ð0Þ

T

¼ y RI y=v

ð24Þ

where v = n− rank (X) is the df and

135

Σ = σ 2 in Eq. (5) and omitting the constant term in the LL which leads to:     1 ðy−βÞ2 1 2 2 ‘ y; β; σ ¼ − log σ − 2 2 σ2

ð29Þ

ð25Þ

where y is the observed data (the same as in Eq. (1)), i.e., the effect from the first level fMRI analysis; σ 2 is the random effect variance, and β is the predictor (mean value), these are the parameters to be estimated. To apply the proposed method, we need to calculate the score function as:

ð26Þ

S1 ¼

∂‘ 1 ðy−βÞ2 ¼− 2þ 2 2σ 4 ∂σ 2σ

ð30Þ

The algorithm for implementing the expectation trust region method can be described as follows (the same algorithm can be applied for group comparison):

S2 ¼

∂‘ ðy−βÞ ¼ ∂β σ2

ð31Þ

2(0) 1. β (1) = β0; λ0 = 1000; from σrandom and Eq. (2) to get ∑0; and k = 0; 2. While (k b 100)

Set S2 = 0, we obtain the optimal estimation for β = y. The next step is to calculate the Hessian matrix for iterative estimation of variance:



T

RI ¼ I −X X X



−1

T

X :

From Eq. (16), we then get the initial value for β: ð0Þ

β

 −1 T −1 T −1 ¼ X ∑0 X X ∑0 Y :

a) k = k + 1; b) Calculate the score function S1 and Hessian matrix H11 according to Eqs. (11) − (13); or F = E(H) from Eq. (15) for the expectation trust region algorithm. c) Estimate predictor β from Eq. (16); d) If k = 1, select λ; applying the rule to make sure H11 + λ ⋅ diag(H 1 1 ) b 0; i.e., if H 1 1 N 0, using λ = − λ 0 ⋅ max(abs(diag(H 11 ))), otherwise H 11 b 0, set λ = λ 0 ⋅ diag(H11). ðkþ1Þ ^ e) Apply Eq. (22) to calculate σ 2random ; ðkþ1Þ ðk Þ ^ ^ f) Calculate variance change, i.e., st ¼ σ 2random −σ 2random ; g) Update the trust region using λ = 2λ in each iteration, i.e., change the region size. h) Compute vector norm of st, if norm(st) b 10e − 100, convergence has secured, break the loop, otherwise do: ^

σ 2random

ðkþ1Þ

^

¼ σ 2random

ðk Þ

, i.e., repeat until convergence. 3. end ðk Þ ðkÞ ^ ^ ^ . 4. Output σ 2random ¼ σ 2random and β ¼ β We compare the methods with EM algorithm for mixed model [2,32] to assess the trust region methods for variance estimation. To implement the EM algorithm in previous fMRI study [2], we define a weighted residual matrix as: RΣ ¼ Σ

−1

−Σ

−1

  T −1 −1 T −1 X X Σ X X Σ

ð27Þ

We start with an initial value as described in Eq. (24) which based on the assumption that the fixed effects variances are zero. The updated estimate of random effect variance as (Eq. (15) in the Appendix A of [2]): 2

σ random

ðkþ1Þ

2

¼ σ random

ðk Þ



4 ðk Þ T 2 p þ tr RRΣ þ σ random y RΣ yÞ=n

ð28Þ

where p = rank(X), R is the fixed effect variance from the first level defined in Eqs. ((2)–(3)), and n is the total number or runs/subjects to be averaged. 3. Results 3.1. Simulation study using synthetic data To demonstrate the (expectation) trust region algorithm, we begin with a very simple case. We take a simple form of X = 1 and

H 11 ¼

∂S 1 1 ðy−βÞ2 ¼ − 2 4 2σ ∂σ σ6

ð32Þ

If we use expectation trust region method, the information matrix F11 = E(H11) = 1/(2σ 4) is used to replace H11 in Eq. (32). Applying the LM algorithm, i.e., Eq. (22), we have 2 kþ1 2 k −1 σ ð Þ ¼ σ ð Þ −ðH 11 þ λ⋅diagðH 11 ÞÞ S 1

ð33Þ

The effect y is y = 2 + 0.01randn and the variance is σ 2 = 0.1 + 0.01randn, where randn is the MATLAB function that produces the N(0,1) Gaussian noise. Fig. 1 shows one example of this simulation study. We use λ0 to represent the scale factor of λ in LM algorithm. With the initial value of λ0 = 1000, the LL function increases in the each iteration as shown in Fig. 1A, as variance σ 2 increase (Fig. 1B, however, the score function decrease in the iteration). The λ value is increased 2 times in each iteration (Fig. 1D), and we break the loop if (σ 2(k + 1) − σ 2(k)) b 10e − 100. One drawback of the simulation in Fig. 1 is that we set the observed value to y = 2 summed with 0.01 times magnitude Gaussian noise is unrealistic if the observed value is close to its mean value, i.e., y = 1. We therefore conducted a simulation study with the observed effect parameter y = 1 + 0.01randn with σ 2 = 0.1 + 0.01randn, and the other initial values for the method were the same as in Example 1 (λ0 = 1000). These results are displayed in Fig. 2. From Fig. 2, we can see that the LL function (Fig. 2A) still increases in each iteration, while the variance decreases (Fig. 2B) in each iteration. Because the score function (Fig. 2C) negative, which is the gradient direction for variance to change, from Eq. (33), and because (H11 + λ ⋅ diag(H11)) is also negative definite, we can see variance decrease in Fig. 2B. In contrast, in Fig. 1B the variance increases in each iteration, because the score function is positive (Fig. 1C). However, in both cases (Figs. 1 and 2), LL always increases in each iteration. λ is negative values for both examples in Figs. 1 and 2. To further study the behaviour of the method, we repeat the calculation with Gaussian noise added to the variance and mean value (the same as Fig. 2, i.e., y = 1 + 0.01randn with σ 2 = 0.1 + 0.01randn, λ0 = 1000). The initial value for λ is the same as in Figs. 1 and 2. Since Eq. (29) is a special case of Eq. (5), which derives from Eq. (1), we can get the ideal variance if we have obtained the predictor β. As we set the fixed effect to 0, the random

136

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

Fig. 1. Example 1 of the simulation study from the trust region algorithm. LL (A), random effect variance (B), score function (C), and λ change in each iteration.

effect variance can be estimated directly from the residual if β in Eq. (1) can be obtained using σ 2 = var(ε), where var is the MATLAB function to estimate variance. These variance values with random noise added are denoted by the solid curve in Fig. 3A, and the variance estimation results indicted by the red dotted curve. Fig. 3B

displays the predictor value with noise added, and Fig. 3C exhibits the initial value of Hessian matrix to estimate the variance. It is obvious that the variance can be estimated well by using the trust region method (expectation trust region method produces similar results) albeit the mean value is slightly high than the ideal

Fig. 2. Example 2 of simulation study from the trust region algorithm with different β values. LL function (A), random effect variance (B), score function (C), and λ change in each iteration (D).

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

137

Fig. 3. Simulation study with 100 random samples using the trust region algorithm. A, variance changes; B, β value with random noise added; C, initial value of Hessian matrix.

variance values (Fig. 3A). We also applied λ0 = 10 and λ0 = 10e + 10 in this simulation study, and we found larger estimation error when using λ0 = 10, while if λ0 is too high (small step size from Eq. (22)) algorithms will narrow the LL and parameters change, leading to slow converge.

To test the robustness of the algorithms, we changed the magnitude of variance (a in the table in Tables 1 and 2). The results for average 2 runs are given in Table 1. The estimated variance is also given. Three algorithms produce the same ideal variance estimation which is estimated by using σ 2 = var(ε) from Eq. (1). From Table 1, we can see that if the variance is larger (i.e., a = 10 and a = 1), then the (expectation) trust region methods given better results than the EM algorithm, since it is closer to the ideal random effect variance. However, if the variance is small (a = 0.1 and a = 0.01), then EM gives the best variance estimation result in terms of close to the ideal variance values. Furthermore, comparing the results obtained from λ0 = 1000 with λ0 = 10, it is evident that the variance with λ0 = 1000 is closer to the ideal variance than with λ0 = 10 for both the trust region and expectation trust region methods. From these results, we conclude that λ0 = 1000 produces better results than λ0 = 10 in this simulation study, because estimated variance is closer to the ideal variance value, especially when the input variance variation is large (Table 1). We also studied the case when averaging 100 runs, using the same methods and parameters used in the previous example. If we increase the total number of runs, the error introduced by the second term in the LL (Eq. (29)) will be increased. This may lead to score function sign changes; as a result, it may cause numerical instability

3.2. Comparison with the EM algorithm and λ0 selection for the trust region algorithm In this simulation study, we first compare the trust region and expectation trust region with the EM algorithm, to study how λ0 effects the variance estimation using these algorithms. We implemented the EM algorithm using Eqs. (27) and (28), while Eqs. ((30)-(33)) were employed for the trust region and expectation trust region algorithms. We  designed   an experiment  1 1 randn toaverage 2 runs, setting X ¼ ; y ¼ þ σ¼    1 1 randn 1 randn a þ in all algorithms, where a = 10, 1, 0.1, and 1 randn 0.01 controls the magnitude of the variance variation, as shown in the first row of Tables 1 and 2. Again, we set the fixed effect variance in Eq. (2) to 0, and produce the random noise and add it to the system (i.e., y and σ) with 100 repetitions. Table 1 summarize this simulation results with λ0 = 1000 and λ0 = 10.

Table 1 Different method comparison based on synthetic data (two runs/subjects to average), λ0 = 1000 (top) and λ0 = 10 (bottom), (mean ± standard deviation). Variance λ0 = 1000 Estimated(ideal) EM Trust region Expectation λ0 = 10 Estimated(ideal) EM Trust region Expectation

a = 10

a=1

a = 0.1

a = 0.01

0.8284 0.3374 0.8280 0.8284

± ± ± ±

1.4104 0.4365 1.1406 1.4106

0.9118 0.5023 0.9112 0.9113

± ± ± ±

1.1362 0.9101 1.1362 1.1362

0.9222 0.9125 0.9202 0.9219

± ± ± ±

1.4486 1.4480 1.4494 1.4486

0.9149 0.9148 0.9114 0.9146

± ± ± ±

1.3401 1.3402 1.3419 1.3401

0. 8583 0.3902 0.8061 0.8061

± ± ± ±

1.6760 0.4692 1.1656 1.1656

1.0235 0.5891 0.9603 0.9731

± ± ± ±

1.3155 1.0375 1.3233 1.3169

0.8483 0.8372 0.7440 0.8147

± ± ± ±

1.1831 1.1847 1.2501 1.1778

0.9408 0.9406 0.7197 0.9011

± ± ± ±

1.1651 1.1651 1.3194 1.1587

138

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

Table 2 different methods comparison based on synthetic data to average100 runs (mean ± standard deviation, top λ0 = 1000, bottom λ0 = 10). Variance λ0 = 1000 Estimated(ideal) EM Trust region Expectation λ0 = 10 Estimated(ideal) EM Trust region Expectation

a = 10

a=1

a = 0.1

a = 0.01

0.9926 0.4527 8.9177e + 28 0.9909

± ± ± ±

0.1399 0.2773 7.2249e + 29 0.1397

0.9994 0.6400 -2.8692e + 29 0.9985

± ± ± ±

0.1310 0.1420 2.0186e ± 30 0.1309

0.9669 0.9487 0.9668 0.9668

± ± ± ±

0.1441 0.1452 0.1441 0.1441

1.0100 1.0098 1.0100 1.0100

± ± ± ±

0.1448 0.1448 0.1448 0.1448

0.9875 0.4743 1.115e + 29 0.8125

± + ± ±

0.1360 0.3448 1.0633e + 30 0.1683

1.0117 0.6675 -1.8080e + 29 0.9189

± ± ± ±

0.1417 0.1780 1.8080e ± 30 0.1428

0.9915 0.9744 0.9865 0.9862

± 0.1294 ± 0.1289 ± 0.1291 + 0.1291

1.0245 1.0243 1.0226 1.0225

± ± ± +

0.1448 0.1448 0.1445 0.1445

for the analysis, as the variance may change in the opposite direction (comparing Fig. 1B with Fig. 2B). Fig. 4 gives one example of the LL, random effect variance, and predictor changes using different methods with the same variance and β as input. With λ0 = 1000, variance input variation is the same as for the two runs in Table 1. In Fig. 4, graphs A, D, G are obtained using EM algorithm; B, E, H from the trust region method; and C, F, I from the expectation trust region method. We found EM produces the smallest variance estimation, followed by the trust region method, while the expectation trust region method produces the largest variance. β has a very small range of change in the trust region (Fig. 4H) and expectation trust region methods (Fig. 4I), whereas EM has the largest change range (Fig. 4G). Furthermore, comparing Fig. 4B E H and Fig. 4C F I, it can be seen that the LL

(Fig. 4A-C), random effect variance (Fig. 4D-F), and β values (Fig. 4G–I) change in the opposite direction for all methods. In addition, using λ0 = 1000 and λ0 = 10 with 100 runs, we repeated the process 100 times Table 2 summarizes the results produced by the EM, trust region, and expectation trust region methods. We found the expectation trust region method is the best method in terms of stability and accuracy, although in certain cases, the trust region method did give variance estimation which is close to the ideal value when a = 0.1 and a = 0.01 in Table 2. Also, the standard deviation of these 100 tests from the trust region method is large when the input variance is large, esp. a = 10, and a = 1. Due to numerical instability, negative variance values have been obtained using the trust region method. In addition, we compare the computational time for each method, and found that the time to implement EM algorithm

Fig. 4. One example of numerical process to average 100 runs in the simulation study.

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

(0.0843 s) is more than 4 times longer than trust region (0.0209 s) and expectation trust region (0.0199 s) methods. Comparing Tables 1 and 2, it is clear that more runs/subjects should be used for the expectation trust region methods to obtain more accurate results. Furthermore, we found that λ0 size controls the accuracy for the trust region algorithms, the larger λ0 will produce the highest accuracy, but it will be slower to converge, and relative more time to realize. On the other hand, a smaller λ will produce lower estimation accuracy, but it requires less computational time. In the study, we set λ0 = 1000, which is 1000 times larger than the diagnosis of the Hessian matrix (due to Eq. (33)), suggesting that the algorithm is more like a gradient method than an NR method. 3.3. Combining different runs within and across-subjects using human data In the first human dataset analysis, we study the case of combining different runs within the mixed effect model. In the first level, the contrast vector was set to [1 0 0 0 0] to study the brain response to the Ach stimulus. In the second level, the design matrix in the mixed effect model (Eq. (1)) was set to [1 1] T or [1 1 1] T to combine two or three functional runs for the first experiment. The contrast vector in Eq. (18) was set to 1. We investigated the random effect variance change in each iteration at one randomly selected voxel from subject JS in the first experiment. Different methods were compared, and Fig. 5 shows the LL (Fig. 5A–C), variance (Fig. 5D-F), and predictor β (Fig. 5G–I) change (y axis) against each numerical

139

iteration (x axis) using the EM, trust region, and expectation trust region algorithms. For the EM algorithm, although the LL and β increases in each iteration (Fig. 5A and G), the variance decreases (Fig. 5D). The numerical processing for the (expectation) trust region algorithm for the same voxel is displayed in (Fig. 5B E H) and (Fig. 5C F I). Like the EM algorithm, the LL (Fig. 5B and C) increases rapidly at the beginning of the numerical iteration, the increase rate is slow after the first 10 iterations, and it converges after less than 50 iterations. In the meantime, the random effect variance decreases from its initial value (Fig. 5E F) to converge for these trust region algorithms. Comparing Fig. 5G with Fig. 5H/I, we can see that β has wider changes using the EM compared to the expectation trust region algorithms. It is worthwhile to mention that although the trust region and expectation trust region methods have the same parameter change pattern, the final values are different. For example, for the trust region method, the LL converges to 2.283 (Fig. 5B), while the expectation trust region method converges to 2.2778 (Fig. 5C). The expectation trust region method (Fig. 5F) has larger random effect variance estimation than the trust region method (Fig. 5E) in the example. One example of combination of two functional runs with subject JS from the first experiment is presented in Fig. 6. Because the results using the EM algorithm is very noisy, we have to smooth it with 8 mm FWHM Gaussian kernel to show the result for comparison. Fig. 6A, B, and C display the results using the EM, trust region algorithm, and expectation trust region algorithm. Comparing the different methods, the largest T value from EM is almost the same as

Fig. 5. Average 2 runs of subject JS from the first experiment. LL (A–C), variance change (D–F), β (G–I) change in each iteration. A,D,G are obtained using the EM algorithm. B, E, H using the trust region algorithm. C, F, I, using the expectation trust region algorithm.

140

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

Fig. 6. Activation maps from average 2 runs of subject JS from the first experiment. A, EM algorithm; B, trust region algorithm; C, expectation trust region algorithm. False discovery rate (FDR) threshold T N 3.414(P b 0.05).

Fig. 7. Activation maps average 8 subjects in the first experiment. A, EM algorithm; B, trust region algorithm. C, expectation trust region algorithm. FDR threshold T N 3.8354(P b 0.05).

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

141

Fig. 8. Activation maps from average 8 subjects in the second experiment. A, EM algorithm; B, trust region algorithm; C, expectation trust region algorithm. FDR threshold N3.9445 (P b 0.05).

Fig. 9. One example of numerical iteration from average 11 normal subjects in retinotopic mapping experiment. LL (A–C), variance change (D–F), β (G–I) change in each iteration.

142

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

the trust region and expectation trust region algorithm after smoothing, however, EM has the largest activation region size. Trust region algorithm (Fig. 6B) and expectation trust region algorithm (Fig. 6C) produce very similar results. We applied the different methods to combine 8 subjects from the first experiment (Fig. 7). Fig. 7A shows the results obtained using the EM algorithm, while Fig. 7B is using the trust region algorithm, and Fig. 7C expectation trust region algorithm. Again it can be clearly seen that the trust region (Fig. 7B) produces the similar results to the expectation trust region (Fig. 7C) algorithm. Compare with trust region algorithm, we found that the EM (Fig. 7A) produces a larger T value and activation region size than the trust region methods (Fig. 7B and C). Fig. 8 shows the results from the second experiment. Fig. 8A is produced using the EM algorithm, while Fig. 8B and C are the results of the trust region and expectation trust algorithms. Again, the trust region and expectation trust region methods produce very similar results. Consistent with Fig. 7 from the first experiment, EM shows the largest T value and activation size. Furthermore, we can see that the first experiment (Fig. 7) has larger activation size and T value than the second experiment (Fig. 8) for all algorithms, suggesting different brain regions involved in the colour perception.

3.4. Third dataset results The third dataset was used for evaluating these algorithms for the group average, noise robustness, and group comparison within the framework of mixed effect model. One voxel (randomly selected) from 11 normal subjects was used to study the behaviour for different algorithms to group averages. The LL, random effect variance, and β changes in each iteration are displayed in Fig. 9A– C, D–F, G–I, respectively. The results from the EM, trust region, and expectation trust region algorithms are shown in Fig. 9A, D, G; B, E, H; and C, F, I, respectively. As in Fig. 5A, the LL increase in each iteration using the EM algorithm (Fig. 9A), although the increase rate is slower than in Fig. 5A. However, the range of LL change (Fig. 9A) is much wider than Fig. 5A, suggesting this value is very sensitive to the change of variance and β in each iteration. Fig. 9D shows the random effect variance changes, decreasing dramatically at the beginning of the iterations, but the rate of change becomes much slower after a few iterations. From Fig. 9A and G, it is obvious that LL and β have not converged after 100 iterations, and iteration is terminated due to exceeding the maximum number of predefined iterations. Again, the results from the trust region and expectation trust region methods are similar for both LL and parameters changes in the numerical processing (comparing Fig. 9 B E H with Fig. 9C F I). The LL increases in each iteration, and the variance decreases, but the predictor increases as shown in Fig. 9. The difference is the variance from expectation trust region method is slightly smaller than the trust region method (comparing Fig. 9E with Fig. 9F), while β is larger than the trust region method (comparing Fig. 9H with Fig. 9I).

Table 3 Gaussian noise simulation study from 100 repetitions. T values (mean ± standard deviation) produced by using EM algorithm, trust region, and expectation trust region algorithms. Time is the computation time in second. T value

EM

Trust

Expectation

N(0,0.5)

9.0093 ± 7.5717 Time: 0.0266 24.1428 ± 15.1104 Time: 0.0295 99.7123 ± 57.1525 Time: 0.0288

4.5639 ± 6.7669 Time: 0.0208 8.5425 ± 1.5847 Time: 0.0149 14.1707 ± 0.1837 Time: 0.0139

5.1416 ± 1.0009 Time: 0.0127 8.6043 ± 1.3447 Time: 0.0139 14.1687 ± 0.1836 Time: 0.0135

N(0,0.05) N(0,0. 005)

3.5. Robustness to Gaussian noise simulation study To study the behaviour of different algorithms in response to noise, we carried out another simulation study using the data from Fig. 9. We produced three levels of i.i.d. noises, i.e., N(0,0.005), N(0,0.05), and N(0,0.5) for this study, and we added these noises to the effect y from the first level analysis and variance Σ in Eq. (2). We repeated the process 100 times and calculated the mean, standard deviation, and computation time. The T statistics (using Eq. (18)) are calculated (Table 3). From Table 3, we can see that, first, EM has the larger T value esp., for the smaller noise in variance. The T mean value obtained from EM algorithm is more than 3 times larger than the T value using trust region methods. We found very large T value standard deviation for the EM algorithm (Eq. (28)) compared to the trust region approaches with N(0,0.005) noise added (Table 3), suggesting the EM method is very sensitive to noise. Second, the EM algorithm requires more than twice the computational time than the expectation trust region method in all cases. A paired t test was employed to compare different methods, and we found there is significant difference between the EM and trust region methods in terms of T statistics, but no significant difference between trust region and expectation trust region method was observed. Finally, we found larger standard deviation using the EM algorithm (Table 3), suggesting EM shows wider variability when estimating T statistics. To further compare different algorithms for group averaging, we applied these algorithms to average 11 normal subjects from the third dataset. Fig. 10 shows the results using the EM algorithm (Fig. 10A) and the expectation trust region algorithm (Fig. 10B). Because the trust region algorithm produced similar results to the expectation trust region algorithm visually, we have not shown the results here. Again, we can see that EM (Fig. 10A) gives the T value s that are more than 2 times larger than (expectation) trust region algorithms (Fig. 10B). Because of this, we used the different colour bar for the activation map for these two algorithms. We found similar results obtained for both trust region and expectation trust region algorithms, and this pattern is consistent for our data, as shown from Figs. 6-8 to average different runs/subjects. It should be mentioned that the T value s from the third dataset (Fig. 10) are much larger than the first and second datasets (Figs. 7–8) to combine different subjects. One reason for this is because the third dataset includes more subjects, i.e., 11 subjects to be averaged, which is more than the first and the second datasets. Secondly, because the first level analysis produces a large T value (effect value and small standard deviation) due to including only 2nd order slow drifts, this could be another reason for the larger T value in Fig. 10. Thirdly, in the first level for activation detection, we only include one condition, which makes the df higher than including several conditions as was the case for the first and second datasets analysis. Lastly, because the design matrix for the first level analysis is changing voxel by voxel adaptively, which leads to much better fits, it produces the larger T value at the first level analysis as a result. In addition to averaging different runs/subjects, we compared the normal subject group with the amblyopic subject group within the framework of a mixed model using different algorithms from the third dataset. Again we applied the EM, trust region, and expectation trust region methods to the same voxel. We cannot show the LL numerical iteration processing result visually, because it has imaginary numbers. Therefore, we only give one random selection voxel for the analysis from the trust region and expectation trust region method (Fig. 11). Fig. 11A and G are LL changes during the numerical processing using the trust region and expectation trust region algorithms. Fig. 11 B, C, and D (H, I, J) display the random effect variance changes obtained from the

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

143

Fig. 10. Activation maps from average 11 normal subjects in the retinotopic mapping experiments. A, EM algorithm; B, expectation trust region algorithm.

(expectation) trust region methods. The predictor β for control and amblyopic patient groups are presented in Fig. 11E F (K L) from both methods. In general, both methods have a similar numerical processing property, and produced similar results, but there are some differences. For instance, the final LL value is larger from the trust region method (Fig. 11A) than expectation trust region method (Fig. 11G). Furthermore, the random effect variance of the control/patient group from the trust region method (Fig. 11B/C) is larger than the expectation trust region method (Fig. 11H/I). Both methods (Fig. 11D and J) have a negative covariance between two groups, suggestion there is a negative correlation between two groups. Comparing Fig. 11E with Fig. 11K, it is very interesting to see that the predictor value from the trust region method has a bigger vibration than the expectation method. This suggests that the trust region method could be numerical instable compared to the expectation method, which could explain the numerical results in Table 2 where the trust region method gives very large number (Table 2, 2nd and 3rd columns, λ0 = 10). However, both methods have very similar predictor β for the patient predictor estimation (Fig. 11F and L). Finally, we show the different brain maps from these methods in Figs. 12-14. Fig. 12, 13, and 14 display the effect, variance, and T statistics map, respectively. Figs. 12A, 13A, and 14A are obtained from the EM algorithm, while Figs. 12B, 13B, and 14B are from the expected trust region algorithm. Although effect and variance maps are different for the expected trust region algorithm, the trust region method produces a very similar T statistics map to the

expectation trust region algorithm visually (we have not shown the results here). Compare Fig. 12A with Fig. 12B, it is easy to see that the effec T values obtained from EM are larger than the effect values estimated from the expected trust region algorithm. In addition, the pattern which indicates the structure of the brain is clearer from the expected trust region algorithm than the EM algorithm. For the variance map (Fig. 13), we also observe larger variance values from the EM algorithm than the expected trust region algorithm. As the colour bar indicates, there is a wide range of the variance value obtained from the EM algorithm than expected trust region algorithm. However, when we use Eq. (18) to calculate the T statistical map which is the ratio of the effect map (Fig. 12) and variance map (Fig. 13), we found the expected trust region algorithm produces a higher T statistic value than the EM algorithm (Fig. 14). This result is different from the group average shown in Fig. 6–10 which indicates that the EM algorithm yields larger t values than the expected trust region algorithm for the group comparison. Fig. 14A and B exhibit the group difference T statistical map from the EM and trust region algorithms. Besides, the result produced by the trust region method is smoother than the EM algorithm. Lastly, we have not found a significant difference between control and amblyopic groups using all algorithms in this study. This result is consistent with our previous study [26], which states that there is no significant difference between normal and amblyopic cortex, although there is a brain activation reduction in the amblyopic cortex.

144

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

Fig. 11. LL and parameter changes using the trust region and expectation trust region methods. A–F obtained using the trust region method, G–L obtained using the expectation trust region method.

4. Discussion We have applied maximum likelihood estimation methods for the mixed effect model to the second level fMRI analysis. To estimate the model parameters, we developed the expectation trust region algorithms to optimize the LL. Based on 3 human and simulated datasets, we found that it is feasible to use trust region algorithms for second level fMRI analysis. However, the scale of damping factor for the trust region algorithms should be large (λ0 N 1000) to ensure estimation accuracy, which makes the trust region methods approach more like gradient algorithm [33,34] than a NR type optimization algorithm. We also found that if we used a fixed large damping factor for the mixed effect model, the algorithm is slow to converge. To overcome

this limitation, we adopt the idea from trust region which changes the trust region size in each iteration. This method is faster and easier to implement. To ensure LL increase in each iteration, we select the sign of damping factor in (H + λ ⋅ diag(H)) to be negative definite. We have compared the trust region algorithms with the EM algorithm for the mixed model analysis. We found that there are two limitations for the EM algorithm; first, it is slow to converge and takes a long calculation duration comparing with the trust region algorithm. Second, it is not robust to Gaussian noise based on our study (Table 3). One important question that may be raised is why two methods produce different results based on the same dataset? To understand this, we must first review the mathematical

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

145

Fig. 12. Group comparison using the mixed effect model: effect map. A, EM algorithm; B, expectation trust region algorithm.

background of the LM algorithm. The LM algorithm is simply a regularization method for the convex function, if we modified the LL function in Eq. (5) as ‘ðβ; ∑; yÞ ¼ −

n 1 1 T −1 lnð2πÞ− lnjΣj− ðy−XβÞ Σ ðy−XβÞ−λJ ðΣÞ 2 2 2 ð34Þ

where J ðΣÞ ¼ 12 λ

n X

2

Σii . The above Eq. is often referred to as the

i¼1

penalized quasi-likelihood function. This function is different from [5,6] in which they used J(Σ) = log |X TΣX|. The corresponding score function with respect to Σ is S1 ¼

n   i X 1h −1 T −1 −1 −tr Σ þ ðy−XβÞ Σ Σ ðy−XβÞ −λ Σii 2 i¼1

ð35Þ

If using (35) as the score function, then the vector V in Eq. (22) should be V = − λ(H + λI)Σ; calculating the Hessian matrix as H 1 ¼ ∂S∂ 1 which lead to Eq. (20) if we consider the proportion of each diagonal element of the Hessian matrix. It may also be asked why we use such a penalty term; the reason is the ill-posed problem to calculate the Hessian matrix inverse. In mathematics, this is wellknown as Tikhonov regularization [35]. We used this technique to reduce the small disturbances that cause large estimation error in the output. Our simulation studies (Table 1) show that the variance estimation obtained from the expectation trust region method produced values nearer to the true values than the EM algorithm,

suggesting that the LM algorithm is better than EM algorithm which does not penalize the objective LL function for optimization. 4.1. Difference between trust region and expectation trust region method From real human datasets, we found that trust region and expectation trust region methods produce very similar results throughout this study, except large variance input when many subjects are to be averaged (Table 2). In terms of calculation, the expectation trust region method is based on the assumption that E(y − Xβ) = 0 for the Hessian matrix calculation (Eq. (15)), it is faster than trust region method (Table 3) since it does not calculate the off-diagonal elements of the Hessian matrix. In the implementation of the algorithm, we adopted an approach which calculates the variance and β in each iteration separately, thus reducing the calculation time because if we compute all these parameters together at once, the Hessian/ information matrix size is much larger, therefore increasing the time to calculate the inverse of the matrix. However, if we calculate variance and predictor separately, it makes it easier to compare with the EM algorithm, since most EM algorithms compute these parameters separately. It can be seen that (from Eqs. (20)–(22)), the parameters estimation accuracy is controlled by the step size, that is (H + λ ⋅ diag(H)) − 1, a large step size will lead to quick convergence, while a smaller step size will have higher accuracy for the variance estimation. In our study, we increase λ two times in each

146

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

Fig. 13. Group comparison using the mixed effect model: variance map. A, EM algorithm; B, expectation trust region algorithm.

iteration, which may not be optimal, and step size may need to be selected more efficiently. The steepest descent method could be employed to estimate the optimal step size.

i.e., robust stochastic approximation may be studied for the mixed effect model. 5. Conclusion

4.2. Limitations and future directions Although we have developed algorithms and tested in the case of 2 group comparisons, we have not tested this algorithm in case of 3 or more groups. Moreover, we have not investigated the possibility of applying the trust region algorithms to the non-Gaussian, i.e., Laplace distribution likelihood objective function. In addition, we used a very simple method to determine λ. The algorithm for the trust region sub-problem should be applied to estimate λ more objectively [18]. One of the future research objectives is to apply the suggested algorithms to a longitudinal dataset [36,37] and fMRI effective connectivity studies [38,39], because variances in longitudinal dataset and connectivity studies also need to be estimated for second level data analysis. Another future direction is to apply a hybrid algorithm for the mixed model. In a hybrid algorithm the advantage of different algorithms (EM and trust region algorithms) can be combined which may be a more sensitive method for the mixed effect model. In addition, parameter expansion EM (PXEM) methods have been developed for statistical analysis [40], e.g., for implementing the Gibbs sampler, and it may be worthwhile to investigate the possibility of applying this method to second level fMRI data analysis. Other gradient search methods,

In conclusion, we have developed maximum likelihood estimation with expectation trust region algorithms with the mixed model for second level fMRI data analysis. We tested the proposed method using a set of synthetic and three human datasets, and the results show that the suggested trust region algorithms can be used for second level fMRI data analysis. To improve the sensitivity for combining/comparing different runs within the framework of the mixed model, the results suggest that we should apply the expectation trust algorithm with a damping factor (N 1000) as the initial value for the second/high level data analysis. The expectation trust region method is insensitive to initial value for numerical optimization, and is robust to Gaussian noise and faster in computation, this is an advantage for fMRI studies, perhaps reducing the burden on the experimenter and the amount of concentration time required of the experiment participant. Acknowledgment This study is supported under the Centre of Excellence in Intelligent Systems (CoEIS) project, funded by InvestNI and the Integrated Development Fund, through ILEX. The data collection in this study was supported by CIHR grants to Robert F. Hess (# MOP53346) and Kathy T. Mullen (#MOP-10819).

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

147

Fig. 14. Group comparison using mixed effect model: T map. A, EM algorithm; B, expectation trust region algorithm.

Appendix A. fMRI experiment design and pre-processing A.1. fMRI dataset 1 and 2 The first and second experiments were conducted within the constraints of the ethical clearance from the Medical Research Ethics Committee of the University of Queensland for MRI experiments on humans at the Centre for Magnetic Resonance (Brisbane, Australia), and conforms with the Code of Ethics of the World Medical Association (Declaration of Helsinki), printed in the British Medical Journal (18 July 1964). All magnetic resonance images in the first and second experiments were acquired using a 4 T Bruker MedSpec system. A transverse electromagnetic head coil was used for radiofrequency transmission and reception. Eight healthy observers were used as subjects (four female, mean age 41 years, age range: 31–54 years), five of whom were naive to the purpose of the study. The subjects were instructed to maintain fixation on the provided fixation-point and trained prior to the scanning sessions to familiarize them with the task. All observers had normal or corrected-to-normal visual acuity. No participant had a history of psychiatric or neurological disorder, head trauma, or substance abuse. Informed written consent was gained from all participants prior to the commencement of the study. For the fMRI studies, 241 T2*-weighted gradient-echo echoplanar images depicting BOLD contrast were acquired in each of 36 planes with TE 30 ms, TR 3000 ms, in-plane resolution 3.6 mm and slice thickness 3 mm (0.6 mm gap). The slices were taken parallel to the calcarine sulcus, and covered the entire occipital and parietal lobes and large dorsal-

posterior parts of the temporal and frontal lobes. Two–three fMRI scans were performed in each session. Head movement was limited by foam padding within the head coil. In the same session, a highresolution 3D T1 image was acquired using an MP-RAGE sequence with TI 1500 ms, TR 2500 ms, TE 3.83 ms, and a resolution of 0.9 mm3. In the fMRI data pre-processing, the first two image volumes were cut because of magnetic instability, therefore we used only 239 image volumes in each run. For details of the two fMRI experiments, stimuli, data collections, and data pre-processing see reference [22]. A.2. fMRI dataset 3 Eleven normal subjects (mean age 30 yrs) and 14 amblyopic subjects were used in this study (for more details regarding the protocol and subjects, see [26,41]). Studies were performed with the informed consent of the subjects and approved by the Montreal Neurological Institute Research Ethics Committee and followed the tenets of the Declaration of Helsinki, printed in the British Medical Journal (18 July 1964). Briefly, a Siemens 1.5 T Magnetom scanner was used to collect both anatomical and functional images. Anatomical images were acquired using a rectangular (14.5″ × 6.5″) head coil (circularly polarized transmit and receive) and a T1 weighted sequence (TR = 22 ms; TE = 10 ms; flip angle = 30°) giving 176 sagittal slices of 256 × 256 mm2 image voxels. Functional scans for each subject were collected using a surface coil (circularly polarized, receive only) positioned beneath the subject's occiput. Each functional imaging

148

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

session was preceded by a surface coil anatomical scan (identical to the head coil anatomical sequence, except that 80 × 256 × 256 sagittal images of slice thickness 2 mm were acquired) in order to later co-register the data with the more homogeneous head-coil image. Functional scans were multislice T2 weighted, gradientecho, planar images (GE-EPI, TR = 3.0 s, TE = 51 ms, flip angle = 90°). The image volume consisted of 30 slices orthogonal to the calcarine sulcus. The field of view was 256 × 256 mm, the matrix size was 64 × 64 with a thickness of 4 mm yielding voxel sizes of 4 × 4 × 4 mm3. Visual retinotopic mapping stimuli in a phase-encoded design were used in the study. Phase-encoded design [42,43] overcomes a shortcoming of the blocked paradigm, namely that the stimulus is inherently ‘discrete’. To enable the stimuli to cause a continuous variation across the surface of the cortex, the experiment is designed to ensure the mapping on the cortex is varied monotonically over a relatively long period. This experimental design has been successfully used to reveal complex representations of retinotopy in the visual cortex and tonotopy in the auditory cortex. Each visual retinotopic experiment (phaseencoded design, travelling square wave) consisted of four acquisition runs for each eye (two eccentricity runs, two polar angle runs, two clockwise order runs, and two counter-clockwise runs), with each of the 128 image volumes acquired at three second intervals for the left and right eye of normal subjects of the fixing and amblyopic eye of patients. Runs were alternated between the eyes in each case while the subject was performing a task to keep awake in the scanner. The eye that was not subjected to stimulation was occluded with a black patch that excluded all light from the eye. Subjects monocularly viewed a stimulus back-projected into the bore of the scanner through an angled mirror. Dynamic motion correction for functional image time series for each run and for different runs were realigned at the same time by using the fmr_preprocess function (provided in the MINC software package: http://noodles.bic.mni.mcgill.ca/ ServicesSoftware/HomePage) with default parameters of threedimensional Gaussian low-pass filtering. The first eight scans of each functional run were discarded due to start up magnetization transients in the data, so only 120 image volumes were used for each run.

where In1 × n2 and In2 × n1 is the n1 × n2 and n2 × n1 identity matrix, respectively. From Eq. (9) again, we get

S2 ¼

 1 −1 ∂‘ 1  −1 T −1 ¼ − tr Σ ∑2 þ ðy−XβÞ Σ ð∑2 ÞΣ ðy−XβÞ ðA4Þ 2 2 ∂σ 12

Using Σθ ¼

 ∂Σ ∂Σ 0 ¼ ¼ 0 ∂θ ∂σ 22

  0 0 ⊗I ¼ 1 0

0



I n2

ðA5Þ

¼ ∑3

and taking into account Eq. (9), we obtain the score function of σ22: S3 ¼

 1 ∂‘ 1  −1 T −1 −1 ¼ − tr Σ ∑3 þ ðy−XβÞ Σ ð∑3 ÞΣ ðy−XβÞ ðA6Þ 2 2 2 ∂σ 2

where In2 is n2 × n2 identity matrix, n2 is number of patient subject. The corresponding Hessian matrix is: 2

2

H 11 H ¼ 4 H 12 H 13

H 12 H 22 H 23

∂S 1 2 3 6 6 ∂σ 1 H 13 6 ∂S 6 1 H 23 5 ¼ 6 6 ∂σ 12 H 33 6 4 ∂S 1 ∂σ 22

∂S 1 ∂σ 12 ∂S 2 ∂σ 12 ∂S 2 ∂σ 22

3 ∂S 1 ∂σ 22 7 7 ∂S 2 7 7 7 ∂σ 22 7 7 ∂S 3 5

ðA7Þ

∂σ 22

where H11, H12, …, and H44 are calculated as follows. From Eqs. (9), (A1), and (A2), we get H 11 ¼

∂S 1 1 h −1 −1 T −1 ¼ tr Σ ð∑1 ÞΣ ð∑1 Þ −2ðy−XβÞ Σ ∂σ 21 2 i −1 −1 ð∑1 ÞΣ ð∑1 ÞΣ ðy−XβÞ :

ðA8Þ

Similarly, from Eqs. (A2) and (A3), we have Appendix B. Hessian and information matrices for different group comparison

H 12 ¼

If we want to compare different groups, using the variance matrix as shown in Eq. (3), and set θ = σ12, A = I from Eq. (3), we get [6]:

Σθ ¼

 ∂Σ ∂Σ 1 ¼ ¼ 0 ∂θ ∂σ 21

  0 I ⊗I ¼ n1 0 0

 0 ¼ ∑1 0

ðA1Þ

where In1 is the n1 × n1 identity matrix, n1 is the number of control subject. From Eq. (9), the score function of σ12 is:

S1 ¼

 ∂S 1 1 h  −1 tr Σ ð∑2 ÞΣ−1 ð∑1 Þ −ðy−XβÞT Σ−1 ¼ ∂σ 12 2 ð∑2 ÞΣ

−1

ð∑1 ÞΣ

−1

T

ðy−XβÞ−ðy−XβÞ Σ i −1 −1 ð∑1 ÞΣ ð∑2 ÞΣ ðy−XβÞ :

−1

ðA9Þ

From Eqs. (A2) and (A5), we obtain H 13 ¼

 ∂S 1 1 h  −1 tr Σ ð∑3 ÞΣ−1 ð∑1 Þ −ðy−XβÞT Σ−1 ¼ 2 ∂σ 2 2 ð∑3 ÞΣ

−1

−1

T

ðy−XβÞ−ðy−XβÞ Σ i −1 ð∑3 ÞΣ ðy−XβÞ ;

 1 ∂‘ 1  −1 T −1 −1 ¼ − tr Σ ∑1 þ ðy−XβÞ Σ ð∑1 ÞΣ ðy−XβÞ ðA2Þ 2 2 2 ∂σ 1

ðΣ1 ÞΣ

−1

ð∑1 ÞΣ

−1

ðA10Þ

In the same way, from Eqs. (A4) and (A3), we calculate Similarly, we obtain score function S2 using Σθ ¼

 ∂Σ ∂Σ 0 ¼ ¼ 1 ∂θ ∂σ 12

  1 0 ⊗I ¼ I n2n1 0

I n1n2 0

H 22 ¼

 ¼ ∑2

ðA3Þ

 ∂S 2 1 h  −1 −1 T −1 ¼ tr Σ ð∑2 ÞΣ ð∑2 Þ −2ðy−XβÞ Σ ∂σ 12 2 i −1 −1 ð∑2 ÞΣ ð∑2 ÞΣ ðy−XβÞ

ðA11Þ

X. Li et al. / Magnetic Resonance Imaging 32 (2014) 132–149

From Eqs. (A4) and (A5), we get H 23 ¼

 ∂S 2 1 h  −1 −1 T −1 tr Σ ð∑3 ÞΣ ð∑2 Þ −ðy−XβÞ Σ ¼ 2 2 ∂σ 2 ð∑3 ÞΣ

−1

ð∑2 ÞΣ

−1

ð∑3 ÞΣ

−1

ðy−XβÞ

T

ðy−XβÞ−ðy−XβÞ Σ

−1

ð∑2 ÞΣ

i

−1

ðA12Þ

Similarly, from Eqs. (A6) and (A5), we get H 33 ¼

 ∂S 3 1 h  −1 −1 T −1 ¼ tr Σ ð∑3 ÞΣ ð∑3 Þ −2ðy−XβÞ Σ 2 2 ∂σ 2 i −1 −1 ð∑3 Þ ð∑3 ÞΣ ðy−XβÞ

ðA13Þ

Finally, from Eqs. (A1) and (8), we have H 44 ¼

∂S 4 T −1 ¼ −X Σ X ∂β

ðA14Þ

The variance can be estimated in the same way as group average in Eq. (20) as !   2  −1 ∂ ‘ −1 −1 T −1 ^ ^ ¼−ðH 44 Þ ¼ X Σ X V ar β ¼ −E ∂β

ðA15Þ

for T statistics calculation. If we employ expectation trust region algorithm, then we need to obtain the information matrix. Substitute E(y − Xβ) = 0 into Eqs. (A8)-(A13), we obtain the information matrix as: F ¼ −E ðH Þ      3 2 −1 −1 −1 −1 −1 −1 tr ∑ ∑1 ∑ ∑1 tr ∑ ∑2 ∑ ∑1 tr ∑ ∑3 ∑ ∑1 6      7 16 7 −1 −1 −1 −1 −1 −1 ¼ − 6 tr ∑ ∑2 ∑ ∑1 tr ∑ ∑2 ∑ ∑2 tr ∑ ∑3 ∑ ∑2 7 24      5 −1 −1 −1 −1 −1 −1 tr ∑ ∑3 ∑ ∑1 tr ∑ ∑3 ∑ ∑2 tr ∑ ∑3 ∑ ∑3

ðA16Þ

Using the above Hessian matrix (A(8)) or information matrix (A(19)), the LM algorithm in Eq. (20) becomes 2 3 3ðkþ1Þ 2 2 3ðk Þ S 1 ðθÞ ðk Þ σ 21 σ1 −1ðk Þ 4 4σ 5 4 5 S 2 ðθÞ 5 ¼ σ 12 −ðH þ λ⋅diagðH ÞÞ 12 2 2 S 3 ðθÞ σ2 σ2 2

ðA17Þ

The initial value of variance/covariance can be calculated using the MATLAB function cov. If the two groups are not equal, covariance between two group using the same number of subjects/runs to estimate. Appendix C. Supplementary data Supplementary data to this article can be found online at http:// dx.doi.org/10.1016/j.mri.2013.10.007. References [1] Henderson CR. Applications of linear models in animal breeding. Canadian Cataloguing in Publication Data; 1984. [2] Worsley K, Liao CH, Aston J, Petre V, Duncan GH, Morales F, et al. A general statistical analysis for fMRI data. Neuroimage 2002;15:1–15. [3] Friston KJ, Stephan KE, Lund TE, Morcom A, Kiebel S. Mixed-effects and fMRI studies. Neuroimage 2005;24:244–52. [4] Roche A, Mebastien M, Keller M, Thirion B. Mixed-effect statistics for group analysis in fMRI: a nonparametric maximum likelihood approach. Neuroimage 2007;38:501–10. [5] Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Stat Assoc 1993;88(421):9–25. [6] Searle S, Casella G, McCulloch C. Variance components. New York (NY): Wiley; 1992. [7] Beckmann CF, Jenkinson M, Smith SM. General multilevel linear modeling for group analysis in FMRI. Neuroimage 2003;20:1052–63.

149

[8] Woolrich MW, Behrens TEJ, Beckmann CF, Jenkinson M, Smith SM. Multilevel linear modelling for FMRI group analysis using Bayesian inference. Neuroimage 2004;21:1732–47. [9] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 1977;39:1–38. [10] Harville JA. Maximum likelihood approaches to variances component estimation and to related problems. J Am Stat Assoc 1977;72(358):320–40. [11] Thirion B, Pinel P, Meriaux S, Roche A, Dehaene S, Poline JB. Analysis of a large fMRI cohort: statistical and methodological issues for group analyses. Neuroimage 2007;35(1):105–20. [12] Woolrich M. Robust group analysis using outlier inference. Neuroimage 2008;41(2):286–301. [13] Patterson HD, Thompson R. Recovery of inter-block information when block sizes are unequal. Biometrika 1971;58(3):545–54. [14] Hartley HO, Rao JNK. Maximum-likelihood estimation for the mixed analysis of variance model. Biometrika 1967;54(1 and 2):93–108. [15] Lindstrom M, Bates DM. Newton-Raphson and EM algorithms for linear mixedeffects models for repeated-measures data. J Am Stat Assoc 1988;83(404):1014–22. [16] Marquardt DW. An algorithm for least-squares estimation of nonlinear parameters. SIAM J Appl Math 1963;11(2):431–41. [17] Kelley CT, editor. Iterative methods for optimization. Philadelphia: SIAM; 1999. [18] Nocedal J, Wright S. Numerical Optimization (Springer Series in Operations Research and Financial Engineering). New York: Springer; 2006. [19] Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical recipes in C: the art of scientific computing. 2nd ed. Cambridge, UK: Cambridge University Press; 1992 . [20] Barbara K. Some Newton-type methods for the regularization of nonlinear illposed problems. Inverse Probl 1997;13(3):729. [21] Hanke M. The regularizing levenberg-marquardt scheme is of optimal order. J Integr Equ Appl 2010;22(2):260–83. [22] Mullen KT, Dumoulin SO, Hess RF. Color responses of the human lateral geniculate nucleus: selective amplification of S-cone signals between the lateral geniculate nucleno and primary visual cortex measured with high-field fMRI. Eur J Neurosci 2008;28:1911–23. [23] Hess RF, Thompson B, Gole GA, Mullen K. The amblyopic deficit and its relationship to geniculo-cortical processing streams. J Neurophysiol 2010;104(1):475–83. [24] Mullen KT, Thompson B, Hess RF. Responses of the human visual cortex and LGN to achromatic and chromatic temporal modulations: an fMRI study. J Vision 2010;10(13):1–19. [25] Glover G. Deconvolution of impulse response in event-related BOLD fMRI. Neuroimage 1999;9(4):416–26. [26] Li X, Dumoulin SO, Mansouri B, Hess RF. Cortical deficits in human amblyopia: their regional distribution and their relationship to the contrast detection deficit. Investig Ophthalmol Vis Sci 2007;48:1575–91. [27] Li X, Coyle D, Maguire L, McGinnity TM, Watson DR, Benali H. A least angle regression method for fMRI activation detection for phase-encoded experimental designs. Neuroimage 2010;52(2):1390–400. [28] Anderson TW. An introduction to multivariate statistical analysis. Wiley Series in probability and mathematical statistics. 2nd ed. John Wiley & Sons, Inc.; 1984. [29] Lynch M, Walsh B. Genetics and analysis of quantitative traits. Sunderland, Massachusetts, 01375 (US): Sinauer Associates, Inc.; 1988. [30] Smyth GK, Huele AF, Verbyla AP. Exact and approximate REML for heteroscedastic regression. Stat Model 2001;1:161–75. [31] Smyth GK. An efficient algorithm for REML in heteroscedastic regression. J Comput Graphical Stat 2002;11:836–47. [32] Laird N, Lange N, Stram D. Maximum likelihood computations with repeated measures: application of the EM algorithm. J Am Stat Assoc 1987;82(397):97–105. [33] Jamshidian M, Jennrich R. Conjugate gradient acceleration of the EM Algorithm. J Am Stat Assoc 1993;88(421):221–8. [34] Lange K. A gradient algorithm locally equivalent to the EM Algorithm. J R Stat Soc Ser B Methodol 1995;57(2). [35] Demidenko E. Mixed Models: Theory and Applications (Wiley Series in Probability and Statistics). New York: Wiley-Interscience; 2004. [36] Diggle PJ, Heagerty P, Liang KY, Zeger S. Analysis of longitudinal data. In: Atkinson AC, editor. Oxford Statistical science series. 2nd ed. Oxford University Press; 2003 [Oxford]. [37] Li X, Coyle D, Maguire L, Watson DR, McGinnity TM. Grey matter concentration and effective connectivity changes in Alzheimer’s disease: A longitudinal structural MRI study. Neuroradiology 2010;53(10):733–48. [38] Li X, Coyle D, Maguire L, McGinnity TM, Benali H. A model selection method for nonlinear system identification based fMRI effective connectivity analysis. IEEE Trans Med Imaging 2011;30(7):1365–80. [39] Li X, Marrelec G, Hess RF, Benali H. A nonlinear identification method to study effective connectivity in functional MRI. Med Image Anal 2010;14(1):30–8. [40] Liu C, Rubin DB, Wu YN. Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 1998;85(4):755–70. [41] Li X, Dumoulin SO, Mansouri B, Hess RF. The fidelity of the cortical retinotopic map in human amblyopia. Eur J Neurosci 2007;25(5):1265–77. [42] Sereno M, Dale AM, Reppas JB, Kwong KK, Belliveau JW, Brady TJ, et al. Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. Science 1995;268:889–93. [43] Hess RF, Li X, Mansouri B, Thompson B, Hansen BC. Selectivity as well as sensitivity loss characterizes the cortical spatial frequency deficit in amblyopia. Hum Brain Mapp 2009;30(12):4054–69.

Maximum likelihood estimation for second level fMRI data analysis with expectation trust region algorithm.

The trust region method which originated from the Levenberg-Marquardt (LM) algorithm for mixed effect model estimation are considered in the context o...
6MB Sizes 0 Downloads 0 Views