Robust GRAPPA reconstruction using sparse multi-kernel learning with least squares support vector regression.

Magnetic Resonance Imaging 32 (2014) 91–101

Contents lists available at ScienceDirect

Magnetic Resonance Imaging journal homepage: www.mrijournal.com

Robust GRAPPA reconstruction using sparse multi-kernel learning with least squares support vector regression Lin Xu a, Yanqiu Feng b,⁎, Xiaoyun Liu a, Lili Kang b, Wufan Chen a, b,⁎ a b

School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China School of Biomedical Engineering, Southern Medical University, Guangzhou, China

a r t i c l e

i n f o

Article history: Received 26 January 2013 Revised 27 August 2013 Accepted 3 October 2013 Keywords: Parallel imaging GRAPPA Multi-kernel learning Support vector regression (SVR) Structural risk minimization

a b s t r a c t Accuracy of interpolation coefficients fitting to the auto-calibrating signal data is crucial for k-space-based parallel reconstruction. Both conventional generalized autocalibrating partially parallel acquisitions (GRAPPA) reconstruction that utilizes linear interpolation function and nonlinear GRAPPA (NLGRAPPA) reconstruction with polynomial kernel function are sensitive to interpolation window and often cannot consistently produce good results for overall acceleration factors. In this study, sparse multi-kernel learning is conducted within the framework of least squares support vector regression to fit interpolation coefficients as well as to reconstruct images robustly under different subsampling patterns and coil datasets. The kernel combination weights and interpolation coefficients are adaptively determined by efficient semi-infinite linear programming techniques. Experimental results on phantom and in vivo data indicate that the proposed method can automatically achieve an optimized compromise between noise suppression and residual artifacts for various sampling schemes. Compared with NLGRAPPA, our method is significantly less sensitive to the interpolation window and kernel parameters. © 2014 Elsevier Inc. All rights reserved.

1. Introduction Parallel magnetic resonance imaging (MRI) through a multichannel radio frequency (RF) coils array [1–6] is being used in various clinical applications to reduce scan time while maintaining field of view (FOV) and spatial resolution. Several post-processing techniques have been developed to reconstruct unaliased images from undersampled datasets, among which sensitivity encoding (SENSE) [2] and generalized autocalibrating partially parallel acquisitions (GRAPPA) [6] are the most representative methods for image-domain-based reconstruction and k-space-based reconstruction, respectively. The GRAPPA method, reconstructing the missing k-space data by using a linear interpolation of neighboring acquired data over all coils, has drawn growing attention in recent years because it avoids explicit estimation of coil sensitivities, which is often necessary for image-domain-based approaches such as SENSE. Nevertheless, GRAPPA reconstruction accuracy depends on the selection of the interpolation window (hereafter called “interpolation window” to avoid possible confusion with the “kernel function” in support vector machine). Several improved methods have been proposed to choose the optimal interpolation window [7–10]. Although superior images ⁎ Corresponding author. School of Biomedical Engineering Southern Medical University, Guangzhou, China. Tel.: +86 20 6164 8294; fax: +86 20 6164 8274. E-mail addresses: [email protected] (Y. Feng), chenwf@fimmu.com (W. Chen). 0730-725X/$ – see front matter © 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.mri.2013.10.001

are acquired at low accelerations, these conventional GRAPPA methods, which use the linear model, suffer from serious noise amplification and aliasing artifacts at high acceleration situations, where the linear interpolation function cannot sufficiently address the spatial information complexity of coils. Recently, Chang et al. proposed a nonlinear GRAPPA (NLGRAPPA) reconstruction [11], which used a truncated polynomial kernel to map the k-space data onto a higher dimension feature space. Despite providing a more accurate fitting on the spatial correlation of coils compared with linear GRAPPA, the nonlinear transformation makes NLGRAPPA reconstruction more sensitive to interpolation windows. Furthermore, the determination of the optimal nonlinear interpolation function for NLGRAPPA, as used in [7–10], may be computationally expensive because the optimal interpolation function is also related to the number of second-order terms aside from the interpolation window size. To reduce the sensitivity of the kernel-based reconstruction to interpolation windows and kernel parameters, selecting the optimal interpolation function is formulated as a multi-kernel learning problem [12–14] in this study on the assumption that such interpolation function can be represented by a group of linear and nonlinear kernel functions. Multi-kernel learning is trained in the scheme of least squares support vector regression (LS-SVR), developed from structural risk minimization principle [15], for balancing fitting error and model complexity [16] over the measured auto-calibrating signal (ACS) data. Both phantom and in vivo

92

L. Xu et al. / Magnetic Resonance Imaging 32 (2014) 91–101

experiments were conducted to evaluate the performance of the proposed method and compare it with that of GRAPPA and NLGRAPPA methods. 2. Theory 2.1. Coefficients calibrations in GRAPPA and NLGRAPPA GRAPPA interpolates the missing data points by linearly joining the nearby acquired data across all coils [6,17] where the interpolation coefficients are estimated from the fully sampled ACS data near the k-space center. Let b denote the interpolation target consisting of the data on each ACS location and S denote the interpolation source that consists of adjacent data with the same undersampling pattern as the outer acceleration region, thus the interpolation coefficients w can be acquired by fitting the following equation: Sw ¼ b þ ε

ð1Þ

where ε represents the random noise in practice. NLGRAPPA employ a nonlinear kernel mapping to transform S into a higher dimension feature space Φ(S) where the interpolation coefficients are solved [11]. The mapping in NLGRAPPA adopted the 2nd order polynomial kernel, which is defined as: 2 H k si ; sj ¼ si sj þ 1

ð2Þ

where H denotes the Hermit transpose operation. In NLGRAPPA, only partial 2nd order terms of Φ(S)were used to attain a tradeoff between fitting error and model complexity, and the number of reserved 2nd order terms was suggested to be thrice that of firstorder terms [11]. Note that the nonlinear formulation would degrade into the linear model of conventional GRAPPA when all second-order terms are discarded. Both conventional GRAPPA and NLGRAPPA also utilize the least squares method to fit the interpolation function, which merely considers the fitting error but not the model complexity. As a result, choices of interpolation window and the number of 2nd-order terms dictate the tradeoff between the artifact and noise in the reconstruction. 2.2. GRAPPA reconstruction with sparse multi-kernel learning (MKGRAPPA) A single kernel produces only a specific space that involves partial features of coils correlation information, thus making neither linear GRAPPA nor NLGRAPPA effective for overall accelerations. In this study, multi-kernel learning is introduced to provide a stable and reliable reconstruction for different reconstruction tasks. The multi-kernel learning is implemented within the framework of weighted LS-SVR [16] to avoid potential overfitting arising from powerful representation of nonlinear kernel mapping [11]. To regress the interpolation function without bias, we solve the following optimization problem: J ¼ min w

bi −

M X

m¼1

N X 1 H 2 w wþλ v i ξi ; v i N0; i ¼ 1; …; N 2 i¼1

hwm ; Φm ðsi Þi ¼ ξi

wm ¼ θm w M X m

θm ¼ 1; θm ≥0; m ¼ 1; …; M

ð3aÞ

ð3bÞ ð3cÞ ð3dÞ

Here, vector w represents the coefficients of interpolation function to be solved, ξi is an error variable and N is the size of training set, vi denotes the fuzzy weight determined based on the “similarity” of this point with all other data points in the training samples, the detail of which is listed in Appendix A, and λ controls the tradeoff between training accuracy and model complexity. By assigning a small fuzzy weight to the training samples seriously corrupted by noise, robust estimates of coil coefficients are expected [18]. In Eq. (3b), bi denotes the interpolation target of i-th training sample, 〈, 〉 denotes the inner product of two vectors a = [a 1 ,a 2 , …,a n ] and b = [b 1 ,b2 , …,b n ] which is defined as n X H H H a 1 b1 þ a 1 b1 þ ⋯ þ a n bn , and source data vector si ∈ S is ha; bi ¼ i¼1

mapped into M different feature spaces (Φ1(si), …, ΦM(si)) by using M kinds of kernel functions, the selection of which is detailed in a latter subsection. In Eqs. (3c) and (3d), θm represents the combination weight of m-th kind of kernel function. Note that ℓ1-norm constraints on θ would promote a sparse kernel combination [19], which means that only a small number of basic kernels might carry significant weights. According to dual theory and Karush–Kuhn–Tucker optimal condition, the multi-kernel problem in Eq. (3a–d) can be transformed into the min–max problem as follows:

minimize maximize α α

H

θ

M X

! H

H

θm Km α þ Vλ α α−α b

m¼1

Ki 0; i ¼ 1; …; M N X αi ¼ 0 M X

ð4Þ

i¼1

θm ¼ 1; θm N0

m¼1

where Km is the kernel matrix and Km(i,j) =nkm(si,sj); Voλ is a diagonal matrix, which is given by Vλ ¼ diag λv11 ; …; λv1N ; and α represents the Lagrange multipliers. In general, the optimal λ needs to be chosen empirically by cross-validation. Simplifying the calculation, λ is separated from Vλ, and then diagonal matrix V that consists of fuzzy weight is integrated with other basic M 1 kernels, nα H (∑ m o= 1 K m + θ M + 1 V)α, where θM þ1 ¼ λ and 1 1 V ¼ diag v1 ; …; v N . Thus, λ is adaptively determined in the optimization of kernels combination coefficients. Problem Eq. (4) can be solved by using an efficient wrapper algorithm [13] to acquire the optimal α and θ for each offset and coil channel, which is detailed in Appendix B. As in all auto-calibration reconstruction algorithms, the training step in MKGRAPPA processes on the ACS region. With α and θ acquired by the above training step, the missing k-space data s(x)on location x can then be estimated:

sðx Þ ¼

N X M X i¼1 m¼1

αi θm Km ðsi ; sx˜ Þ

ð5Þ

where si ∈ S, sx˜ is a vector that includes the nearby acquired data of location x in k-space. 2.3. Selecting basic kernels The basic kernels can be of different or same types but with different scale parameters. Apart from the polynomial and linear kernels, the Gaussian kernel, which is defined as k(si,sj) = exp (−‖si − sj‖ 2/γ 2), where γ is a scale parameter to tune the kernel width, is widely used in practice because of its powerful representation ability.


The polynomial kernel shows better extrapolation abilities at lower orders, but requires higher orders for good interpolation [20]. By contrast, the Gaussian kernel has good interpolation abilities but fails to provide longer range extrapolation [20]. In k-space-based parallel reconstruction, the reconstruction of missing data points is an interpolation, although the whole reconstruction process can also be regarded as an extrapolation because the interpolation coefficients are calculated from the central ACS data [21]. Consequently, low-order polynomial and Gaussian kernels are combined to simultaneously meet the interpolation and extrapolation requirements in reconstruction. A linear kernel is also selected because of its robustness at low accelerations. 3. Materials and methods 3.1. Data acquisition Four scanned datasets were used to validate the performance of the proposed method. American College of Radiology MRI phantom data were acquired on a 3-T Tim Trio whole-body scanner (Siemens Healthcare, Erlangen, Germany) with 16-channel coils by using a spin echo sequence (TE/TR = 15/500 ms, matrix size 256 × 256, field of view 230 × 230 mm 2, number of slices = 3, slice thickness = 5 mm). Two axial brain experiments were performed on a 1.5 T Siemens Avanto (Siemens Healthcare, Erlangen, Germany) with 8channel head coil and a 3 T Tim Trio with 32-channel head coil, respectively, to examine the generality of the proposed method. The first axial brain data were acquired with a spin echo pulse sequence (TE/TR = 14/400 ms, 33.3 kHz bandwidth, matrix size = 256 × 256, FOV = 240 × 240 mm 2, number of slices = 18, slice thickness = 6 mm). The second axial brain data were acquired by using the same acquisition imaging parameters as the phantom dataset. The cardiac image was acquired on a 3 T Tim Trio MRI scanner with 32-channel using an ultrafast gradient echo (TurboFLASH) sequence (phase encodes per segment = 12, number of phases = 30, TE/TR = 1.2/2.4 ms, flip angle = 10°, slice thickness = 6 mm, FOV = 230 × 320 mm 2, matrix size = 144 × 202). All in vivo datasets were collected from healthy adult human volunteers who gave written informed consent in accordance with Institutional Review Board policies. 3.2. Image reconstruction As observed in Ref. [20], using higher orders of polynomials or larger widths of Gaussian kernels did not produce better results. Therefore, the order of polynomial kernel was set 2, and the Gaussian kernel parameter γ was empirically set as 100 throughout this study for convenience. Kernel mapping is the most time-consuming process in the proposed method. Fortunately, it requires only one calculation in each reconstruction as the kernel matrices are the same for all coils. Note that all kernel matrices need normalizing to unit trace. Considering that the wrapper algorithm was proved to be convergent [22], the initial value α(0)was simply assigned with an allzeros vector, and the stopping criterion is either the tolerance ε b 5e − 6 or the maximal number of iterations (20) that was reached. Image reconstruction was performed in Matlab (MathWorks, Natick MA) on a computer with a Pentium 4 processor, a speed of 2.70 GHz and 4 GB RAM. The normalized root mean square error (NRMSE) with respect to the reference image reconstructed from the full k-space data was calculated as follows to quantitatively evaluate the accuracy of reconstruction results: rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ref 2 ref NRMSE ¼ ∑j I recon −I ref j j =∑j I j where Ij is the image with recon non-acceleration, Ij the reconstructed image, and j the pixel index of the image.

93

The effects of interpolation window size and number of ACS lines (NSL) on the reconstruction performance were investigated for GRAPPA [6], NLGRAPPA and MKGRAPPA with the phantom dataset. The NLGRAPPA implementation used for this article is developed in Ref. [11], which is available online at http://www. acsu.buffalo.edu/~jlv27/index_files/software.htm. We denote T as the multiple of the number of second order terms to that of the first order terms used in NLGRAPPA. NLGRAPPA with T = 1, 2 and 3 were reconstructed for the phantom study. NRMSE is calculated for the following to quantitatively compare the performance of the three methods:1) outer reduction factors (ORF) of 4 with NSL of 24, the size of interpolation window along the frequency encoding (FE) direction varies from 3 to 15, and the size of interpolation window along the phase encoding (PE) direction is fixed at 2; 2) ORF of 4 with NSL of 32, the size of interpolation window along FE varies from 3 to 15, and the size of interpolation window along PE is fixed at 2; 3) ORF of 4, a 2 × 5 interpolation window with NSL increasing from 16 to 40; and 4) ORF of 4, a 2 × 9 interpolation window is adopted, the NSL increasing from 16 to 40. The reconstruction quality of MKGRAPPA was compared with that of GRAPPA and NLGRAPPA. The optimal interpolation window for GRAPPA was selected following [9] which calculated the g-factors for all interpolation windows with the sizes from 2 × 3 to 10 × 11 at a search step of 2 in each encoding direction. In the absence of noise prescan, the noise covariance matrix was replaced by the identity matrix. For each acceleration, NLGRAPPA was run for a range of T from 1 to 10 to determine the optimal T value (NLGRAPPA with T = 0 are actually the GRAPPA reconstruction). The interpolation window sizes in NLGRAPPA and MKGRAPPA were held to 2 and 9 in the PE and FE directions, respectively, which are a recommended configuration in Ref. [23]. All in vivo data were reconstructed with two kinds of sampling patterns: 1) ORF of 3 and NSL of 12; and 2) ORF of 6 and NSL of 32. 4. Results 4.1. Phantom study Fig. 1 shows the reconstruction results of 16-channel phantom data by using conventional GRAPPA, NLGRAPPA of different parameters (T = 1, 2, and 3), and the proposed MKGRAPPA for ORF 4 with 24 and 32 ACS lines separately. When 24 ACS lines are used for reconstruction, GRAPPA images suffer from aliasing artifacts and amplified noise. The noise is significantly reduced in the NLGRAPPA images with T = 1. Aliasing artifacts become more severe when T increases from 1 to 3. This phenomenon can be observed more clearly when using a 2 × 9 interpolation window than when using a 2 × 5 interpolation window. Thus, NLGRAPPA with T = 1 produces images with optimal visual quality in the NLGRAPPA reconstruction. MKGRAPPA produces images with visual quality comparable with those reconstructed by NLGRAPPA with optimal T of 1. When NSL increased to 32, the images that were reconstructed by all methods contain less noise, fewer artifacts, and clearer edges compared with those reconstructed at NSL of 24. For NLGRAPPA, the images reconstructed by using the 2 × 9 interpolation window are also deteriorated by more pronounced artifacts than those which used the 2 × 5 interpolation window. Upon visual inspection, NLGRAPPA with T = 2 and T = 1 are the preferred results for interpolation windows of 2 × 5 and 2 × 9, respectively. MKGRAPPA-reconstructed images are slightly noisier but contain fewer artifacts than NLGRAPPAreconstructed images. Fig. 2 plots the variation of NRMSE with varying interpolation windows along FE and NSL in conventional GRAPPA, NLGRAPPA and

94


Fig. 1. Phantom images reconstructed at ORF of 4 with NSL of 24 (left) and 32 (right). The proposed MKGRAPPA method is compared with the conventional GRAPPA and NLGRAPPA of different parameters (T = 1, 2, and 3). The methods were implemented with an interpolation window of 2 in the PE direction and 5 in the FE direction (denoted as 2 × 5) and 2 in the PE direction and 9 in the FE direction (denoted as 2 × 9).

MKGRAPPA for ORF of 4. Given NSL of 24 (Fig. 2A), all NRMSE functions decrease with interpolation window size and subsequently rise up. This “U”-shape behavior is pronounced when NLGRAPPA with T = 2 and 3. As NSL increases to 32 (Fig. 2B), all methods except NLGRAPPA with T = 3 produce decreased monotonically NRMSE for all interpolation windows that vary from 3 to 15, unlike conventional GRAPPA. The optimal NLGRAPPA have a slightly smaller NRMSE than the MKGRAPPA, which is consistent with the visual inspection in Fig. 1. For the impact of NSL on the performance of the reconstruction algorithms, Figs. 2C and 2D separately show the plots of NRMSE against varying NSL when interpolation windows of 2 × 5 and 2 × 9 are adopted. NLGRAPPA yields larger NRMSE than MKGRAPPA when limited ACS lines are available (approximately less than 20), which is clearer when a large interpolation window is adopted. Both GRAPPA and MKGRAPPA give flat NRMSE functions. However, NRMSEs of MKGRAPPA are consistently lower than those of GRAPPA.

4.2. In vivo brain imaging Figs. 3 and 4 show comparisons of brain images reconstructed by using a full reference, GRAPPA, NLGRAPPA and MKGRAPPA. Table 1 (first column) list the optimal parameters T in the NLGRAPPA reconstruction, as NLGRAPPA performance strongly depends on the number of second order terms (parameter T). For ORF of 3 and NSL of 12 (Fig. 3), NLGRAPPA produces an image with serious aliasing artifacts even though a small number of second-order terms (T = 1) were used. Conversely, both GRAPPA and MKGRAPPA acquire images without obvious aliasing artifacts. As ORF increases to 6 (Fig. 4), the image reconstructed by GRAPPA is seriously degraded by amplified noise, which is significantly suppressed in the images generated by NLGRAPPA and MKGRAPPA. Although no obvious difference exists between the images reconstructed by NLGRAPPA and MKGRAPPA, the zoomed-in patch images show that NLGRAPPA produces more oscillation artifacts than MKGRAPPA.


95

Fig. 2. Quantitative comparison of conventional GRAPPA, NLGRAPPA and MKGRAPPA algorithms in the phantom study. NRMSE plots against the size of interpolation window along FE for ORF = 4 with NSL of 24 (A) and 32 (B), and NRMSE plots against NSL when interpolation windows of 2 × 5 (C) and 2 × 9 (D) are adopted.

Fig. 5 shows a representative region in the reconstruction of 32channel axial brain data. The images were reconstructed with the ORFs of 3 (left columns) and 6 (right columns). NLGRAPPA produces images with serious oscillation artifacts at ORF of 3 even with the optimal parameters. GRAPPA produces images with pronounced artifacts and noise near the central region of the reconstructed image for ORF of 6. NLGRAPPA removes more noise than MKGRAPPA while introducing artifacts and mild blurring. Fig. 6 plots the profiler lines in the resulting images of the brain dataset at ORF of 6. The profile line positions are highlighted by white lines in the reference images (Figs. 4 and 5). The zoomed-in regions encircled in black circle positions are marked with black circles in reference images (Figs. 4 and 5). The GRAPPA reconstruction image profile in the zoomed-in circle regions shows the largest fitting error duo to noise amplification, and the MKGRAPPA reconstruction profiles are closer to the reference profile. 4.3. In vivo cardiac imaging Fig. 7 shows images reconstructed from 32-channel cardiac dataset by using GRAPPA, NLGRAPPA with optimal parameter T, and MKGRAPPA for ORF of 3 with NSL of 12 and ORF of 6 with NSL of 32. The images reconstructed by using the conventional GRAPPA method contain amplified noise which increases with ORF, especially prominent noise that leads to relatively high NRMSE at ORF of 6. At ORF of 3, slight aliasing artifacts can be observed in the lower portion of the NLGRAPPA error image, while MKGRAPPA produces an image with the least artifacts and noise. At ORF of 6, NLGRAPPA and MKGRAPPA produce images with a decreased noise level, where the noise in the region of interest is clearly removed, whereas noticeable artifacts emerge in other regions.

Table 1 shows the optimal parameter Ts for NLGRAPPA and interpolation window sizes for GRAPPA in phantom and in vivo studies. At ORF of 3, the optimal Ts for all reconstruction cases are the same. As ORF increases to 6, the optimal T for NLGRAPPA exhibits a large variance with different image contents and array coils. Note that the optimal Ts in these data exhibit totally different trends with increased ORF and NSL. The parameter T gradually increases for the 16-channel phantom data and 8-channel brain data, whereas it remains 1 for the 32-channel brain data. As for cardiac data, T rapidly rises with the increasing ORF, reaching up to 9 at ORF of 6. For linear GRAPPA, the optimal interpolation window size increases with ORF. Table 2 illustrates the combination coefficients of basic kernels which are automatically estimated in the proposed method for the 32-channel brain dataset. For simplicity, only the coefficients of the first offset in a fixed coil were presented, as the kernel combination coefficients were calculated for each offset and coil. At low ORF, the linear kernel is assigned with the largest coefficient but the polynomial kernel has only a small weight. With the increased ORF, the polynomial kernel coefficients increase and those of the linear kernel decrease. 5. Discussion This study introduces a novel adaptive nonlinear GRAPPA reconstruction method, in which the sparse multi-kernel learning and SVR are combined to calibrate the interpolation coefficients. The combination coefficients of multiple kernels are optimally determined for each offset of individual coil in the proposed method, which considers the noise level variation in different coils and offsets. The proposed MKGRAPPA method can adaptively balance the noise suppression and additional artifacts when

96


Fig. 3. Images reconstructed from an 8-channel head data with ORF 3 and NSL 12. The results of a nonaccelerated image used as reference image (A), GRAPPA (B), NLGRAPPA with optimal parameter T (C), and MKGRAPPA (D) are shown from top to bottom. The local and difference images are shown at the right of each of their corresponding image.

applied to different undersampling patterns (ORF and NSL), data acquired by different array coils and implemented with different interpolation windows. The NRMSE plots versus the interpolation window size are like a “U” shape in the Figs. 2A to 2B, this consistent phenomenon was observed and analyzed previously [8,10,23,24], and is interpreted from the perspective of statistical learning theory [15] in this paper. The reconstruction error involves fitting error or the empirical risk [15] and confidence error F ðh=N Þ, where h is the Vapnik– Chervonenkis (VC) dimension and N is the number of training samples. VC dimension represents the degree of complexity of the interpolation function. The fitting error decreases, whereas, the confidence error increases with h, which results in an inherent conflicting demand between the fitting error and confidence error in reconstruction. This phenomenon is indicated by the “U” shaped

NRMSE plots. The fitting and the confidence errors decrease with N , thus the NRMSE functions in Figs. 1C and 1D share the same trends of decrease. The h of nonlinear interpolation function for the same interpolation window is higher than that of the linear interpolation function, which causes the dramatic rises (Figs. 2A to 2B) or drops (Figs. 2C to 2D) of the NLGRAPPA NRMSE plots. GRAPPA and NLGRAPPA performances are heavily dependent on the choice of interpolation window or parameter T because of the empirical risk minimal principle used in previous studies. Thus, these parameters should be carefully selected to prevent the underfitting and overfitting of interpolation function. The optimal interpolation window in GRAPPA can be selected at low accelerations by using some advanced methods. However, Table 1 and resulting images show that the artifacts and noise amplification are serious at high accelerations despite the adoption of a large interpolation window.


97

Fig. 4. Images reconstructed from 8-channel head data with ORF 6 and NSL 32. The results of a nonaccelerated image used as reference image (A), GRAPPA (B), NLGRAPPA with optimal parameter T (C), and MKGRAPPA (D) are shown from top to bottom. The local and difference images are shown at the right of each of their corresponding image.

The NLGRAPPA can drastically reduce noise through nonlinear transformation. However, choosing optimal reconstruction parameters for NLGRAPPA is problematic because the optimal parameter T varies with ORF, interpolation window, NSL, and the coil configuration as shown in Fig. 2 and Table 1. The exaggerated difference in the results of NLGRAPPA with slightly different T limits finer parameter

adjustments to obtain better performance. The nonlinear kernels in MKGRAPPA are used to reduce the fitting error. The training is based on the structural risk minimization principle [15], which can control confidence error. The combination of LS-SVR and multi-kernel learning reduces fitting noise at high accelerations and avoids artifacts caused by overfitting.

Table 1 Optimal interpolation window sizes in GRAPPA and T in NLGRAPPA for phantom and in vivo studies. -

GRAPPA NLGRAPPA

16-ch phantom

8-ch brain

32-ch brain

32-ch cardiac

ORF = 3

0RF = 6

0RF = 3

0RF = 6

ORF = 3

ORF = 6

ORF = 3

ORF = 6

2×5 1

2×9 1

2×5 1

4×9 4

2×5 1

4×7 2

2×9 1

4 × 11 9

98


Fig. 5. Performance comparisons for the 32-channel brain data at ORF = 3 with NSL = 12 and ORF = 6 with NSL = 32, respectively. The magnitude and difference images of a representative region are shown for (A) nonaccelerated image used as reference image (B) GRAPPA, (C) NLGRAPPA with optimal parameter T and (D) MKGRAPPA.

Fig. 6. Image profile lines running through a tube with different contrast behaviors for ORF of 6. The profile line positions are highlighted by white lines in the reference images (Figs. 4 and 5). The zoomed-in regions encircled in black circle positions are marked with black circles in reference images (Figs. 4 and 5).


99

Fig. 7. Performance comparisons for cardiac dataset at ORF = 3 with NSL = 12 (left) and ORF = 6 with NSL = 32 (right), respectively. The results of nonaccelerated image used as reference image (A), GRAPPA (B), NLGRAPPA with optimal parameter T (C), and MKGRAPPA (D) are shown from top to bottom.

Table 2 Kernel combination weights in MKGRAPPA for phantom and in vivo studies. -

Linear kernel Polynomial kernel Gaussian kernel γ−1

16-ch phantom

8-ch brain

32-ch brain

32-ch cardiac

ORF = 3

0RF = 6

0RF = 3

0RF = 6

ORF = 3

ORF = 6

ORF = 3

ORF = 6

0.6942 0.1263 0.2056 0.0013

0.2223 0.5490 0.2287 0.0035

0.5537 0.1574 0.3091 0.0018

0.1088 0.6953 0.1960 0.0069

0.6524 0.1728 0.2569 0.0015

0.2053 0.5928 0.2446 0.0042

0.4153 0.3516 0.2385 0.0027

0.1635 0.7219 0.2065 0.0084

100


Table 3 Comparison of time consumption between GRAPPA, NLGRAPPA and MKGRAPPA (s). -

GRAPPA NLGRAPPA MKGRAPPA

16-ch phantom

8-ch brain

32-ch brain

32-ch cardiac

ORF = 3

0RF = 6

0RF = 3

0RF = 6

ORF = 3

ORF = 6

ORF = 3

ORF = 6

44 92 106

760 1068 1590

11 14 17

349 367 385

84 324 183

1612 4329 3175

36 127 83

591 3948 1044

Ref. [10] indicates that a desired interpolation window minimizes reconstruction error, which results in an optimal tradeoff between signal-to-noise-ratio (SNR) and artifacts. Although the NLGRAPPA results have the lowest NRMSE values among all methods for some cases (Figs. 2 and 4E), the subjective quality of the reconstructed image is often not faithfully reflected by the NRMSE value. The visual (Figs.1 and 4D) and profile comparisons (Fig. 6) indicate that the blind removal of noises in NLGRAPPA increases aliasing artifacts and blurring. By contrast, the experiments demonstrated that the proposed MKGRAPPA method can reach a suitable compromise between the noise level and blurring artifacts. The LS-SVR can be cast as a special case of regularization method, but additionally emphasize and exploit primal–dual interpretations [25]. Actually, choosing the optimal regularization parameter is nontrivial. For example, the discrepancy principle [26] tends to cause over-smoothing, and the L-curve method and cross-validation [27] can be computationally expensive. The regularization parameter is selected within the multi-kernel learning problem by regarding the effect of the regularization term as a special kernel in MKGRAPPA. This approach may significantly reduce computational cost, which is more meaningful for real-time requirements. The experimental results show that neither linear kernel nor polynomial kernel can consistently perform well for overall sampling scenarios. The linear kernel is the ideal choice for low ORF reconstruction tasks, whereas the polynomial kernel is more capable of removing residual artifacts and suppressing amplified noises at high accelerations. Table 2 shows that MKGRAPPA can utilize each kernel to provide complementary information by assigning different combination weights which is consistent with the prior information of different kernels. Better MKGRAPPA performance could be expected when more kernels are combined. However, this would increase the burden on memory and computational cost when the ACS lines and coils number are large. Some advanced kernels, such as Fourier and wavelet kernels, could be investigated in future work. The proposed method uses more kernels to map the input data. However, time consumption does not increase significantly compared with existing methods. Table 3 lists the time consumption of different methods in phantom and in vivo studies. The listed times of GRAPPA and NLGRAPPA are merely the reconstruction times under the optimal interpolation window and T, and the searching time is not included. NLGRAPPA and MKGRAPPA are slightly slower than GRAPPA when the coil number is eight. Nonlinear mapping requires greater time consumption, and the calculation speed of MKGRAPPA is 2–3 times slower than that of GRAPPA when the coil number is larger than eight. NLGRAPPA have the largest time consumption when coil number is 32 because of the concrete construction of each second order term and the significant increases of the number of second order terms with coil number. The methods require considerable reconstruction time when the coil number is large. Fortunately, the dataset with a large number of coils can be preprocessed by some coil compression techniques [28,29]. In addition, the symmetry of kernel matrix and parallel computation on computers can be used to accelerate MKGRAPPA reconstruction in future studies.

6. Conclusion This study demonstrated a novel multiple kernel learning based algorithm for GRAPPA reconstruction. The proposed MKGRAPPA method combined multi-kernel learning and LS-SVR to train the interpolation function. This method provides optimal trade-off between noise and artifacts in reconstructed images. The experimental results indicated that MKGRAPPA can potentially reduce noise and artifacts in GRAPPA reconstruction. Compared with NLGRAPPA, although MKGRAPPA gives implicit improvement on image quality at high accelerations, the latter is less sensitive to interpolation window and kernel parameter than the former. Acknowledgments This research work was supported by the National Key Basic Research Program Project of China (NO. 2010CB732501 and NO. 2010CB732502) and National Natural Science Funds of China (NO. 81371539). Appendix A. Estimation of fuzzy weight The initial fuzzy weights v 0 are directly designed from the ACS data. The outlier can be detected with the angle between each sample with the mean of all samples s mean when the interpolation function through the origin of coordinates and the s mean falls into the fitted interpolation function. A large angle between one sample with s mean indicates that the sample is far from the fitted function, thus a small fuzzy weight should be assigned. Therefore, assigning the fuzzy weight can be based on the cosine function because it is symmetrical and decreases monotonically within the range of [0, π/ 2]. The cosine between each sample and s mean can be determined by calculating their inner product: 0 hsi0 ; smean i 0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi v i ¼ pffiffiffiffiffiffiffiffiffiffiffi 0 0 0 0 ; smean hs; si i hsmean i

ðA1Þ

where s0 i ¼ ½si bi . The fitting residual can be calculated after each iteration by the following equation, ðτÞ

e

h i ^ðτÞ −b ¼ ΩðτÞ α ðτÞ b ¼b

ðA2Þ

The fuzzy weight for next iteration v i ðτþ1Þ is then determined by using: ðτþ1Þ vi

^ ≤q if e ðτÞ =σ i : 1=eðτÞ otherwise; i 8

Retargeted Least Squares Regression Algorithm.

Approximate solutions to ordinary differential equations using least squares support vector machines.

Incremental learning for ν-Support Vector Regression.

Determination of Cefoperazone Sodium in Presence of Related Impurities by Linear Support Vector Regression and Partial Least Squares Chemometric Models.

Low-dose CT reconstruction via L1 dictionary learning regularization using iteratively reweighted least-squares.

LMSMVE: a program for least median of squares regression and robust distances.

Robust support vector regression for uncertain input and output data.

A multiple hold-out framework for Sparse Partial Least Squares.

Online least squares one-class support vector machines-based abnormal visual event detection.

Moving least-squares reconstruction of large models with GPUs.

Inline Measurement of Particle Concentrations in Multicomponent Suspensions using Ultrasonic Sensor and Least Squares Support Vector Machines.

Predicting the ecotoxicity of ionic liquids towards Vibrio fischeri using genetic function approximation and least squares support vector machine.

Electricity load forecasting using support vector regression with memetic algorithms.

On sufficient statistics of least-squares superposition of vector sets.

Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.

Kernel recursive least-squares tracker for time-varying regression.

Doubly-robust dynamic treatment regimen estimation via weighted least squares.

Discriminative least squares regression for multiclass classification and feature selection.

Multikernel least mean square algorithm.

Quantifying silica in filter-deposited mine dusts using infrared spectra and partial least squares regression.

Gene Function Prediction from Functional Association Networks Using Kernel Partial Least Squares Regression.

Methods of least squares and SIRT in reconstruction.

Linear support vector regression and partial least squares chemometric models for determination of Hydrochlorothiazide and Benazepril hydrochloride in presence of related impurities: a comparative study.

Surgical planning for horizontal strabismus using Support Vector Regression.