Variational inference with ARD prior for NIRS diffuse optical tomography.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 5, MAY 2015

1109

Brief Papers Variational Inference With ARD Prior for NIRS Diffuse Optical Tomography Atsushi Miyamoto, Kazuho Watanabe, Kazushi Ikeda, and Masa-Aki Sato

Abstract— Diffuse optical tomography (DOT) reconstructs 3-D tomographic images of brain activities from observations by near-infrared spectroscopy (NIRS) that is formulated as an ill-posed inverse problem. This brief presents a method for NIRS DOT based on a hierarchical Bayesian approach introducing the automatic relevance determination prior and the variational Bayes technique. Although the sparseness of the estimation strongly depends on the hyperparameters, in general, our method has less dependency on the hyperparameters. We confirm through numerical experiments that a schematic phase diagram of sparseness with respect to the hyperparameters has two regions: in one region hyperparameters give sparse solutions and in the other they give dense ones. The experimental results are supported by our theoretical analyses in simple cases. Index Terms— Automatic relevance determination (ARD) prior, diffuse optical tomography (DOT), near-infrared spectroscopy (NIRS), phase diagram, variational Bayes (VB) method.

I. I NTRODUCTION Near-infrared spectroscopy (NIRS) measures the oxygenated and deoxygenated hemoglobin concentration changes in the brain from sources to detectors and makes a 2-D map of brain activity of human without invasion [2]. If there are more sources and detectors, we can reconstruct a 3-D tomographic image from NIRS observations that is called the diffuse optical tomography (DOT) [3]–[5]. The Rytov formulation [6]–[8] rewrites the reconstruction to an ill-posed linear inverse problem [9], [10]. A conventional method to solve this kind of problems is the minimum norm (MN) estimation. From the Bayesian estimation viewpoint, this can be regarded as the maximum a posteriori (MAP) solution with a Gaussian prior distribution, where a prior knowledge appears in the prior distribution of parameters [11]. However, this does not take the localization of brain activities into account [12], [13]. In this brief, we present a hierarchical Bayesian method for NIRS-DOT that includes the MN estimation as a special case. Our method introduces the automatic relevance determination (ARD) prior to the DOT inverse problem to achieve sparse estimation of localized Manuscript received June 6, 2013; revised February 4, 2014 and May 22, 2014; accepted May 23, 2014. Date of publication June 18, 2014; date of current version April 15, 2015. This work was supported in part by the National Institute of Information and Communications Technology and in part by the Japan Society for the Promotion of Science under Grant 23700175 and Grant 25120014. A. Miyamoto is with Nikon Corporation, Kanagawa 244-8533, Japan (e-mail: [email protected]). K. Watanabe is with the Department of Computer Science and Engineering, Toyohashi University of Technology, Aichi 441-8580, Japan (e-mail: [email protected]). K. Ikeda is with the Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma 630-0192, Japan, and also with ATR Neural Information Analysis Laboratories, Kyoto 631-0804, Japan (e-mail: [email protected]). M.-A. Sato is with ATR Neural Information Analysis Laboratories, Kyoto 631-0804, Japan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2014.2328576

brain activity [14]–[16]. The ARD prior is a Gaussian distribution with mean zero and the inverse of the variance obey a conjugate prior distribution, a Gamma distribution with hyperparameters [17]. Although the prior is conjugate, the exact posterior distribution is intractable to compute due to its hierarchy. Our method employs the variational Bayes (VB) method for efficient computation [18] that approximates the posterior to a distribution where the random variables and the parameters are independent of each other. Another trick of the VB is to maximize a lower bound instead of the exact marginal likelihood [18]. The hierarchical Bayesian approach has previously been applied to the magnetoencephalography (MEG) source reconstruction [19] and has shown good results compared with Markov-chain Monte Carlo experiments [20], [21]. The main difference between the MEG problem and our problem is the properties of magnetic permeability and optical transparency. The latter has a variation in position while the former is almost constant everywhere that makes our problem more difficult. The aforementioned studies showed their results were robust against the variation of hyperparameters of the model. Robustness against the variation is important in our case since it is difficult to precisely estimate the optical properties of the brain. We carried out numerical experiments to see the dependency on the hyperparameters and found a phase transition phenomenon in the results [1]. Hyperparameters in one region of the phase diagram generate a sparse and accurate solution, while those in the other region a dense and inaccurate solution. In each area, the estimation is robust against prior variations. We also give a theoretical analysis to explain the phase transition phenomenon. Although the analysis applies to only a simple case, the result provides insight into how the phase transition phenomenon takes place. This phase transition has potential to occur when the VB method is applied to sparse linear inverse problems in general. The rest of this brief is organized as follows. Section II describes the Bayesian formulation of the DOT inverse problem, the hierarchical prior of the model and the VB method. Section III presents numerical experiments to examine the properties of the method. The theoretical analysis of the model in the simple case is described in Section IV. In Section V, our results are discussed. Finally, Section VI presents our conclusion. II. M ETHODS A. Conventional Model The distributions of oxygenated and deoxygenated hemoglobin concentrations in the brain change according to how much activated each area of the brain is. NIRS measures the change using the difference of their absorption coefficients [22]. Leary [6] derived a linear equation Y = AX

(1)

that expresses the relationship between the changes in the coefficients, X ∈ R N , and the changes of the photon densities observed by NIRS,

2162-237X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1110


Furthermore, we introduce the noise variance σ 2 and replace the model (4) with Y |X X ) = N (Y Y |A A X , σ 2 I ). P(Y

(8)

C. VB Method Bayesian inference makes the estimate of the tomographic images X by calculating its expectation with respect to its posteX |Y Y ) for the observed data Y . However, it is rior distribution P(X intractable to compute the posterior distribution Fig. 1. Graphical models of conventional and proposed methods. Double circled nodes and circled nodes are observation variables and random variables, respectively. αMN , α¯ 0 , γ0 , and σ 2 are hyperparameters.

X , α |Y Y) = P(X since it includes an integral

Y ∈ R M , using the Rytov formulation. Here, the mth and nth element A m,n of the sensitivity matrix, A ∈ R M×N , describes the efficiency of the absorption coefficient changes, xn , in the nth voxel to the density change, ym , of the mth probe pair. DOT estimates the absorption coefficient changes, X , from the observed photon density changes, Y , and the sensitivity matrix, A , and visualizes them as tomographic images that show the pattern of the brain activity. Hereafter, we will refer to X and Y as the tomographic images and the observed data, respectively. A conventional method to solve (1) is the MN estimation that minimizes the cost function X 22 X ) = Y Y − A X 22 + αMN X E(X

(2)

where · 2 is the l2 norm. The first term in the cost function (2) represents the reconstruction error and the second term the regularization to resolve the ill-posed nature of the inverse problem. The coefficient, αMN , controls their relative weights. From the Bayesian inference’s point of view, the cost function (2) is regarded as the negative log-posterior of X when the prior distribution and the likelihood are given by −1 X |00 , αMN X |αMN ) = N (X I) P(X Y |X X ) = N (Y Y |A AX , I ) P(Y

(3)

A T A + αMN I )−1 A TY X¯ = (A

(5)

where I is the identity matrix. B. Hierarchical Model A hierarchical model introduces a prior distribution to parameters (Fig. 1). The ARD prior extends the variance of the prior distribution (3) to a diagonal matrix, α−1 = diag(αn−1 ), and the precision vector, α = {αn |n = 1, ..., N} ∈ R N , has a prior distribution α) = P(α

N

(αn |α¯ 0 , γ0 )

n=1

(αn |α¯ 0 , γ0 ) = (γ0 )−1 αn−1

αn γ 0 α¯ 0

(6)

γ0 exp

−

αn γ 0 α¯ 0

α P(X X , α, Y ) d X dα

(10)

in its denominator. Y ), The VB method [18] approximates the posterior distribution, P(Y by the maximum of its lower bound X , α, Y ) P(X α Q(X X , α ) log (11) F(Q) = d X dα X , α) Q(X X , α, Y ) P(X α Q(X X , α) (12) ≤ log d X dα X , α) Q(X Y) = log P(Y (13) X , α ) and F(Q) are called the variational distribution and where Q(X the variational free energy, respectively. Since the variational free energy is written as Y ) − KL[Q(X X , α )|P(X X , α |Y Y )] F(Q) = log P(Y

(14)

the problem results in minimizing the Kullback–Leibler (KL) diverX , α ) to P(X X , α |Y Y ). Obviously, the variational free gence from Q(X Y ), if and only if energy, F(Q), takes its maximum value, log P(Y X , α ) equals P(X X , α |Y Y ), or the KL-divergence is null. Q(X For a tractable approximation of the posterior distribution, the variational distribution usually takes the form of X )Qα (α α) X , α ) = Q X (X Q(X

(15)

X ) and Qα (α α ). Then, that is, assuming the independence of Q X (X X ) and Qα (α α ) are updated alternatively. In other words, the Q X (X X ) are update so that the KL-divergence is parameters of Q X (X α ) and vice versa. They are called the minimized for a fixed Qα (α X and α steps. X ) is a Gaussian distribution and Qα (α α ) is For the ARD prior, Q X (X a Gamma distribution. Then, the X -step gives the optimal variational distribution as X ) = N (X X | X¯ , −1 ) Q X (X

(16)

where X¯ = σ −2 −1 A TY

(17)

= σ −2 A T A + α¯

(18)

and α¯ = diag(α¯ n ). In a similar way, the α -step is written as α) = Qα (α

(7)

where (αn |α¯ 0 , γ0 ) represents the Gamma distribution with mean parameter α¯ 0 and shape parameter γ0 . The mean, E[αn ], is α¯ 0 and the variance, V [αn ], is α¯ 02 /γ0 . The hyperparameters, α¯ 0 and γ0 , can arbitrarily be selected but most popular ones are α¯ 0 = 1 and γ0 → 0. They are called a noninformative prior and have the shift-invariance property.

(9)

Y) = P(Y

(4)

respectively. Hence, the MN method is equivalent to the MAP estimation under the above condition. In this model, αMN is a precision parameter. Note that the solution of the MN estimation is explicitly given by

X , α, Y ) P(X Y) P(Y

N

(αn |α¯ n , γ )

(19)

n=1

where

1 ¯ 2 1 −1 γ )n,n + 0 ( X )n + ( 2 2 α¯ 0 1 γ = γ0 + 2

α¯ n−1 = γ −1

(20) (21)


1111

Fig. 2. Medium for synthetic data. Green and brown circles represent sources and detectors, respectively.

where (·)n and (·)n,n represent the nth and n, nth elements of a vector and a matrix, respectively. These X and α steps are repeated until the variational free energy converges. When γ0 → ∞, α¯ 0 takes a constant value, our method is reduced to the MN estimation.

Fig. 3.

Reconstructed tomographic images in the sparse case.

Fig. 4.

Reconstructed tomographic images in the dense case.

III. N UMERICAL E XPERIMENTS A. Settings We examined our method with synthetic data as below. The sensitivity matrix, A , was realistically given by the analytical solution of the diffusion equation with Rytov formulation [6]–[8], where photon migration was modeled in highly scattering media with infinite boundary conditions. The optical properties for tissue (i.e., gray matter) were derived from [10]. We have 500 voxels of 4×4×4 (mm) in the medium of 40 × 40 × 20 (mm) and placed 13 sources and 12 detectors thereon, as shown in Fig. 2, that make 156(= 13 × 12) measurements. Under the above condition, we considered the following two cases on the true tomographic images, X true . One is the sparse case, where only a portion of voxels in the first layer are activated taking the localized brain activity into account. The other is the dense case, where all the voxels in the first layer are activated although this is not realistic. The observed data, Y , were made by adding the white Gaussian sensor noise with signal-to-noise ratio (SNR) 102 to A X true so that the conditional distribution (8) holds. The definition of the SNR is the ratio of the variance of the measurement signal, Y , and that of the true signal, A X X. The quality and localization of the reconstructed tomographic images, X¯ , are measured by the normalized squared error NE( X¯ ) =

X true − X¯ 22 X X true 22 X

and the sparseness function (SF)1 √ 1 X¯ 1 ¯ SF( X ) = √ N− ∈ [0, 1] X¯ 2 N −1

(22)

(others). We see that our method gives jagged images or blurred images that strongly depends on the high or low γ0 . The images are classified into two kinds: sparse images and blurred ones. This can quantitatively be shown as Fig. 5, where the normalized error (NE), the SF, and the variational free energy (F) are shown for various values of hyperparameters as well as sparsenesses of true images. We can draw a schematic phase diagram inferred from the top two rows in Fig. 5 that is shown in Fig. 6. It shows an existence of a phase transition, i.e., parameters in the region A give sparse solutions and those in the region B dense blurred images. Only small changes of the values in each region exist. IV. T HEORETICAL A NALYSIS FOR P HASE T RANSITION

(23)

where N is the dimension of the vector X¯ [23]. · 1 is the l1 norm. Note that the value of sparseness is 0.95 in the sparse condition and 0.58 in the dense condition. B. Results Reconstructed tomographic images for various values of the hyperparameters, α¯ 0 and γ0 , are shown in Figs. 3 and 4 in the sparse and dense cases, respectively, where each figure shows the first layer of a true image (left top) and reconstructed images with our method 1 The 0-norm is another criterion for sparseness but it strongly depends on the threshold for determining zero or not.

Why does such a phase transition appear? In the case that the model is 1-D and α¯ 0 is infinity, we can derive the local optimal solution analytically and show how the transition occurs. The extreme assumption corresponds to the parameter setting of the upper edge of Fig. 6, in which the phase transition occurs. For clarity, we rewrite 1-D parameters as scalar ¯ Y = y, X¯ = x, A = a, = λ, α¯ = α.

(24)

Then, the variational free energy is F(α, ¯ x, λ, γ ) σ −2 a 2 1 + log λ + α(x ¯ 2 + λ−1 ) = − σ −2 (y − ax)2 + 2 λ α¯ +γ log + log (γ ) + γ (25) γ

1112


Fig. 5. NE (top row), the SF (middle row), and the variational free energy (bottom row). The hyperparameters, α¯ 0 and γ0 , run from 10−5 to 105 in log scale. The sparsenesses of the true images of the columns are 0.95, 0.76, 0.67, and 0.58 from left to right.

where f (α) ¯ = ξ2 α¯ 2 + ξ1 α¯ + ξ0 ξ2 = 2γ0 > 0 ξ1 = σ −2 a 2 (4γ0 − σ −2 y 2 + 1) ξ0 = σ −4 a 4 (2γ0 + 1) > 0.

Fig. 6.

D2 = 8σ −6 a 4 y 2 (γˆ0 − γ0 )

σ −2 ay x = −2 2 σ a + α¯ λ = σ −2 a 2 + α¯

α¯ = 2γ (x 2 + λ−1 )−1 1 γ = γ0 + . 2

(26) (27) (28) (29)

We substitute (26), (27), and (29) in the variational free energy of (25). Then, the variational free energy is 1 α¯ ασ ¯ −2 y 2 + log −2 2 +γ0 log α. ¯ 2 2(σ −2 a 2 + α) ¯ σ a + α¯

(30)

This function is not optimized with respect to α¯ yet. Obviously, the variational free energy is continuous and differentiable in the domain. The variational free energy of the domain boundary is ¯ = −∞, lim F(α)

(35) (36)

(37)

where

and the updating rules are

α→0 ¯

(34)

We solve the problem about zero point of (33) instead of (32) because the denominator of the rhs in (32) is positive in the domain. Since f (α) ¯ in (33) is a quadratic function, we analytically solve f (α) ¯ = 0. The discriminant of the quadratic equation is

Schematic phase diagram.

F(α) ¯ =−

(33)

lim F(α) ¯ = ∞.

α→∞ ¯

(31)

It means that the global optimal solution of α¯ is infinity when other parameters satisfy the stationary conditions. We focus on the local optimal solution given by the iterative update rules because the VB method yields a local optimal solution. The derivative of the variational free energy is f (α) ¯ d F(α) ¯ = d α¯ 2α( ¯ α¯ + σ −2 a 2 )2

(32)

γˆ0 =

(y 2 − σ 2 )2 . 8σ 2 y 2

(38)

The following discussion suggests that γˆ0 plays the role of the transition point. It is implied from (38) that the phase transition occurs for any noise variance σ 2 while the transition point is affected by it. If the discriminant is positive, 0 < γ0 < γˆ0 , there are two real¯ valued solutions. Since ξ2 > 0 and ξ0 > 0, the axis of f (α) determines the signs of the solutions. The axis of f (α) ¯ is σ −2 y 2 − 1 ξ1 = a 2 −1 + . (39) − 2ξ2 4γ0 If σ −2 y 2 is less than one, the axis and two solutions are in negative domain. Then, (33) and (32) are positive. It means that the variational free energy monotonically increases unboundedly as α¯ → ∞. If σ −2 y 2 is more than one, the axis and two solutions are in positive domain because ξ1 σ −2 y 2 + 1 2 >a − >0 (40) 2ξ2 σ −2 y 2 − 1 follows from γ0 < γˆ0 . The larger one is a local minimizer of the variational free energy. The smaller one is a local maximizer and a stationary point. Then, the local optimal solution of the variational free energy is √ −ξ1 − D2 . (41) α¯ = 2ξ2


1113

Another issue is the convergence property of the VB methods [27]. Nakajima and Watanabe [28], [29] and Nakajima and Sugiyama [30] proved that the VB method had only one local optimum in simple cases. The same group found that the phase transition of VB learning occurred in Bernoulli mixture models [31]. More rigorous analyses for the phase transition are future work. Fig. 7. NE for various γ0 in 1-D case. The circles denote the numerical solutions and the line denotes the analytical solution. σ 2 = 1 and y 2 > 1.

If the discriminant is nonpositive, γ0 ≥ γˆ0 , there is at most one real solution in positive domain. Then, (33) and (32) are nonnegative. It means that the variational free energy monotonically increases unboundedly as α¯ → ∞. The theoretical results are summarized as follows. If σ −2 y 2 > 1 and 0 < γ0 < γˆ0 √ σ −2 a 2 (−4γ0 + σ −2 y 2 − 1) − D2 α¯ = (42) 4γ0 4γ0 σ −2 ay (43) x = −2 2 −2 2 √ σ a (σ y − 1) − D2 √ σ −2 a 2 (σ −2 y 2 − 1) − D2 (44) λ= 4γ0 1 (45) γ = γ0 + . 2 Otherwise 1 (46) α¯ = ∞, x = 0, λ = ∞, γ = γ0 + . 2 γˆ0 is a critical point of the phase transition. Instead of the 1-D assumption, we assume that A is orthogonal matrix, similar results can be obtained. We confirmed the result by some numerical experiments. Fig. 7 shows that the numerical and analytical solutions match well and a phase transition occurs at the critical point. The numerical solution was obtained by the iterative updating rules, (26)–(29). It means that γ0 dramatically changes the solution given by the VB method. V. D ISCUSSION The prior distribution is the most important in Bayesian inference. We introduced the ARD prior for NIRS-DOT, i.e., a hierarchical Bayesian model with the Gamma distribution being a hyperprior distribution. This model is large enough to include the conventional methods. In the limit of γ0 → ∞, e.g., (αn |α¯ 0 , γ0 ) approaches a delta distribution and our method corresponds to the MN method. Note that the hyperparameters belong to the region B in Fig. 6 in this case. Another example is the case of γ0 → 0 and α¯ 0 = 1, where our method gives the noninformative prior distribution belonging to the region A in Fig. 6 [24]. In the case of γ0 = 1, the prior is an exponential distribution, (αn |α¯ 0 , γ0 ) ∝ exp (−αn /α¯ 0 ), that is related to a relaxation of l1 regularization [25], [26]. In the limit of α¯ 0 → ∞ with γ0 = 1, the prior becomes a uniform distribution and belongs to the region A. The performance of hierarchical Bayesian methods strongly depends on their hyperparameters. In the case of the current source estimation from MEG data in [19], the authors utilized fMRI information for selecting the hyperparameters but they showed the results were not sensitive to the choice of hyperparameters. Our experimental results agree with the above fact and our theoretical analysis partially explains the robustness against the variation of hyperparameters.

VI. C ONCLUSION We presented a hierarchical Bayesian method for NIRS-DOT. This introduces the ARD prior distribution and calculates the posterior distribution using the VB method. Our method reconstructed better tomographic images compared with the conventional MN method when the true images are localized. Exhaustive experiments with various values of hyperparameters indicated that the solutions were rather robust against the variation of hyperparameters and had a phase transition phenomenon. We analyzed the property of the solutions in a simple case and showed the existence of a phase transition. Note that our method was applied only to synthetic data in this brief since there are no real NIRS-DOT data freely available to the best of our knowledge. Our method should be confirmed by real data in the future. ACKNOWLEDGMENT The authors would like to thank Dr. S. Nakajima for helpful discussions. R EFERENCES [1] A. Miyamoto, K. Watanabe, K. Ikeda, and M. Sato, “Phase diagrams of a variational Bayesian approach with ARD prior in NIRS-DOT,” in Proc. Int. Joint Conf. Neural Netw., San Jose, CA, USA, Jul./Aug. 2011, pp. 1230–1236. [2] G. Strangman, D. A. Boas, and J. P. Sutton, “Non-invasive neuroimaging using near-infrared light,” Biol. Psychiatry, vol. 52, no. 7, pp. 679–693, Oct. 2002. [3] D. A. Boas, A. M. Dale, and M. A. Franceschini, “Diffuse optical imaging of brain activation: Approaches to optimizing image sensitivity, resolution, and accuracy,” NeuroImage, vol. 23, pp. 275–288, 2004. [4] E. M. C. Hillman, “Experimental and theoretical investigations of near infrared tomographic imaging methods and clinical applications,” Ph.D. dissertation, Dept. Med. Phys. Bioeng., Univ. College London, London, U.K., 2002. [5] B. W. Zeff, B. R. White, H. Dehghani, B. L. Schlaggar, and J. P. Culver, “Retinotopic mapping of adult human visual cortex with high-density diffuse optical tomography,” in Proc. Nat. Acad. Sci. USA, vol. 104, no. 29, pp. 12169–12174, Jul. 2007. [6] M. A. O’Leary, “Imaging with diffuse photon density waves,” Ph.D. dissertation, Dept. Phys., Univ. Pennsylvania, Philadelphia, PA, USA, 1996. [7] J. P. Culver, A. M. Siegel, J. J. Stott, and D. A. Boas, “Volumetric diffuse optical tomography of brain activity,” Opt. Lett., vol. 28, no. 21, pp. 2061–2063, Nov. 2003. [8] J. P. Culver, T. Durduran, D. Furuya, C. Cheung, J. H. Greenberg, and A. G. Yodh, “Diffuse optical tomography of cerebral blood flow, oxygenation, and metabolism in rat during focal ischemia,” J. Cerebral Blood Flow Metabolism, vol. 23, no. 8, pp. 911–924, 2003. [9] D. A. Boas, T. Gaudette, and S. R. Arridge, “Simultaneous imaging and optode calibration with diffuse optical tomography,” Opt. Exp., vol. 8, no. 5, pp. 263–270, Feb. 2001. [10] D. A. Boas and A. M. Dale, “Simulation study of magnetic resonance imaging-guided cortically constrained diffuse optical tomography of human brain function,” Appl. Opt., vol. 44, no. 10, pp. 1957–1968, Apr. 2005. [11] H. H. Thodberg, “A review of Bayesian neural networks with an application to near infrared spectroscopy,” IEEE Trans. Neural Netw., vol. 7, no. 1, pp. 56–72, Jan. 1996. [12] R. Sitaram et al., “Temporal classification of multichannel near-infrared spectroscopy signals of motor imagery for developing a brain-computer interface,” NeuroImage, vol. 34, no. 4, pp. 1416–1427, Feb. 2007.

1114


[13] A. Li et al., “Optimal linear inverse solution with multiple priors in diffuse optical tomography,” Appl. Opt., vol. 44, no. 10, pp. 1948–1956, Apr. 2005. [14] M. Guven, B. Yazici, X. Intes, and B. Chance, “Hierarchical Bayesian algorithm for diffuse optical tomography,” in Proc. 34th Appl. Image Pattern Recognit. Workshop, Los Alamitos, CA, USA, Dec. 2005, pp. 140–145. [15] P. J. G. Lisboa et al., “Partial logistic artificial neural network for competing risks regularized with automatic relevance determination,” IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1403–1416, Sep. 2009. [16] A. Browne, A. Jakary, S. Vinogradov, Y. Fu, and R. Deicken, “Automatic relevance determination for identifying thalamic regions implicated in schizophrenia,” IEEE Trans. Neural Netw., vol. 19, no. 9, pp. 1101– 1107, Jun. 2008. [17] M. A. Chappell, A. R. Groves, B. Whitcher, and M. W. Woolrich, “Variational Bayesian inference for a nonlinear forward model,” IEEE Trans. Signal Process., vol. 57, no. 1, pp. 223–236, Jan. 2009. [18] H. Attias, “Inferring parameters and structure of latent variable models by variational Bayes,” in Proc. 15th Conf. Uncertainty Artif. Intell., San Francisco, CA, USA, 1999, pp. 21–30. [19] M.-A. Sato et al., “Hierarchical Bayesian estimation for MEG inverse problem,” NeuroImage, vol. 23, no. 3, pp. 806–826, Nov. 2004. [20] A. Nummenmaa et al., “Automatic relevance determination based hierarchical Bayesian MEG inversion in practice,” NeuroImage, vol. 37, no. 3, pp. 876–889, Sep. 2007. [21] A. Nummenmaa et al., “Hierarchical Bayesian estimates of distributed MEG sources: Theoretical aspects and comparison of variational and MCMC methods,” NeuroImage, vol. 35, no. 2, pp. 669–685, Apr. 2007.

[22] A. Li, Q. Zhang, J. P. Culver, E. L. Miller, and D. A. Boas, “Reconstructing chromosphere concentration images directly by continuous-wave diffuse optical tomography,” Opt. Lett., vol. 29, no. 3, pp. 256–258, Feb. 2004. [23] P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints,” J. Mach. Learn. Res., vol. 5, pp. 1457–1469, Jan. 2004. [24] D. J. C. MacKay, Information Theory, Inference and Learning Algorithms. Cambridge, U.K.: Cambridge Univ. Press, 2003. [25] D. P. Wipf and S. Nagarajan, “A new view of automatic felevance determination,” in Advances in Neural Information Processing Systems, vol. 20. Cambridge, MA, USA: MIT Press, 2008. [26] M. West, “On scale mixture of normal distributions,” Biometrika, vol. 74, no. 3, pp. 446–448, 1987. [27] M. Sato, “Online model selection based on the variational Bayes,” Neural Comput., vol. 13, no. 7, pp. 1649–1681, Jul. 2001. [28] S. Nakajima and S. Watanabe, “Generalization error of automatic relevance determination,” in Proc. 17th Int. Conf. Artif. Neural Netw., Porto, Portugal, Sep. 2007, pp. 1–10. [29] S. Nakajima and S. Watanabe, “Analytic solution of hierarchical variational Bayes in linear inverse problem,” in Proc. 16th Int. Conf. Artif. Neural Netw., Athens, Greece, Sep. 2006, pp. 240–249. [30] S. Nakajima and M. Sugiyama, “Implicit regularization in variational Bayesian matrix factorization,” in Proc. 27th Int. Conf. Mach. Learn., Haifa, Israel, Jun. 2010. [31] D. Kaji and S. Watanabe, “Two design methods of hyperparameters in variational Bayes learning for Bernoulli mixtures,” Neurocomputing, vol. 74, no. 11, pp. 2002–2007, May 2011.

Copyright of IEEE Transactions on Neural Networks & Learning Systems is the property of IEEE and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Predicting breast tumor response to neoadjuvant chemotherapy with diffuse optical spectroscopic tomography prior to treatment.

Diffuse optical tomography: image reconstruction and verification.

Sensitivity analysis of imaging geometries for prostate diffuse optical tomography.

Diffuse optical tomography using multichannel robotic platform for interstitial PDT.

Extended hierarchical Bayesian diffuse optical tomography for removing scalp artifact.

Improved variational Bayes inference for transcript expression estimation.

Characterization of tissue optical properties for prostate PDT using interstitial diffuse optical tomography.

Algorithm for localized adaptive diffuse optical tomography and its application in bioluminescence tomography.

Shape-parameterized diffuse optical tomography holds promise for sensitivity enhancement of fluorescence molecular tomography.

Reconstruction of optical properties using a diffusion model for interstitial diffuse optical tomography.

Optimized Bayes variational regularization prior for 3D PET images.

Fast linear solver for radiative transport equation with multiple right hand sides in diffuse optical tomography.

Diffuse optical tomography using multi-directional sources and detectors.

Pre-seizure state identified by diffuse optical tomography.

Nonquadratic penalization improves near-infrared diffuse optical tomography.

Calibrating optical tweezers with Bayesian inference.

Indocyanine green enhanced co-registered diffuse optical tomography and photoacoustic tomography.

Interstitial diffuse optical tomography using an adjoint model with linear sources.

Towards real-time detection of seizures in awake rats with GPU-accelerated diffuse optical tomography.

Multi-modal molecular diffuse optical tomography system for small animal imaging.

Spatial resolution in depth for time-resolved diffuse optical tomography using short source-detector separations.

Image reconstruction for diffuse optical tomography based on radiative transfer equation.

T1 magnetic resonance imaging head segmentation for diffuse optical tomography and electroencephalography.

A fast SPAD-based small animal imager for early-photon diffuse optical tomography.