Source localization with MEG data: A beamforming approach based on covariance thresholding.

Biometrics 70, 121–131 March 2014

DOI: 10.1111/biom.12123

Source Localization with MEG Data: A Beamforming Approach Based on Covariance Thresholding Jian Zhang,1, * Chao Liu,1 and Gary Green2 1

School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, Kent CT2 7NF, U.K. 2 York Neuroimaging Centre, University of York, York Y010 5NY, U.K. ∗ email: [email protected]

Summary. Reconstructing neural activities using non-invasive sensor arrays outside the brain is an ill-posed inverse problem since the observed sensor measurements could result from an infinite number of possible neuronal sources. The sensor covariance-based beamformer mapping represents a popular and simple solution to the above problem. In this article, we propose a family of beamformers by using covariance thresholding. A general theory is developed on how their spatial and temporal dimensions determine their performance. Conditions are provided for the convergence rate of the associated beamformer estimation. The implications of the theory are illustrated by simulations and a real data analysis. Key words: Beamforming; covariance thresholding; MEG neuroimaging; source localization and reconstruction; varying coefficient models.

1. Introduction Magnetoencephalography (MEG) is a technique for mapping brain activity by measuring magnetic fields produced by electrical currents occurring in the brain, using arrays of superconducting quantum interference sensors (Hamalainen et al., 1993). Applications of MEG include basic research into perceptual and cognitive brain processes, localizing regions affected by pathology, and determining the biological functions of various parts of the brain. In this article, we propose a novel method for analyzing MEG data and apply it to identify face-perception regions in a human brain. While MEG offers a direct measurement of neural activity with very high temporal resolution, its spatial resolution is relatively low. In fact, improving its resolution by virtue of source reconstruction is lying at the heart of the entire MEGbased brain mapping enterprise. Reconstructing neural activities based on the measurements outside the brain is an illposed inverse problem since the observed magnetic field could result from an infinite number of possible neuronal sources. To be concrete, let Yi (tj ) be the measurement recorded by the MEG sensor i at time tj , and Y(tj ) = (Y1 (tj ), . . . , Yn (tj ))T be the measurements from all n sensors at time tj , where the time points tj = j/, 1 ≤ j ≤ J, the number of the time instants J = b is determined by the time window b and the sampling rate per second, and the number of the sensors n is of order hundreds. Sarvas (1987) showed that the contribution of an individual source to Y(tj ) can be numerically calculated by the use of an Maxwell’s equation-based forward model and that the contributions of multiple sources can be summed up linearly. Accordingly, Y(tj ) can be written as

Y(tj ) =

x(r)β(r, tj ) dr + ε(tj ),

© 2013, The International Biometric Society

(1)

where is the source space (i.e., the space inside the brain), β(r, tj ) is the source magnitude at location r with unknown orientation η(r) and x(r) = l(r)η(r) is a linear function of the orientation η(r) ∈ R3 with l(r) being an n × 3 matrix (called lead field matrix) at location r. The columns lx (r), ly (r), and lz (r) in l(r) are the noiseless output of n sensors when a unit magnitude source at location r is directed in the directions of the x, y, and z axes, respectively. The lead field matrix is known in the sense that it can be calculated by solving a set of Maxwell’s equations (Sarvas, 1987). If we know the source locations and orientations, then x(r) is known and therefore model (1) reduces to a functional regression coefficient model which has been extensively studied in literature (e.g., Chapter 16, Ramsay and Silverman, 2005). Unfortunately, as both the locations and orientation are unknown, it cannot be directly treated as a standard functional regression coefficient model. To meet the challenge, we discretize the continuous source space by a sieve {r1 , . . . , rp }, which is distributed throughout the brain. We can assume that the true sources are approximately located on the sieve if the sieve is sufficiently dense (i.e., p is sufficiently large). Let β(tj ) = (β1 (tj ), . . . , βp (tj ))T = (β(r1 , tj ), . . . , β(rp , tj ))T be the magnitude vector of the candidate sources at {r1 , . . . , rp } and {β(rk , tj ) : 1 ≤ j ≤ J} the source time series at rk , where the superscript T indicates the matrix transpose. Let xk = l(rk )η(rk ) and X = (x1 , . . . , xp ). Then model (1) can be discretized as follows: Y(tj ) =

p

xk β(rk , tj ) + ε(tj ) = Xβ(tj ) + ε(tj ),

1 ≤ j ≤ J,

k=1

(2) where 1 ≤ p < ∞, ε(tj ) is the noise vector of the n sensors ¯ k , ·) = J β(rk , tj )/J, we define the at time tj . Letting β(r j=1 theoretical and empirical powers of β(rk , ·) at rk respectively

121

122

Biometrics, March 2014

as

(β(rk , tj ) − β(r ¯ k , ·))2 J

γk = var(β(rk , ·)),

γˆk =

j=1

J

.

(3)

We expect that most sources are null in the sense that their theoretical powers are close to zeros. Given the sensor data Y(tj ), 1 ≤ j ≤ J, we want to estimate {ηk , βk (·)}1≤k≤p , and to further infer the locations of latent non-null sources and the corresponding orientations and source time-courses. To obtain a good approximation to model (1), the size of the sieve p should be substantially larger than the number of sensors n. However, when p is considerably larger than n, the estimation problem becomes highly ill-defined as there are a diverging number of candidate models in the MEG model space. This leads to a surging interest in developing new methods and theories in order to cope with this situation (e.g., Friston et al., 2008; Henson et al., 2010; Sekihara and Nagarajan, 2010). It is necessary to impose certain regularizations on the above model to tackle the adverse effects mentioned above. One frequently used regularization is the socalled sparsity condition where the observed magnetic field is assumed to depend on a much smaller number of latent sources than the number of available sensors n. Under this condition, the problem reduces to the one of finding the true sources from a very large number of candidates. The central issue that offers a challenge to modern statistics is of how to overcome the effects of diverging spectra of sources as well as noise accumulation in an infinite dimensional source space. The existing methods in literature roughly fall into two categories, global approach and local approach. Bayesian and beamforming-based methods are the special cases of these two approaches respectively (e.g., Friston et al., 2008; Sekihara and Nagarajan, 2010). In the global approach, we directly fit the candidate model to the data, where the sieve size is required to be known in advance. In contrast, the local approach involves a list of local models, each is tailored to a particular candidate region and therefore the sieve size can be arbitrary. The global approach often needs to specify parametric models for sources and the noise process, while the local approaches are model-free. When the sieve size is small or moderate compared to the number of available sensors n, we may use a Bayesian method to infer latent sources, with helps of computationally intensive algorithms (e.g., Friston et al., 2008). However, when the sieve size is large, these global methods may be ineffective or computationally intractable and local approaches are more attractive. The sensor covariance-based beamforming represents a popular and simple solution to the above large-p-small-n problem. The basic premise behind beamforming is to scan through a source space with a series of filters; each is tailored to a particular area in the source space (called pass-band) and resistant to confounding effects originating from other areas (called stopband) (Robinson and Vrba, 1998). The scalar minimum variance beamforming aims to estimate the theoretical power γk at the location rk by minimizing the sample variance of the projected data wT Y(tj ), 1 ≤ j ≤ J with respect to the weighting vector w, subject to the constraint wT xk = 1. In a scalar minimum variance beamformer, the pass-band is defined by linearly weighting sensor arrays with the constraint wT xk = 1,

while the stop-band is realized via minimizing the variance of the projected data. The estimated power can be used to produce a signal-to-noise ratio (SNR) map over a given temporal window while the projected data can provide time course information at each location. We rank these candidate sources by their SNRs and filter out noisy ones by thresholding. There are other beamforming methods such as vector minimum variance beamformers and minimum-norm types of beamformers (Sekihara and Nagarajan, 2010). Like the scalar version, the weighting vectors in the former are adaptive to sensor observations, while those in the latter are not. Although significant progress has been made in assessing the performance of beamforming based on simulations (e.g., Brookes et al., 2008), there is no rigorous statistical theory available to allow one to examine the scope of a beamformer about what can and cannot be inferred on neuronal activity from beamforming. For example, although the sensor measurements are known linearly linked to underlying neuronal activity via the high-dimensional lead field matrix subject to some random fluctuations, there is no general and mathematically sound framework available to allow users to examine the extent of effects to which the spatial dimension (i.e., the lead field matrix) and the temporal dimension (i.e., the temporal correlations of sensor measurements) of a beamformer on its accuracy in source localization and estimation. In particular, when there are multiple sources, the accuracy of beamforming is compromised by confounding effects of multiple sources. The closer these sources are, the harder it will be for the beamformer to localize and estimate them. It is natural to ask when a beamformer will breakdown in presence of locationally nearby multiple sources and how this effect is determined by the spatial and temporal dimensions of a beamformer. Here, to address these issues, a more flexible beamformer is proposed based on thresholding sensor covariance estimator. The proposed procedure reduces to the so-called unit-noisegain minimum-variance beamformer in Sekihara and Nagarajan (2010) when the thresholding level is set to zero. A general framework is provided for the theoretical analysis of the proposed beamformers. An asymptotic theory is developed on how the performance of the beamformer mapping is affected by its spatial and temporal dimensions. Simulation studies and an MEG data analysis are conducted to assess the sensitivity of the proposed procedure to the thresholding level. In particular, the proposed method is applied to a human MEG data set derived from a face-perception experiment. Two clusters of latent sources are predicted, which reacted to face and scrambled face stimuli differently as shown in Figure 1. The rest of the article is organized as follows. The details of the proposed procedure are given in Section 2. The asymptotic properties of the proposed procedure are investigated in Section 3 and in the On-line Supplementary Material. The simulation studies and the real data analysis are presented in Section 4. The conclusions are made in Section 5. The proofs of the theorems and lemmas are deferred to Web Appendix A in the On-line Supplementary Material.

2. Methodology Suppose that the data (Y(tj ) : 1 ≤ j ≤ J)n×J and X = (x1 , . . . , xp )n×p , which are obtained from a single-stimulus

Source Localization

−1

0

200

1

x 10

400 time (−6,4,7)cm

−1

0

200 −4

2

x 10

400

0 −2 0

200 −3

0 600

time (−5,5,4)cm

(−5,5,6)cm

2

600

time−course

time−course

x 10

0

−3

time−course

−3

(−5,5,5)cm

x 10

time−course

time−course

−3

1

1

x 10

400 time (−4,−4,8)cm

600

0 −1

0

200

400

600

time

0 −2

0

200

400

600

time

Figure 1. The time series plots for the locations of the first five highest log-contrast values when the thresholding constant c0 was chosen by maximizing the SAM index. The Savitzky–Golay plots of the projected sensor measurements under the two stimuli (faces and scrambled faces) along the estimated signal orientations for the locations (with CTF coordinates (−5, 5, 5), (−5, −5, 6), (−6, 4, 7), (−4, −4, 8), and (−5, 5, 4) cm), respectively. In each plot, the solid line and the dashed line stand for the time-courses under the face stimulus and the scrambled face stimulus, respectively. MEG experiment, follow model (2). Suppose that latent sources are indexed by an unknown subset I of {1, . . . , p} such that for a positive constant 0 and k ∈ I, var(βk (t)) > 0 . We want to estimate I, the source orientations {ηk : k ∈ I} as well as source time-series {βk (tj ), 1 ≤ j ≤ J, k ∈ I}. We propose a family of beamformers based on sensor covariance thresholding. These beamformers can be implemented in three steps: In Step 1, a thresholded sample sensor covariance matrix is calculated. In Step 2, a brain SNR map is produced by partitioning the brain into a regular three-dimensional grid called a sieve and by calculating the source SNR at each grid point. These SNRs generate a source distribution overlaid on a structural image of the subject’s brain. In Step 3, latent sources are inferred by screening the map. The beamforming method investigated here is termed synthetic aperture magnetometry (SAM) (Robinson and Vrba, 1998). The above procedure can be extended to the setting with two stimuli, where we first apply the procedure to the MEG data for each stimulus and then calculate the log-contrast (the logarithm of the ratio of the SAM indices under two stimuli) at each grid point. This will create a log-contrast map. The global peak on the map indicates where the maximum of SNR increases attains for one stimulus relative to the other. The details of the proposed procedure are spelled out below. 2.1. Thresholding the Sensor Covariance Matrix The n × n sensor covariance C is traditionally estimated by ˆ of {Y(tj )}. Bickel and Levthe sample covariance matrix C ina (2008) showed that the sample covariance is not a good estimator of the population covariance if its dimension n is large or if the sample covariance is degenerate. In MEG neu-

123

roimaging, the sensor sample covariance matrix can be nearly singular due to collinearity between nearby sensors, which can have serious effects on estimating the precision matrix used in the source reconstruction. Although regularizing the sensor sample covariance by shrinkage has already been used in MEG neuroimaging (e.g., Brookes et al., 2008), its spatial block structure has not been explored. Here, applying the idea of thresholding (Bickel and Levina, 2008), we estimate the sensor covariance by ˆ nJ ) = (ˆcij (τnJ ))n×n , where ˆcij (τnJ ) = ˆcij I(|ˆcij | ≥ τnJ ), I(·) is C(τ an indicator and τnJ is a constant changing in n and J. 2.2. Reconstruction of Latent Sources We localize the sources which underpin the observed magnetic field as follows. For any location r and orientation η, let ndimensional vector x = x(r, η) = l(r)η. Given η, we calculate the n-dimensional weighting vector w(r, η) by minimizing the ˆ nJ )w with respect to w ∈ Rn , subject sample variance wT C(τ T to w x = 1. Note that if taking the data projection wT Y(tj ) as a filter, then wT x represents the filter output from a unitmagnitude source. Therefore, setting wT x = 1 guarantees that the signal from the source at r can fully pass through the filter. In addition to the power at r, the sample variance ˆ nJ )w may contain the noise and other unwanted conwT C(τ tributions such as the interferences from sources at locations other than r. Accordingly, by minimizing the above variance, ˆ nJ )−1 x/xT C(τ ˆ nJ )−1 x, which we obtain the weight w(r, η) = C(τ minimizes such unwanted interferences without affecting the passing of the signal coming from the source at r (e.g., Chapter 4, Sekihara and Nagarajan, 2010). Substituting w(r, η) into the above sample variance, we obtain the empirical ˆ nJ )−1 x of the estimated source time-series power γˆk = 1/xT C(τ {w(r, η)T Y(tj ) : 1 ≤ j ≤ J} defined in (3), which is used to estimate the theoretical power at r. Similarly, we obtain the estimated power γˆk = 1/xT C−1 x at r if C is known. As the empirical power of the projected noise along w(r, η) is σ02 w(r, η)T w(r, η), the signal-to-noise ratio (SNR) at r is ˆ nJ )−1 x), which is proportional to 1/(σ02 w(r, η)T w(r, η)xT C(τ the normalized power ˆ nJ )w(r, η) ˆ nJ )−1 l(r)η (l(r)η)T C(τ w(r, η)T C(τ , = T T ˆ nJ )−2 l(r)η w(r, η) w(r, η) (l(r)η) C(τ where by convention, 0/0 is set to 0 (Sekihara and Nagarajan, 2010). The orientation is then estimated by maximizing the SNR or equivalently by maximizing the normalized power. The above optimization can be done by solving a generalized eigenvalue problem: The optimal orientation η(r) is the eigenvector associated with the minimum non-zero eigenvalue ˆ nJ )−2 l(r) relative to the 3 × 3 maof the 3 × 3 matrix l(r)T C(τ Tˆ −1 trix l(r) C(τnJ ) l(r). We denote by SAM(r) the reciprocal of the above minimum non-zero eigenvalue and call it the SAM index of neural activity at r. When r is running over the sieve, SAM(r) creates a neuronal SNR map that underlies measured magnetic fields. The maximum peak of the map gives a location estimator of one of the underlying sources and the optimal weighting vector along which the latent time-course at the maximum peak can be estimated by projecting the data. We also calculate the local peaks on the transverse slices of

124


the brain, identifying multiple sources by grouping these local peaks. 2.3. Choosing the Thresholding Level In practice, the MEG imaging is often run on a subject first without stimulus and then with stimulus. This allows us to ˆ for the MEG data with calculate the sample covariance C ˆ 0 for the backstimulus as well as the sample covariance C ground noises. The latter can provide an estimator of the background noise level. To make the thresholded sample covariance to be convergent, we set τnJ = c0 σˆ02 log(n)/J with ˆ by τnJ , where σˆ 2 is the a tuning constant c0 and threshold C 0 ˆ 0 . Note that, when c0 = 0, minimum diagonal element in C the proposed SAM procedure reduces to the standard SAM implemented in the software FieldTrip. For each value of c0 , we can apply the proposed SAM procedure to the data and obtain the maximum SAM index SAMc0

= max{SAM(r) : r in the sieve}.

(4)

In both simulations and a real data analysis, we will show that c0 ∈ D0 = {0, 0.5, 1, 1.5, 2} has covered its useful range. We choose c0 in which the SAM index attains maximum or minimum, which are called MA and MI, respectively. By choosing c0 , MA intends to increase the maximum SNR value, while MI tries to reduce source interference. Note that via τnJ , c0 ˆ has effects on estimating C and therefore has effects on the Cbased source interference reduction used in calculating SAM index as well as the peak value of the SAM index. 3. Theory To make model (2) identifiable, we assume the following condition. (A1): The source processes {β(t)} and the noise process {ε(t)} are stationary with E[β(t)] = E[ε(t)] = 0. These two processes are uncorrelated with each other. The sources {βk (t)}, 1 ≤ k ≤ p are also uncorrelated with each other. The assumption of E[β(t)] = 0, which holds approximately in many applications (Sekihara and Nagarajan, 2010), is made for simplicity. Under model (2) and condition (A1), if the noises are uncorrelated across the sensors and white, then the sensor covariance matrix can be expressed as C = Xcov(β(tj ))XT + cov(ε(tj )) =

p

2 γk xk xT k + σ0 In , (5)

k=1

where γk denotes the theoretical power at location rk , σ02 is the background noise level and In is an n × n identity matrix. To simplify the theory, assuming that l(rk )ηk = 0, 1 ≤ k ≤ p, we reparametrize the model (2) as follows: Y(tj ) =

p

˜j (tj ) + ε(tj ), ˜j β x

(6)

j=1

√ ˜k (tj ) = ||l(rk )ηk ||βk (tj )/ ˜ k = nl(rk )ηk /||l(rk )ηk || and β where x √ n. Here, || · || stands for the Euclidean norm of a vector. For the notation simplicity, we let xk and βk (tj ) stand for ˜k (tj ), respectively. Note that the SAM index SAM(r) ˜ and β x

is invariant under the above reparametrization. Although the original time-course and power are not invariant under the reparametrization, they can be recovered by multiplying √ β(rk , tj ) by the scaling factor n/||l(rk )ηk ||. We often see that n ||l(rk )ηk ||2 /n = i=1 (li (rk )ηk )2 /n tends to a limit as n is large. In this section, we present an asymptotic analysis for the proposed SAM index when both n and J are sufficiently large. In practice, the number of sensors is fixed around a few hundreds. Allowing n to be varying is an analytic device for finding spatial factors that affect the performance of a beamformer. We will show that the values of the proposed SAM index are much higher at source locations than at non-source locations. This implies that the screen based on the proposed beamformers can eventually identify latent sources if n and J are sufficiently large. We proceed the analysis with two steps. In the first step, we focus on the ideal situation where the sensor covariance matrix is known. In the second step, we investigate the asymptotic behavior of the proposed SAM index when the sensor covariance matrix is unknown but estimated by using the sensor measurements on a finite number of time instants. 3.1. Beamforming with Known Sensor Covariance We assume that there are q(≤ p) non-null source locations, say r1 , . . . , rq in the model and denote the sensor covariance C by Cq to reflect this assumption. If the sensor processes are ergodic and can be observed over an infinite number of time instants, then the sensor covariance matrix Cq can be fully recovered. Under this ideal situation, we can perform beamforming directly on Cq and reconstruct the unknown sources based on the equation

Cq =

q

2 γk xk xT k + σ0 In .

k=1

To build a brain map, we consider an arbitrary location r in the brain and orientation η with l(r)η = 0. Let x = √ x(r, η) denote nl(r)η/||l(r)η||, the (scaled) composite lead field vector at r with orientation η. For any two locations r and ry , their lead field spatial coherence is defined by ρ(x(r, η), y) = x(r, η)T y/n = 1 − ||x(r, η) − y||2 /(2n). Note that 1 − ρ(x(r, η), y) shows how close (r, η) is to (ry , ηy ) in terms of the lead field distance ||x(r, η) − y||. In general, matrix l(r) can be written down as l1 (r)A(r), where l1 (r) is the first d ≤ 3 columns of l(r) and A(r) is a d × 3 matrix of rank d. For example, d = 2 in the single spherical head model (Sarvas, 1987). A necessary requirement for the true source r1 being identifiable is that the lead field vectors differ from each other over different locations near r1 . This requirement can be satisfied by assuming the following condition. (A2): For any r = r1 , l(r) and l(r1 ) are linearly independent in the sense that for any orientations η and η1 and non-zero constant c1 with max{||l(r)η||, ||l(r1 )η1 ||} = 0, we have l(r)η = c1 l(r1 )η1 . Proposition 1. Under condition (A2), the true source location is identifiable and the orientation is identifiable if and

Source Localization only if the columns of the lead field matrix at the true source location are linearly independent. 3.1.1. Single-source case. To give an insight into the behavior of beamforming, we start with the single-source case where q = 1. We assume that there exists a single unknown source located at r1 with orientation η1 and that other sources are weak enough to be ignored. In this case, the −1 estimated power γˆk at r1 is equal to 1/xT 1 C1 x1 with the bias −1 1/xT C x − γ . We show below that if the sensor covariance 1 1 1 1 is known, then the SAM beamformer map can accurately recover the true source location and orientation. When n is large, we can further demonstrate how sharp the peak is. See A.1 of the Web Appendix A in the On-line Supplementary Material.

125

1 − λ/n for large n. If anq → ∞, then a(k+1)(k+1)|k , 1 ≤ k ≤ q are all positive when n is large. Therefore, xk , 1 ≤ k ≤ p are linearly independent for large n because the inverse of matrix (ρj1 j2 ) can be obtained by iteratively performing sweep operations (q − 1) times on this matrix. We say that the source locations are asymptotically separable if anq → ∞ as n → ∞, and that non-source location rx with orientation ηx is asymptotically separable from the sources if naxx|q → ∞. Secondly, to regularize the lead field matrix, we define notation byx|k , 1 ≤ k ≤ q, by letting byx|1 = ρ(y, x1 )ρ(x1 , x)/γ1 , and

byx|k = byx|(k−1) − +

Proposition 2. Under conditions (A1) and (A2), SAM(r) has a single mode and attains the maximum at r1 . The bias of the source power estimator at r1 is equal to σ02 /n. 3.1.2. Multiple sources. We now turn to multiple sources, where there exist q unknown sources located at rk , 1 ≤ k ≤ q with orientations ηk , 1 ≤ k ≤ q, respectively. To show the consistency of the beamforming estimation, more notations and regularity conditions on the composite lead field vectors are introduced as follows. First, we introduce a notation of source separateness to describe the estimability of multiple sources. Let x and y denote the (scaled) composite lead field vectors at locations rx and ry with orientations ηx and ηy , respectively. For the simplicity of notation, let ρj1 j2 = ρ(xj1 , xj2 ). To describe the spatial relationships among the composite lead field vectors, for 1 ≤ k ≤ q, we define the partial coherence factor ayx|k between ry and rx given {rj , 1 ≤ j ≤ k} by iteratively performing the so-called sweep operation on the matrix (ρj1 j2 )/σ02 as follows: ρ(y, x1 , x) ρ(y, x) − ρ(y, x1 )ρ(x1 , x) = , σ02 σ02 = axj1 xj2 |1 , aj1 x|1 = axj1 x|1 ,

ayx|1 = aj1 j2 |1

ayx|(k+1) = ayx|k −

ay(k+1)|k a(k+1)x|k , a(k+1)(k+1)|k

aj1 j2 |(k+1) = axj1 xj2 |(k+1) , aj1 x|(k+1) = axj1 x|(k+1) ,

1 ≤ k ≤ q − 1,

km =

akx|(k−1) ayk|(k−1) (1 + γk bkk|(k−1) ) . 2 γk akk|(k−1)

It follows from the definition that byx|q = O(1) if max2≤k≤q akx|(k−1) /akk|(k−1) = O(1) and max2≤k≤q ayk|(k−1) /akk|(k−1) = O(1) (i.e., given {x1 , . . . , xk−1 }, the partial regression coefficients of x and y with respect to xk are bounded). In particular, if max2≤k≤q akk|(k−1) is bounded below from zero as n tends to infinity, then the above partial regression coefficients are bounded and thus byx|q = O(1). The following theorem shows that if max1≤j≤q max2≤k≤q akj|(k−1) /akk|(k−1) = O(1) (i.e., the partial regression coefficients of {xj } are bounded), then anq → ∞ is a necessary condition for the source powers to be consistently estimated. We state our general mapping theorem as follows. Theorem 1. Suppose that max1≤j≤q max2≤k≤q akj|(k−1) / akk|(k−1) = O(1) and that conditions (A1) and (A2) hold. −1 Then, as anq = O(1), the power estimator 1/xT km +1 Cq xkm +1 at rkm +1 is asymptotically larger than γkm +1 , that is, the power estimator is not consistent, where km is defined in the equation (7). As anq → ∞, we have: (i) For any non-null source location rj , 1 ≤ j ≤ q, the power estimator and the SAM index at location rj with orientation ηj admit the expressions 1

where the dependence of the above notation on n has been suppressed. Note that 1 − σ02 a(k+1)(k+1)|k shows the partial selfcoherence of xk+1 given the preceding xj ’s (Goodnight, 1979). See A2 of the Web Appendix A, the On-line Supplementary Material. Let anq = n min1≤k≤q−1 a(k+1)(k+1)|k , and 0,

if anq → ∞,

min{1 ≤ k ≤ q − 1 : na(k+1)(k+1)|k = O(1)},

anq = O(1). (7)

We impose the condition that anq → ∞ as n → ∞, which is equivalent to requiring that for any λ > 0, the maximum partial coherences of xk , 1 ≤ k ≤ p is bounded above by

akx|(k−1) byk|(k−1) ayk|(k−1) bkx|(k−1) − akk|(k−1) akk|(k−1)

−1 xT j Cq xj −1 xT j Cq xj −2 xT j Cq xj

= γj +

1 najj|[−j]

−2 + O(anq ),

−1 = σ02 nγj ajj|[−j] + 1 + 2γj bjj|[−j] + O(anq ),

where ajj|[−j] and bjj|[−j] are constants defined in the online supplementary material satisfying najj|[−j] → ∞ and bjj|[−j] = O(1) as n is sufficiently large. In particular, for q = 2, we have a11|[−1] = a22|[−2] = a22|1 = 2 2 2 (1 − ρ12 )/σ02 , b11|[−1] = ρ12 /γ2 , b22|[−2] = ρ12 /γ1 . (ii) For any null-source location rx which is asymptotically separable from the sources and satisfies max2≤k≤q akx|(k−1) /akk|(k−1) = O(1), the power estimator and the SAM index at location rx with orientation ηx

126


can be respectively expressed as

For a constant A, let τnJ = A log(n)/J. Let y¯i be the sample mean of the ith sensor as before and

bxx|q 1 1 −3 − 2 2 + O(anq = ), xT Cq−1 x naxx|q n axx|q xT Cq−1 x xT Cq−2 x

=

σ02

bxx|q −2 1+ ) . + O(anq n

Note that Theorem 1 holds under the reparametrized model (6) and continues to hold under the original model (2) after adjusting the power estimators by the corresponding scaling factors. The above theorem can also be extended to the setting with two stimuli. It can also be seen from Theorem 1 that at each of separable non-null source locations the SAM index is asymptotically equal to the product of the number of sensors n, the source power and the source coherence factor. In contrast, at null-source location rx which is asymptotically separated from the non-null sources, the SAM power index is equal to the lower bound σ02 up to the first asymptotic order. These facts suggest that the greater the number of sensors employed, the larger the contrast between non-null sources and null-sources will be, therefore, the easier for the beamformer to localize them. In particular, when n is large, the −1 localization bias can be of order less than anq in terms of the lead field distance. 3.2. Beamforming with Estimated Sensor Covariance In addition to conditions (A1) and (A2), we need the following two conditions for extending the asymptotic analysis to the case of unknown sensor covariance. The first one is imposed to regularize the tail behavior of the sensor processes. The Gaussian iid processes considered in Bickel and Levina (2008) satisfies the following condition. (A3): There exist positive constants κ1 and τ1 such that for any u > 0, max P(||yi (t)|| > u) ≤ exp(1 − τ1 uκ1 )

1≤i≤n

sup 0 ,B∈F ∞ A∈F−∞ k

|P(A)P(B) − P(AB)|.

The mixing coefficient α(k) quantifies the degree of the temporal dependence of the process {y(t)} at lag k. We assume that α(k) is decreasing exponentially fast as lag k is increasing, which the Gaussian iid processes considered in Bickel and Levina (2008) satisfy. (A4): There exist positive constants κ2 and τ2 such that α(k) ≤ exp(−τ2 kκ2 ).

mn = max

n

1≤i≤n

I(cij = 0) ≤ n,

j=1 J 1 ˆcij = (yt (tk ) − y¯i )(yj (tk ) − y¯j ), J k=1

ˆ nJ ) = C ˆ p (τnJ ) = (ˆcij I(ˆcij ≥ τnJ )), C(τ

where I(·) is the indicator. Let κ3 = max{2(2/κ1 + 1/κ2 ) − 1, (4/3)(1/κ1 + 1/κ2 ) − 1/3, 1}. We are now in position to generalize Proposition 2 and Theorem 1 to the case where the sensor covariance is estimated by the thresholded covariance estimator. The former is straightforward and thus omitted. In the following theorem, we show that estimating the SAM index is harder than estimating the power index due to the hardness of estimating the small value −2 of xT xj . j C Theorem 2. Suppose that conditions (A1)–(A4) hold and 2 2 that τnJ n2 = o(1) and anq n τnJ = o(1), max1≤j≤q max2≤k≤q akj|(k−1) /akk|(k−1) = O(1) as both n and J tend to infinity. Then, we have: (i) If anq = O(1) and n2 τnJ = o(1), the power estimator at rkm +1 is asymptotically larger than γkm +1 in probability, that is, the power estimator is not consistent. If anq → ∞, for 1 ≤ j ≤ q, the power estimator and the SAM index at the source location rj with orientation ηj admit the expressions

1 −1 ˆ xT j C(τnJ ) xj

and max1≤i≤n E||yi (t)||2 < +∞. In the second additional condition, we assume that the sen0 sor processes are strong mixing. Let F−∞ and Fk∞ denote the σ-algebras generated by {y(t) : −∞ ≤ t ≤ 0} and {y(t) : t ≥ k}, respectively. Define the mixing coefficient α(k) =

C = Cp = (cij ),

= γj +

1 −2 + Op (anq + nτnJ mn ), najj|[−j]

−1 ˆ xT j C(τnJ ) xj = σ02 nγj ajj|[−j] + 1 + 2γj bjj|[−j] Tˆ −2 xj C(τnJ ) xj −1 2 + Op (anq + anq nmn τnJ ),

1 ˆ −1 xT j C(0) xj

= γj +

1 −2 + Op (anq + n2 τnJ ), najj|[−j]

ˆ −1 xT j C(0) xj = σ02 nγj ajj|[−j] + 1 + 2γj bjj|[−j] Tˆ −2 xj C(0) xj −1 2 2 + Op (anq + anq n τnJ ),

where najj|[−j] → ∞ and bjj|[−j] = O(1) as n is sufficiently large. (ii) For null-source location rx , as naxx|q → ∞ and max2≤k≤q akx|(k−1) /akk|(k−1) = O(1), the power estimator and the SAM index at location rx with orientation ηx

Source Localization

the time instants tk = k/J, k = 0, 1, 2, . . . , J − 1. The signal strength (SS) in the sensor space was defined by

can be expressed as 1 ˆ nJ )−1 x xT C(τ

=

1 bxx|q −3 − 2 2 + Op (anq + nτnJ mn ), naxx|q n axx|q

ˆ nJ )−1 x xT C(τ = σ02 ˆ nJ )−2 x xT C(τ 1 ˆ −1 x xT C(0)

=

1+

bxx|q −2 2 + Op (anq + anq nτnJ mn ) , n

1 bxx|q −3 − 2 2 + Op (anq + n2 τnJ ), naxx|q n axx|q

ˆ −1 x xT C(0) = σ02 ˆ −2 x xT C(0)

bxx|q −2 2 2 1+ + anq n τnJ ) , + Op (anq n

as n is sufficiently large. 4. Numerical Results In this section, we assessed the proposed procedure by simulations and a real data analysis. 4.1. Simulation Studies Using simulations we examined the performance of the proposed beamformer procedure in terms of the so-called localization bias. For any estimator ˆr1 of an source location r1 , the localization bias is defined as E|ˆr1 − r1 |, where |ˆr1 − r1 | is the L1 distance between ˆr1 and r1 . We attempt to answer the following questions: Has the SAM procedure been improved by using the thresholded covariance estimator? To what extent will the performance of the proposed beamformer procedure deteriorate by source interferences? How does the choice of the number of sensors and the sampling frequency affect the accuracy of source localization? For simplicity, we focused on a single-shell forward head model, where a spherical volume conductor with 10 cm radius from the origin was created (Oostenveld et al., 2011). We constructed 2222 regular 3D grid points of resolution 1 cm within the head. These candidate source positions were aligned with the axes of the head coordinate system. The gradiometer array with n sensors at 12 cm distance from the origin and the corresponding (n × 6666) lead field matrix L between the n sensors and the 2222 grids was created by using the open source software FieldTrip. We first studied the single-source case, where the true source θ1 (t) was located at r1 = (3, −1, 4)T and of the form θ1 (t) = η1 β1 (t) with

η1 = β1 (t) =

√

127

1 1 10 √ ,√ ,√ 102 102 102 102 cos(20tπ),

T

0 ≤ t ≤ 1.

0 ≤ t ≤ 1,

J

k=0

For each k, we sampled Nnk from an n-dimensional standard Normal Nnk and set ε(tk ) = α × SS × Nnk , where α is a coefficient of noise level. Similarly, we can define the noise strength (NS). We consider two representative noise levels (i.e., low and high) α = 0.8, 5 with the SNRs (i.e., SS/NS ≈ 1/α2 ) equal to 0.8−2 = 1.56 and 5−2 = 0.04, respectively. For each combination of (n, J), where n = 26, 91 and J = 500, 1000, 2000, 3000, 4000, we generated 30 datasets of {Y(tk ), 0 ≤ k ≤ J − 1} paired with {ε(tk ), 0 ≤ k ≤ J − 1} by (8) for each noise level. Here, we imitated the practical situation, where the MEG imaging was run on a subject first without stimulus and then with stimulus. The former provides an estimator of the background noise level. For each dataset, we calculated the sample ˆ of {Y (tk ), 0 ≤ k ≤ J − 1} and the corresponding covariance C ˆ 0 of the background noises. We calcusample covariance C ˆ nJ ), where lated the thresholded sensor covariance estimate C(τ 2 τnJ = c0 σˆ0 log(n)/J with the tuning constant c0 and σˆ02 is ˆ 0 . We considered five valthe minimum diagonal element in C ues for c0 : 0, 0.5, 1, 1.5, 2, and the selected c0 s (say ma and mi) by MA and MI. Recall that, when c0 = 0, the proposed SAM procedure reduces to the standard SAM implemented in the FieldTrip. For each value of c0 , we applied the proposed SAM procedure to the 30 datasets, obtaining 30 SAM-based maximum location estimates of r1 . We then showed the 30 SAMc0 values defined in (4) and the location biases by the box-andwhisker plots. The results, summarized in Figure 2 and Web Figure B1, the Web Appendix B in the On-line Supplementary Material, demonstrate that in the single source case, for each combination of (n, J) and c0 = ma, the proposed procedure can localize the true source with very small bias and the performance deteriorate when the background noise level is increasing. We now turn to the two-source case. Let the second source θ2 (t) = η2 β2 (t),

η2 = (1, 0, 0)T ,

β2 (t) = 8 cos(20tπ),

0 ≤ t ≤ 1,

which is located at r2 = (−5, 2, 6). Adding θ2 (t) into the model in (8), we have

,

Y (t) = l(r1 )η1 β1 (t) + l(r2 )η2 β2 (t) + ε(t),

By using θ1 (t), we simulated the neural oscillation patterns in the brain motor system. The sensor measurements at t follow the model Y (t) = l(r1 )η1 β1 (t) + ε(t),

J−1 ||l(r1 )η1 β1 (tk )||2 SS = .

(8)

where ε(t) is the sensor noise vector. Set the time window width w = 1 and the number of time instants J to be equal to the sample rate . That is, the sensors are measured at

0 ≤ t ≤ 1.

(9)

The L1 distance between the two sources is 13, respectively. Similar to the single source case, for each of the 10 combinations of (n, J), we generated 30 paired datasets of {(Y(tk ), 0 ≤ k ≤ J − 1} and {ε(tk ), 0 ≤ k ≤ J − 1}. For each dataset, we 2 ˆ thresholded the sample covariance C by τnJ = c0 σˆ0 log(n)/J, on which we applied the SAM procedure. We obtained 30 SAMc0 values, the SAM-based maximum estimates of source location and calculated their minimum L1 distances to r1 and r2 . The results, displayed as box-and-whisker plots in

128

Biometrics, March 2014 SAM: SNR=1/25,J=500

SAM: SNR=1/25,J=1000

Biases: SNR=1/25,J=500 6

8

4

6

Biases: SNR=1/25,J=1000 4

4

2

0

0 0 0.5 1 1.5 2 ma mi c0

0 0.5 1 1.5 2 ma mi c0

Biases: SNR=1/0.64,J=500

SAM: SNR=1/0.64,J=1000

1 SAM

Bias

10

Biases: SNR=1/0.64,J=1000 1

40

20 15

0 0.5 1 1.5 2 ma mi c0

0.5

30

Bias

SAM: SNR=1/0.64,J=500 25

2 1

4

0 0.5 1 1.5 2 ma mi c0

SAM

Bias

6

3 SAM

Bias

SAM

8

20

0.5

10

5

0

0 0 0.5 1 1.5 2 ma mi c0

0 0.5 1 1.5 2 ma mi c0

0 0.5 1 1.5 2 ma mi c0

0 0.5 1 1.5 2 ma mi c0

Figure 2. Single source: The first two columns of plots are the box-and-whisker plots of the SAM values and the localization biases for n = 91 sensors and the sample rate J = 500. Among them, the upper two respectively show the SAM values and the localization biases against the tuning constant c0 = 0, 0.5, 1, 1.5, 2, ma, and mi for the noise level α = 5, and the bottom two are for the noise level α = 0.8. The remaining two columns of plots are for n = 91 sensors and the sample rate J = 1000. From the first and third columns of the plots, we see that most of the time MA slected 0 as SAMc0 attained the maximum at c0 = 0. This figure appears in color in the electronic version of this article. Figure 3 and Web Figures B2 and B3, the Web Appendix B in the On-line Supplementary Material, suggest that: (1) The SAM-based maximum estimates are much closer to r2 SAM: SNR=1/25,J=500

Biases: SNR=1/25,J=500

SAM: SNR=1/25,J=1000

2

0.5

1

1.5 c0

2

ma mi

0

SAM: SNR=1/0.64,J=500

0.5

1

1.5 c0

2

3

ma mi


2 0

0

0.5

1

1.5 c0

2

ma mi

0

SAM: SNR=1/0.64,J=1000

0.5

1

1.5 c0

2

ma mi


2.3

15

3

15

1

1.5 c0

2

ma mi

0

Biases to source 1 SNR=1/25,J=500 18 16 14 12 10 8 6

0.5

1

1.5 c0

2

0

Bias

Bias

0 2

3

4

5 6 7 Slice

8

9 10

1

Biases to source 1 SNR=1/0.64,J=500

2

3

4

5 6 7 Slice

8

9 10

10

2

3

4

5 6 7 Slice

8

9 10

ma mi

0

0.5

1

1.5 c0

2

ma mi

Biases to source 2 SNR=1/25,J=1000

5 0

2

3

4

5 6 7 Slice

8

9 10

1

Biases to source 1 SNR=1/0.64,J=1000

2

3

4

5 6 7 Slice

8

9 10

Biases to source 2 SNR=1/0.64,J=1000 10

15 10 5

0 1

2

20

Bias

Bias

5

1.5 c0

10

Biases to source 2 SNR=1/0.64,J=500 20

10

1

18 16 14 12 10 8 6 1

20 15

0.5


20

1

5


10

10

0

ma mi

Bias

0.5

2.1 2

0 0

Bias

2.2

10 5

2.6

Bias

2.8

SAM

Bias

SAM

4 3.5

0 0

Bias

5

4.5

SAM

Bias

5.5

4

Bias

4

4

4.5

Bias

Biases: SNR=1/25,J=1000

5

6

SAM

than to r1 . Therefore, the source interference has masked the impact of detecting the first source. In contrast, without the interference from r2 , as demonstrated in the single-source

1

2

3

4

5 6 7 Slice

8

9 10

5 0

1

2

3

4

5 6 7 Slice

8

9 10

1

2

3

4

5 6 7 Slice

8

9 10

Figure 3. Two sources: The first two rows display the box-and-whisker plots of the SAM values and the localization biases against the tuning constant c0 = 0, 0.5, 1, 1.5, 2, ma, and mi for the combinations of n = 91 sensors, and the sample rates J = 500, 1000, respectively. The third and fourth rows present the box-and-whisker plots of the local localization bias to the sources r1 and r2 against the transverse slice indices from 0 to 10 when c0 was selected by MA. The first and third rows are for the noise level α = 5, while the second and fourth rows are for the noise level α = 0.8. From the first row of the plots, we see that most of the time MA slected 1 as SAMc0 attained the maximum at c0 = 1 when α = 5, and that most of the time MA slected 0.5 as SAMc0 attained the maximum at c0 = 0.5 when α = 0.8. From the third and fourth rows of the plots, we see that all the local peaks on the transverse slices are not close to the source location r1 , which implies the source 1 has been masked on the SAM-based power map. This figure appears in color in the electronic version of this article.

Source Localization case, we can localize the source r1 very accurately by using the SAM-based maximum estimation. (2) On average, when the noise level in the data is high, at either low or high thresholding level, the localization bias was high because the sensor covariance was poorly estimated. Also, in general the SAM value was inversely proportional to the localization bias, suggesting that we should choose the thresholding level by maximizing the SAM index in c0 . These facts explain why the proposed MA thresholding works in reducing the source localization bias. However, MA seems not optimal and its optimal value does depend on n and J. (3) The number of sensors n has an effect on the choice of the sampling rate and c0 . When n is small, say 26, source interferences may have a big effect on source localization. In this situation, MI can have a better performance than MA. On average, the localization bias is higher when J = 500, 1000 than when J = 3000, 4000. (4) In the two-source case, the thresholding is useful when the noise level in the data is high. The performance of the proposed procedure is not necessarily decreasing in the noise level. In the field of sensor array processing, people often adopt a shrinkage approach, that is, by artificially add noise to the data in order to improve the SNR mapping (Sekihara and Nagarajan, 2010). In light of this, like in the software FieldTrip, in our implementation, if the smallest eigenvalue of the sample sensor covariance is too small, then we add a small amount of noise (determined by the smallest eigenvalue of the noise covariance matrix) to the data (i.e., adding × Ip to the sample sensor covariance) after thresholding. To see whether the local peaks on the beamforming map can provide certain traces of the source r1 , the 2222 grid points were further divided into 10 transverse slices along the z-axis, where the kth transverse slice contains the grids of the form (rx , ry , k)τ , 0 ≤ k ≤ 10. For each combination of (n, J) and the associated c0 selected by MA, we calculated the local peak of the SAM index and its localization biases to r1 and r2 for each of the ten slices. The results, plotted in Figure 3 and Web Figures B2 and B3, the Web Appendix B in the Online Supplementary Material, indicate that the average L1 distances from the locations of these local peaks to r1 are at least larger than 4 for any combination of (n, J). Therefore, the trace of source r1 has been masked on the SAM-based SNR map. We also use the Bayesian principal component decomposition (PCD) of the estimated sensor covariance matrix (Minka, 2000) to determine the effective number of sources. In the above simulation setting, the Bayesian PCD predicted that there was only a single effective source in all 30 datasets. 4.2. Real MEG Data Analysis We illustrate the proposed method using a human MEG data set provided by Professor Richard Henson from the MRC Cognition and Brain Sciences Unit Volunteer Panel (Henson et al., 2010). The study subject, a healthy young adult underwent the following face perception test which includes two different stimuli (faces and scrambled faces). A central fixation cross (presented for a random duration of 400–600 ms) was followed by a face or scrambled face (presented for a random duration of 800–1000 ms), followed by a central circles for 1700 ms. As soon as he saw the face or the scrambled face, the subject used either their left or right index finger to report whether he thought it was symmetrical or asymmetrical about a verti-

129

cal axis through its center. The MEG data were collected with 102 magnetometers and sampled at rate 1100 Hz. The experiment includes six sessions. Here, we analyzed the data set generated in the first session, which includes 96 trials labeled as Face and 50 labeled as Scrambled Face. We first created a grid system of 1 cm resolution using the subject’s anatomical magnetic resonance imaging (MRI) data. Then, we applied the neuroimaging software SPM8 to read, preprocess and epoch the recorded data for the face stimulus and the scrambled face stimulus, respectively. This gives rise to 146 epochs (i.e., trials) of 700 ms (770 time instants) with 200 ms pre-stimulus and 500 ms post-stimulus, where 96 trials were labeled as Face and the remaining 50 as Scrambled Face. To reduce the noise level in the data, we averaged the over the Face trials and the Scrambled Face trials, respectively. For each of the two stimuli, we calculated the ˆ from the averaged post-stimulus data sample covariance C ˆ 0 from the averaged pre-stimulus and the noise covariance C ˆ by c0 σˆ 2 log(n)/J with different data. We thresholded C 0 values of c0 = 0, 0.5, 1, 1.5, 2, ma, and mi, where n = 102, ˆ 0 . For J = 550, and σˆ02 is the minimum diagonal element in C each value of c0 , we then applied the proposed SAM procedure to the face and the scrambled face data, respectively and calculated the log-contrast between the face stimulus and the scrambled face stimulus, creating a log-contrast map. We sliced the map into 13 transverse layers in which we calculated the local maximum peaks. The locations of these peaks were described by the CTF coordinates. The definition of the brain CTF coordinate system can be found at http://neuroimage.usc.edu/brainstorm/CoordinateSystems. We projected the MEG data along the directions of the optimal weighting vectors at each of the above 13 local peak locations, followed by the Savitzky–Golay smoothing (Orfanidis, 1996). Finally, we spatially smoothed the logcontrast values by using the interpolating command in the FieldTrip. We overlaid the interpolated log-contrast values on the MRI scan of the brain. For c0 = 0, 0.5, 1, 1.5, 2, ma, and mi, the SAM procedure resulted in the same set of the transverse local peaks and the same global peak with the CTF coordinates (−5, 5, 5) cm. Note that ma = 1.5 and mi = 2. The transverse slices of the log-contrast maps along the z-axis are shown in Web Figure C1, the Web Appendix C in the On-line Supplementary Material for c0 = 0.5, 1, 1.5, and 2. This is in agreement with the observation we have made in the previous simulation studies that the SAM procedure is not sensitive to c0 when there is a single strong source and the number of sensors is 91. In light of the above fact, we focused on c0 = 0 in the remaining analysis. We calculated the log-contrast values at the above 13 local peaks before and after the spatial interpolation, respectively as shown in Table 1. To group the above log-contrast values, we placed the 13 values in decreasing order. Given 2 ≤ i0 ≤ 11, we put the first i0 ordered values in Group 1 (the source group) and the remaining in Group 2 (the non-source group). By using the ‘elbow’ rule in cluster analysis, we chose i0 = 5, since the variance percentage var(Group 1)/(var(Group 1) + var(Group 2)) attained the maximum when i0 = 5. The resulting Group 1 includes the log-contrast values at the locations (−5, 5, 5), (−5, 5, 6), (−6, 4, 7), (−4, −4, 8), and

130


Table 1 The peak locations and log-contrast values for the 13 local peaks Peak location (cm)

log-contrast

Interpolated log-contrast

Face-selective region

(−5, 5, 5) (−5, 5, 6) (−6, 4, 7) (−4, −4, 8) (−5, 5, 4) (−4, −4, 0) (−5, −3, 9) (−4, −2, −1) (0, −1, 3) (−6, −1, 1) (−5, 4, 2) (2, 3, 10) (0, −1, 11)

0.6218 0.5263 0.5201 0.3293 0.1243 0.1144 0.0877 0.0797 0.0338 0.0036 −0.0069 −0.0562 −0.5374

0.6151 0.5084 0.4624 0.3078 0.1178 0.0640 0.0928 0.0551 −0.0136 −0.0243 −0.0222 −0.3745 −0.5782

Close to STS Close to STS Close to STS N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A

By N/A, we mean that the location is not close to any face-selective regions.

(−5, 5, 4) cm, while Group 2 includes the values at the locations (−4, −4, 0), (−5, −3, 9), (−4, −2, −1), (0, −1, 3), (−6, −1, 1), (−5, 4, 2), (2, 3, 10), and (0, −1, 11) cm). Unlike the latter 8 local peaks, the former 5 local peaks had relatively larger powers in the responses to the face stimulus than to the scrambled face stimulus. We plotted the orthogonal slices of the log-contrast maps at the global peak location (−5, 5, 5) cm and at the local peak location (−4, −4, 8) as shown in Figure 4. The estimated time-courses at the peaks in Group 1 are displayed in Figure 1. The orthogonal slice plots and the estimated time-courses at the other local peaks can be found in Web Figure C2, the Web Appendix C in the Online Supplementary Material. The Bayesian PCA predicted that there is a single effective source. Interestingly, Figure 1 has revealed that during the time period 120–300 ms, there were larger evoked responses (i.e., larger M/N170 amplitudes) to the face stimulus than to

the scrambled face stimulus, at the locations (−5, 5, 5), (−5, 5, 6), (−6, 4, 7), and (−4, −4, 8) cm, respectively. Note that M/N170 is a typical pattern of face-perception (see Henson et al., 2010). Furthermore, Web Figure C3, the Web Appendix C in the On-line Supplementary Material highlights two putative clusters around (−5, 5, 5) and (−4, −4, 8) cm, where the brain gave larger power responses to the face stimulus than to the scrambled face stimulus. Figure 4 indicates the former is stronger than the latter. In fact, Table 1 shows that the log-contrast at location (−5, 5, 5) cm was twice that at location (−4, −4, 8) cm. Figure 4 also shows that the locations (−5, 5, 5) cm is close to the right superior temporal sulcus (STS) region, which was known to be associated with the face-perception (Davies-Thompson and Andrews, 2007). 5. Conclusion In the past decade, the performance of the beamformers has been examined by extensive simulations (e.g., van Veen et al., 1997; Brooks et al., 2008; Quraan et al., 2011). In particular, Quraan et al. (2011) demonstrated that the beamformer methods can fail to effectively attenuate the interference of strong sources from outside the region of interest, resulting in leakage. Despite these developments, there is lack of a general and statistically sound theory which allows one to assess: (1) when multiple sources are detectable, (2) how the spatial and temporal dimensions determine the beamforming accuracy in source localization and estimation, (3) how the sampling rate is related to the number of sensors, and (4) whether using the thresholded sensor covariance can improve the performance of a beamformer. Here, we have developed a general asymptotic theory for addressing the above questions and illustrated it by using simulations. We have proposed a family of beamformers by using covariance thresholding. We have shown how to select the thresholding level. The simulations have suggested that the proposed thresholding does improve the performance of the standard SAM procedure. The new methodology is further assessed by a real MEG data analysis. The proposed methodology is general and can be used to screen features in a general varying coefficient model. However, the details of

Figure 4. The interpolated log-contrast map when c0 = 0. The first two columns of plots show the orthogonal slices (through the maximum peak (−5, 5, 5) cm and along the directions of three axes, respectively) of the log-contrast map. The remaining two columns of plots show the orthogonal slices (through the local peak (−4, 4, 8) cm and along the directions of three axes, respectively) of the log-contrast map. In each slice, the values of the log-contrast have been thresholded by the half of the maximum. This figure appears in color in the electronic version of this article.

Source Localization these studies are beyond the scope of this article and will be presented elsewhere. 6. Supplementary Materials Web Supplementary Materials, referenced in Sections 2–4, and the Matlab code are available with this paper at the Biometrics website on Wiley Online Library.

Acknowledgements We thank Professor Richard Henson from MRC, Cambridge for sharing his MEG data with us and an Associate Editor and two anonymous reviewers for their constructive comments.

References Bickel, P. and Levina, E. (2008). Covariance regularization by thresholding. The Annals of Statistics 36, 2577–2604. Brookes, M. J., Vrba, J., Robinson, S. E., Stevenson, C. M., Peters, A. M., Barnes, G. R., Hillebrand, A., and Morris, P. G. (2008). Optimizing experimental design for MEG beamformer imaging. NeuroImage 39, 1788–1802. Davies-Thompson, J. and Andrews, T. J. (2011). The localization and functional connectivity of face-selective regions in the human brain. Journal of Vision 11, 647. Friston, K., Daunizeau, J., Kiebel, S., Phillips, C., Trujillo-Barreto, N., Henson, R., Flandin, G., and Mattout, J. (2008). Multiple sparse priors for the E/MEG inverse problem. Neuroimage 39, 1104–1120. Goodnight, J. H. (1979). A tutorial on the SWEEP operator. American Statistician 33, 149–158. Hamalainen, M., Hari, R, Ilmoniemi, R. J., Knuutila, J., and Lounasmaa, O. V. (1993). Magnetoencephalographytheory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of Modern Physics 21, 413–460.

131

Henson, R. N., Flandin, G., Friston, K. J., and Mattout, J. (2010). A parametric empirical Bayesian framework for fMRI-constrained MEG/EEG source reconstruction. Human Brain Mapping 31, 1512–1531. Minka, T. (2000). Automatic choice of dimensionality for PCA. Technical Report 514, MIT Media Lab, Perceptual Computing Section. Oostenveld, R., Fries, P., Maris, E., and Schoffelen, J. M. (2011) FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Computational Intelligence and Neuroscience 2011, 156869. Orfanidis, S. J. (1996) Introduction to Signal Processing. New York: Prentice-Hall. Quraan, M. A., Moses, S. N., Hung, Y., Mills, T., and Taylor, M. J. (2011). Detection and localization of hippocampal activity using beamformers with MEG: A detailed investigation using simulations and empirical data. Human Brain Mapping 32, 812–827. Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis, 2nd edition. New York: Springer. Robinson, S. and Vrba, J. (1998). Functional neuroimaging by synthetic aperture magnetometry. In: Recent Advances in Biomagnetism, T. Yoshimoto, M. Kotani, S. Kurikim H. Karibe, and N. Nakasato (eds), 302–305. Sendai, Japan: Tohoku University Press. Sarvas, J. (1987). Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem. Physics in Medicine and Biology 32, 11–22. Sekihara, K. and Nagarajan, S. S. (2010). Adaptive Spatial Filters for Electromagnetic Brain Imaging. Berlin: Springer-Verlag. Van Veen, B. D., Van Drongelen, W., Yuchtman, M., and Suzuki, A. (1997). Localization of brain electrical activity via linearly constrained minimum variance spatial filtering. IEEE Transactions on Biomedical Engineering 44, 867–880.

Received January 2013. Revised September 2013. Accepted October 2013.

Deep Source Localization with Magnetoencephalography Based on Sensor Array Decomposition and Beamforming.

Data-based matched-mode source localization for a moving source.

Compressive sensing beamforming based on covariance for acoustic imaging with noisy measurements.

MEG source reconstruction based on identification of directed source interactions on whole-brain anatomical networks.

MEG data.

Covariance Partition Priors: A Bayesian Approach to Simultaneous Covariance Estimation for Longitudinal Data.

Large Covariance Estimation by Thresholding Principal Orthogonal Complements.

Multiple dipole modeling and localization from spatio-temporal MEG data.

Synthetic-aperture based photoacoustic re-beamforming (SPARE) approach using beamformed ultrasound data.

Beamformer source analysis and connectivity on concurrent EEG and MEG data during voluntary movements.

Erratum: Data-based matched-mode source localization for a moving source [J. Acoust. Soc. Am. 135, 1218-1230 (2014)].

MEG versus EEG localization test.

Multi-session statistics on beamformed MEG data.

CMT: a constrained multi-level thresholding approach for ChIP-Seq data analysis.

Intracranial EEG potentials estimated from MEG sources: A new approach to correlate MEG and iEEG data in epilepsy.

Target localization through a data-based sensitivity kernel: a perturbation approach applied to a multistatic configuration.

A model-based approach for imputing censored data in source apportionment studies.

Energy-Based Acoustic Source Localization Methods: A Survey.

A regional covariance approach to the analysis of functional patterns in positron emission tomographic data.

Correction: Accounting for Linear Transformations of EEG and MEG Data in Source Analysis.

MEG source estimation via a group lasso.

Corrigendum to "Closely Spaced MEG Source Localization and Functional Connectivity Analysis Using a New Prewhitening Invariance of Noise Space Algorithm".

Closely Spaced MEG Source Localization and Functional Connectivity Analysis Using a New Prewhitening Invariance of Noise Space Algorithm.

A Circular Polarizer with Beamforming Feature Based on Frequency Selective Surfaces.