Toward the detection of gravitational waves under non-Gaussian noises II. Independent component analysis.

336

Proc. Jpn. Acad., Ser. B 92 (2016)

[Vol. 92,

Toward the detection of gravitational waves under non-Gaussian noises II. Independent component analysis By Soichiro MORISAKI,*1,*2 Jun’ichi YOKOYAMA,*1,*2,*3,† Kazunari EDA*1,*2 and Yousuke ITOH*1 (Communicated by Yoshihide KOZAI,

M.J.A.)

Abstract: We introduce a new analysis method to deal with stationary non-Gaussian noises in gravitational wave detectors in terms of the independent component analysis. First, we consider the simplest case where the detector outputs are linear combinations of the inputs, consisting of signals and various noises, and show that this method may be helpful to increase the signal-to-noise ratio. Next, we take into account the time delay between the inputs and the outputs. Finally, we extend our method to nonlinearly correlated noises and show that our method can identify the coupling coefficients and remove non-Gaussian noises. Although we focus on gravitational wave data analysis, our methods are applicable to the detection of any signals under non-Gaussian noises. Keywords: gravitational component analysis

waves,

data

1. Introduction

Direct detection of gravitational waves (GWs) was achieved for the first time in 20151) by the advanced Laser Interferometer Gravitational wave Observatory (aLIGO).2) This discovery brought a great impact on science, announcing the onset of gravitational wave astronomy. Following aLIGO, the large-scale cryogenic gravitational wave telescope (LCGT) now known as KAGRA, is being constructed in Kamioka, Japan.3) It is the first detector in deep underground and expected to provide useful knowledges for future detectors. In addition, it will improve determination accuracy of a gravitational wave source direction on the sky together with aLIGO and Virgo, and help to obtain more information from GWs. *1 Research Center for the Early Universe (RESCEU), Graduate School of Science, The University of Tokyo, Tokyo, Japan. *2 Department of Physics, Graduate School of Science, The University of Tokyo, Tokyo, Japan. *3 Kavli Institute for the Physics and Mathematics of the Universe (Kavli IPMU), UTIAS, WPI, The University of Tokyo, Kashiwa, Chiba, Japan. † Correspondence should be addressed: J. Yokoyama, Research Center for the Early Universe (RESCEU), Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyoku, Tokyo 113-0033, Japan (e-mail: [email protected]. ac.jp).

doi: 10.2183/pjab.92.336 ©2016 The Japan Academy

analysis,

non-Gaussian

noises,

independent

Because gravity is the weakest force among the four elementary interactions known to the present, GWs have a high penetrating power. Thanks to the smallness of their cross section, GWs are hard to be influenced by interstellar medium during the propagation unlike electromagnetic waves. GWs bring information on deep inside of compact stars such as neutron stars directly to us. By the same token, this property makes its detection very difficult. To extract weak GW signals buried in detector noises, a commonly used method is the matched filtering technique which is based on the maximum likelihood method assuming stationary, Gaussian noises.4) However, non-Gaussian noises frequently appear in actually measured output data. They would hinder the sensitivity of the matched filter, which is known to be susceptible to non-Gaussian noises stemming from instrumental and environmental artifacts, and results in false alarms. One approach to deal with non-Gaussian noises is to appropriately modify the functional form of the likelihood.5),6) In the previous paper, one of us introduced the likelihood function based on the Edgeworth expansion for weak nonGaussian noise and the Gaussian mapping method for strong non-Gaussian noise.7) In this paper, we propose another method to mitigate the effect of non-Gaussian noises using the independent component analysis (ICA)8)–10) (see

No. 8]

Gravitational wave detection under non-Gaussian noises II.

Refs. 11, 12 for textbooks). The ICA has been developed in the context of blind source separation among which the cocktail party problem is wellknown as a representative example. The ICA enables us to decompose output data into statistically independent components on the assumption that there is at most one Gaussian component in the data. Here, we apply the ICA method to data analysis for burstlike GW signals and investigate how well the ICA works to separate stationary non-Gaussian noise from output data. The rest of the paper is organized as follows. In §2.1, we consider the simplest case where the detector outputs are linear combinations of the inputs. Next, the time delay between the inputs and the outputs, which exists in real experiments, is taken into account in §2.2. Finally, we study the applicability of our method to nonlinearly correlated noises in §2.3. The last section §3 is devoted to our conclusion. 2. Independent component analysis (ICA)

As is seen in the previous paper,7) signal detection under non-Gaussian noises is much involved than the case with Gaussian counter parts since the optimal statistic has a more complicated form. Interestingly, however, there have been some proposals to make use of non-Gaussian natures of both signals and noises to separate signals from nonGaussian noises known as the independent component analysis (ICA). Here we consider applicability of this new approach for the detection of GWs in a simple model. Suppose that there exist N statistically independent sources of signals and N outputs and that there is at most one source which follows the Gaussian distribution and all the other sources obey non-Gaussian distributions. The ICA is a method to identify each independent source of signals making use of the non-Gaussianity and statistical independence of sources. So here non-Gaussianity is not an obstacle of the analysis but rather a necessary ingredient. Indeed, if multiple sources obey Gaussian distributions, we cannot distinguish them using ICA even if they are statistically independent. 2.1. The simplest model. Here let us consider a simple problem as a first step of realistic application of ICA to the detection of GWs. To be specific, let us identify two sources of signals s1(t), and s2(t) as a burst-like gravitational wave signal h(t) and non-Gaussian noise k(t) such as seismic noises. That is,

sðtÞ ¼

s1 ðtÞ s2 ðtÞ

337

¼

hðtÞ

kðtÞ

:

½1

In addition to the output from the laser interferometer x1(t), we make use of the output from an environmental monitor such as a seismograph x2(t), and assume that they are linear functions of the signal s(t) as x1 ðtÞ xðtÞ ¼ ¼ AsðtÞ ½2 x2 ðtÞ where A is assumed to be a time independent matrix. By definition, the gravitational wave signal obeys a probability distribution function (PDF) r1 ðh; tÞ ¼ ðh hðt; ÞÞ

½3

where h(t, 3) is the actual waveform of gravitational radiation emitted from some source, say a binary neutron star coalescence, to be observed at the position of a laser interferometer, where 3 collectively denotes parameters of the source. On the other hand, we do not specify the PDF of k(t), r2(s2), except that it is a super-Gaussian distribution such as a Student t-distribution with a larger tail than Gaussian.13) The detector output of a laser interferometer, of course, suffers from Gaussian noises n(t) besides non-Gaussian noise k(t). Hence [2] should actually read nðtÞ xðtÞ ¼ AsðtÞ þ nðtÞ; nðtÞ ¼ : ½4 0 Here we have not incorporated any Gaussian noise to the second line where the signal k(t) itself consists of (non-Gaussian) noises and any Gaussian noise can be incorporated to a part of it. We now introduce a trick to replace the original source s1(t) F h(t) by s1(t) F h(t) D n(t), that is, we regard the Gaussian noise as a part of the original signal. Since n(t) is a Gaussian noise with vanishing mean, its statistical property is entirely characterized by the two-point correlation function K(t ! tB) F E[n(t)n(tB)]. Then the marginal distribution function of s1(t) is given by 1 1 2 r1 ½s1 ðtÞ ¼ pffiffiffiffiffiffi exp 2 ðs1 ðtÞ hðt; ÞÞ ; 2 2 2 ¼ Kð0Þ: ½5 Thus s1(t) now satisfies a simple Gaussian distribution which is much easier to handle with than the delta-function distribution [2], and s(t) and x(t) are related by a simple formula x(t) F As(t). Now our

S. MORISAKI et al.

338

Z

tentative goal is to find the inverse matrix of A whose components are not known precisely. One may set it as a11 a12 A¼ ; ½6 0 a22

¼

½7

such that each component of the transformed variables y is mutually statistically independent. Thanks to the assumption [6], the matrix W also takes a form w11 w12 W¼ : ½8 0 w22 If we knew all the components of A, the matrix W could simply be given by the inverse matrix W F A!1, in which case we would find y F s. However, since we do not know them we attempt to determine W in such a way that the components of y, y1(t) and y2(t) to be statistically independent with each other as much as possible. The mutual independence of statistical variables may be judged by introducing a cost function L(W) which represents a “distance” in the space of statistical distribution functionals. As an example, we adopt the Kullback-Leibler divergence14) defined between two arbitrary PDFs p(y) and q(y) as Z pðyÞ dy D½pðyÞ; qðyÞ ¼ pðyÞ ln qðyÞ pðyÞ ¼ Ep ln ; ½9 qðyÞ where Ep[·] denotes an expectation value with respect to a PDF p. We examine the distance between the real distribution function of statistically independent variables s, r(s) F r1[s1(t)]r2[s2(t)], and a distribution of y, py, constructed from the observed distribution function of x through the linear transformation y F Wx as py ðyÞ kW 1 kpx ðxÞ;

½10

where ||W !1|| denotes the determinant of W !1. The cost function of py(y) from r(s) is given by Lr ðW Þ ¼ D½py ðyÞ; rðyÞ ¼ Epy ½ln py ðyÞ Epy ½ln rðyÞ

kW 1 kpx ðxÞ ln½kW 1 kpx ðxÞdy

Epy ½ln rðyÞ Z ¼ lnkW k þ px ðxÞ ln½px ðxÞdx Epy ½ln rðyÞ

since the gravitational wave is so weak that it will not affect any seismograph. The aim of ICA is to find a linear transformation y ¼ W x;

[Vol. 92,

¼ H½x Epy ½ln kW krðyÞ ¼ H½x Epx ½ln pðx; W Þ;

½11

pðx; W Þ kW krðyÞ;

½12

with

and

Z H½x

px ðxÞ ln½px ðxÞdx:

½13

The PDF of x in the last expression of [11] has W dependence because p(x, W ) is a PDF of x which is made out of the PDF of y (F s in this particular case) through the relation y F Wx. The above formula shows that the matrix W which minimizes the cost function Lr(W ) also maximizes the log-likelihood ratio of x. Since we do not know r(y) a priori, we instead adopt an arbitrary mutually independent distribution q(y) F q1(y1)q2(y2) in the cost function. Defining a PDF consisting of marginal distribution functions Z Z p~ðyÞ py ðy1 ; y2 Þdy2 py ðy1 ; y2 Þdy1 ¼ p~1 ðy1 Þ~ p2 ðy2 Þ;

½14

we find the following relation Lq ðW Þ ¼ D½py ðyÞ; qðyÞ ¼ D½py ðyÞ; p~ðyÞ þ D½~ pðyÞ; qðyÞ

½15

holds. Since the Kullback-Leibler divergence is known to be positive semi-definite, a distribution that minimizes the first term in the right-hand-side yields the desired linear transformation y F Wx for which this term vanishes. In this case the second term gives a discrepancy due to the possible incorrect choice of q(y). In this sense it would be better to choose a realistic trial function q(y) as much as possible. It is known in fact that even for an arbitrary choice of q(y), the correct W gives an extremum of Lq(W ). Hence we solve c@Lq ðW Þ ¼ 0: @wij From

½16

No. 8]


Each component of [22] reads as follows.

Lq ðW Þ ¼ H½x ln kW k Epy ½ln qðyÞ H½x ln kW k Epy ½fðyÞ; fðyÞ ln qðyÞ;

½17 ½18

dW ln kW k ln kW þ dW k ln kW k

1 ¼ 2 E½ðw11 x1 þ w12 x2 hÞðw11 x1 þ w12 x2 Þ ¼ 2 ½w211 hx21 i þ 2w11 w12 hx1 x2 i þ w212 hx22 i ðw11 hhx1 i þ w12 hhx2 iÞ; 0 ¼ E½ðw11 x1 þ w12 x2 hÞw22 x2 ¼ 2 ½w11 w22 hx1 x2 i þ w12 w22 hx22 i w22 hhx2 i; ½26

¼ TrðdW W 1 Þ ¼ ðW 1 Þji dwij ; and @f dwij xj ; @yi

we find

@f dW Lq ðW Þ ¼ Epy ðW 1 Þji xj dwij : ½19 @yi

In order to satisfy [16] the above expectation value should vanish for each index. Multiplying wkj to the argument of the expectation value, we find it equivalent to @f Epy yk ½20 þ ki ¼ 0: @yi

0 ¼ c2 E½y1 tanh y2 ¼ c2 E½ðw11 x1 þ w12 x2 Þ tanh w22 x2 ¼ c2 w11 hx1 tanhðw22 x2 Þi þ c2 w12 hx2 tanhðw22 x2 Þi;

½27

1 ¼ c2 E½y2 tanh y2 ¼ c2 w22 hx2 tanhðw22 x2 Þi:

½28

Because a gravitational wave with a detectable amplitude is a rare event, long-time averages of hx1 and hx2 should vanish. Then from [26] we obtain w12 ¼

so that w11 ¼

½21

hx22 i hx21 ihx22 i hx1 x2 i2

w12 ¼

and ½22

½30

! 12

hx1 x2 i ðhx21 ihx22 i2

1

hx1 x2 ihx22 iÞ 2

:

½31

On the other hand, [27] yields a relation

We determine W so that [22] is satisfied for each component choosing plausible forms of qi(yi). For q1(y1) we take " # 1 ðy1 hðt; ÞÞ2 ; ½23 q1 ðy1 Þ ¼ pffiffiffiffiffiffi exp 22 2 based on [5], so that H1(y1) F (y1 ! h(t, 3))/ #, the sum over t can be approximated by

and hx1 tanhðw22 x2 Þi w11 hx2 tanhðw22 x2 Þi a12 hk tanhðw22 a22 kÞi w11 a22 hk tanhðw22 a22 kÞi a12 ¼ w11 ; ½39 a22 which means these two equations are consistent with each other. We also find that, even from the observational data, we can deduce w11a12 D w12a22 F 0 and y1 variable is indeed free from non-Gaussian noise. The resultant S/N of y1 is simply given by w12 ¼

S h pffiffiffiffiffiffiffiffiffiffiffiffi : N E½n2

½40

Thus, the non-Gaussian noise is effectively removed here.

ts X þT 1

sðt Þei!N ðtÞ t

t¼ts

ts X þT 1

sðtÞei!N t

t¼ts

¼ ~sð!N ; ts Þ:

½44

Thus, when we take a long time-series compared to # for the Fourier expansion, the following equality holds for each Fourier mode. ~ N Þ~sð!N ; ts Þ; ~ ð!N ; ts Þ ¼ Að! x ~ NÞ Að!

1 X

AðÞei!N :

½45

¼0

To derive this relation, we have assumed that A is independent of ts. Then for each Fourier mode, A and W take the following form

No. 8]


a~12 ð!N Þ ; a~22 ð!N Þ ~ 12 ð!N Þ w ~ ð!N Þ ¼ : ½46 W ~ 22 ð!N Þ 0 w Since the normalization of yi is arbitrary, one can ~ 11 ð!N Þ ¼ w ~ 22 ð!N Þ ¼ 1. Calculating the Fourier put w components for various values of ts, for example ts F 0, T, 2T, +, MT (M is an integer.), and following the same argument as in the case of the simplest model, we find ~ NÞ ¼ Að!

a~11 ð!N Þ 0 ~ 11 ð!N Þ w

~ 12 ð!N Þ ¼ w

hx1 ð!N ; ts Þx2 ð!N ; ts Þits ; hx22 ð!N ; ts Þits

~ 11 ð!N Þ ¼ w

hx1 ð!N ; ts Þx2 ð!N ; ts Þits hx22 ð!N ; ts Þits

½47

½48

follows a Gaussian distribution, and we can find the increased S/N for each frequency f F B/2: as in the previous subsection. 2.3. Nonlinear coupling. So far we have considered the linear models [4] and [41]. However, in real gravitational wave experiments, there exist nonlinearly correlated noises.16) In order to investigate the applicability of ICA to nonlinear cases, we consider the following simple nonlinear model as a first step: x1 ðtÞ a b hðtÞ þ nðtÞ ¼ x2 ðtÞ 0 1 kðtÞ c½hðtÞ þ nðtÞkðtÞ þ : ½49 0 Without loss of generality, we can set the covariances of n(t) and k(t) to be unity by redefining a, h(t), b, and x2(t): E½n2 ðtÞ ¼ E½k2 ðtÞ ¼ 1;

½50

and we define n(t) so that a 0:

½51

In addition, we consider the case where |b| is not much larger than a and |c| (see the argument above [67] and [68]). By assumption, the marginal PDF of n(t) is the normal distribution, 1 n2 q1 ðnÞ ¼ pffiffiffiffiffiffi exp ; ½52 2 2 whereas the PDF of k, q2(k), is not known.

Our goal is to remove the non-Gaussian noise k(t). If we know the coefficients, a, b, and c, we can obtain the time series without non-Gaussian noise, h(t) D n(t) by using the transformation hðtÞ þ nðtÞ ¼

x1 ðtÞ bx2 ðtÞ : a þ cx2 ðtÞ

½53

Therefore we estimate the values of a, b and c. When we estimate them, we consider a long-time average of the data. Assuming that the detectable burst gravitational waves rarely come to our detectors, we can neglect them. Therefore we may consider a simplified model x1 ðtÞ a b nðtÞ cnðtÞkðtÞ ¼ þ ½54 x2 ðtÞ 0 1 kðtÞ 0 to estimate the coefficients. To begin with, the coefficient b can be estimated easily by

where hits denotes an average with respect to ts. So the variable ~ 12 ð!N Þ~ y~1 ð!N ; ts Þ ¼ x~1 ð!N ; ts Þ þ w x2 ð!N ; ts Þ

341

best ¼ hx1 ðtÞx2 ðtÞi;

½55

because E½nðtÞkðtÞ ¼ E½nðtÞE½kðtÞ ¼ 0;

½56

E½n ðtÞkðtÞ ¼ E½n ðtÞE½kðtÞ ¼ 0:

½57

2

2

The next targets are a and c. To find their values, we consider a transformation similar to [53], y1 ðtÞ ¼

x1 ðtÞ best x2 ðtÞ ; þ x2 ðtÞ

y2 ðtÞ ¼ x2 ðtÞ

½58

and regard y1(t) as the reconstructed h(t) D n(t). They are very close to each other when , F a and . F c. As the Jacobian of the transformation [58] is given by @ðy1 ; y2 Þ 1 ; J ¼ ¼ ½59 @ðx ; x Þ þ x 1

2

2

the cost function can be expressed as Lð; Þ ¼ D½py ðyÞ; q1 ðy1 Þq2 ðy2 Þ py ðyÞ ¼ Epy ln q1 ðy1 Þq2 ðy2 Þ 2 0

13

6 B C7 J 1 px ðxÞ 6 B C7 ¼ Epx 6lnB C7 4 @ A5 x1 best x2 q1 q2 ðx2 Þ þ x2 " # 1 x1 best x2 2 ¼ Epx 2 þ x2 þ Epx ½ln j þ x2 j þ const:

½60

We minimize it with respect to , and .. The derivatives of the cost function read

S. MORISAKI et al.

342

" # @L ð þ x2 Þ2 ðx1 best x2 Þ2 ¼ Epx ; @ ð þ x2 Þ3 " # @L x2 ð þ x2 Þ2 x2 ðx1 best x2 Þ2 ¼ Epx ; @ ð þ x2 Þ3

[Vol. 92,

½61 ½62

and their learning rules are / @L and @ / @L . However, the cost function is mathemati@ cally ill-defined because of the singularity at , D .x2 F 0. Therefore we must improve these rules. We consider the following modified learning rules, / Epx ½ð þ x2 Þ2 ðx1 best x2 Þ2 ;

½63

2

2

/ Epx ½x2 ð þ x2 Þ x2 ðx1 best x2 Þ ;

½64

which are obtained by removing the denominators of [61] and [62]. From [49] and [50], the rules can be written as / ½2 a2 ðb best Þ2 þ 2 c2 ; 2

½65

/ 2ð acÞ þ ½ðb best Þ þ c ; 2

2

½66

where we have defined 0 by ¼ Epx ½k ðtÞ. Now we assume |b ! best| = a, |c|, which means that we do not consider the case where |b| is too large and the number of the samples is too small for us to extract the information about a and c. In this case, [65] and [66] can be written as 3

/ ½2 þ 2 a2 c2 ;

½67

/ ½2ð acÞ þ ð c Þ; 2

2

½68

respectively. We study these learning rules and show that they tell us where the point (a, c) is. First, we locate their stationary points, which satisfy ", F 0 and ". F 0. The curves defined by ", F 0 and ". F 0 are depicted in Fig. 1. They are symmetrical with respect to the lines ‘’ defined by the following equations, ! 12 1 ð 2 þ 4Þ 2 þ þ ; ½69 ‘ :¼ 1 ð 2 þ 4Þ 2 ! 12 1 2 2 ð þ 4Þ ‘ : ¼ : ½70 1 ð 2 þ 4Þ 2 þ We can easily check that (,, .) F (a, c) satisfies ", F ". F 0. Therefore, the points satisfying ", F ". F 0 are (a, c) and those which are symmetrical to (a, c) with respect to ‘D, ‘! and the origin. However, all these points are not necessarily stable points. In order to locate the convergence point of [67] and [68], we must also study the evolution of (,, .). Focusing on the sign of ", and

Fig. 1. The curves defined by ", F 0 and ". F 0 with a F 1, c F 0.5, 0 F 1. They are symmetrical with respect to ‘D and ‘!. The intersections are the stationary points of [67] and [68]. The convergence point is one of them.

"., we depict the flow schematically in Fig. 2. We refer to the stationary point on the region defined by ! 12 ! 12 1 1 ð 2 þ 4Þ 2 ð 2 þ 4Þ 2 þ

Toward the detection of gravitational waves under non-Gaussian noises I. Locally optimal statistic.

Gravitational Waves from Gravitational Collapse.

Quantum Emulation of Gravitational Waves.

The evaluation of phasemeter prototype performance for the space gravitational waves detection.

Extraction of gravitational waves in numerical relativity.

Telescope captures view of gravitational waves.

Thin Film Helps Detect Gravitational Waves.

Gravitational waves and electrodynamics: new perspectives.

Gravitational waves from periodic three-body systems.

Detection of explosives on the surface of banknotes by Raman hyperspectral imaging and independent component analysis.

Formation of black hole and emission of gravitational waves.

Gravitational effective action at second order in curvature and gravitational waves.

Reflective coating optimization for interferometric detectors of gravitational waves.

Observation of Gravitational Waves from a Binary Black Hole Merger.

Detecting high-frequency gravitational waves with optically levitated sensors.

Astrophysics. India poised to join hunt for gravitational waves.

Quantitative interpretation of mineral hyperspectral images based on principal component analysis and independent component analysis methods.

Independent component analysis of instantaneous power-based fMRI.

Gravitational waves from the sound of a first order phase transition.

Measuring the spin of black holes in binary systems using gravitational waves.

Detection of the gravitational lens magnifying a type Ia supernova.

Parallel group independent component analysis for massive fMRI data sets.

A Bayesian inverse solution using independent component analysis.

A Local Learning Rule for Independent Component Analysis.