S0893-6080(14)00179-8 http://dx.doi.org/10.1016/j.neunet.2014.07.013 NN 3374

To appear in:

Neural Networks

Please cite this article as: Yamaguti, Y., & Tsuda, I. Mathematical modeling for evolution of heterogeneous modules in the brain. Neural Networks (2014), http://dx.doi.org/10.1016/j.neunet.2014.07.013 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Click here to view linked References

Mathematical Modeling for Evolution of Heterogeneous Modules in the Brain Yutaka Yamagutia,b,∗, Ichiro Tsudaa,b Research Institute for Electronic Science, Hokkaido University, Kita-12, Nishi-7, Kita-ku, Sapporo, Hokkaido 060-0812, Japan b Research Center for Integrative Mathematics, Hokkaido University, Kita-12, Nishi-7, Kita-ku, Sapporo, Hokkaido 060-0812, Japan a

Abstract Modular architecture has been found in most cortical areas of mammalian brains, but little is known about its evolutionary origin. It has been proposed by several researchers that maximizing information transmission among subsystems can be used as a principle for understanding the development of complex brain networks. In this paper, we study how heterogeneous modules develop in coupled-map networks via a genetic algorithm, where selection is based on maximizing bidirectional information transmission. Two functionally diﬀerentiated modules evolved from two homogeneous systems with random couplings, which are associated with symmetry breaking of intrasystem and intersystem couplings. By exploring the parameter space of the network around the optimal parameter values, it was found that the optimum network exists near transition points, at which the incoherent state loses its stability and an extremely slow oscillatory motion emerges. Keywords: modularity, coupled oscillator, information flow, transfer

∗

Corresponding author. Tel.: +81 11 706 2593; Email address: [email protected] (Yutaka Yamaguti)

Preprint submitted to Neural Networks

June 16, 2014

entropy, evolution, symmetry breaking 1. Introduction Modular architecture is one of the most frequently referenced concepts of neural organization (Szent´agothai, 1983; Felleman & Van Essen, 1991; Mountcastle, 1997). In biological systems, such modular architecture has been self-organized in ontogeny, influenced by phylogeny (Wagner et al., 2007). In contrast, the modules and their interactions of artificial machines are designed by humans. Understanding the mechanism giving rise to functionally diﬀerentiated modules in biological systems, such as cortical module architectures, is of great interest. In theoretical studies, it has been proposed that maximizing information transmission between subsystems can be used as a guiding principle for understanding the development and evolution of complex brain networks. Linsker (1988, 1989, 1997) shows that information transmission between successive layers of feed-forward networks is a viable principle for designing functional neural networks. Recently, Tanaka et al. (2009) showed that the learning algorithm that maximizes information retention in recurrent networks also gives rise to the appearance of biological structures, such as cell assemblies, and even to dynamics, such as spontaneous activity of synfire chain and critical neuronal avalanches. One of the present authors has proposed the idea of a new self-organization principle based on a variational principle (Tsuda, 1984, 2001; Kaneko & Tsuda, 2001): components (or elements) are self-organized according to constraints that act on the whole system, whereas according to the usual self-organization principles (ex. Nico2

lis & Prigogine, 1977; Haken, 1980), macroscopic order is self-organized via interactions among microscopic elements. Actually, neuron-like elements as components of a system have been obtained under the condition of maximum transmission of information across a whole system (Ito & Tsuda, 2007; Watanabe et al., 2014). In the brain, bidirectional information transmission among modules is crucial for information processing. In fact, most connections between cortical modules are known to be bidirectional (Felleman & Van Essen, 1991). Yamaguti et al. (2014) have found intermittent switching of direction of information flow in two heterogeneously coupled chaotic systems. Such a mechanism may be applicable to bidirectional information transmission in neural systems. Motivated by these studies, here we propose a mathematical model for functional diﬀerentiation induced by selection based on maximizing bidirectional information transmission. We try to extract the essence of evolutionary dynamics by investigating a coupled-oscillator network model. We construct randomly coupled oscillator networks, each consisting of two sub-networks, by using a discretized version of the Kuramoto phase oscillator model (Kuramoto, 1984). Each sub-network consists of N oscillators, whose states are represented by phase variables. A genetic algorithm is used to modify the parameters to the direction of better fitness. Transfer entropy (Schreiber, 2000), which measures directed information transfer, is estimated to quantify information flow between the two sub-networks in both directions and their product is regarded as fitness in the evolutionary process. In Section 2, the proposed network model, analytic method, and procedure for evolution by genetic algorithm are described. In Section 3, we

3

give numerical results of this evolutionary process. In Section 4, we study the fitness landscape around the optimum network, exploring the parameter space as a means of characterizing the evolutionary process from the viewpoint of dynamical systems. In Section 5, dependencies of the coupling density on heterogeneity are investigated. Section 6 is devoted to summary and discussions. 2. Models and methods 2.1. Network model We consider a network of oscillators, which consists of two sub-network systems. Each sub-network consists of N = 200 phase oscillators, whose dynamics is described by a discrete-time version of the Kuramoto model (Kuramoto, 1984; Daido, 1986; Barlev et al., 2010). It is known that the dynamics of weakly coupled oscillators that have stable limit-cycles can be reduced to a coupled-equation of phase oscillators by using a reduction technique (Kuramoto, 1984). The Kuramoto model corresponds to the simplest case in which the coupling term can be represented by the lowest order term of its Fourier series. Therefore, it is reasonable to expect that the results presented here will be robust, even for other coupled-oscillator systems. The numbers of couplings between oscillators in the networks are provided by three parameters, p, q, and r. Roughly, p is the mean probability for a coupling between two oscillators in a whole network, q is the proportion of internetwork couplings to whole couplings, and r is the proportion of internetwork couplings from system 1 to 2 to whole internetwork couplings (See Fig. 1). The coupling probability from an oscillator to another oscillator 4

in the same sub-network (intranetwork coupling) is given by 2p(1 − q), and the coupling probability from an oscillator in sub-network 1 (resp. 2) to an oscillator in sub-network 2 (resp. 1) (internetwork coupling) is given by 4pqr ( 4pq(1 − r)). The existence of coupling between any two oscillators is independent of other couplings. [Fig. 1 about here.] The dynamics of the k-th oscillator in sub-network i (i = 1, 2, k = 1, ..., N ) is described by (i,k)

(i,k)

θt+1 = θt + ω ) ( ∑ α (i,k) (j,l) (i,k) ij + βt , sin θt − θt − ψkl + 2N p (i,k)

(1)

(j,l)∈G

(i,k)

where ω is a natural frequency, α is a coupling strength, βt

denotes the

Gaussian noise term independently imposed on each oscillator, and G(i,k) denotes a set of labels (j, l) for oscillators connecting to oscillator (i, k). In numerical simulations, ω and α are fixed to 1 and 0.1, respectively, and the (i,k)

standard deviation of βt

is set to 0.05.

ij For each ψkl , one of two possible values, (m − 1)π with m = 1 or 2, is (ij)

randomly assigned according to given probabilities pm , which satisfy the (ij)

(ij)

probability conditions pm ≥ 0 and p1

(ij)

+ p2

= 1. In the case of a two-

oscillator system, these correspond to phase locking, with 0 as “in-phase” and π as “anti-phase.” To characterize the macroscopic states of the sub-networks, we define a (i)

complex order parameter (Landau & Ginzburg, 1950; Kuramoto, 1984) Rt

5

for each sub-network as follows: (i) Rt (i)

N √ 1 ∑ (i,k) = exp( −1θt ). N k=1

Its magnitude |Rt | can be used as a measure of phase coherence within (i)

(i)

the sub-network, and its angle Θt = arg Rt represents a mean phase for (2)

each sub-network. The phase diﬀerences given by Φt = Θt

(1)

− Θt

may

be considered to be another order parameter which conveys information on phase modulations. 2.2. Information flow as an index of optimization We used transfer entropy (TE), an information-theoretic measure, to quantify directed information transfer from one sub-network to another (Schreiber, 2000; Kaiser & Schreiber, 2002; Paluˇs & Vejmelka, 2007; Vicente et al., 2011; Yamaguti et al., 2014). Let X = {Xt } and Y = {Yt }, (t = 0, 1, ..) be two discrete random processes with states xt and yt selected from a countable al(m)

phabet A. Let xt

(l)

= (xt , xt−τ , ..., xt−(m−1)τ ) and yt = (yt , yt−τ , ..., yt−(l−1)τ )

be the delay-coordinate vectors with time-step τ . We write probabilities and conditional probabilities as p(·) and p(·|·), respectively. If the future state (l)

(m)

of Y , denoted by yt+τ , depends on yt but not on xt , then the generalized Markov property (l)

(m)

(l)

p(yt+τ |yt , xt ) = p(yt+τ |yt ),

(2)

(m)

holds. If there is any dependence on xt , it can be quantified by the transfer entropy (TE), which is defined as the Kullback–Leibler divergence between the two probability distributions on each side of Eq. (2): ( ) (m) (l) ∑ ) , x p(y |y t+τ t (m) (l) t , TX→Y (τ ) ≡ p(yt+τ , yt , xt ) log (l) ) p(y |y t+τ (l) (m) t yt+τ ,yt ,xt

6

(3)

where the sum runs over the all possible orderings of alphabets. In this way, TE quantifies a directional dependence between two time series. TE can also be represented by using conditional mutual information (Matsumoto & Tsuda, 1988; Paluˇs & Vejmelka, 2007): ( ) (m) (l) TX→Y (τ ) = I yt+τ ; xt |yt ≡

∑

(l)

p(yt )

(l)

yt

∑

(m)

yt+τ ,xt

(m)

(m)

(l)

p(yt+τ , xt |yt ) log

(l)

p(yt+τ , xt |yt ) (l)

(m)

(l)

p(yt+τ |yt )p(xt |yt )

.

When the two processes cannot be assumed to be Markov processes, TE is regarded as just an approximate measure of information transfer. Hereafter we treat the case l = m = 1. TE can be regarded as a nonlinear extension of Granger causality (Granger, 1969), which is often used in neuroscience (for example, Brovelli et al., 2004; Friston et al., 2013). Indeed, these two measures are proved to be essentially equivalent in the case of Gaussian processes (Barnett, 2009). An advantage of TE is that it is a model-free and inherently nonlinear method since its formulation is based on information theory. A comparison between TE and Granger causality with applications to neuroscience data has been given by Vicente et al. (2011). After each numerical simulation of the dynamics given by Eq. (1), the TEs between the mean phases of sub-networks TΘ(1) →Θ(2) (τ ) and TΘ(2) →Θ(1) (τ )

were calculated in each direction, and their maxima Tˆ1→2 = maxτ TΘ(1) →Θ(2) (τ ) and Tˆ2→1 = maxτ TΘ(2) →Θ(1) (τ ) were evaluated. Finally, the product of these values Tˆ1→2 Tˆ2→1 was used as the fitness function of the network. 7

(4)

Here, we describe a procedure for estimating TE from time series produced by the present numerical simulations. First, the phase variables are discretized into Nb = 32 bins of uniform size. We let { } (1,2) (2) (1) (2) µijk ≡ Pr Θt+τ ∈ ∆i , Θt ∈ ∆j , Θt ∈ ∆k , (2)

(1)

be the joint probability that Θt+τ is in the i-th bin, Θt (2)

and Θt

is in the j-th bin,

is in the k-th bin, where ∆i denotes the range of the i-th bin (1,2)

(i, j, k = 0, ..., Nb − 1). Additionally, we use the notation µjk (2)

(1)

∆j , Θt ∈ ∆k } for the joint probability of Θt (2)

(2)

(2)

(1,2)

(1)

≡ Pr{Θt ∈ (1,2)

(1,2)

and Θt , µi|jk ≡ µijk /µjk (2)

(2)

(2)

for the conditional probability, and µi|k ≡ µik /µk ≡ Pr{Θt+τ ∈ ∆i , Θt (2)

∆k }/Pr{Θt

∈

∈ ∆k } for the analogously defined conditional probability.

Then, TE from Θ(1) to Θ(2) with time step τ is defined as a form of Kullback– Leibler divergence: TΘ(1) →Θ(2) (τ ) ≡

∑

(1,2)

(1,2) µijk

log2

µi|jk

i,j,k

(2)

µi|k

.

(5)

The joint probabilities were estimated from empirical distributions obtained from numerical simulations. By rotational symmetry in the present (1,2)

(1,2)

model, we have µi|jk = µ[i+l]|[j+l][k+l] for all (i, j, k) and l(= 1, ...Nb −1), where [·] denotes the modulo-Nb operation. Furthermore, we can assume that the stochastic process defined by state transition between discretized states of (1)

(2)

(Θt , Θt ) has at least one ergodic distribution. Under these assumptions, (1,2)

by a straightforward argument, we can write µjk (1,2)

(1,2)

(1,2)

= µ[j+l][k+l] and, there-

(1,2)

fore, µijk = µ[i+l][j+l][k+l] = µi′ j ′ 0 for all (i, j, k) and l, where i′ ≡ i − k (mod ∑ (1,2) (2) Nb ) and j ′ ≡ j − k (mod Nb ), and µk = j µjk = 1/Nb , which we have

confirmed numerically in the simulation. By using these equations, we can 8

rewrite Eq. (5) as TΘ(1) →Θ(2) (τ ) =

∑

(1,2)

(1,2) µij|0

log2

i,j

µi|j0

(2)

µi|0

.

(6)

We estimated probabilities by using the empirical distribution, pij|0 = cij|0 /L = ∑Nb −1 ′′ ′′ k=0 ci′′ j ′′ k /L, where i ≡ i + k (mod Nb ), j ≡ j + k (mod Nb ), ci′′ j ′′ k is (2)

(1)

(2)

the count of events such that Θt+τ ∈ ∆i′′ , Θt ∈ ∆j ′′ , Θt ∈ ∆k , and L is the

sample size. By using Eq. (6) instead of Eq. (5), the number of bins to be counted is reduced from Nb3 to Nb2 so that a robust estimation of the TE can be made. To further reduce the bias in the estimation of TE, we adopted a method proposed by Kaiser & Schreiber (2002), in which all terms in the sum in Eq. (6) whose corresponding cij|0 values are less than a given threshold mmin are set to 0. This means that for bins having fewer than mmin points, we assume that the probabilities factorize. This results in a bias toward 0 for small samples. In the present study, we set mmin = 10, for which value we obtained stable results. In numerical simulations, after discarding transient trajectories (104 steps), a time series with length L = 5.0 × 104 steps was used for calculating TE. This sample size is considered to be suﬃciently large compared with the time scale of internal dynamics (102 ∼ 103 , see Fig. 4A) so that a whole measure on the concerned set can be estimated. 2.3. Evolutionary algorithms The evolutionary algorithm adopted here is essentially a standard genetic algorithm (Holland, 1975), which is used to develop the networks by gradually improving the set of parameter values in order to find the network that 9

maximizes fitness, here given by Eq. (4). During the evolutionary process, (ij)

the value of p was fixed, while a set of parameters q, r, and pm (m = 1 and 2; i, j = 1 and 2) were considered as a genome to be evolved. We call each parameter a gene. The initial population was set at 48 genomes. Parameter values in all initial genomes were given as follows: q = r = 0.5 and (ij)

pm = 0.5 for all (ij) and m, so that the probability of coupling between any two oscillators does not depend on their sub-networks; thus, the evolution starts from homogeneous networks. For each generation of evolution, networks were generated from genomes, and numerical simulations of the dynamics were performed. Then, the eight genomes with the highest fitness were selected as parents (the elite strategy) and their exact copies were generated for the next generation. It should be (ij)

noted that the parameter values of q, r and pm in the selected networks were passed to the next generation while the connection patterns of each network were not preserved: that is, connection patterns were randomly generated for each generation. Furthermore, 32 and 8 children are made by mutation and crossover, respectively, according to the following procedures. Four mutants were made from each selected parent by adding to the parameter values small random numbers which were taken from independent Gaussian distributions N (0, 0.02). Eight child genomes were made by crossover procedures such that, for each genome, 1) two parent genomes were randomly chosen from 8 selected parents with equal probability and 2) each gene of the child genome is randomly selected from the two corresponding genes of the parents. If the values of q or r of a child genome became negative, then they were fixed at 0; if they became larger than 1, then they were fixed at 1. The values of

10

(ij)

pm were also cut oﬀ and normalized to maintain the probability conditions (ij)

(ij)

pm ≥ 0 and p1

(ij)

+ p2

= 1. These procedures were repeated up to 1000

generations. The results of evolution were not sensitive to the population size or the number of genomes surviving each generation. The model network has several symmetries. For example, the network is symmetric with respect to the exchange of sub-network indices. To reduce symmetry-induced multiplicity of goals, we adopt two conditions, i) r ≥ 0.5

12 and ii) p12 0 > p1 during evolution. Whenever a mutation violated condi-

tion i), the indices of the networks were exchanged by a structure-preserving changes of parameters. Whenever a mutation violated condition ii), the fol12 21 21 lowing parameters were swapped: p12 1 ↔ p0 and p0 ↔ p1 to correct the

violation. 3. Numerical results of evolutionary process 3.1. Evolution of the network We numerically simulated network evolution. In this section, the result for p = 0.05 is shown. By the end of simulation (1000 generations), the mean fitness of the selected networks was almost saturated (Fig. 2A). In all simulations, genome populations were distributed around their mean values in a single cluster; split populations were not observed at the end of the evolution. The parameter values of the network at the end of the evolution and its schematic diagram are summarized in Fig 2. The evolutionary paths of the parameters mentioned above are shown in Fig. 2B. In sub-network 1, in-phase couplings were dominant in intranetwork couplings, but anti-phase couplings remained within the sub-network. In sub11

network 2, only in-phase couplings survived. For internetwork couplings, almost all couplings from sub-network 1 to sub-network 2 stabilized as inphase coupling. In contrast, most of those in the opposite direction became anti-phase coupling. Furthermore, the number of couplings from sub-network 1 to sub-network 2 was greater than that in the opposite direction. In summary, the evolutionary process treated here enhanced the diﬀerentiation of i) intranetwork couplings and internetwork couplings, ii) intranetwork couplings of two sub-networks, iii) coupling types, and iv) the number of internetwork couplings of opposite directions (Fig. 3). The emergence of heterogeneity indicates that symmetry in the couplings between the two sub-networks was broken and structural diﬀerentiation developed via the evolutionary process. Because neither external inputs nor the fitness function have a bias providing diﬀerent eﬀects to two sub-network, this diﬀerentiation occurred spontaneously. [Fig. 2 about here.] [Fig. 3 about here.] 3.2. Dynamics of evolved networks (i)

The dynamics of the phase coherence |Rt | and the internetwork phase diﬀerence Φt of the evolved network are shown in Fig. 4. Both the phase diﬀerence and the intranetwork coherences of both sub-networks exhibited chaotic oscillation. The slow oscillations of the phase coherences of two subnetworks showed a mutually anti-phase relation. Phase-locked to these, the phase diﬀerence oscillates slowly between in-phase and anti-phase states. Reflecting the diﬀerentiation between intranetwork couplings, mean magnitudes of phase coherences were also diﬀerentiated. 12

[Fig. 4 about here.] 4. Fitness landscape around the optimized network In this section, we explored the fitness landscape around the network with the optimal parameters found in the previous section. We investigate landscapes of two-dimensional spaces spanned by two control parameters, fixing other parameters in the vicinity of optimal values. Figure 5A shows the fitness landscape of w(11) -w(22) parameter space, (ij)

(ij)

where w(ij) = p1 −p2 . Other parameter values are fixed as follows: q = 0.6, r = 0.7, w(12) = 1.0, and w(21) = −1.0. Figure 5B shows means and standard

deviations of phase coherences over time, which indicates that transition of the dynamics occurs at the boundary given by w(11) + w(22) ≈ 1.4. In the upper-right region of the border, chaotic oscillations of order parameters were observed, while no coherent motion was observed at the lower-left region of the border. As indicated in Fig. 5 by an asterisk, the optimum network existed near the transition border, where incoherent states lost stability and oscillatory behavior emerged. Figure 6 shows landscapes of diﬀerent cutting sections, (A) w(11) -r and (B) wasym -r space, respectively, where wasym = (w(11) − w(22) )/2. In Fig. 6A, other parameters were fixed as q = 0.6, w(12) = 1.0, w(21) = −1.0,

and w(22) = −1.0. In Fig. 6B, q = 0.6, w(11) = 0.7 + wasym , w(12) = 1.0, w(21) = −1.0, and w(22) = 0.7 − wasym . The center of Fig. 6B, where

r = 0.5 and wasym = 0.0, corresponds to a symmetric network in which sub-network 1 and sub-network 2 are essentially identical. These landscapes clearly indicate that a symmetric network is not optimal: optimal networks 13

are heterogeneous. [Fig. 5 about here.] [Fig. 6 about here.] 5. Dependence on structural inhomogeneity How does the emergence of heterogeneous modules depend on the coupling density? In this section, we investigate the dependence of the emergence of heterogeneous modules on sparseness of couplings. 5.1. Mean field approximation First, we perform a mean field approximation for the model. In the limit N → ∞, we can replace the coupling term in Eq. (1) with the mean field term: (i,k)

(i,k)

θt+1 = θt

+ω+

where (i) Zt

=

] [ √ (i,k) α (i) (i,k) Im e− −1θt Zt + βt , 2 2 ∑

(7)

(j)

s(ji) w(ji) Rt ,

j=1

for s(11) = s(22) = 2(1 − q), s(21) = 4q(1 − r), and s(12) = 4qr. The derivation of this equation is described in Appendix A. 5.2. Heterogeneity and sparseness of the network In Section 3, four kinds of diﬀerentiation were observed in the evolved networks, three of which ( ii),iii), and iv) ) were related to diﬀerentiation

14

between two sub-networks. To quantify these types of heterogeneity, we define three heterogeneity indices: hintra = s(22) w(22) − s(11) w(11) hinter1 = s(12) w(12) − s(21) w(21) hinter2 = |s(12) w(12) | − |s(21) w(21) | ,

where hintra measures the degree of diﬀerentiation of intranetwork couplings, and hinter1 and hinter2 measure diﬀerentiation of the internetwork couplings. In particular, hinter2 quantifies only the “eﬀective” density of internetwork couplings. Here, we call |s(12) w(12) | “eﬀective” density because in-phase and anti-phase couplings act oppositely in the macroscopic dynamics, and hence their diﬀerence w(ij) is of importance. We changed the density of the whole network p and repeated the numerical experiments of evolution for each value of p. In addition, we also performed evolution by using a mean field model, Eq. (7). Qualitatively similar networks and dynamics to those shown in Figs. 3 and 4 were obtained for all p values and also for the mean field model. However, heterogeneity of the evolved networks was dependent on the sparseness parameter p (Fig. 7). Although hinter1 remained high in all cases, hintra and hinter2 decreased as coupling density was increased. Parameter values and values of these indices in the mean field model were similar to those obtained for networks with dense (p ≥ 0.5) coupling. This suggests that structural inhomogeneity owing to sparse coupling in finite-size networks is crucial for the development of differentiated modules. Theoretical analysis of the finite-size eﬀect (Pikovsky & Ruﬀo, 1999) remains a subject for future study. 15

[Fig. 7 about here.] 6. Discussion Development of two interacting sub-networks by an evolutionary algorithm under the constraint of maximizing bidirectional information transmission was studied. Emergence of diﬀerentiated modules was demonstrated. By exploring the parameter space of the network around the optimum set of parameter values, it was found that optimal networks exist near transition points, at which the incoherent state loses its stability and an extremely slow oscillatory behavior emerges. Our results support the hypothesis that maximization of bidirectional information transmission can serve as a principle for the development of heterogeneous structures in complex systems, such as the neocortex of mammalian brains. Information-theoretic approaches for the development of neural networks have been studied by many authors (e.g., Linsker, 1988, 1989, 1997; Bell & Sejnowski, 1995; Tanaka et al., 2009), although the diﬀerentiation of functional modules has been rarely studied. In comparison with these previous studies, our present approach focuses on the dynamics of systems, adopting transfer entropy to capture the dependence of dynamics of one system on the state of another. Several hypotheses have been proposed for developing modularity in biological organisms. These have been tested by computational experiments on evolutionary mechanisms, such as specialization (Espinosa-Soto & Wagner, 2010), network wiring cost reduction (Clune et al., 2013), and modularly varying goals (Kashtan & Alon, 2005). Although our results and those of 16

previous studies are not mutually exclusive, our model diﬀers from previous studies by introducing an information-theoretic measure as a principle of selection. Furthermore, diﬀerentiation in our model is not externally driven, as it is for a modularly varying environment (Kashtan & Alon, 2005). Instead, diﬀerentiation is internally driven by mutual dependence of subsystem dynamics. We found slow oscillatory dynamics as a result of an evolutionary process that maximized bidirectional information transmission. Recently, slow oscillatory modulations of neural activities have been found in various studies on diﬀerent spatial scales (Fox et al., 2005; Lakatos et al., 2005, 2008; Nir et al., 2008; Molter et al., 2012). Such slow modulations may be useful for bidirectional information transmission, allowing the direction of information flow to switch. It would be interesting to investigate whether there is a common mechanism between the present computational phenomena and their findings. It would also be interesting to apply the present approach to the evolution of artificial machines, such as robots, by providing an underlying mechanism for spontaneous diﬀerentiation of functional modules. Acknowledgments This work was partially supported by a Grant-in-Aid for Scientific Research on Innovative Areas “The study on the neural dynamics for understanding communication in terms of complex hetero systems (No. 4103)“ (21120002) of the Ministry of Education, Culture, Sports, Science and Technology, Japan, by the Human Frontier Science Program (RGP0039/2010), 17

and by JSPS KAKENHI Grant (No. 26540123). The authors would like to thank the anonymous reviewers for giving valuable comments that improved this paper. Appendix A. Mean field approximation of the dynamics of the networks To understand the evolutionary dynamics of the network in the limit of N → ∞, we consider a mean field equation. The coupling term for the oscillator (i, k) in Eq. (1) can be written as ( ) √ ∑ √ (j,l) ji (i,k) α −1 θt −ψkl . Im e− −1θt e 2N p (i,k) (j,l)∈G

The set G(i,k) can be partitioned into four subsets according to sub-network { } (i,k) ji and coupling type: Gjm = (j, l) | (j, l) ∈ G(i,k) , ψkl = (m − 1)π , where j = 1 or 2, and m = 1 or 2. Using these notations, the coupling term can be written as

(j,l)

where zt

=e

α (i,k) Im z¯t 2N p

√ (j,l) −1θt

2 ∑

2 ∑

(−1)m−1

j=1 m=1

∑

(j,l)

zt

(i,k)

(j,l)∈Gjm

, and ¯· is the complex conjugate.

,

Now, we consider N → ∞ limit. From the law of large numbers, the relative sizes of the subsets converge to (i,k)

|Gjm |/N → ps(ji) p(ji) m , where s(11) = s(22) = 2(1 − q), s(21) = 4q(1 − r), and s(12) = 4qr. Here, we consider an approximation by assuming that the probability distribution 18

(i,k)

of the phases of oscillators in the set Gjm is the same as the distribution ∑ (j,l) of all oscillators in the sub-network j. Thus, we replace (j,l)∈G(i,k) zt by jm

(i,k) (j) |Gjm |Rt .

This approximation is justified by the random and independent

nature of couplings. The coupling term can then be written as [ ] 2 ∑ 2 ∑ α (i,k) (j) Im z¯t (−1)m−1 s(ji) p(ji) m Rt 2 j=1 m=1 [ ] 2 ∑ α (i,k) (j) = Im z¯t s(ji) w(ji) Rt 2 j=1 [ ] α (i,k) (i) = Im z¯t Zt , 2 (ji)

where w(ji) = p1

(ji)

− p2

for j, i = 1 or 2 and (i) Zt

=

2 ∑

(j)

s(ji) w(ji) Rt .

j=1

References Barlev, G., Girvan, M., & Ott, E. (2010). Map model for synchronization of systems of many coupled oscillators. Chaos, 20, 023109. Barnett, L. (2009). Granger causality and transfer entropy are equivalent for Gaussian variables. Physical Review Letters, 103, 238701. Brovelli, A., Ding, M., Ledberg, A., Chen, Y., Nakamura, R., & Bressler, S. L. (2004). Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by Granger causality. Proceedings of the National Academy of Sciences of the United States of America, 101, 9849– 54.

19

Bell, A. J. & Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural computation, 7, 1129– 59. Clune, J., Mouret, J.-B., & Lipson, H. (2013). The evolutionary origins of modularity. Proceedings of the Royal Society B: Biological Sciences, 280(1755), 20122863. Daido, H. (1986). Discrete-Time Population Dynamics of Interacting SelfOscillators. Progress of Theoretical Physics, 75, 1460–1463. Espinosa-Soto, C. & Wagner, A. (2010). Specialization can drive the evolution of modularity. PLoS Computational Biology, 6, e1000719. Felleman, D. & Van Essen, D. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1, 1–47. Fox, M. D., Snyder, A. Z., Vincent, J. L., Corbetta, M., Essen, D. C. V., & Raichle, M. E. (2005). The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proceedings of the National Academy of Sciences of the United States of America, 102, 9673–8. Friston, K., Moran, R., & Seth, A. K. (2013). Analysing connectivity with Granger causality and dynamic causal modelling. Current opinion in neurobiology, 23, 172–8. Granger, C. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 37, 424–438. 20

Haken, H. (1980). Synergetics. Naturwissenschaften, 67, 121–128. Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. Ann Arbor: The University of Michigan Press. Ito, T. & Tsuda, I. (2007). Propagation and storage of input information in the network with feedback (in Japanese). Bussei Kenkyu, 87, 588–589. Kaiser, A. & Schreiber, T. (2002). Information transfer in continuous processes. Physica D, 166, 43–62. Kaneko, K. & Tsuda, I. (2001). Complex systems: chaos and beyond: a constructive approach with applications in life sciences. New York: Springer Verlag. Kashtan, N. & Alon, U. (2005). Spontaneous evolution of modularity and network motifs. Proceedings of the National Academy of Sciences of the United States of America, 102, 13773–8. Kuramoto, Y. (1984). Chemical oscillations, waves, and turbulence. Berlin and New York: Springer-Verlag. Lakatos, P., Karmos, G., Mehta, A. D., Ulbert, I., & Schroeder, C. E. (2008). Entrainment of neuronal oscillations as a mechanism of attentional selection. Science, 320, 110–113. Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., Karmos, G., & Schroeder, C. E. (2005). An oscillatory hierarchy controlling neuronal excitability and

21

stimulus processing in the auditory cortex. Journal of Neurophysiology, 94, 1904–11. Landau, L. & Ginzburg, V. (1950). On the theory of superconductivity. Zhurnal Eksperimental’noi i Teoreticheskoi Fiziki, 20(1064), 546–568. Linsker, R. (1988). Self-organization in a perceptual network. Computer, 21, 105–117. Linsker, R. (1989). How to generate ordered maps by maximizing the mutual information between input and output signals. Neural Computation, 1, 402–411. Linsker, R. (1997). A Local Learning Rule That Enables Information Maximization for Arbitrary Input Distributions. Neural Computation, 9, 1661– 1665. Matsumoto, K. & Tsuda, I. (1988). Calculation of information flow rate from mutual information. Journal of Physics A: Mathematical and General, 21, 1405–1414. Molter, C., O’Neill, J., Yamaguchi, Y., Hirase, H., & Leinekugel, X. (2012). Rhythmic modulation of theta oscillations supports encoding of spatial and behavioral information in the rat hippocampus. Neuron, 75, 889–903. Mountcastle, V. (1997). The columnar organization of the neocortex. Brain, 120, 701–722. Nicolis, G. & Prigogine, I. (1977). Self-organization in nonequilibrium systems. New York: Wiley. 22

Nir, Y., Mukamel, R., Dinstein, I., Privman, E., Harel, M., Fisch, L., Gelbard-Sagiv, H., Kipervasser, S., Andelman, F., Neufeld, M. Y., Kramer, U., Arieli, A., Fried, I., & Malach, R. (2008). Interhemispheric correlations of slow spontaneous neuronal fluctuations revealed in human sensory cortex. Nature Neuroscience, 11, 1100–1108. Paluˇs, M. & Vejmelka, M. (2007). Directionality of coupling from bivariate time series: How to avoid false causalities and missed connections. Physical Review E, 75, 1–14. Pikovsky, A. & Ruﬀo, S. (1999). Finite-size eﬀects in a population of interacting oscillators. Physical Review E, 59, 1633–1636. Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85, 461–464. Szent´agothai, J. (1983). The modular architectonic principle of neural centers. Reviews of Physiology, Biochemistry and Pharmacology, 98, 11–61. Tanaka, T., Kaneko, T., & Aoyagi, T. (2009). Recurrent infomax generates cell assemblies, neuronal avalanches, and simple cell-like selectivity. Neural Computation, 21, 1038–67. Tsuda, I. (1984). A hermeneutic process of the brain. Progress of Theoretical Physics Supplement, 79, 241–259. Tsuda, I. (2001). Toward an interpretation of dynamic neural activity in terms of chaotic dynamical systems. Behavioral and Brain Sciences, 24, 793–810. 23

Vicente, R., Wibral, M., Lindner, M., & Pipa, G. (2011). Transfer entropy–a model-free measure of eﬀective connectivity for the neurosciences. Journal of computational neuroscience, 30, 45–67. Wagner, G. P., Pavlicev, M., & Cheverud, J. M. (2007). The road to modularity. Nature Reviews Genetics, 8, 921–31. Watanabe, H., Ito, T., & Tsuda, I. (2014). in preparation. Yamaguti, Y., Tsuda, I., & Takahashi, Y. (2014). Information flow in heterogeneously interacting systems. Cognitive Neurodynamics, 8, 17–26.

24

List of Figures 1 2 3

4

5

6

7

Schematic representation of the model network and the parameters that regulate coupling probabilities. . . . . . . . . A typical evolution of fitness (A) and parameters (B). . . . . Summary of network structure after the evolution. (A) Converged values of parameters of evolved networks. Means and standard deviations of parameters over 8 diﬀerent trials are plotted. (B) Schematic representation of an evolved network. Dynamics of the order parameters in an evolved network shows chaotic oscillation with slow modulations. (A) Trajectories of the phase diﬀerence Φ (upper) and intranetwork coherence |R(j) | (lower). (b) Trajectories in macroscopic phase space spanned by three order parameters. . . . . . . . . . . . . . Fitness landscape on a two-dimensional parameter space. (A) Fitness as a function of two parameters, w(11) and w(22) . (B) Means (left) and standard deviations (right) of the phase coherences |R(j) |. For each parameter value on the space, the mean value of 12 samples from diﬀerent initial conditions networks are calculated. Asterisks indicate parameter values of the optimal network obtained by the evolutionary process. . Fitness landscape on (A) w(11) -r and (B) wasym -r parameter space. Other parameter values are described in the text. The asterisks indicates parameter values of the optimal network. Dependencies of (A) heterogeneity indices and (B) coupling parameters on coupling density p. M.f. indicates the mean field model. . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

. 26 . 27

. 28

. 29

. 30

. 31

. 32

Sub-network 2 Intra-network couplings :2p(1-q)

Inter-network couplings from 1 to 2 :4pqr

Inter-network couplings from 2 to 1 :4pq(1-r)

Sub-network 1

Fig. 1: Schematic representation of the model network and the parameters that regulate coupling probabilities.

26

A

means over copies of selected parents means over all networks

fitness

10

5

0 0

B

200

400 600 generation

800

1000

1

q r

0.5 0 0 1

200

400

600

800

1000 p(11) 1

0.5 0 0

p(11) 2 200

400

600

800

1000

1

(22)

p1

0.5 0 0

(22)

p2 200

400

600

800

1000

1

p(12) 1

0.5 0 0

p(12) 2 200

400

600

800

1000

1

(21)

p1

0.5 0 0

(21)

p2 200

400

600

800

1000

Fig. 2: A typical evolution of fitness (A) and parameters (B).

27

A 1 p(11) m (22) pm p(12) m 0.75 p(21) m

1

0.8

0.6 0.5 0.4 0.25 0.2

0

0 1

2

q

r

m B

Fig. 3: Summary of network structure after the evolution. (A) Converged values of parameters of evolved networks. Means and standard deviations of parameters over 8 diﬀerent trials are plotted. (B) Schematic representation of an evolved network.

28

A

B

π Φ 0

π

−π 1

|R(1)|| |R(2)|

0.5

Φ

π/2 0 −π/2

0.8 0.6

−π 1

0.4

0.8

0 1

0.6

0.2

0.4

1.1

time (× 104)

1.2

1.3

0.2

|R(2)|

0

0

(1)

|R |

Fig. 4: Dynamics of the order parameters in an evolved network shows chaotic oscillation with slow modulations. (A) Trajectories of the phase diﬀerence Φ (upper) and intranetwork coherence |R(j) | (lower). (b) Trajectories in macroscopic phase space spanned by three order parameters.

29

A 1

7 6

0.5 4

0

3

Fitness

w(22)

5

2

−0.5

1

w(11)

1

1

0

0.5

1

0

0.3

1 (22)

w

0.1

0

−1 −1

1

1

1

0

0.5

1

0 (11) w

1

−1 −1

0 (11) w

1

0

(22)

w

w

(22)

(2)

0 (11) w

mean |R |

(22)

w

−1 −1

0.2 0 0

0.2 0 0.1 −1 −1

0 (11) w

1

std |R(1)|

0.5

std |R(2)|

0

(1)

B

−0.5

mean |R |

−1 −1

0

Fig. 5: Fitness landscape on a two-dimensional parameter space. (A) Fitness as a function of two parameters, w(11) and w(22) . (B) Means (left) and standard deviations (right) of the phase coherences |R(j) |. For each parameter value on the space, the mean value of 12 samples from diﬀerent initial conditions networks are calculated. Asterisks indicate parameter values of the optimal network obtained by the evolutionary process. 30

A

1 7 6

0.8

r

4 0.4

3

Fitness

5 0.6

2 0.2 1 0 −1

−0.5

0

(11)

0.5

1

0

w

B

1 7 6

0.8

r

4 0.4

3

Fitness

5 0.6

2 0.2 1 0

−0.2

0

wasym

0.2

0

Fig. 6: Fitness landscape on (A) w(11) -r and (B) wasym -r parameter space. Other parameter values are described in the text. The asterisks indicates parameter values of the optimal network.

31

h.i.intra

A 0.5

0

0.1 0.2

0.4 0.5

1.0

m.f.

1.0

m.f.

1.0

m.f.

h.i.inter1

p 4 3 2

0.1 0.2

0.4 0.5

h.i.inter2

p 4 2 0

0.1 0.2

0.4 0.5 p

B 1 0.5

s

(11)

w

s

(22)

w

(11) (22)

0 −0.5

0.1

0.2

0.4

0.5

1.0

m.f.

1.0

m.f.

p

4 2 s (12) w(12)

0 −2

s (21) w(21) 0.1

0.2

0.4

0.5 p

Fig. 7: Dependencies of (A) heterogeneity indices and (B) coupling parameters on coupling density p. M.f. indicates the mean field model.

32