A multivariate extension of mutual information for growing neural networks.

Neural Networks 95 (2017) 29–43

Contents lists available at ScienceDirect

Neural Networks journal homepage: www.elsevier.com/locate/neunet

A multivariate extension of mutual information for growing neural networks✩ Kenneth R. Ball a,b, *, Christopher Grant a,b , William R. Mundy a , Timothy J. Shafer a a

Integrated Systems Toxicology Division, National Health and Environmental Effects Research Laboratory, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, United States b Oak Ridge Institute for Science Education, Oak Ridge Associated Universities, Oak Ridge, TN, United States

article

info

Article history: Received 28 December 2016 Received in revised form 26 May 2017 Accepted 7 July 2017 Available online 24 July 2017 Keywords: Mutual information Shannon information theory Multivariate mutual information Multiinformation Total correlation Neural network development MEA

a b s t r a c t Recordings of neural network activity in vitro are increasingly being used to assess the development of neural network activity and the effects of drugs, chemicals and disease states on neural network function. The high-content nature of the data derived from such recordings can be used to infer effects of compounds or disease states on a variety of important neural functions, including network synchrony. Historically, synchrony of networks in vitro has been assessed either by determination of correlation coefficients (e.g. Pearson’s correlation), by statistics estimated from cross-correlation histograms between pairs of active electrodes, and/or by pairwise mutual information and related measures. The present study examines the application of Normalized Multiinformation (NMI) as a scalar measure of shared information content in a multivariate network that is robust with respect to changes in network size. Theoretical simulations are designed to investigate NMI as a measure of complexity and synchrony in a developing network relative to several alternative approaches. The NMI approach is applied to these simulations and also to data collected during exposure of in vitro neural networks to neuroactive compounds during the first 12 days in vitro, and compared to other common measures, including correlation coefficients and mean firing rates of neurons. NMI is shown to be more sensitive to developmental effects than first order synchronous and nonsynchronous measures of network complexity. Finally, NMI is a scalar measure of global (rather than pairwise) mutual information in a multivariate network, and hence relies on less assumptions for cross-network comparisons than historical approaches. © 2017 Elsevier Ltd. All rights reserved.

1. Introduction Recordings of electric potential over time from electrodes embedded in neural tissues enable the observation of neural firing activity from neurons within a given proximity of each electrode, depending on the manufacture (Obien, Deligkaris, Bullmann, Bakkum, & Frey, 2015). While neural spiking is the underlying mechanism that gives rise to broader functional activity like cortical rhythms (Fries, Nikolić, & Singer, 2007; Wang, 2010), there is also interest in the underlying information content of spiking patterns via a variety of hypotheses of neural encoding (Kumar, Rotter, ✩ Preparation of this document has been funded in part by the U.S. Environmental Protection Agency. This document has been reviewed by the National Health and Environmental Effects Research Laboratory and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. Correspondence to: Integrated Systems Toxicology Division, MD-B105-03, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711 E-mail addresses: [email protected], [email protected] (K.R. Ball).

*

http://dx.doi.org/10.1016/j.neunet.2017.07.009 0893-6080/© 2017 Elsevier Ltd. All rights reserved.

& Aertsen, 2010; Reyes, 2003; Rullen & Thorpe, 2001). Microelectrode arrays (MEAs) allow examination of neural spiking, bursting and coordinated neural activity in cultures of neural tissues in vitro (Nam & Wheeler, 2011; Pine, 2006). Examination of neural network function using MEAs has been proposed as an approach to evaluate the impacts of drugs, chemicals, and disease states (Johnstone et al., 2010) on network development and function, and the recent availability of multi-well MEA (mwMEA) formats have facilitated such studies (Brown et al., 2016; Crossley et al., 2014; Valdivia et al., 2014; Wainger et al., 2014; Woodard et al., 2014). The focus of this work involves the inference of network development effects resulting from chemical exposures, although the methods for measuring network properties described here may have broader applications. In particular, a new normalization of Shannon mutual information (extended to multivariate networks) is used to measure effects in growing neural networks. In the MEA experimental paradigm, electric potentials are recorded over time using extracellular electrodes positioned in brain structures (in vivo) or embedded into a surface upon which neural cells are cultured (in vitro). While the amplitude and shape of individual action potentials can be assessed, most often the

30

K.R. Ball et al. / Neural Networks 95 (2017) 29–43

action potential spikes are reduced to time-stamps and analyzed as spike trains on each individual electrode (or ‘‘units’’) if multiple signals are observed on a single electrode. These spike trains can then be compared across time and space (different electrodes) to examine network characteristics. A variety of measures of network information may be extracted from spike train data derived from MEA recordings that are indicative of network development and/or function (Bove, Genta, Verreschi, & Grattarola, 1996; Brown et al., 2016; Chiappalone, Bove, Vato, Tedesco, & Martinoia, 2006; Cotterill et al., 2016). This includes measures that assess the synchrony of activity in the network. Synchrony is a useful concept that is associated with increasing network complexity; however, measures and definitions of synchrony differ widely in the literature. From the tissue-level viewpoint of a large neural network, it is suggested synchronous neural firing and associated potential oscillation is relevant to sensory awareness (Engel & Singer, 2001), attention (Gregoriou, Gotts, Zhou, & Desimone, 2009; Gross et al., 2004; Ward, Doesburg, Kitajo, MacLean, & Roggeveen, 2006), memory (Jensen, Kaiser, & Lachaux, 2007; Palva, Monto, Kulashekhar, & Palva, 2010; Tallon-Baudry, Bertrand, & Fischer, 2001) and other high level cognitive processes (Buzsáki & Draguhn, 2004; Eckhorn, Reitboeck, Arndt, & Dicke, 1990; Engel, Fries, & Singer, 2001; Uhlhaas et al., 2009; Uhlhaas & Singer, 2006). Synchronous activity is observed in both in vivo and in vitro neural network recordings, and arises spontaneously in networks grown in vitro, indicating that it is an intrinsic property of neural networks, and as noted above, is important to neural function. Thus, assessment of effects of neuroactive/neurotoxic compounds or disease states on the development or maintenance of synchrony is important. Synchrony of signals at the tissue-level follows from the temporal coincidences in aggregated firing of many neurons, and is measured in terms of rate and phase by binning signals into temporal and/or spatial windows. Temporal synchrony at the neural spiking level is a lower level phenomenon, and while the classical view of neural coding emphasizes rate synchrony, temporal synchrony of the firing of individual neurons at a millisecond-level resolution may drive both higher level synchronous behavior and associated cognitive processes (Diesmann, Gewaltig, & Aertsen, 1999; Riehle, Grün, Diesmann, & Aertsen, 1997). Some form of temporal synchrony will arise as a result of temporally causal relationships between the firing patterns of pairs of neurons, relationships which are integral to the study of synaptic interactions in the nervous system (Bologna et al., 2010; Salinas & Sejnowski, 2001). Inferring the developmental state of synaptic relationships between neurons through temporal synchrony of their observed firing patterns – and quantifying this synchrony – is one avenue for investigating developmental neurotoxicity by making comparisons between populations of cultures under different experimental conditions. A simple measure of synchrony between the binary spike trains of two firing neurons is Pearson’s correlation coefficient, which returns the cross-correlation between spike trains derived from two neurons. The signal from a given neuron/electrode may be time lagged relative to another at various time lag intervals to return more general multi-dimensional cross-correlograms and auto correlations when a single spike train is compared relative to a time lagged version of itself (Knox, 1981; Rieke, Warland, van Steveninck, & Bialek, 1996). There are theoretical deficiencies to using linear correlation as a measure of temporal neural synchrony in pairwise interactions. First and foremost, linear correlation relates little about the information content of a coincident binary signal: two perfectly correlated neurons may fire once or onehundred times in a given time interval and the correlation coefficient will be the same. Second, correlation is not well defined for zero-valued signals, i.e. when no firing is observed by an electrode embedded in a neural culture.

Shannon’s mutual information (MI) is a higher order information theoretic measure of temporal synchrony that has been used as a measure of information encoding in stimulus response experiments (Bologna et al., 2010; Borst & Theunissen, 1999) in the sense that MI depends on higher orders of the joint probability distribution than linear correlation (Li, 1990). Pairwise MI accounts for both linear correlation of two firing neurons and the shared information content of the two spike trains, and hence represents a more robust measure of temporal synchrony as a mechanism of neural information coding. Other pairwise measures of temporal synchrony include transfer entropy (Gourévitch & Eggermont, 2007; Schreiber, 2000) and joint entropy (Garofalo, Nieus, Massobrio, & Martinoia, 2009) which are nonsymmetric measures that allow for inference of directional causality in networks. Cutts and Eglen (2014) review and benchmark a large number of pairwise correlation measures. Measures of pairwise mutual information are considered in their benchmarking but rejected as inconsistent measures of linear correlation because of non-invariance with respect to spiking rates (which is an attractive property of such measures in the present case). All of the described synchrony measures may be useful for investigating the pairwise connections of a multi-node neural network, from which connectivity properties between neurons may be inferred. However, further assumptions and processing on connectivity maps or the multivariate comparison structures (cross-correlation and pairwise MI matrices, etc.) are required if an experimenter desires a scalar measure of overall network complexity in a network of neurons/electrodes with more than two nodes. A scalar measure of neural network complexity that does not rely on aggregating many pairwise temporal synchrony measures would provide a more natural framework to make population comparisons between observed network activity in the form of multi-dimensional (e.g. recorded from multiple electrodes) spike trains recorded in neural cultures. MI has previously been extended to the multivariate case of a multi-node network in at least two different forms, which are described in detail below. The hypothesis tested in this paper is that one of these multivariate extensions to MI – when coupled with a novel normalization term that is motivated by a self-consistency property of the function – is useful as a scalar measure of the information content of a developing neural network via simulations of binary spike train recordings. The proposed normalized mutual information measure is demonstrated (by examining concentration—responses in real experiments) to be more sensitive than aggregated linear correlation and other neural spiking parameters as a discriminant feature for the effects of compounds on network function tested in vitro in Sections 3 and 4. In Section 2, the proposed function is shown to be justified in the context of Shannon information theory as an asymptotically consistent measure of shared information as connection strength increases. Also in Section 2, bounding properties of the function are stated and proved in the case of a network that is growing in size. For the purposes of this study, mutual information measures are applied to multichannel spike recordings without prior knowledge of the generative neural network topology underlying those recordings. The network is assumed to be (possibly) fully connected (without self-connections), and all possible pairwise and higher order interactions are considered. In practice there is some topology to biological neural networks; in order to generate simulations in 3 a series of feed-forward partially connected neural networks are generated that have directed connections. However, the analyses on spike train recordings make no specific assumptions about network topology.


2. Normalized multiinformation/total correlation 2.1. Mutual information and multivariate extensions Mutual information (MI) is the shared (Shannon) information content between two random variables X1 and X2 , and is defined in terms of the entropy h of the random variables (Cover & Thomas, 2012) as: I(X1 ; X2 ) = h(X1 ) + h(X2 ) − h(X1 , X2 ).

(1)

Extensions of pairwise MI to a scalar measure of shared information content in multivariate networks of more than two random variables exist in the literature, and two approaches are reviewed below. Each multivariate extension has multiple names depending on the author and/or field in which it is applied; to avoid confusion an attempt is made in this review to trace the use of each name and definition back to its source. The first extension has been described by Timme, Alford, Flecker, and Beggs (2014) and Watanabe (1960) as total correlation (TC). Given n random variables X1 , X2 , . . . , Xn the function TC is defined in terms of the entropies of n random variables as TC(X1 ; X2 ; . . .; Xn ) = h(X1 ) + h(X2 ) + · · · + h(Xn )

− h(X1 , X2 , . . ., Xn ).

(2)

The functional value in Eq. (2) denoted here by TC is alternatively referred to as multiinformation (Studeny` & Vejnarová, 1998), stochastic interaction (Wennekers & Ay, 2003), and integration (Sporns, Tononi, & Edelman, 2000; Tononi, Sporns, & Edelman, 1994). See Jakulin and Bratko (2003) for further discussion. TC is the Kullback–Leibler divergence from the joint distribution of a set of random variables to the maximum entropy product of the variables. In the present study, it will be used as the base for a proposed normalized multivariate measure. An alternative multivariate extension of mutual information dates back at least to McGill (1954) where the author investigated n-dimensional transmitted information, and Fano (1961). Han (1980) calls this quantity multiple mutual information (MMI). The quantity may be expressed as the alternating sum/difference of permutations of joint entropies of the random variables so that: MMI(X1 ; X2 ; . . .; Xn )

=

n ∑

(−1)k+1

k=1

∑

h(XM1 , XM2 , . . . , XMk ),

(3)

M ⊆{1,...,n} |M |=k

where the interior sum is taken over all possible indices formed from subsets of the integers {1, . . . , n} of cardinality k. The MMI (3) has been more recently called multivariate mutual information (Pham, Ho, Nguyen, Tran, & Nguyen, 2012; Srinivasa, 2005; Timme et al., 2014). To confuse matters even more, the quantity referred to here as TC (2) has itself been called multivariate mutual information (Boes & Meyer, 1999) and simply mutual information (Delorme, Palmer, Onton, Oostenveld, & Makeig, 2012). Despite their close relationship, the functions MMI and TC behave very differently. Both functions reduce to the usual pairwise MI (1) in the case of two random variables, which may be demonstrated by evaluating the functions in terms of entropies. However, in the case of three or more variables, multiple or multivariate mutual information (MMI) may be negative (Matsuda, 2000), whereas total correlation or multiinformation (TC) is guaranteed to be nonnegative (since the joint entropy of multiple variables must be less than or equal to the sum of the entropies of the individual variables). The guaranteed positivity of TC (with the zero value only in the case of perfect independence amongst variables) makes the

31

function an attractive quantity for comparisons between multivariable networks, as it is difficult to justify negative information in a neural network from a biological perspective. An alternative to TC called dual total correlation (DTC) (Han, 1978; Timme et al., 2014) or excess entropy (Abdallah & Plumbley, 2012; Ay, Olbrich, Bertschinger, & Jost, 2006; Olbrich, Bertschinger, Ay, & Jost, 2008) is a non-negative multivariate extension of MI that is closely related to TC. In fact, DTC and TC are opposite ends of a continuum of complexity measures comprising Tononi, Sporns, and Edelman (TSE) complexity (Ay et al., 2006; Tononi et al., 1994). The rationale for limiting the analysis in this paper to TC is discussed below in Section 2.4. In this paper a normalization of TC is considered that depends upon network dimensionality and allows for comparisons of complexity between networks of different sizes. 2.2. Information theoretic motivation for normalization For neural networks, it is well established that as the number of days in vitro (DIV) increases, neural cells grow and form synaptic connections and firing rates and temporal synchronization are observed that correspond to an increasingly functional and mature neural network (Biffi, Regalia, Menegon, Ferrigno, & Pedrocchi, 2013; Charlesworth, Cotterill, Morton, Grant, & Eglen, 2015; Chiappalone et al., 2006; Cotterill et al., 2016; Illes, Fleischer, Siebler, Hartung, & Dihné, 2007; van Pelt, Vajda, Wolters, Corner, & Ramakers, 2005; Wagenaar, Pine, & Potter, 2006). In the present paper it is postulated that multiinformation as a measure of ambient shared information content of an in vitro neural network may be a useful indicator of developmental effects between control and compound treated cultures. However, especially in early stages of development, overall activity level is low and there is variability in the number of electrodes that observe neural spiking and hence varying dimensionality in network signal observations. In order to compare mutual information in networks of different sizes during development, following treatment with different drugs/chemicals, or in different disease states, a measure of shared information content that is robust with respect to changing network size is needed. Rothstein (1952), prefiguring TC, formalizes the idea of organization in a networked system with an ‘‘essential idea’’ that in an organized system ‘‘a number of elements are coupled somehow so that they are not completely independent of one another, as opposed to the unorganized state in which they are mutually independent’’. Rothstein identifies the minimal organized state as that in which networked nodes are entirely independent, and the maximal organized state as that in which interactions are so strong that only one node effectively drives the system. In the following we motivate a normalization of TC by examining the TC as the number of nodes in a networked system grows under the conditions of the maximal organized state. For n random variables, the normalized multiinformation (denoted NMI, also, the normalized TC) is defined as: 1 TC(X1 ; X2 ; . . .; Xn ) n−1 [ ] n (4) ∑ 1 = h(Xi ) − h(X1 , X2 , . . . , Xn ) . n−1

NMI(X; X2 ; . . .; Xn ) =

i=1

This simple normalization (dividing the multiinformation, TC, by n − 1) addresses a divergence in the multiinformation measured by TC from the multivariate mutual information returned by MMI when both are evaluated over multiple copies of the same variable. The ‘‘multiinformation’’ terminology is selected to refer to NMI in this paper to emphasize the measure’s relationship with mutual information (MI).

32


Property 2.1. The MMI of a network of an arbitrary finite number of copies of the same random variable X returns the self information h(X ) of that random variable (just as pairwise mutual information of two copies of X does). However, the multiinformation (TC) returns (n − 1)h(X ) when evaluated on a network consisting of n copies of X :

Consequently, like MMI, the NMI acting on n copies of X is simply the self information or entropy of X :

MMI(X ; X ; . . . ; X ) = h(X )

A network comprised of exact copies of one random variable is not a hypothetical absurdity; it has a neural network interpretation in an asymptotic sense. In a neural network, as synapse strength between two neurons increases the probability of a spike in one neuron triggering a spike in the connected neuron also increases. As this probability approaches one, or as the strength of all connections in a network of neurons increases arbitrarily, an incoming spike triggers a spike in all connected neurons with probability approaching one. Ignoring any time lag between local spike cascades, the network asymptotically approaches the hypothetical case of a set of exact copies of one random variable, and it should then be expected that a consistent multivariate measure of MI should asymptotically approach the self information of the one generative random variable. NMI as a measure of a property of multivariate networks is also distinct from normalized pairwise mutual information measures (Lange & Grubmüller, 2006; Strehl, Ghosh, & Mooney, 2000). The normalization in these referenced measures is meant to extend the interpretation of Pearson’s linear correlation as a coefficient of determination to the higher order setting of mutual information. On the other hand, ‘‘normalization’’ in NMI refers to rescaling to account for inflation that would otherwise occur with increasing network size (dimension).







n

TC(X ; X ; . . . ; X )





= (n − 1)h(X ).



n

Proof. Observe that the mutual information of a random variable X with itself is simply the self information or entropy of the variable X: I(X ; X ) = h(X ) + h(X ) − h(X , X ) = h(X ), since the joint entropy of X with itself is just the entropy of X . In fact, the joint entropy of any finite number of copies of X is simply the entropy of X alone: h(X , X , . . . , X ) = h(X ). It can be shown that the multivariate definition of mutual information denoted by MMI shares this internal consistency, so that MMI amongst multiple copies of X likewise contains the same quantity of information: the self information of X . Consider the 3-dimensional case: MMI(X ; X ; X ) = 3h(X ) − 3h(X , X ) + h(X , X , X ) = h(X ) and the 4-dimensional case: MMI(X ; X ; X ; X ) = 4h(X ) − 6h(X , X ) + 4h(X , X , X )

− h(X , X , X , X ) = h(X ). This property may be confirmed by observing that the alternating sum/difference of the 1st through nth values in the nth row (i.e., ignoring the 0th row and the 0th value of each row) of Pascal’s triangle is always equal to one: n ∑

(−1)

k+1

( ) n k

= 1.

k=1

Therefore the alternating signs of the coefficients of the entropy series in the definition of MMI for n copies of X sum to one: MMI(X ; X ; . . . ; X ) =







n

n ∑

(−1)

k+1

( )

k=1

n h(X , X , . . . , X ) = h(X ). k    k

The property is verifiable by induction from Pascal’s triangle. On the other hand, the total correlation or multiinformation (TC), as it has been formulated in the citations, does not return the self-information of X in the comparison of multiple copies of X . Instead, observe that for n copies of X: TC(X ; X ; . . . ; X ) =



 n



n ∑ i=1

h(X ) − h(X , X , . . . , X )



 n



= nh(X ) − h(X ) = (n − 1)h(X ). □ Since adding a variable in the trivial case increases the apparent shared information as measured by TC even when it is known that no increase in actual shared information has occurred, it is expected that comparisons of TC between networks of different dimensionality will be subject to artificial discrepancies. Therefore, the normalization constant 1/(n − 1) is introduced in the definition of NMI (4) to induce consistency in the trivial case where multiinformation is measured across multiple copies of the same variable.

NMI(X ; X ; . . . ; X ) =







n

1 n−1

TC(X ; X ; . . . ; X ) = h(X ).



 n



Remark 2.1 (Copies of Random Variables). Here, copies of a network X refer to entirely deterministic copies of the variable’s values rather than just a copy of the topology and structure of X . In the former (that which is considered here) a recording of a network X and its copy results in exactly the same observed values, so there is complete dependency between the two networks. In the latter case the two networks might be run stochastically and independently. The former case is useful as an illustrative example of a situation where the network appears to grow in size but no new information is present. Suppose two identical MEA’s are embedded in a neural culture and aligned closely so that pairs of corresponding electrodes in each array detect approximately the same neural spiking signals. Does the new combined signal actually represent new shared information in the underlying network being represented? NMI is designed to answer ‘‘no’’, whereas TC simply grows with the apparent network size. Neither answer is wrong, the matter depends on the perspective of the experimenter. NMI is thought to be useful in the specific application presented here, where recordings of spiking neurons are used by proxy to measure developmental status of a neural culture over multiple days. Remark 2.2 (NMI in a Growing Network). Property 2.1 considers exact copies of a recording of spikes from network X ; essentially recordings imagined to come from separate networks which are identical and completely dependent upon each other. In this way, the network grows in the absence of additional information content (the joint entropy between all variables remains constant). What happens to TC and NMI as the number of copies m of X grows indefinitely? Suppose the network size grows by just taking m copies (completely deterministic on the original network X ) of the size n network, and m → ∞. Then the TC grows with m as: TC(X ; X ; . . . ; X ) = m



 m



n ∑ i=1

=m

n ∑ i=1

h(Xi ) − h(X ; X ; . . . ; X )

 h(Xi ) − h(X ),

 m




and therefore TC approaches infinity as m → ∞. On the other hand, NMI is: NMI(X ; X ; . . . ; X ) =



 m



1

(

mn − 1

m

n ∑

) h(Xi ) − h(X )

i=1

1

(m(n − 1) NMI(X ) + (m − 1)h(X )) m−1 = NMI(X ) + h(X ). mn − 1 mn − 1 Therefore, as m → ∞, the NMI of the repeated network ap1 NMI(X ) + 1n h(X ) (which also equals the average of proaches n− n =

mn − 1 m(n − 1)

the entropies of individual variables). Now consider a more general case, where the size of the network grows but the underlying information being represented remains constant. For instance, imagine the number of electrodes in an MEA is increased without underlying changes to a culture of neurons. The spike recordings are representations of the underlying biological neural network, and increasing the number of (uniformly distributed) electrodes essentially increases the resolution of the representation. If there are a finite number of observable states in the biological network, eventually the joint entropy of all MEA channels (network nodes) will match the true entropy of the neurons, whereas ∑ the indefinitely increasing number of electrodes will cause the i h(XI ) term in TC and NMI to continue to increase, washing out the joint entropy term. As in the last case, TC goes to infinity with indefinitely increasing resolution, whereas NMI will stay bounded and approach the mean of the entropies (selfinformation) of individual variables. 2.3. Bounding properties of NMI on growing networks Consider the case of a ‘‘growing’’ network where two separate multivariate networks with known individual NMI values are suddenly considered in tandem as a single network. The NMI of two different multivariate networks X and Y that are evaluated together (IXY ) is bounded between a weighted sum of the NMI’s of the individual networks (IX and IY ) and the same sum plus the self information of the network containing less information scaled by the total dimensionality. The lower bound of the combined NMI for networks X and Y of dimension m and n respectively is denoted by LB: LB =

m−1 m+n−1

IX +

n−1 m+n−1

IY .

Then the full constraint on the combined NMI IXY may be written as 1 LB ≤ IXY ≤ LB + min{h(X ), h(Y )} (5) m+n−1 where h(X ) and h(Y ) are here considered to be the entropy (self information) of vector valued random variables, equivalent to the joint entropies of X and Y ’s components. Bounds on the NMI of combined multivariate networks are evidence that the NMI may be thought of as a ‘‘dimensionless’’ quantity associated with shared information content. Consider an in vitro MEA experiment, where on a given day 10 electrodes record neural spiking and on the next day an 11th electrode becomes active, and it is assumed the recordings are long enough so that estimators of the entropies are consistent. If the internal structure of the 10 electrode network is unchanged, then the NMI of the latter recording will be no more than nine-tenths of the NMI measured on the earlier day plus one-tenth of the lesser selfinformation of either the original network or the new electrode’s recording. Likewise, the NMI of the latter recording will be no less than nine-tenths of the previous day’s NMI (in the case of independence of the new electrode). The bounding properties are formally stated and proved below.

33

Property 2.2 (Lower Bound). Given two (non-empty) sets of random variables X1 , . . . , Xm and Y1 , . . . , Yn , with NMI values: NMI(X1 ; . . .; Xm ) = IX NMI(Y1 ; . . .; Yn ) = IY then the NMI between all the random variables satisfies the relation: NMI(X1 ; . . .; Xm ; Y1 ; . . .; Yn ) ≥

m−1 m+n−1

IX +

n−1 m+n−1

IY .

Proof. The NMI is expressed in terms of entropies as: NMI(X1 ; . . .; Xm ; Y1 ; . . .; Yn ) =

( ×

m ∑

h(Xi ) +

i=1

n ∑

1 m+n−1

) h(Yi ) − h(X1 , . . . , Xm , Y1 , . . . , Yn ) .

i=1

Because the joint entropy of two random variables is always less than or equal to the sum of the self entropies of the same variables, we may express the NMI in terms of the inequality: NMI(X1 ; . . .; Xm ; Y1 ; . . .; Yn ) ≥

×

( m ∑

h(Xi ) +

i=1

=

1

m+n−1

n ∑

1 m+n−1

)

h(Yi ) − h(X1 , . . . , Xm ) − h(Y1 , . . . , Yn )

i=1

((m − 1)IX + (n − 1)IY ).

□

Property 2.3 (Upper Bound). Again, given two (non-empty) sets of random variables X1 , . . . , Xm and Y1 , . . . , Yn , with NMI values: NMI(X1 ; . . .; Xm ) = IX NMI(Y1 ; . . .; Yn ) = IY then the NMI between all the random variables satisfies the relation: NMI(X1 ; . . . ; Xm ; Y1 ; . . . ; Yn ) ⎧ 1 ⎪ ((m − 1)IX + (n − 1)IY + h(X1 , . . . , Xm )) ⎪ ⎪ ⎪ m + n−1 ⎨ h(Y1 , . . . , Yn ) ≥ h(X1 , . . . , Xm ) ≤ 1 ⎪ ⎪ ((m − 1)IX + (n − 1)IY + h(Y1 , . . . , Yn )) ⎪ ⎪ ⎩m + n − 1 h(X1 , . . . , Xm ) ≥ h(Y1 , . . . , Yn ). Proof. As before, the NMI is expressed in terms of entropies as: NMI(X1 ; . . .; Xm ; Y1 ; . . .; Yn ) =

( ×

m ∑

h(Xi ) +

i=1

n ∑

1 m+n−1

) h(Yi ) − h(X1 , . . . , Xm , Y1 , . . . , Yn ) .

i=1

Joint entropy of a set of random variables is always greater than or equal to all of the self entropies of the random variables comprising the set. Thus, in the case where the entropy of the Y network (viewed as a single n-dimensional random variable) is greater than the entropy of the X network so that h(Y1 , . . . , Yn ) ≥ h(X1 , . . . , Xm ), we may write h(X1 , . . . , Xm , Y1 , . . . , Yn ) ≥ h(Y1 . . . , Yn ) and express NMI in terms of the inequality NMI(X1 ; . . .; Xm ; Y1 ; . . .; Yn ) ≤

( ×

m ∑ i=1

h(Xi ) +

n ∑ i=1

1 m+n−1

) h(Yi ) − h(Y1 , . . . , Yn ) .

34


Then, observing that left hand side of the above equation may be rewritten as m ∑

h(Xi ) +

i=1

(

=

n ∑

h(Yi ) − h(Y1 , . . . , Yn )

i=1 m ∑

+

NMI(X1 ; . . .; Xn ; Y ) =

)

h(Xi ) − h(X1 , . . . , Xm )

=

i=1

(

does not add to the number of observed states. Then, the NMI above may be written in terms of entropies as

n ∑

) h(Yi ) − h(Y1 , . . . , Yn )

+ h(X1 , . . . , Xm )

i=1

we see that the inequality holds: NMI(X1 ; . . .; Xm ; Y1 ; . . .; Yn ) 1 ≤ ((m − 1)IX + (n − 1)IY + h(X1 , . . ., Xm )) . m+n−1 The same procedure is followed to prove the second part of the property for the case where h(X1 , . . . , Xm ) ≥ h(Y1 , . . . , Yn ). □ The lower bounding property states that the combined NMI of two multi-node networks of size m and n is always greater than or equal to a weighted sum of the networks’ individual NMI’s. Equality holds in the case where the two networks are perfectly independent. Informally, this property guarantees that for two sufficiently large networks the NMI of the combined network is at least a little less than the weighted (according to network dimensionality) average of the NMI’s of the individual networks. The upper bounding property actually contains the lower bound in each case, and may be interpreted as saying that the NMI of two networks is always less than or equal to the lower bound of NMI plus the self information of the network with less information divided by one less than the dimensionality of the total network. Together, the bounding properties state that the NMI of the combination of two networks is always between a weighted sum of the NMIs of the individual networks and that sum plus the lesser self information of the two networks divided by one less than the total dimensionality, as illustrated in Eq. (5). In the context of MEA recordings of spikes in a neural network, an experimenter may wish to estimate the growth of the ambient shared network information as the biological neurons develop. At the beginning of an experiment, perhaps at 2–5 days in vitro (DIV), only a few of the available electrodes will detect significant neural spiking. However, as the neural network matures with DIV, more electrodes will detect significant spiking. Especially if one is estimating a developmental curve over time or making comparisons between wells with different numbers of actively spiking electrodes, robustness of NMI with respect to varying network size, as evidenced by the above properties, is essential. Finally, it is observed that the addition of a trivially zero or constant variable to a network slightly reduces the overall NMI measured in the total network according to the network size. The property is formally stated and proved below. Property 2.4. Given a nonempty set of random variables X1 , . . . , Xn and a trivial variable Y with zero entropy h(Y ) = 0, the normalized mutual information of a network formed from the given set with the trivial variable is NMI(X1 ; . . .; Xn ; Y ) =

(

n−1 n

)

NMI(X1 ; . . .; Xn ).

Proof. Recall that the self information or entropy of a zero (constant) variable is zero, and the joint entropy of any number of random variables with a constant is just the joint entropy of the non-constant random variables, because the constant variable

=

1

(

n ∑

n

( i=n1 1 ∑ n 1 n

) h(Xi ) + h(Y ) − h(X1 , . . . , Xn , Y )

) h(Xi ) + 0 − h(X1 , . . . , Xn )

i=1

((n − 1) NMI(X1 ; . . .; Xn ))

which proves the property. □ The property follows directly from the expression of the NMI in terms of the component and joint entropies. More importantly, it demonstrates that an all-zero node added to the network decreases the evaluated NMI, which is especially of concern for smaller networks. Thus, it may be desirable to detect which channels in an MEA experiment have trivially small or zero self information so that they may be excluded from the analysis. In our implementation of NMI we removed channels with component entropies (self information) below a user specified threshold. Interestingly, TC remains constant under the scenario tested in Property 2.4, leading to a question of whether Property 2.4 for NMI is a feature or a bug. While the self information content of a trivially constant variable is zero, if this variable is considered as an actual node in a connected network then it is certainly also the case that the (constant) node’s information content is different (albeit trivially) from the rest of the network. Thus, it may not be unreasonable to expect a small decrease in NMI in the circumstance where we insist that a trivially constant node is really connected to the whole network being considered. Analogously, if the size of a cup is increased but the volume of liquid in the cup is fixed, then the cup will look less full. NMI, as a function normalized to the dimension of the network, behaves similarly. 2.4. Connections to TSE complexity Dual total correlation/excess entropy, another positive measure of complexity, and TC are opposite ends of a continuum of complexity measures comprising TSE complexity (Ay et al., 2006; Tononi et al., 1994). However, Olbrich et al. (2008) demonstrated that the dual total correlation of the union of two copies of a single multivariable network ensemble is the self-information of the single network, calling into question the usefulness of dual total correlation/excess entropy as a measure of system complexity. Consequently, dual total correlation was not included in the analyses presented in this paper. The property demonstrated by Olbrich et al. (2008) is distinct from the ‘‘multiple copies of one variable’’ scenario used to motivate NMI as a normalization of TC. In fact, dual total correlation and TSE complexity, like TC, return the self-information of the single variable multiplied by a scalar depending on the network size in the multiple copy case. Dual total correlation decreases as network size increases in the repeated variable scenario, and TSE complexity increases as the network size increases (demonstrated below). The renormalization of TSE complexity suggested by Olbrich et al. (2008) in order to induce a linearity property for TSE complexity of unions of network ensembles observed in dual total correlation has the effect of decreasing the complexity measure with increasing network size in the multiple copies of one variable scenario. TSE complexity (CTSE ) of a network of n variables is defined as a linear combination of n − 1 terms: CTSE (X ) =

n−1 ∑ k k=1

n

C (k) (X )


where the terms C (k) (X ) are defined in terms of TC as ∑ n C (k) (X ) = TC(X ) − ( ) TC(XM1 , XM2 , . . ., XMk ). n M ⊆{1,...,n} k |M |=k k Multiinformation (TC) is equivalent to C (1) and dual total correlation/excess entropy is C (n−1) . In the case described above, where a network of n copies of the same variable X is analyzed, since TC returns (n − 1)h(X ) we see that dual total correlation is C (n−1) (X ; X ; . . . ; X ) = (n − 1)h(X )







n

(

n

)

n

) (n − 2)h(X ) n−1 n n−1 n = (n − 1)h(X ) − (n − 2)h(X ) n−1 ( ) 1 = h(X ) n−1 −

(

(n − 1)

which decreases as the network size (number of copies of X ) grows, approaching zero. The general case for C (k) is



 n

( )

n

C (k) (X ; X ; . . . ; X ) = (n − 1)h(X ) −

( )

n (k) k



n (k − 1)h(X ) k

( ) n = (n − 1) − (k − 1) h(X )

k and so the general TSE complexity in this case may be written as CTSE (X ; X ; . . . ; X ) =



 n



n−1 ∑ k( k=1

=

n

(n − 1) −

n−1 ( ∑ k(n − 1)

n

k=1

=

) n−1 ( ∑ n−k n

k=1

( = =

n−1

+ ( n ) n−1 2

n k

)

(k − 1) h(X )

) − (k − 1) h(X )

h(X )

n−2 n

+ ···

2 n

+

1 n

) h(X )

h(X )

which grows as the number (n) of copies of X increases. The normalization of TC defined above as NMI suggests a similar normalization scheme of the continuum of TSE complexity measures could be applied to induce a similar self-consistency property in the broader TSE complexity space. Rather than a direct normalization that could obviously be applied to the whole TSE complexity measure, a redefinition of the TSE complexity in terms of the linear combination of its N − 1 components (for a network of size N) may induce the property in question while also returning NMI instead of TC at one end of the TSE spectrum. However, such a redefinition of TSE complexity is not immediately obvious, and a formal information geometric analysis of a new class of complexity measures is outside the scope of the present paper. 3. Experimental methodology 3.1. Numerical experiments Two sets of numerical experiments were designed to investigate how the proposed NMI behaves relative to averaged pairwise mutual information (PMI) interactions. In the first set of experiments, the mutual information measures were applied to

35

a simple feed-forward neural network driven by Poisson arrival processes with a full complement of randomized connectivities, and different connection strengths were tested. In the second experiment a varying percentage of connections were set to be zero in the same feed-forward network with the non-zero randomized connectivities drawn from a static distribution. The first simulation is analogous to increasing synaptic strength (i.e. increased connectivity between existing connections) in a neural network, and the second simulation is analogous to increasing synaptogenesis (i.e. increased numbers of connections) in a network. The network simulation is modeled by a 3 layer feed-forward neural network, with each layer consisting of 6 nodes (analogous to electrodes in the MEA paradigm). The first layer is driven by 6 Poisson arrival processes, denoted by x0 , with binary spikes arriving at a rate of 2 per second at a sampling rated of 1000/3 Hz (so approximately 0.6% of each driving process time interval corresponds to a spike). The (i, j)th element of a 6 × 6 connectivity matrix A01 is the probability that the ith incoming spike triggers the jth node during a given time bin. If a node is triggered, a spike from the node propagates to the second layer with probabilities similarly defined by a connectivity matrix A12 , and from the second to the third layer by A23 . A node triggered by multiple incoming spikes is considered to be triggered only once during a given time bin. Elements of the connectivity matrices are drawn from a beta distribution B(α, β ) α with mean β+α . In the first set of simulations, values of α are tested ranging from 0.05 to 0.55 in increments of 0.05, and β is given in terms of α as β = 1 − α so that the mean of the beta distribution is α . As α increases the average of the connectivity strengths increases, increasing the probability that a given spike is propagated through the network. In this simulation all neurons in adjacent layers are theoretically connected (i.e. have a nonzero chance of triggering a spike in a subsequent layer) but the average synaptic strength is determined by the model parameter α ; at early stages (low α ) the beta distribution is heavily skewed towards zero so most connectivity strengths are effectively zero. In the second set of simulations, the connectivity strength beta distribution is fixed at α = 0.5 and β = 1, and fractions ranging from 0 to 0.9 of the connectivity strengths are randomly set to zero. In this case, the synaptic strength model distribution is left skewed with a mean of 1/3. When more of the connectivities are set to zero, the simulation is meant to concur with early stages of neuronal development where few connections (and hence spiking patterns) are observed. Vice versa, more connectivities retained corresponds to a later stage of development. Each simulation was repeated 100 times for each parameter condition in each simulation experiment, which proved to be a sufficient number of repetitions to infer average parameter effects on the networks given the observed variability in the computed functional quantities. All computations were performed in Octave 4.0.0 running in 64-bit Windows 7 on an Intel(R) Core(TM) i5-4590 CPU 3.3 GHz processor. 3.2. Evaluation of NMI in recordings from neural networks in vitro All animal procedures were approved by the National Health and Environmental Research Laboratory Institutional Animal Care and Use Committee. The data for these experiments were derived from recently published studies (Brown et al., 2016). Briefly, primary cultures of cortical neurons were prepared from 0–24h old Long-Evans rats. Cells were plated into 48 well MEA plates containing 16 microelectrodes per well (Axon Biosystems, Atlanta, GA) and 15 min of activity were recorded from in vitro neural cell cultures at 2, 5, 7, 9, and 12 days in vitro (DIV) using an Axion Biosystems Maestro amplifier, Middleman and AxIS software (version 1.9 or later). Activity was recorded using a gain of 1200x and a sampling rate of 12.5 kHz/channel. After passing the signal

36


through a Butterworth band-pass filter (300–5000 Hz) online spike detection (threshold = 8x root mean square (rms) noise on each channel) was done with the AxIS adaptive spike detector. Each plate contained control wells and wells that were treated with different concentrations of neuroactive/neurotoxic compounds. For NMI, spikes were binned into user defined time intervals of ∆t seconds at each channel; if one or more spikes was detected in a given interval within a single channel, the observation value associated with that time interval and channel was set to 1. Otherwise, the value was set to zero. This procedure results in binary signals sampled at 1/∆t Hz. NMI returns a scalar, positive value for each 16-channel recording of a particular culture on a particular day. As the DIV of recordings increased, NMI tended to grow monotonically with the culture in the plate control groups. Electrode channels in a recording where less than 0.01% of binning windows contained spikes were removed from analysis, corresponding to self entropy rates below 0.491 bits per second and less than 2 spikes per minute for the chosen binning interval of 3 ms. The (square) root of the area under the curve (RAUC) over the 5 days of recordings (estimated by the trapezoidal rule Yeh & Kwan, 1978) was used as a comparative measure for developmental effects in a concentration-response experiment. The concentration–response experimental data was derived from 4 48-well plates with embedded MEAs. Each plate had 12 neutral control wells and the remaining 36 wells were treated with 6 chemicals at 6 increasing concentration levels. The experimental rationale for the selected concentrations and test chemicals is described in Brown et al. (2016), in this manuscript we report results on 4 of those chemicals: acetaminophen, bisindolymaleimide1, mevastatin, and loperamide. In addition to the proposed NMI approach, additional quantitative features of binary spike signals currently used in MEA analysis were computed; this included the average Pearson’s correlation as the mean of the cross-correlations between active electrodes (where at least 5 spikes per minute are observed), and the mean firing rate (MFR) amongst active electrodes as a non-synchrony related measure of network activity. The active electrode cutoff used for correlation is not directly translatable to the NMI entropy based cutoff because firing rates used for the entropy based cutoff are derived from a windowed spike list, and may actually reduce the effective firing rate. Complete details regarding these measures can be found in Brown et al. (2016). 3.2.1. Determining an information-optimal sampling rate for MEAs The implementation of normalized mutual information described above relies on binning spikes according to a predetermined time interval ∆t. The sampling rate of the raw electrode recordings was 12.5 kHz, resulting in around 10 µs precision in the time stamps associated with each spike. Since overlap for the present algorithm is determined by coincidence within a given interval, bins that are too small will result in almost no meaningful overlap: there may be a network connection underlying successive spikes on different electrodes, but if the time bins are smaller than the biological latency of induced neural firing then the NMI algorithm will not capture the shared information. On the other hand, bins that are too wide (relative to the raw observed firing rate) will tend to incorporate distinct spikes into single binning intervals. In this low resolution case (large bins) meaningful information will be lost and the NMI values returned will be artificially deflated. Using recordings from networks on DIV 12, the NMI value for binning intervals ∆t ranging from a tenth of a second to a millisecond were computed and multiplied by the given binning sampling rate 1/∆t to obtain the rate of NMI for each recording in each tested well. In Fig. 1 the mean and standard deviations of the NMI values are plotted against the bin size, and indicate that a bin size of ∆t ≈ 3 ms resulted in an approximately maximum average

Fig. 1. The rate of NMI at DIV 12 recordings for the control wells of one experimental MEA plate by the log of the binning size ∆t. The vertical dashed line indicates ∆t = 3 ms, the binning interval selected for the concentration response experiments reported here.

NMI rate (although the NMI rates from binning intervals between 2 and 5 ms would likely return similar results). The functional relationship observed in the control recordings between NMI rate and the binning interval ∆t was generally duplicated in test chemical recordings with non-trivial spiking. In Fig. 4, significant spiking is recorded at DIV 5 but NMI detects little synchrony in the recordings. The experiment described above was repeated for control well recordings at DIV 5, and no locally optimal binning interval was found in the interval of tested lengths. Further, computed NMI rates were between 2 and 3 orders of magnitude lower than those at DIV 12 for all tested binning intervals, indicating that low NMI values (and other measures of synchrony) at early developments stages are reflective of actual disconnectivity in the neural network rather than being artifacts of the chosen binning interval caused by differences in spiking lag at varying developmental states. The NMI rate derived from binary spiking signal data sampled according to the bin size of ∆t = 3 ms was used for all recordings evaluated in the present manuscript. It is worth noting that the optimal NMI binning interval found here in ambient recordings is consistent with scales of neural synchrony examined or derived in other experiments (Diesmann et al., 1999; Riehle et al., 1997). 3.3. Estimating component and joint entropies of binary network signals One consideration that must be taken into account when applying NMI as a measure of shared information is the explosion of the state space of possible firing patterns as network dimensionality grows. Computation of NMI requires an estimate of the joint entropy of the multiple variables in the network. Estimates of signal entropy depend on estimating a probability distribution of the random generating variable from a finite set of observations. The probability distribution is estimated by binning, which becomes more difficult for multi-dimensional signals as dimensionality increases. However, for binary signals binning may result in reasonable estimates of entropy via a tractable computational approach that is described below. Entropy of a discrete random variable X that takes M possible values depends on the probability mass function p(X ) as: h(X ) =

M ∑ i=1

−p(xi )log2 p(xi ),

(6)


where the log base 2 is taken to return information theoretic units of entropy as bits (Paninski, 2003; Rényi et al., 1961; Shannon, 1949). In the single-dimensional binary case the number of possible states of the system is 2, and the number of possible states M grows with the dimensionality n as M = 2n . Thus for a 16 electrode MEA array, we estimate a probability distribution over M = 216 possible states. This estimate is quite tractable for a typical MEA neural spike recording; in this manuscript recordings were at least 10 min long and spikes were binned into windows of ∆t = 3 ms, resulting in at least 200,000 temporal observation frames per experiment. Under the uniform distribution (a ‘‘lower-bound’’ for coincidence testing), the rate of coincidence is 1/216 and the expected number of state coincidences would be given as: 200,000(200,000 − 1) 2 · 216

= 305,174.

Coincidence is defined here as observing the same firing pattern in two observations rather than simply observing more than one spike in a single observation, hence the number of coincidences can actually exceed the number of observations. Simply by coincidence counting a reasonable estimate of the joint entropy is attained according to the criteria laid out by Ma (1981). It is important to note that entropy estimation via coincidence counting/binning and hence measures of mutual information may be unreliable for higher dimensional systems. For example, a 64 electrode binary spiking MEA allows for 264 possible states, and the likelihood of coincidences under the uniform distribution becomes so low that one would need close to a 211 day recording binned by ∆t = 3 ms in order to expect one coincidence as described above. In this case multiple coincidences become strong indicators of an organized network with shared information. The important point is that mutual information measures may not be as sensitive in higher dimensional networks, especially in ambient recordings of neural spiking with infrequent spiking activity. Thus, caution should be exercised in using NMI in recordings containing more than 16 electrodes, as the evidentiary value of state coincidences may not increase in proportion to the decrease in likelihood of coincidences for very large networks if the recording length is fixed. The computational feasibility of joint entropy estimation for MEA derived signals is dependent on the transformation of electrode signals to binary spikes. Any given state of the 16-channel system may be interpreted as a 16-bit number, allowing for easy manipulation using a accumulative array function to count coincident states. The frequency of each number (state) is tallied over the entire length of the recording, and the probability of a given state is estimated by dividing the number of appearances of that state by the total number of observations (time bins in the recording). If a particular pattern is not observed, it is estimated probability is zero and hence need not be included in the summation in Eq. (6). In theory the above process should be immediately extensible to 64 channel MEA recordings in 64-bit computing environments, however one must also consider the sensitivity of the entropy estimates under these circumstances, as described above. Finally, for more general or continuous signals (i.e. network nodes are no longer binary variables), joint entropy estimates of many variables quickly becomes a prohibitively expensive computational problem. NMI is only considered here in the context of binary networks, as seen in models of neural spiking. However binless estimation techniques (Victor, 2002) could enable NMI estimates for continuous valued multivariate networks or time-bin agnostic NMI estimates for binary spike trains.

37

4. Results 4.1. Numerical simulations indicate mutual information is sensitive to early stage network development In Fig. 2 the results of the first simulation experiment (meant to model strengthening synapses in a developing network via feedforward neural network) are plotted. The NMI is compared to the average of non-zero PMI off-diagonal matrix elements (Fig. 2(a)), the mean of absolute values of non-zero off diagonal elements of the cross-correlation matrix (R) (Fig. 2(b)), and the mean firing rate (MFR) amongst all active nodes (Fig. 2(c)), which model electrodes. Also in Fig. 2 the results of the second simulation experiment are similarly plotted; the average strength of active connections is fixed and drawn from a beta distribution B(0.5, 1) but the number of active connections at each layer ranges from 10% to 100%. As static measures of network shared information content, NMI is on average greater than mean PMI in both simulations. A similar comparison may not be made between NMI and R and MFR respectively because the quantities are measured in different units, however NMI tracks closely with R and MFR growth in the first simulation. Instead a dimensionless comparison may be made between all four measures by comparing the relative growth rates at each pair of stages in the simulations. Fig. 3 depicts the average relative growth of each measure between developmental stages by plotting the natural log differences (that is, the logarithm of the ratios) between measures at a given α value and the same measures at the previous α value. For example, at the transition between α = 0.05 and α = 0.10 in Fig. 3(a) indicated by value 1 on the horizontal axis, the NMI and mean PMI logarithmic growth ratios are centered around 1.5. This means that the ratio between NMI (and mean PMI) of simulated recordings from networks specified by α = 0.10 and α = 0.05 is on average approximately e1.5 ≈ 4.5, whereas the same ratio for average correlation and MFR with logarithmic growth ratios centered around approximately 0.75 is e0.75 ≈ 2.1. In the first simulation (Fig. 3(a)) the tested logarithmic growth ratios for NMI are significantly different from those for R in the first 5 transitions and from MFR in the first 2 transitions (p-values < 0.05 according to a paired Welch’s t-test). By the same standard of significance, the logarithmic growth ratios for mean PMI are significantly different from R in all tested transitions and from MFR in transitions 1, 2, 4, 7, and 9 through the last tested ratio (because of very small variation, the growth ratios of mean PMI were significantly different from other measures at later developmental stages, although this is not visible in Fig. 3(a)). In the second simulation (Fig. 3(b)) the tested logarithmic growth ratios for NMI and mean PMI are significantly different from the tested logarithmic growth ratios for MFR and R (p-values < 0.05 according to a paired Welch’s t-test) at the first transition. These results suggest that the mutual information measures are more sensitive to network growth at early developmental stages than MFR and linear correlation, in the sense that the same increases in network complexity – especially in terms of the likelihood of a spike propagating through the network, interpreted as synaptic strength – result in a larger (and thus more likely to be detected) relative increase in the mutual information measures. 4.2. Measures of developing networks in vitro Section 3.2 described an in vitro concentration response experiment for neurodevelopmental effects of chemicals on network function. Fig. 4 depicts the average development of all four measures amongst the controls within each plate. There is variability in the developmental curves across plates, likely due to variations in experimental conditions as plates were prepared and analyzed at different times over the course of several months.

38


(a) NMI vs. Mean PMI.

(b) NMI vs. Mean R.

(c) NMI vs. MFR.

(d) NMI vs. Mean PMI.

(e) NMI vs. Mean R.

(f) NMI vs. MFR.

Fig. 2. Plots of NMI vs. the mean of non-zero, off-diagonal elements of the PMI (2(a), 2(d)), the mean (absolute) cross correlation matrices (2(b), 2(e)), and vs. the mean firing rate over all active nodes (2(c), 2(f)) for both simulation experiments. In the first simulation (2(a), 2(b), 2(c)), nodes are meant to simulate electrodes, and as α grows the strength of network connections interpreted as probabilities of spike propagation (drawn from B(α, 1 − α )) in the simulated feed-forward network grows. In the second simulation (2(d), 2(e), 2(f)), as the percentage of active nodes grows the number of non-zero connections in the simulated feed-forward network grows proportionally. Here the beta distribution that connectivities are randomly drawn from is fixed at B(0.5, 1), so the mean probability of an incident spike triggering a spike in a connected node via an active connection is 0.5/(0.5 + 1) = 1/3. Error bars indicate standard deviations of each result across 100 randomized replicates.

Note that the control MFR (Fig. 4) measured at DIV 5 is relatively high whereas the NMI, mean PMI and R values (Figs. 4(a), 4(b) and 4(c)) are either zero or close to zero at DIV 5. In these instances several active electrodes that detect significant neural spiking may contribute to a high rate of firing, however in an immature culture that has not completed synaptogenesis (Harrill, Robinette, & Mundy, 2011) the correlation and shared information (NMI and mean PMI) will be small as there is insufficient temporal coincidence between the spikes to indicate the presence of a connected network. The synchronous measures detect a delayed developmental curve relative to the non-synchronous MFR measure. This suggests that synaptogenesis, measured by network connectivity either through correlation or shared information, is distinct from active firing by neurons. Further, growth in the MI measures (Figs. 4(a) and 4(b)) is somewhat delayed at DIV 7 and 9 relative to the mean linear correlation (Fig. 4(c)). This follows from the lack of significant increases in MFR on average across DIV 5, 7, and 9: even with growing linear correlation, because the MI measures track with MFR as well it is consistent that they should be delayed relative to the mean R growth. Fig. 5 plots the RAUC values of various network measures for a concentration series experiment for four test chemicals. Figs. 5(a) and 5(b) depict concentration responses for acetaminophen and bisindolymaleimide 1 respectively. NMI, mean PMI, MFR, and mean R all returned similar results as measures of network development in the case of these chemicals. Acetaminophen is used as a negative control compound in this and many other in vitro assays (McConnell, McClain, Ross, LeFew, & Shafer, 2012; Valdivia et al., 2014). Bisindolymaleimide 1 is a PKC inhibitor that decreases both neurite outgrowth and neural network activity, so the common

results of all measures in this case (and acetaminophen) were consistent with prior expectations. Mevastatin (depicted in Fig. 5(c)) is a known inhibitor of synaptogenesis in vitro (Harrill et al., 2011); in the concentration series experiment the NMI, mean PMI, and R quantities were sensitive to the effect of mevastatin starting at a dose of about 1 µM. However, MFR was relatively unchanged at concentrations up to 3 µM. Lower doses of mevastatin may have inhibited synaptogenesis without affecting the average firing rates of neurons, illustrating that synchronous measures NMI and mean PMI, like R, are sensitive to network connectivity even in the case where there is significant neural spiking observed in active electrodes. The average correlation measure R returned an outlier from a well dosed with Mevastatin 10 µM that is evident in the activity spike and large error bar at this concentration in Fig. 5(c). A recording derived at DIV 7 in the spurious well detected a small number of spikes localized around one or two time points (over the course of ≈ 10 min) that happened to coincide across all 16 electrodes, resulting in many cross-correlation measurements at or close to 1. Only one electrode detected significant spiking otherwise (334 spikes over the length of the recording) and had very low (< 0.1) correlation with the other mostly inactive electrodes. These few global spikes may be an external or signal processing artifact, as such connectivity does not coincide with other cultures at DIV 7. However, even if it were a real biological effect, the almost zero firing rate observed by most electrodes is a feature to which linear correlation is not sensitive (as long as the few spikes present pushed each channel into the active electrode category). On the other hand, even without the entropy threshold used to discard electrodes, the MI measures will (and do) return zero or close to


39

(a) NMI vs. Mean PMI.

(a) Relative growth: simulation 1.

(b) NMI vs. Mean R.

(b) Relative growth: simulation 2.

Fig. 3. Plots of the logarithmic difference (log of the ratios) of successive values of NMI, Mean PMI, R, and MFR between successive stages of growth in each simulation experiment. Error bars depict the standard error of the means for each logarithmic growth ratio for 100 samples. In Fig. 3(a) each integer value represents a successive transition between values of α in the first set of simulations, ∆α = 0.05: 1 is the (natural) log of the ratios of measures between α = 0.10 and 0.05, 2 is the same ratio for α = 0.15 and 0.10, etc. Similarly, in Fig. 3(b) integers on the horizontal axis represent successive transitions between percentages of active connections in the second simulation. (c) NMI vs. MFR.

zero values for these recordings because of the trivially low selfinformation in most of the electrodes. The outlier MFR RAUC value indicated by the spike in MFR activity at the same concentration (10 µM) of Mevastatin came from a different plate replicate where one well had a single electrode with observed firing and no other active electrodes. In the concentration series for loperamide depicted in Fig. 5(d), a departure from plate controls of the MFR measure at and above 1 µM was observed at a lower concentration than the same response observed in the mean R measure. Rather than tracking with the increase in R observed at 1 µM , NMI and mean PMI split the two measures which is consistent with mutual information measures as sensitive to both linear correlation and the spike train information content. NMI and mean PMI are not independent of R nor of MFR which are all computed from the same recordings. Thus, while not representing a significant departure from control behavior at this concentration, lower NMI and mean PMI trending downward with MFR at 1 µM is nonetheless a real consequence of the sensitivity of both measures to the information rate/content within neural spike trains. Correlation, on the other hand, is not sensitive to decreases in network activity, but only in network connectivity. If a network is well connected even a few observed spikes

Fig. 4. Average developmental curves and standard deviation of four measures of network information content or complexity across 4 experimental plates. Each plate included 12 control wells, the values plotted here are averaged across plates. Error bars depict the standard deviation of each quantity across 12 × 4 = 48 replicates. The NMI ontogeny is directly compared against mean PMI 4(a), mean R 4(b), and MFR 4(c).

over a relatively long recording may result in a relatively high average correlation, despite the fact that there may not be very much interesting information content relative to a more rapidly firing network with rich patterns. While the average concentration responses of NMI and mean PMI were similar in all of the reported experiments, there was more variability observed in the mean PMI measure. Both the mean PMI of the control well recordings and the mean PMI of test recordings as a percentage of the median of the controls were more variable than their NMI counterparts, resulting in higher variability and some dramatic outliers in the mean PMI measure for concentration response experiments, observable in Fig. 5. Furthermore, the standard deviation of the mean PMI recorded in control wells as a fraction of its mean was, on average, 127% of that of the NMI at DIV 12 and 120% of that of the NMI as measured by RAUC.

40


(a) Acetaminophen Concentration Response.

(b) Bisindolymaleimide 1 Concentration Response.

(c) Mevastatin Concentration Response.

(d) Loperamide Concentration Response.

Fig. 5. Concentration series of RAUC values for acetaminophen 5(a), bisindolymaleimide 1 5(b), mevastatin 5(c), and loperamide 5(d). Response is plotted as the percentage of the median of control values by plate, error bars depict standard deviations of measurements at each concentration value over multiple replicates. Note that the horizontal axis values of responses are slightly perturbed according to the tested measure from the true concentration value so that comparisons may be made between error bars of different measures. This perturbation does not reflect a true difference in tested concentrations, in fact each measure is derived from exactly the same recordings over each MEA.

5. Discussion

5.1. Implications of NMI for future work

The overarching scientific problem that motivated the work presented in this paper is the problem of making comparisons of network complexity between developing and growing neural networks. More specifically, NMI is introduced as a new approach to measuring multivariate mutual information in networks where nodes are characterized by binary spike trains. It is demonstrated from information theoretic principles that NMI can be justified as a dimensional rescaling of multiinformation/total correlation (TC). NMI of a growing network is shown to satisfy certain boundary conditions, serving as evidence that NMI is a multivariate measure of mutual information that is robust with respect to changes in network size. NMI and the approach of aggregating non-zero pairwise mutual information (PMI) values are shown to be more sensitive measures of network development during early stages in simulations than average linear correlation and mean firing rate (see Fig. 3). Further, it is observed in simulations (see Fig. 2(a)) that as the probability of an incident spike propagating through a feed forward network approaches one, NMI and mean PMI asymptotically approach the same value, providing further numerical evidence that NMI is a consistent measure of mutual information in a multivariate setting.

5.1.1. NMI as a discriminative measure of developmental neurotoxicity effects NMI and mean PMI are shown to be more sensitive to concentration effects than the nonsynchronous MFR measure in the case of mevastatin (Fig. 5(c)) and also to be more sensitive to concentration effects than the synchronous, but linear, mean Pearson’s correlation (R) in the case of loperamide (Fig. 5(d)). Further, the exact case described earlier as a motivation of MI methods over linear correlation occurred in the biological experiment in Fig. 5(c): one or two (artifactual) coincident spikes happened to be observed across the network resulting in artificially high average correlation but almost no meaningful shared information. This suggests that the higher order synchrony measures are more sensitive to various types of developmental effects in neural networks than nonsynchronous and first order synchronous measures, in the sense that they are by definition capable of detecting features of network complexity that may otherwise be passed over by the comparative measures. Furthermore, NMI is seen to be less variable in both the control and test recordings than mean PMI, indicating that NMI may be a preferable higher order measure of network synchrony.


Also, NMI is a single scalar measure: a normalization of TC which is thought to measure the correlation of a network at all orders of interactions (not just pair-wise interactions) (Cover & Thomas, 2012; Schneidman, Berry, Segev, & Bialek, 2006; Schneidman, Still, Berry, Bialek, et al., 2003). Deriving a value for the central tendency of PMI interactions (i.e. mean PMI) requires an additional assumption of the user regarding the distribution of the interactions: should the mean, median, or even mode be reported? When network size is small enough that the joint entropies can be reliably estimated from recordings, NMI may be preferable by requiring one less assumption on the part of the experimenter. 5.1.2. Implications for biological and artificial neural networks In order to draw a connection between the physical Ising model and simulations of biological neural networks, Schneidman et al. (2006) state that fixing the typical interaction strength within a fully connected network of neurons and then adding neurons to the network is analogous to lowering the temperature of a physical Ising model with a fixed size. They observe that the sum of individual-cell entropies grows proportional to the number of cells N, whereas TC of the network grows proportional to N(N − 1) (by solving ‘‘the Ising model in perturbation theory’’). Extrapolating from their model of network growth with weak pairwise interactions, NMI would grow proportional to network size N. Schneidman et al. (2006) state that multiinformation (TC in the present notation, IN in their notation) grows quadratically with N and the sum of independent entropies (S1 in their notation) grows linearly with N, but IN (TC) is defined as S1 less the total entropy. It may be worth reexamining the ‘‘freezing out’’ effect hypothesized on total entropy in light of the interpretation of NMI proposed here. On the other hand NMI and mean PMI approached approximately the same information rate in simulations (Section 4.1) as networks became more fully connected, suggesting that NMI is closer to the average of pair-wise mutual information values in a strongly connected network. In this case, as network size N grows, NMI will ultimately be bounded by a constant multiple of the upper limit of the pair-wise mutual information rate (again, a consequence of fixed time bin sizes). Finally, as illustrated in the remark in Remark 2.2, growth of a ‘‘network’’ when it is deterministically representing a scene or system with fixed information content also returns bounded NMI. The argument used in Property 2.1 to motivate the normalization factor in NMI from TC – identical copies of a network considered together as a single network – is reminiscent of the two-replica overlap parameter used as a quantifier in Ising models of spin glasses and by extension neural network interpretations of Hopfield models (Amit, Gutfreund, & Sompolinsky, 1985; Hopfield, 1982). Here, however, the idea of copying the network is made more deterministic by exactly reproducing the random variables comprising the nodes rather than reproducing weights drawn randomly from a distribution. Neural networks under some type of Hebbian learning rule are computationally versatile (Agliari, Barra, Galluzzi, Guerra, & Moauro, 2012; Agliari et al., 2015; Sollich, Tantari, Annibale, & Barra, 2014) and regions of criticality are shown to arise out of synthetic and empirical models of hierarchical networks (Moretti & Muñoz, 2013). NMI and its conceptual basis as a shared information quantifier of a network of varying size may be of some interest to the study of such networks and their statistical mechanical analogues which have important implications in machine learning fields (Amit, 1992; Barra, Bernacchia, Santucci, & Contucci, 2012; Coolen, Kühn, & Sollich, 2005; Goodfellow, Bengio, & Courville, 2016). Finally, it may be interesting to extend ideas presented here to non-symmetric measures of network synchrony. Mutual information is a symmetric measure, so NMI is somewhat agnostic to questions of causality and directedness in an underlying network topology.

41

5.2. Conclusions NMI is a new approach to quantifying shared information in a multivariate network, constituting a simple normalization to TC (total correlation, also called multi-information) that is motivated by behaviors of the measure under various types of network growth. Because NMI is scaled according to network size, its growth is bounded under several investigated scenarios of network growth where the underlying generative information content that a neural network is representing remains constant. This is in contrast to TC, which grows without bound under those scenarios, and MMI which exhibits more complicated behavior and may be positive or negative. NMI is applied as a measure of network development in simulated feed-forward networks and in biological experiments on cultures of neurons. In the latter experiment, NMI was shown (along with mean PMI) to be a discriminative measure of developmental effects caused by exposure to neuroactive compounds. Acknowledgment KB and CG were supported by ORISE Fellowships. References Abdallah, S. A., & Plumbley, M. D. (2012). A measure of statistical complexity based on predictive information with application to finite spin systems. Physics Letters A, 376(4), 275–281. Agliari, E., Barra, A., Galluzzi, A., Guerra, F., & Moauro, F. (2012). Multitasking associative networks. Physical Review Letters, 109(26), 268101. Agliari, E., Barra, A., Galluzzi, A., Guerra, F., Tantari, D., & Tavani, F. (2015). Retrieval capabilities of hierarchical networks: from Dyson to Hopfield. Physical Review Letters, 114(2), 028103. Amit, D. J. (1992). Modeling brain function: The world of attractor neural networks. Cambridge University Press. Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters, 55(14), 1530. Ay, N., Olbrich, E., Bertschinger, N., & Jost, J. (2006). A unifying framework for complexity measures of finite systems. In Proceedings of ECCS06. Oxford, UK: European Complex Systems Society. Barra, A., Bernacchia, A., Santucci, E., & Contucci, P. (2012). On the equivalence of hopfield networks and boltzmann machines. Neural Networks, 34, 1–9. Biffi, E., Regalia, G., Menegon, A., Ferrigno, G., & Pedrocchi, A. (2013). The influence of neuronal density and maturation on network activity of hippocampal cell cultures: a methodological study. PLoS One, 8(12), e83899. Boes, J. L., & Meyer, C. R. (1999). Multi-variate mutual information for registration. In Medical image computing and computer-assisted intervention—MICCAI99 (pp. 606–612). Springer. Bologna, L. L., Pasquale, V., Garofalo, M., Gandolfo, M., Baljon, P. L., Maccione, A., . . . , & Chiappalone, M. (2010). Investigating neuronal activity by SPYCODE multichannel data analyzer. Neural Networks, 23(6), 685–697. Borst, A., & Theunissen, F. E. (1999). Information theory and neural coding. Nature Neuroscience, 2(11), 947–957. Bove, M., Genta, G., Verreschi, G., & Grattarola, M. (1996). Characterization and interpretation of electrical signals from random networks of cultured neurons. Technology and Health Care: Official Journal of the European Society for Engineering and Medicine, 4(1), 77–86. Brown, J. P., Hall, D., Frank, C., Wallace, K., Mundy, W. R., & Shafer, T. J. (2016). Evaluation of a microelectrode array-based assay for neural network ontogeny using training set chemicals. Toxicological Sciences, 154(1), 126–139. Buzsáki, G., & Draguhn, A. (2004). Neuronal oscillations in cortical networks. Science, 304(5679), 1926–1929. Charlesworth, P., Cotterill, E., Morton, A., Grant, S. G., & Eglen, S. J. (2015). Quantitative differences in developmental profiles of spontaneous activity in cortical and hippocampal cultures. Neural Development, 10(1), 1. Chiappalone, M., Bove, M., Vato, A., Tedesco, M., & Martinoia, S. (2006). Dissociated cortical networks show spontaneously correlated activity patterns during in vitro development. Brain Research, 1093(1), 41–53. Coolen, A. C., Kühn, R., & Sollich, P. (2005). Theory of neural information processing systems. OUP Oxford. Cotterill, E., Hall, D., Wallace, K., Mundy, W. R., Eglen, S. J., & Shafer, T. J. (2016). Characterization of early cortical neural network development in multiwell microelectrode array plates. Journal of Biomolecular Screening, 21(5), 510–519.

42


Cover, T. M., & Thomas, J. A. (2012). Elements of information theory. John Wiley & Sons. Crossley, N. A., Mechelli, A., Scott, J., Carletti, F., Fox, P. T., McGuire, P., & Bullmore, E. T. (2014). The hubs of the human connectome are generally implicated in the anatomy of brain disorders. Brain, 137(8), 2382–2395. Cutts, C. S., & Eglen, S. J. (2014). Detecting pairwise correlations in spike trains: an objective comparison of methods and application to the study of retinal waves. The Journal of Neuroscience, 34(43), 14288–14303. Delorme, A., Palmer, J., Onton, J., Oostenveld, R., & Makeig, S. (2012). Independent eeg sources are dipolar. PloS One, 7(2), e30135. Diesmann, M., Gewaltig, M.-O., & Aertsen, A. (1999). Stable propagation of synchronous spiking in cortical neural networks. Nature, 402(6761), 529–533. Eckhorn, R., Reitboeck, H. J., Arndt, M., & Dicke, P. (1990). Feature linking via synchronization among distributed assemblies: simulations of results from cat visual cortex. Neural Computation, 2(3), 293–307. Engel, A. K., Fries, P., & Singer, W. (2001). Dynamic predictions: oscillations and synchrony in top–down processing. Nature Reviews Neuroscience, 2(10), 704– 716. Engel, A. K., & Singer, W. (2001). Temporal binding and the neural correlates of sensory awareness. Trends in Cognitive Sciences, 5(1), 16–25. Fano, R. M. (1961). Transmission of information: A statistical theory of communication. MIT Press. Fries, P., Nikolić, D., & Singer, W. (2007). The gamma cycle. Trends in Neurosciences, 30(7), 309–316. Garofalo, M., Nieus, T., Massobrio, P., & Martinoia, S. (2009). Evaluation of the performance of information theory-based methods and cross-correlation to estimate the functional connectivity in cortical networks. PloS One, 4(8), e6482. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. Gourévitch, B., & Eggermont, J. J. (2007). Evaluating information transfer between auditory cortical neurons. Journal of Neurophysiology, 97(3), 2533–2543. Gregoriou, G. G., Gotts, S. J., Zhou, H., & Desimone, R. (2009). High-frequency, longrange coupling between prefrontal and visual cortex during attention. Science, 324(5931), 1207–1210. Gross, J., Schmitz, F., Schnitzler, I., Kessler, K., Shapiro, K., Hommel, B., & Schnitzler, A. (2004). Modulation of long-range neural synchrony reflects temporal limitations of visual attention in humans. Proceedings of the National Academy of Sciences of the United States of America, 101(35), 13050–13055. Han, T. S. (1978). Nonnegative entropy measures of multivariate symmetric correlations. Information and Control, 36, 133–156. Han, T. S. (1980). Multiple mutual informations and multiple interactions in frequency data. Information and Control, 46, 26–45. Harrill, J. A., Robinette, B. L., & Mundy, W. R. (2011). Use of high content image analysis to detect chemical-induced changes in synaptogenesis in vitro. Toxicology in Vitro, 25(1), 368–387. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558. Illes, S., Fleischer, W., Siebler, M., Hartung, H.-P., & Dihné, M. (2007). Development and pharmacological modulation of embryonic stem cell-derived neuronal network activity. Experimental Neurology, 207(1), 171–176. Jakulin, A., & Bratko, I. (2003). Quantifying and visualizing attribute interactions. ArXiv Preprint cs/0308002. Jensen, O., Kaiser, J., & Lachaux, J.-P. (2007). Human gamma-frequency oscillations associated with attention and memory. Trends in Neurosciences, 30(7), 317–324. Johnstone, A. F., Gross, G. W., Weiss, D. G., Schroeder, O. H.-U., Gramowski, A., & Shafer, T. J. (2010). Microelectrode arrays: a physiologically based neurotoxicity testing platform for the 21st century. Neurotoxicology, 31(4), 331–350. Knox, C. K. (1981). Detection of neuronal interactions using correlation analysis. Trends in Neurosciences, 4, 222–225. Kumar, A., Rotter, S., & Aertsen, A. (2010). Spiking activity propagation in neuronal networks: reconciling different perspectives on neural coding. Nature Reviews Neuroscience, 11(9), 615–627. Lange, O. F., & Grubmüller, H. (2006). Generalized correlation for biomolecular dynamics. Proteins: Structure, Function, and Bioinformatics, 62(4), 1053–1061. Li, W. (1990). Mutual information functions versus correlation functions. Journal of Statistical Physics, 60(5–6), 823–837. Ma, S.-k. (1981). Calculation of entropy from data of motion. Journal of Statistical Physics, 26(2), 221–240. Matsuda, H. (2000). Physical nature of higher-order mutual information: Intrinsic correlations and frustration. Physical Review E, 62(3), 3096. McConnell, E. R., McClain, M. A., Ross, J., LeFew, W. R., & Shafer, T. J. (2012). Evaluation of multi-well microelectrode arrays for neurotoxicity screening using a chemical training set. Neurotoxicology, 33(5), 1048–1057. McGill, W. J. (1954). Multivariate information transmission. Psychometrika, 19(2), 97–116. Moretti, P., & Muñoz, M. A. (2013). Griffiths phases and the stretching of criticality in brain networks. Nature Communications, 4. Nam, Y., & Wheeler, B. C. (2011). In vitro microelectrode array technology and neural recordings. Critical Reviews in Biomedical Engineering, 39(1).

Obien, M. E. J., Deligkaris, K., Bullmann, T., Bakkum, D. J., & Frey, U. (2015). Revealing neuronal function through microelectrode array recordings. Frontiers in Neuroscience, 8, 423. Olbrich, E., Bertschinger, N., Ay, N., & Jost, J. (2008). How should complexity scale with system size? The European Physical Journal B, 63(3), 407–415. Palva, J. M., Monto, S., Kulashekhar, S., & Palva, S. (2010). Neuronal synchrony reveals working memory networks and predicts individual memory capacity. Proceedings of the National Academy of Sciences, 107(16), 7580–7585. Paninski, L. (2003). Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253. Pham, T. H., Ho, T. B., Nguyen, Q. D., Tran, D. H., & Nguyen, V. H. (2012). Multivariate mutual information measures for discovering biological networks. In RIVF (pp. 1–6). Pine, J. (2006). A history of MEA development. In Advances in network electrophysiology (pp. 3–23). Springer. Rényi, A., et al. (1961). On measures of entropy and information. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, Vol. 1 (pp. 547–561). Reyes, A. D. (2003). Synchrony-dependent propagation of firing rate in iteratively constructed networks in vitro. Nature Neuroscience, 6(6), 593–599. Riehle, A., Grün, S., Diesmann, M., & Aertsen, A. (1997). Spike synchronization and rate modulation differentially involved in motor cortical function. Science, 278(5345), 1950–1953. Rieke, F., Warland, D., van Steveninck, R. d. R., & Bialek, W. (1996). Spikes: exploring the neural code. MIT Press. Rothstein, J. (1952). Organization and entropy. Journal of Applied Physics, 23(11), 1281–1282. Rullen, R. V., & Thorpe, S. J. (2001). Rate coding versus temporal order coding: what the retinal ganglion cells tell the visual cortex. Neural Computation, 13(6), 1255– 1283. Salinas, E., & Sejnowski, T. J. (2001). Correlated neuronal activity and the flow of neural information. Nature Reviews Neuroscience, 2(8), 539–550. Schneidman, E., Berry, M. J., Segev, R., & Bialek, W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440(7087), 1007–1012. Schneidman, E., Still, S., Berry, M. J., Bialek, W., et al. (2003). Network information and connected correlations. Physical Review Letters, 91(23), 238701. Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461. Shannon, C. E. (1949). The mathematical theory of communication. University of Illinois Press. Sollich, P., Tantari, D., Annibale, A., & Barra, A. (2014). Extensive parallel processing on scale-free networks. Physical Review Letters, 113(23), 238106. Sporns, O., Tononi, G., & Edelman, G. M. (2000). Theoretical neuroanatomy: relating anatomical and functional connectivity in graphs and cortical connection matrices. Cerebral Cortex, 10(2), 127–141. Srinivasa, S. (2005). A review on multivariate mutual information, vol. 2. (pp. 1–6). Notre Dame, Indiana: Univ. of Notre Dame. Strehl, A., Ghosh, J., & Mooney, R. (2000). Impact of similarity measures on webpage clustering. In Workshop on artificial intelligence for web search, AAAI 2000 (pp. 58–64). ` M., & Vejnarová, J. (1998). The multiinformation function as a tool for Studeny, measuring stochastic dependence. In Learning in graphical models (pp. 261– 297). Springer. Tallon-Baudry, C., Bertrand, O., & Fischer, C. (2001). Oscillatory synchrony between human extrastriate areas during visual short-term memory maintenance. Journal of Neuroscience, 21(20), 177. Timme, N., Alford, W., Flecker, B., & Beggs, J. M. (2014). Synergy, redundancy, and multivariate information measures: an experimentalist’s perspective. Journal of Computational Neuroscience, 36(2), 119–140. Tononi, G., Sporns, O., & Edelman, G. M. (1994). A measure for brain complexity: relating functional segregation and integration in the nervous system. Proceedings of the National Academy of Sciences, 91(11), 5033–5037. Uhlhaas, P., Pipa, G., Lima, B., Melloni, L., Neuenschwander, S., Nikolić, D., & Singer, W. (2009). Neural synchrony in cortical networks: history, concept and current status. Frontiers in Integrative Neuroscience, 3, 17. Uhlhaas, P. J., & Singer, W. (2006). Neural synchrony in brain disorders: relevance for cognitive dysfunctions and pathophysiology. Neuron, 52(1), 155–168. Valdivia, P., Martin, M., LeFew, W. R., Ross, J., Houck, K. A., & Shafer, T. J. (2014). Multi-well microelectrode array recordings detect neuroactivity of toxcast compounds. Neurotoxicology, 44, 204–217. van Pelt, J., Vajda, I., Wolters, P. S., Corner, M. A., & Ramakers, G. J. (2005). Dynamics and plasticity in developing neuronal networks in vitro. Progress in Brain Research, 147, 171–188. Victor, J. D. (2002). Binless strategies for estimation of information from neural data. Physical Review E, 66(5), 051903.

K.R. Ball et al. / Neural Networks 95 (2017) 29–43 Wagenaar, D. A., Pine, J., & Potter, S. M. (2006). An extremely rich repertoire of bursting patterns during the development of cortical cultures. BMC Neuroscience, 7(1), 1. Wainger, B. J., Kiskinis, E., Mellin, C., Wiskow, O., Han, S. S., Sandoe, J., . . . , & Boulting, G. (2014). Intrinsic membrane hyperexcitability of amyotrophic lateral sclerosis patient-derived motor neurons. Cell Reports, 7(1), 1–11. Wang, X.-J. (2010). Neurophysiological and computational principles of cortical rhythms in cognition. Physiological Reviews, 90(3), 1195–1268. Ward, L. M., Doesburg, S. M., Kitajo, K., MacLean, S. E., & Roggeveen, A. B. (2006). Neural synchrony in stochastic resonance, attention, and consciousness. Canadian Journal of Experimental Psychology/Revue Canadienne De Psychologie Expérimentale, 60(4), 319.

43

Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development, 4(1), 66–82. Wennekers, T., & Ay, N. (2003). Spatial and temporal stochastic interaction in neuronal assemblies. Theory in Biosciences, 122(1), 5–18. Woodard, C. M., Campos, B. A., Kuo, S.-H., Nirenberg, M. J., Nestor, M. W., Zimmer, M., . . . , & Paull, D. (2014). iPSC-derived dopamine neurons reveal differences between monozygotic twins discordant for Parkinson’s disease. Cell Reports, 9(4), 1173–1182. Yeh, K., & Kwan, K. (1978). A comparison of numerical integrating algorithms by trapezoidal, lagrange, and spline approximation. Journal of Pharmacokinetics and Pharmacodynamics, 6(1), 79–98.

Mutual information model for link prediction in heterogeneous complex networks.

MIRA: mutual information-based reporter algorithm for metabolic networks.

Part mutual information for quantifying direct associations in networks.

MIRA: mutual information-based reporter algorithm for metabolic networks.

Link Prediction in Weighted Networks: A Weighted Mutual Information Model.

Link prediction in complex networks: a mutual information perspective.

Networks in financial markets based on the mutual information rate.

Predicting protein-protein interactions via multivariate mutual information of protein sequences.

A frequency-resolved mutual information rate and its application to neural systems.

The Impact of Different Sources of Fluctuations on Mutual Information in Biochemical Networks.

Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks.

Characterization of Predictive Behavior of a Retina by Mutual Information.

Aberrant neural networks for the recognition memory of socially relevant information in patients with schizophrenia.

Spatial mutual information based hyperspectral band selection for classification.

Graph embedded nonparametric mutual information for supervised dimensionality reduction.

Equitability, mutual information, and the maximal information coefficient.

Mutual information measures applied to EEG signals for sleepiness characterization.

Feature selection for chemical sensor arrays using mutual information.

Optimization of a multilayer neural network by using minimal redundancy maximal relevance-partial mutual information clustering with least square regression.

Growing networks with superjoiners.

Geometric mutual information at classical critical points.

Tree-structured feature extraction using mutual information.

Predicting 10-day mortality in patients with strokes using neural networks and multivariate statistical methods.

A mutual information estimator with exponentially decaying bias.