Lambda and the Edge of Chaos in Recurrent Neural Networks

Jared Seifter** University of Pennsylvania

James A. Reggia*,† University of Maryland

Abstract The idea that there is an edge of chaos, a region in the space of dynamical systems having special meaning for complex living entities, has a long history in artificial life. The significance of this region was first emphasized in cellular automata models when a single simple measure, ECA, identified it as a transitional region between order and chaos. Here we introduce a parameter ENN that is inspired by ECA but is defined for recurrent neural networks. We show through a series of systematic computational experiments that ENN generally orders the dynamical behaviors of randomly connected/weighted recurrent neural networks in the same way that ECA does for cellular automata. By extending this ordering to larger values of ENN than has typically been done with ECA and cellular automata, we find that a second edge-of-chaos region exists on the opposite side of the chaotic region. These basic results are found to hold under different assumptions about network connectivity, but vary substantially in their details. The results show that the basic concept underlying the lambda parameter can usefully be extended to other types of complex dynamical systems than just cellular automata.

Keywords Edge of chaos, lambda, emergence, recurrent neural networks, complex systems

1 Introduction The notion that there is an edge of chaos, a transitional region between order and disorder in the space of dynamical systems that has special meaning for complex living entities, has a long history in the field of artificial life. During the 1980s, extensive computational experiments and theoretical analysis established that the dynamics of cellular automata models fall into four broad classes [23]. These classes can be characterized by the behavior of a cellular space when started from an initial random configuration (state): I. uniformly quiescent—all cells quickly become quiescent, with all activity in the space dying out; II. fixed or periodic activity—local activity patterns that either stabilize (fixed-point attractors) or exhibit oscillatory behavior having a relatively short period (limit cycles); III. chaotic activity—widespread random-appearing activity that persists indefinitely, exhibiting sensitivity to initial conditions and unpredictability; and * Contact author. ** Computer and Information Science Department, Levine Hall, University of Pennsylvania, 3330 Walnut Street, Philadelphia, PA 19104. E-mail: [email protected] † Department of Computer Science and UMIACS, A.V. Williams Building, University of Maryland, College Park, MD 20742. E-mail: [email protected]

© 2015 Massachusetts Institute of Technology

Artificial Life 21: 55–71 (2015) doi:10.1162/ARTL_a_00152

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

Figure 1. The measure ECA orders the four classes of cellular automata as shown here [11]. Note that class IV cellular automata fall between class II and class III. Here s is the number of states a cell may assume; for example, s = 2 for binary-state cellular automata.

IV. complex activity—persistent complex activity patterns that are often localized but persistent and sometimes propagate across the cellular space. The latter kind of class IV systems, having complex behaviors, is very interesting from an artificial life perspective; it is believed to be capable in general of universal computation and has been conjectured by workers in artificial life to be the dynamical regime in which complex lifelike entities could or would emerge [10, 11]. Subsequent work established a simple single measure ECA that correlates with the dynamical behavior and Wolfram class of cellular automata [11].1 The lambda value is defined as the fraction of rules in a cellʼs transition function that result in the cell being assigned a non-quiescent state at the next time step. Thus, ECA always lies in the real interval [0.0, 1.0], with 0 indicating that all transition rules lead to the quiescent cell state, and 1 that all rules lead to a non-quiescent state. Of central interest here is that lambda values roughly order the four classes of cellular automata as illustrated in Figure 1, giving a clear structure to the space of cellular automata transition functions. From this perspective, the class IV cellular automata models supporting complex localized activity patterns most reminiscent of living systems can be viewed as a phase transition—the edge of chaos—between simple, fixed-point or limit-cycle configurations (order) and widespread chaotic configurations or disorder [11]. More recently, the significance of the edge-of-chaos regime has been further investigated explicitly or implicitly in other types of non-cellular-automata computational models. For example, examination of models inspired by ant colonies has revealed a variety of dynamical regimes as an activity gain parameter and an ant density parameter are varied, finding that values of these parameters defining the border region between ordered and chaotic dynamics tell where significant computational properties can emerge [22]. Other studies have determined that Boolean networks originally inspired by gene regulatory networks have an edge-of-chaos regime that appears when the average incoming connectivity has a critical value of 2, and have explored its relationship to information propagation and adaptation [6, 9]. Most recently, an increasing amount of interest has focused on investigating the edge of chaos in randomly connected recurrent neural networks. Similar to cellular automata and random Boolean networks, recurrent neural networks can exhibit fixed-point, limit cycle, and chaotic behaviors. While the occurrence of different dynamic behaviors and the phase transitions between them as network parameters change has long been of interest [3, 7, 16, 19], this interest has increased dramatically over the last few years, in large part due to advances in reservoir computing methods for processing time series data [8, 14]. In this paradigm, a randomly connected recurrent neural network—the reservoir—is used as a “hidden layer” that is driven by a temporal sequence of input signals, and that in turn drives the networkʼs adaptive outputs. Such networks have been found to excel at learning to process time series data during prediction and classification tasks while only requiring learning on reservoir-to-output connections. Several studies have now shown that operating randomly connected or weighted recurrent neural networks in the edge-of-chaos regime typically improves the networkʼs computational properties and ability to process information effectively [1, 2, 12, 15, 21]. While details vary, these results have proven to be true for networks composed of analogue rate-encoding neurons, binary-valued threshold neurons, and spiking neurons. Thus there is now a substantial body of work

1 Langtonʼs ECA for multi-state cellular automata was preceded by an “internal homogeneity” parameter for binary state cellular automata [5] and the parameter P for random Boolean networks [9].

56

Artificial Life Volume 21, Number 1

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

showing that the edge of chaos, originally studied in cellular automata and other aspects of artificial life, is an important operating regime for information processing in recurrent neural networks. However, the parameter spaces examined have generally been multidimensional, where the individual dimensions (parameters) used include the number of incoming connections, the variance of weights, the fraction of excitatory connections, the mean node bias values, the spatial scale of local connectivity, and weight magnitude gain factors. None of this past work has examined whether there is a single simple measure for neural networks that is analogous to Langtonʼs ECA for cellular automata. In this context, here we introduce a parameter ENN that is explicitly inspired by and analogous to ECA but is defined for recurrent neural networks. Our intent in doing this is not just to document the different regions in the space of neural networks that we consider, but also to explore whether a single measure like this can help identify critical regions that will yield maximum computational performance when randomly connected neural networks are used during information-processing tasks, such as with reservoir networks. We show through a series of systematic computational experiments that ENN generally orders the dynamical behaviors of randomly connected/weighted recurrent neural networks composed of linear threshold units in the same way that ECA does for cellular automata (Figure 1). Further, by extending this ordering to larger values of ENN than has typically been done with ECA and cellular automata (Figure 1, horizontal axis), we find that a second edge of chaos exists on the opposite side of the chaotic region that corresponds to larger ENN values. These basic results are found to hold under different assumptions about network connectivity, such as using fully connected versus sparsely connected networks, but vary substantially in their details. 2 Methods We first describe the neural networks and experimental methods used in this work, and then define the measure ENN used to order the space of neural networks. 2.1 Network Description We examine computationally the long-term dynamical behavior of a variety of recurrent neural networks having randomly connected/weighted connectivity between N nodes. Four different types of connectivity were used in different computational experiments: full connectivity (every node is connected bidirectionally to every other node), random partial connectivity (every node receives k < N input connections from other randomly chosen nodes), local connectivity (every node receives k < N input connections from its k nearest neighbor nodes), and local-plus-random connectivity (every node receives k < N input connections from other randomly chosen nodes and also from its k < N nearest neighbor nodes). We include the study of networks with random partial connectivity because they can be related to the sparsely connected networks used in some historically significant random neural network investigations [3, 7] and they are also reminiscent of the sparse connectivity often used in contemporary reservoir computing. We also study networks with local connectivity, both with and without associated random “long range” connections, because this is reminiscent of the connectivity in the cerebral cortex and several past brain-inspired cortical models. Such local connectivity is also most similar to what occurs in cellular automata, where a cellʼs transition function is based on local neighborhoods [11, 20, 23]. Network weights in general are a mixture of randomly assigned excitatory and inhibitory values, but in any given simulation are uniform in the sense that all excitatory weights have the same positive value and all inhibitory weights have the same negative value. Two parameters fp and wp determine how weights are assigned. Whether each weight is excitatory or inhibitory is randomly and independently determined so that a connection is excitatory with probability fp, and inhibitory with a probability 1.0 − fp. Thus fp represents the expected value of the f raction of connections in a network that are positive (excitatory). Each connection randomly selected to be inhibitory has a weight of −1.0, while each excitatory connection in a given simulation has a weight wp, where wp can take on Artificial Life Volume 21, Number 1

57

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

positive values smaller than, equal to, or larger in magnitude than 1.0 in different simulations in order to adjust the relative amount of excitation and inhibition in any given network. In the simulations involving networks having local connections, the N neurons in the network are viewed as forming a one-dimensional lattice having periodic boundary conditions where each neuron has k/2 connections to the closest neighbor nodes on each side. In this situation, the excitatory connections, whose number was determined stochastically based on fp, are allotted to the closest neighbors, and the more distal neighbors are allotted the remaining inhibitory connections. This arrangement is inspired by the Mexican hat pattern of lateral connectivity observed in the mammalian cerebral cortex and often used in computational models of the cortex. Nodes in the networks that we study are always all typical linear threshold units. We select this type of neuron activation function because it is one of the most basic, has been widely used, and is easy to relate to cellular automata models having binary-state cells. In our networks, an arbitrary node i has an activity level designated ai of either 0 (off) or 1 (on). The input ini to node i at any point in time t is given by ini ¼

X j

wij aj þ bi

ð1Þ

where wij is the weight on the connection from another node j to node i, and bi is a fixed bias value. The activation level of each node i at time step t + 1 is then defined as ai(t + 1) = 0 if ini(t) < 0, and ai(t + 1) = 1 otherwise; that is, a threshold of zero is always used. We use a very small, fixed bias bi = −0.0001 in all simulations so that nodes will be off if they have a zero or negligible input.

2.2 Experimental Methods We conducted a systematic series of computational experiments where, in each single experiment, one of the four types of connectivity (full, random partial, local, or local-plus-random) and one set of specific fp and wp parameter values are used. Across different experiments, the parameter fp, which determines the probability that a connection is positive or negative, is systematically varied from 0.0 to 1.0 in steps of 0.025. For each fp value, the ratio of positive to negative weight magnitudes wp is also systematically varied from 0.0 to 2.0 in steps of 0.05; the negative weights are always equal to −1.0, as we are primarily interested in the relative amounts of excitation and inhibition in a network. In any single simulation, the activation value ai is randomly initialized to 1 or 0 with equal probability, so the expected number of on nodes initially is N/2 (using other initial states might sometimes produce different results). Because of this and the random nature of connection and weight assignments during network construction, each computational experiment (i.e., each specific combination of type of connectivity, fp value, and wp value) is run 50 times with different random number streams for each simulation. All simulations are generally run for a maximum of tmax = 1000 time steps using N = 100 neurons in each simulation. If a networkʼs activity during a simulation reaches a fixed point or limit cycle, this is automatically recorded and the simulation is terminated early at that point solely for computational efficiency. When networks have partial or local connectivity, we always use k = 10 connections per node.2 Every time a single neural network is run, generally up through 1000 time steps, its final attractor state is determined. If, by the final time step, all of the neurons are off, the networkʼs activity is classified as extinguished (corresponds to Wolfram class I). If all of the neurons are turned on, the networkʼs activity is classified as saturated (also a uniform outcome reminiscent of class I behavior ). If, during a run, the networkʼs activity state no longer changes from time step to time step, it is considered to have reached a fixed point where its activity has been neither extinguished nor saturated 2 We also ran simulations having k = 50 connections per node, but these simulations did not show any remarkably different results qualitatively (they were essentially intermediate between those for full connectivity and for k = 10 random partial connectivity), so for brevity we just report the full and k = 10 connectivity cases here.

58

Artificial Life Volume 21, Number 1

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

(class II). If the networkʼs activity state repeats (i.e., it is cycling through a sequence of states), it is classified as having reached a limit cycle (class II). If the networkʼs dynamics did not match any of these types of attractors by time step tmax = 1000, it is classified as exhibiting a chaotic dynamics (class III).3 To approximately identify the edge-of-chaos regions (class IV) in our results, we generally designate a particular connectivity-fp-wp combination as being part of an edge-of-chaos region if (i) it lies in between regions where activity quickly reaches a fixed point or limit cycle (class II) in all 50 runs and regions where activity never does (class III) during all 50 runs; (ii) it takes a long time for transient complex activity patterns to reach a fixed point or limit cycle (but not longer than tmax); and (iii) among the 50 simulation runs for this specific connectivity-fp-wp combination, some runs reach a fixed point or limit cycle by tmax, while some do not and thus are classified as chaotic. As will be seen, these criteria generally lead to well-defined contiguous regions in the space of neural networks that we examine. 2.3 Defining a Lambda Parameter for Recurrent Neural Networks How can we define a lambda parameter ENN for neural networks that is analogous to ECA for cellular automata? The intent of the original measure ECA was to provide a single parameter that would naturally order the space of cellular automata transition functions (rule sets), separating this space into regions that have similar dynamics, and in particular identifying the conditions under which one might expect a complex dynamics to emerge [11]. As noted earlier, defining ECA to be the fraction of rules (transitions) that lead to a non-quiescent cell state was found to be a remarkably simple measure that, while not perfect, largely accomplished this task for 0 ≤ ECA ≤ 1 − 1/s, where s is the number of possible individual cell states. The task here is to define an analogous simple measure ENN for recurrent neural networks. Doing so is nontrivial in the sense that, unlike the individual cells in most cellular automata models, nodes in randomly connected/weighted neural networks like those we consider here each have a different local transition function: In some cases (due to the random connectivity), local neighborhoods do not vary or overlap in a systematic way between adjacent nodes as they do in cellular automata, and even if this is not the case, the fraction of each nodeʼs input connections that are excitatory versus inhibitory can also vary due to the random assignment of excitatory versus inhibitory weights, even for fully connected networks where all nodes have the same number of incoming connections from all other nodes. Thus, the neural networks that we study are closer to more general discrete dynamical networks such as random Boolean networks than they are to typical cellular automata. To define ENN, for any given connectivity constraint (full, random partial, etc.), consider a 2D space of neural networks whose dimensions are fp, the probability that a randomly selected connection is excitatory rather than inhibitory, and wp, the weight value assigned to excitatory connections. We hypothesize that a simple measure ENN based on the relative amounts of excitation and inhibition in a neural network can provide an analogous ordering of neural networks in this space to that provided by ECA for cellular automata. Specifically, let Sð fp; wpÞ ¼

XN XN i¼1

j¼1

wij

ð2Þ

be the sum of all weights, both excitatory and inhibitory, on the connections between nodes in a recurrent network that is based on specific fp and wp values. We assume that wij = 0 whenever there is no connection from node j to node i, as occurs in partially connected networks. The sum S can 3 This specific value of tmax, although large, is somewhat arbitrary, as simulations that have not repeated their activity state at this point in time could still possibly do so. It was selected as a tradeoff between running simulations a long time and maintaining computational tractability in the context of the large numbers of simulations that were run. As will be seen later (Figures 3–7), the vast majority of simulations either reached a fixed point or limit cycle within 100 time steps, or did not reach one at all. Of course, in spite of the enormous size of the state space (2100 possible activity states), it is finite, and this implies that any simulation must ultimately repeat its states.

Artificial Life Volume 21, Number 1

59

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

range from large negative to large positive values, but for finite network size N and finite weights, it is bounded by minimum and maximum values that we will designate Smin and Smax, respectively. Then for an individual recurrent neural network having specific fp and wp values, we define ENN ¼

Sð fp; wpÞ − Smin Smax − Smin

ð3Þ

as our candidate for ENN. Like ECA, the value of ENN lies in the interval [0.0, 1.0] and dictates a partial ordering on the space of recurrent neural networks. For the types of neural networks investigated here that are composed of linear threshold neurons, ENN is reminiscent of ECA in that ENN roughly measures the fraction of local transitions that would lead a node to be non-quiescent (i.e., to have a value ai = 1) rather than to be quiescent (ai = 0). This is because the linear threshold neurons as defined above determine their activation state to be non-quiescent iff their local excitatory input exceeds their local inhibitory input, which is in turn governed on average by the relative amounts of excitatory and inhibitory weights in the network. Further, we can be more specific about ENN for the types of networks considered here, regardless of which of the four types of connectivity (full, random, partial, etc.) are involved, by giving more concrete characterizations of Smin and Smax for the range of fp and wp values used. Specifically, S will take on the value Smin when all connections are inhibitory ( fp = 0.0) and all excitatory weights are zero (wp = 0.0), so Smin = S(0, 0). In contrast, Smax will occur when all weights are excitatory ( fp = 1.0) and these weights all take on their maximum value (wp = 2.0 for the range of weights we consider ), so Smax = S(1, 2). Thus, we can rewrite our definition of ENN as ENN ¼

Sð fp; wpÞ − Sð0; 0Þ Sð1; 2Þ − Sð0; 0Þ

ð4Þ

for the four types of network architecture and range of fp and wp values that are examined in the following computational experiments.

3 Results Our main results are that variations in ENN values order regions of varying dynamics in the space of neural networks in a fashion analogous to those observed with ECA in cellular automata. Further, edge-of-chaos regions can often be found, although their details can vary substantially depending on the type of connectivity that is present. We also describe the results of examining how a networkʼs Wolfram class correlates with its ability to serve successfully as a reservoir in a very simple time series learning task. 3.1 Lambda Behavior Figure 2a shows ENN values, multiplied by 100 and rounded for display purposes, for the fp-by-wp space of neural networks described above when random partial connectivity (k = 10) is being used. For example, when fp = 0.3 and wp = 0.3, then ENN = 0.13. As can be seen, ENN monotonically increases as one moves roughly diagonally from the upper left corner of this figure, where ENN is 0.0, to the lower right corner, where ENN is 1.0. Figure 2b shows a contour plot (generated using MATLAB) of these same ENN values that demonstrates this more clearly. Any straight or mildly concave or convex line running from the upper left corner to the lower right corner that is roughly perpendicular to each of the contours as it crosses them provides a path of gradually increasing ENN values. 60

Artificial Life Volume 21, Number 1

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

Figure 2. (a) Values of 100 × ENN for selected values of fp and wp. (b) Contour plot derived from (a), showing representative lines of constant ENN. Based on networks with random partial connectivity.

Thus, as it was designed to do, ENN provides a single simple measure that imposes a partial ordering on the neural networks represented by each point in this fp-by-wp space. Neural networks gradually transition from those dominated by inhibitory connections (upper left corner in both parts of Figure 2) to those dominated by excitatory influences (lower right corner), passing through intermediate regions where excitation and inhibition are roughly balanced (e.g., when fp = 0.5 and wp = 1.0, producing ENN = 0.33). While Figure 2 illustrates this for network architectures having partial random connectivity, qualitatively similar results are obtained for other network architectures (fully connected, locally connected, or locally-plus-randomly connected), but are not shown here for brevity. 3.2 Characterizing Network Dynamics We systematically characterized the dynamics of randomly connected/weighted neural networks represented by points throughout this fp-wp space for each of the four types of network connectivity (full, random partial, local, or local-plus-random). Given the quantization of the fp and wp scales that we used, and running 50 simulations with each having different random number streams for each fp-wp value pair, this represents a total of 336,200 independent network simulations (4 × 41 × 41 × 50). Artificial Life Volume 21, Number 1

61

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

Figure 3. Results from simulations with fully connected networks. (a) Final state of each network that occurred most often in 50 trials, where # = chaos, c = limit cycle, f = fixed point, 1 = saturated, 0 = extinguished. (b) Final state of a network only if it occurred every time in all 50 runs. Otherwise, a b is used if chaos never occurred (ended in limit cycles and fixed points), or a + is used if chaos ever occurs. (c) Average time for networks to reach their final state (same axes as in (a) and (b)). Black means ≤ 4 time steps, dark gray 5–25, medium gray 25–100, light gray > 100 but below tmax = 1000 time steps (a fixed point or limit cycle was found). White means chaotic dynamics mostly occurred (no fixed point or limit cycle reached prior to tmax). (d) Fraction of final states that occur most often, depending on ENN (horizontal axis), that have been linearly scaled to fall between zero (black) and one (white).

62

Artificial Life Volume 21, Number 1

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

Figure 3 shows the results obtained when fully connected networks are studied in this fashion. Figure 3a plots the network dynamics obtained as fp and wp are systematically varied, showing the different regions that are observed by labeling each entry with the result seen most frequently during the 50 runs at that point in fp-wp space. As one moves progressively from the upper left to the lower right (i.e., from ENN = 0 to ENN = 1), one sequentially encounters a region where all activity dies out (labeled with oʼs), a region where complex/chaotic dynamics dominates (#ʼs), regions where limit cycles (cʼs) and fixed-point attractors (fʼs) predominate, and finally a region where the network is saturated (1ʼs). Comparing this with how ENN varies in Figure 2, it is seen that, just as occurs with ECA and cellular automata, ENN provides a single measure that roughly indicates how regions with differing dynamics occur in these randomly weighted neural networks. Figure 3b makes the edge-of-chaos regions more evident (labeled with +). In Figure 3a, each label indicates the most frequent dynamics that occurs at its specific fp and wp values out of all 50 runs for that point in fp-wp space. In contrast, here in Figure 3b the same symbols o, #, and 1 label locations where for all 50 runs, activity is extinguished, chaotic, or saturated, respectively. Sandwiched in between these regions are other curving regions where a mixture of fixed-point and limit cycle attractors both are observed (labeled b) or a mixture of these attractors and chaotic behaviors are seen (labeled +) during different runs. Taking the latter regions (+ʼs) to roughly indicate the edge of chaos, we see that the edge of chaos lies on both sides of the region in which complex/chaotic dynamics occurs, that is, for both smaller and larger values of ENN than those occurring with complex/chaotic dynamics. This observation is reinforced by Figure 3c, which indicates via grayscale how long it takes for networks to reach a fixed-point or limit cycle attractor in fp-wp space. Regions corresponding to the edge of chaos (shaded gray) often have networks that, while they ultimately result in fixed-point or chaotic attractors, take progressively longer on average to reach these simple-dynamics final states as one gets closer to the chaotic region of dynamics. Finally, Figure 3d uses a grayscale to show the fraction of final states that occur most often for different values of ENN. Figure 4 shows the analogous results for partially randomly connected networks having k = 10 input connections per node. While the ENN values continue to identify a similar progression of dynamical regimes as one moves from upper left to lower right in Figure 4a and b, there are substantial changes in the location and sizes of the different regions compared to Figure 3. The regions of complex and chaotic dynamics have broadened and shifted significantly, areas that we view as edge of chaos regions (+ʼs) are greatly expanded at the expense of the region where solely chaotic dynamics occur (Figure 4b), and there is a clearer indication that regions where fixed-point attractors occur and those where limit cycles occur are more disjoint. As Figure 4c and d illustrate, much less of the fp-wp space is dedicated to uniform final activity states (extinguished or saturated regions) and more to broader regions with marginal or complex dynamics. Figure 5 shows the corresponding results for locally connected networks, again with k = 10. The values of ENN continue to show a progression of dynamical regions in fp-wp space, but there is now no region in which chaotic dynamics occurs at all, and hence no edge-of-chaos regions are identifiable. As seen in Figure 5a and b, all simulations terminate with either a uniform activity pattern (extinguished, saturated), other fixed-point attractor states, or limit cycles prior to reaching tmax = 1000 time steps. Still, the central regions of the space continue to take longer to reach their final simple attractor states (Figure 5c). Limit cycles have especially become more common (Figure 5d). Finally, Figure 6 shows the same set of results for networks composed of both local connections (k = 10) and partial random connectivity (k = 10), having 20 output connections per node. These results are largely in between those illustrated in Figures 4 and 5 for random partial and solely local connectivity networks considered separately. Regions in which complex and chaotic dynamics occur have returned, although chaotic behaviors never occur most frequently for any specific fp and wp values. A region that resembles the edge-of-chaos (class IV) regions in the results of Figures 3 and 4 described above (+ʼs) has greatly expanded (Figure 6b). Compared to when local connectivity is used alone, when random connections are added like this, in many cases it took much longer to reach final fixed-point or limit cycle attractors, consistent with the idea that much of the dynamics is more similar to the edge-of-chaos regime. Artificial Life Volume 21, Number 1

63

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

Figure 4. Results from the simulations having k = 10 random input connections per node. Same arrangement and notation as in Figure 3.

3.3 Influence of Dynamics on Learning Effectiveness Much of the resurgence of interest in the dynamics of randomly connected/weighted recurrent neural networks during the last decade has arisen because of the use of such networks as reservoirs in the processing of time series data [8, 14]. The problem in this situation is to determine a priori what conditions a reservoir should have to function optimally. As a small step in examining this problem in the context of ENN and the edge-of-chaos dynamics, we undertook a separate systematic 64

Artificial Life Volume 21, Number 1

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

set of computational experiments in which each of the four types of network architectures is used, for varying fp and wp values, as a reservoir for a very simple time series learning task. The recurrent network structures used as reservoirs are unchanged from those described above (100 linear threshold neurons, etc.), and the procedures used are largely unchanged, except as follows. Now

Figure 5. Results from the simulations having 10 localized connections per node. Same arrangement and notation as in Figure 3, except in (c), where scaling is modified so that black means ≤ 4 time steps, dark gray 5–10, medium gray 10–20, light gray 20–30, and white > 30 (the last does not appear in this image).

Artificial Life Volume 21, Number 1

65

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

Figure 6. Results from the simulations having 10 localized connections per node plus 10 more random connections. Same arrangement and notation as in Figure 3.

each run has two additional input nodes fully connected to each reservoir node, and three output nodes that receive incoming connections from every reservoir node. The weights between the input nodes and the reservoir are arbitrarily assigned random values from 0.0 to 1.0, while the weights from the reservoir to the output nodes are assigned initial values from −1.0 to 1.0. Further, some preliminary runs indicated that for networks where activity would normally die out to zero at every node (extinguished) in the absence of external inputs, incoming activity to the reservoir nodes from 66

Artificial Life Volume 21, Number 1

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

the input nodes now prevents this from occurring. For this reason, a stronger fixed −1.0 bias is used at all reservoir nodes in all simulations during learning. Each output node also has a bias value initialized to be a random value between 0.0 and 1.0. The simple, arbitrary task to be learned is that when the first input node is on, the first output node alone should turn on, while if the second input node is on, the second and third output nodes (but never the first output node) should alternately turn on and off at each time step. During learning, one input node was on alone for 16 time steps, then the other was on alone for 16 time steps, and then this alternating pattern continued for the entire remainder of the run. All weights in the extended network remain fixed during learning except those from the reservoir to the output nodes; only these latter weights and output node biases are altered during learning. Network training is done using an iterative least-mean-square (delta rule; perceptron learning rule) training method where the change in weight wij is given by Dwij ¼ Daj ðti − ai Þ

ð5Þ

where i indexes an output node, j indexes a reservoir node (or the bias), D is a fixed learning rate (0.1 in our runs), ti is the target (correct output value), and ai (aj) is the corresponding actual value of the output (reservoir ) node. Starting at time step t = 1, all runs are allowed to run longer than in the previous experiments, for 6400 time steps per run (at which time the final Wolfram class was determined), to allow extra time for learning. For this reason and because of the increased computations per run required for weight changes, only one run is done for each pair of fp and wp values (rather than 50) to minimize computational costs, resulting in a total of 6,724 additional independent simulation runs (4 × 41 × 41). Weight changes were made incrementally, that is, after each time step of a run. Figure 7 summarizes the results of training the four types of networks examined in this study. The grayscale here indicates the length of time required to reach an end state classification of each networkʼs dynamics, determined as in the preceding computational experiments. While this is similar to what was displayed in part (c) of the preceding four Figures 3–6, the details of the resulting plots differ in each case because only a single run (rather than an average of 50 runs) is being described for each pair of fp and wp values, and because the input node activities influence the reservoirʼs activation states. The stars in these four images indicate those points in fp-wp space for which training produced a neural network that achieved 100% correct output node activation levels by the time training is complete. As can be seen by comparing the locations of these successful training outcomes with the regions that apparently represent edge-of-chaos dynamics here (and in the corresponding parts b and c of the preceding figures), learning generally but not always is most effective when a network is in the edge-of-chaos regime. 4 Discussion There has been long-standing interest, which continues today, in developing measures that can characterize the dynamical behaviors of complex cellular and network systems. The measures that have been studied, and the systems that they have been applied to, are quite diverse. For example, Wuensche has introduced and studied a parameter Z based on the convergence of dynamical flows in cellular automata state space and suggested that it characterizes the mechanism underlying ECA, the latter being an approximation of Z [24, 25]. Measures have also been developed for random Boolean networks [9], including the use of Fisher information [18], to distinguish their ordered, critical, and chaotic regimes. Lyapunov exponents characterizing the expansion rates of perturbations to node activities have been defined for similar purposes [13]. Further examples and discussion can be found in a recent review [4]. Our results in this article add to this growing list of potentially useful measures of complex system dynamics. Artificial Life Volume 21, Number 1

67

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

Figure 7. Average amount of time for networks to reach their final states for networks that are (a) fully connected, (b) randomly partially connected, (c) locally connected, and (d) both locally and randomly connected. All four panels are similar to those in parts (c) of the corresponding Figures 3–6, respectively, but now are for a single run for each pair of fp and wp values. Further, now stars have been added to label which networks reached one hundred percent accuracy during learning.

In the work reported here, we have examined the effectiveness of a single simple measure ENN in characterizing the dynamics of recurrent neural networks of linear threshold neurons. As with the original lambda parameter ECA for cellular automata models [11], we found that ENN strongly correlates with the Wolfram classes of dynamics, extended to apply to recurrent neural networks. For fully connected and random partially connected networks, ENN orders these classes in a similar way. This is not particularly surprising in that ECA measures the fraction of state transitions (rule table entries) producing non-quiescent cell activity, while ENN, in measuring the relative amounts of excitation and inhibition in a network, also effectively determines how many local activity patterns will be mapped to “on” neurons (non-quiescent neurons). Put otherwise, ENN can be viewed as predicting the approximate fraction of local states that will lead to non-quiescent neurons in a network. Thus ENN as described here expands the range of neighborhood structures to which the 68

Artificial Life Volume 21, Number 1

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

lambda concept applies, from the purely local ones used in cellular automata to others, such as fully connected or randomly distributed neighborhoods. While ENN orders the regions of differing dynamics in the space of neural networks in a fashion that one might expect, we found that like ECA, its absolute numerical value is of limited usefulness in predicting a priori the dynamics of a specific individual neural network. For example, given a single ENN value, two specific neural networks with the same connection architecture but different random excitatory and inhibitory weights and different initial states might end up in different types of attractors over time. If one also changes the connectivity distribution in these two networks, this ambiguity is substantially increased. For example, one distribution might fairly quickly result in a periodic attractor, while another might continue to exhibit chaotic activity over a long time span. When the range of ENN values examined in fully or partially connected networks includes larger values of ENN, one finds that there can be a second edge-of-chaos region and a progression of dynamical regimes for large ENN values that are ordered in a complementary, mirror image fashion to those seen with smaller ENN values, as illustrated in Figure 8. This can include a second edge-of-chaos region associated with larger ENN values that is sometimes broader and more evident than that associated with smaller ENN values (see parts b and c of Figures 3 and 4, for example). The significance of the second edge-of-chaos region is unclear at present, but we would expect that it would exhibit the same complex behaviors of interest in artificial life and potentially the same computational universality as the first. To our knowledge, the upper range of ECA values has not yet been mapped out systematically for cellular automata (e.g., [20]), but a similar second region of complex edge-of-chaos dynamics has been suggested to exist for at least binary-state cellular automata should one do such a mapping [17]. For example, the patterns of widespread non-quiescent cells (suggestive of high ECA values) seen along with the propagation of particles or signals during the performance of computational tasks by some cellular automata transition functions discovered using genetic algorithms [17] suggests to us that examples of this second edge-of-chaos region may already be known. It is not clear at this time, for neural networks or cellular automata with k > 2 possible neuron or cell states, whether a second edge-of-chaos region will be found. As others have observed with random networks, we found that the number of connections per node significantly affected network dynamics, although we only considered this in a limited way. The effect of the number of connections can be seen by comparing the dynamical regions of fp-wp space for fully connected networks (Figure 3) and those for random partially connected networks having 10 connections per node (Figure 4). Both the locations and the sizes of these dynamical regions differed substantially. However, in both cases ENN still ordered the regions in a similar fashion, from uniform through complex/chaotic and back to uniform. Not only the number of connections per node, but also their distribution, proved to be important in determining network dynamics. This can be seen by comparing the dynamics of networks having 10 connections per node when those connections are random (Figure 4) with that when they are localized (Figure 5). As discussed earlier, the latter networks were included in this study because they

Figure 8. CA: ordering of cellular automata dynamical regions using ECA as previously reported [11]. NN: ordering of neural network dynamical regions using ENN as found in this current study with fully connected and random partially connected networks. The term “uniform” on the left refers to where all activity has been extinguished in both cellular automata and neural networks, while on the right it refers to saturation (all nodes on) in neural networks. WC = Wolfram class, where class I is taken to also include uniform saturated networks.

Artificial Life Volume 21, Number 1

69

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

are reminiscent of local connectivity in the cerebral cortex, where lateral connections are much more likely to occur between physically neighboring cortical elements than between distant ones. Even though the same number of connections were present, when they were localized in a Mexican hat configuration (excitatory connections to closest neighbors, inhibitory to next closest), neither complex nor chaotic dynamics was ever convincingly observed. Interestingly, this suggests by analogy that the relative lack of usefulness of ECA that has been observed for cellular automata having only two possible states per cell [11] may be due primarily to the locality of cell neighborhoods, rather than to their size or binary cell states as is sometimes assumed. This prediction could be easily tested experimentally. Further, when localized and random connectivity are combined, effectively producing a small-world network that more closely resembles biological cortical connectivity, complex and chaotic dynamics reappear, although somewhat damped in nature. Among the four network architectures that we considered, the networks with 10 random connections per node are most closely related to the sparse, randomly connected and weighted networks used as reservoirs in reservoir computing models. Interestingly, when we used this type of network as a reservoir for a simple learning task, almost all successful cases of network training occurred when the reservoir network fell into the edge-of-chaos regions of the fp-wp space (Figure 7). This includes the edge-of-chaos region associated with larger as well as smaller values of ENN. This is consistent with evidence provided by others that networks using other types of node activation functions (than linear threshold neurons) yet still having edge-of-chaos dynamics are most effective as reservoirs during learning [1, 2, 12, 21, 22], although there are dissenting views [26]. We conclude that ENN is a useful measure for qualitatively ordering the space of neural networks. However, its potential effectiveness in practice is limited in that there is not one crisp value of ENN that can be used to predict, for every specific neural network, the precise dynamics that will be observed. All that ENN can do is to suggest a reasonable starting range of network parameters to explore in a given specific situation. This and ENNʼs ability to qualitatively order the space of neural networks based on their dynamics suggests that further study of ENN and related measures for other types of networks will prove to be useful. Further, ENN may have broader applicability than envisioned here in that with some neural network models (e.g., basic Hopfield networks), the desired network behavior is not found within the edge-of-chaos regime. Determining which dynamics is best for a specific situation depends on the relative need for exploration versus exploitation of the activity dynamics space. This is another area where further work may prove useful. References 1. Bertschinger, N., & Natschlager, T. (2004). Real-time computation at the edge of chaos in recurrent neural networks. Neural Computation, 16, 1413–1436. 2. Busing, L., Schrauwen, B., & Legenstein, R. (2010). Connectivity, dynamics, and memory in reservoir computing with binary and analog neurons. Neural Computation, 22, 1272–1311. 3. Dammasch, I., & Wagner, G. (1984). On the properties of randomly connected McCulloch-Pitts networks. Cybernetics and Systems, 15, 91–117. 4. Fernandez, N., Maldonado, C., & Gershenson, C. (2014). Information measures of complexity, emergence, self-organization, homeostasis, and autopoiesis. In M. Propenko (Ed.), Guided self-organization: Inception (pp. 19–51). New York: Springer. 5. Gelfand, A., & Walker, C. (1984). Ensemble modeling. New York: Marcel Dekker. 6. Goudarzi, A., Teuscher, C., Gulbahce, N., & Rohlf, T. (2012). Emergent criticality through adaptive information processing in Boolean networks. Physical Review Letters, 108, 128702. 7. Griffith, J. (1971). Mathematical neurobiology. New York: Academic Press. 8. Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304, 78–80. 9. Kauffman, S. (1993). The origins of order. New York: Oxford University Press. 70

Artificial Life Volume 21, Number 1

J. Seifter and J. A. Reggia

Lambda and the Edge of Chaos in Recurrent Neural Networks

10. Langton, C. (1990). Computation at the edge of chaos: Phase transitions and emergent computations. Physica D, 42, 12–37. 11. Langton, C. (1991). Life at the edge of chaos. In C. Langton et al. (Eds.), Artificial life II (pp. 41–91). Redwood City, CA: Addison-Wesley. 12. Legenstein, R., & Maass, W. (2007). Edge of chaos and prediction of computational performance for neural circuit models. Neural Networks, 20, 323–334. 13. Luque, B., & Sole, R. (2000). Lyapunov exponents in random Boolean networks. Physica A, 284, 33–45. 14. Maass, W., Natschlager, T., & Markram, H. (2002). Real-time computing without stable states. Neural Computation, 14, 2531–2560. 15. Markovic, D., & Gros, C. (2012). Intrinsic adaptation in autonomous recurrent neural networks. Neural Computation, 24, 523–540. 16. McFadden, F., Peng, Y., & Reggia, J. (1993). Local conditions for phase transitions in neural networks with variable connection strengths. Neural Networks, 6, 667–676. 17. Mitchell, M., Hraber, P., & Crutchfeld, J. (1993). Revisiting the edge of chaos: Evolving cellular automata to perform computations. Complex Systems, 7, 89–130. 18. Prokopenko, M., Lizier, J., Obst, O., & Wang, X. (2011). Relating Fisher information to order parameters. Physical Review E, 84, 041116. 19. Reggia, J., & Edwards, M. (1990). Phase transitions in connectionist models having rapidly varying connection strengths. Neural Computation, 2, 523–535. 20. Schiff, J. (2008). Cellular automata (p. 78). Hoboken, NJ: Wiley Interscience. 21. Snyder, D., Goudarzi, A., & Teuscher, C. (2012). Finding optimal random Boolean networks for reservoir computing. Artificial Life, 13, 259–266. 22. Sole, R., & Miramontes, O. (1995). Information at the edge of chaos in fluid neural networks. Physica D, 80, 171–180. 23. Wolfram, S. (1984). Cellular automata as models of complexity. Nature, 311, 419–424. 24. Wuensche, A. (1999). Classifying cellular automata automatically. Complexity, 4(3), 47–66. 25. Wuensche, A., & Lesser, M. (1992). The global dynamics of cellular automata. Reading, MA: Addison-Wesley. 26. Yildiz, I., Jaeger, H., & Kiebel, S. (2012). Re-visiting the echo state property. Neural Networks, 35, 1–9.

Artificial Life Volume 21, Number 1

71

Copyright of Artificial Life is the property of MIT Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Lambda and the edge of chaos in recurrent neural networks.

The idea that there is an edge of chaos, a region in the space of dynamical systems having special meaning for complex living entities, has a long his...
3MB Sizes 0 Downloads 7 Views