Journal of Neuroscience Research 1:287-313 (1975)

ASSOCIATIVE RECALL AND FORMATION OF STABLE MODES OF ACTIVITY IN NEURAL NETWORK MODELS Holger Wigstrom Chalmers institute of Technology, Division o f Information Theory, Goteborg, Sweden

Models of neural networks with recurrent inhibition are studied, as well as one model which also includes recurrent excitation. The models are intended as possible descriptions of the cerebral cortex. Each network model is composed of neuron models called pyramidal cells and stellate cells in accordance with the names of two types of cells in the cortex. Inputs and outputs of the network are connected t o the pyramidal cells while feedback is provided b y the stellate cells. Connections within the network are random. During a learning phase t h e pyramidal cell excitatory synapses become facilitated according t o a two-conditional facilitation rule. This is the basis of t h e model’s ability for associative learning. The associative retrieval of information can be studied during a subsequent association phase. This has been done b y simulation on a digital computer. It was shown that all of the models considered can be designed t o perform a so-called decision-making function. This means that if the associating input pattern is similar t o several patterns which occurred during learning t h e model can decide which similarity is greatest by responding with the appropriate associated pattern. The model also including recurrent excitation differs from the simpler models i n t h a t it can become stabilized in so-called stable modes of activity which are self-sustaining and remain even after the input has been turned off. Normally, only one stable mode can be active at a time. However, through careful choice of construction parameters it was possible t o obtain a model in which a maximum of two stable modes could be activated independently of each other. Physiological and psychological interpretations are discussed and so are t h e limitations of the models, which are evident in certain situations. INTRODUCTION

In a previous paper by the author (Wigstriim, 1973) it was shown how a neuron model based upon two-conditional facilitation of synapses and random connections H. Wigstrijm is now a t University of Gijteborg, Department of Physiology, Fack, S-400 3 3 Gijteborg 33, Sweden.

287

0 1975 Alan R. Liss, Inc.,

1 5 0 Fifth Avenue, New York,

N.Y. 10011

288

W igstrijm

could be used in an associative memory. An interesting observation was that the model’s inhibitory input pattern is subject to so-called pattern separation. By this is meant that similarity between patterns is decreased during a transformation. This phenomenon had in fact been observed earlier by Rosenblatt (1962), although in a different situation. In a subsequent paper (Wigstram, 1974) a similar neuron model was used in a network with recurrent inhibition intended as a model of the cerebral cortex. The fact that patterns could become associated with themselves through a circuit in which pattern separation occurs gave this model an essential property beyond being an associative memory. It was shown that the output obtained through an associative recall is made up of a particular pattern which occurred as an output during a certain learning event. This learning event is characterized by the presence of an input pattern with maximal similarity to the pattern used to evoke the associative recall. A function of this kind will be referred to as a decision-making function because the selection of a maximal similarity condition can be considered as a decision-making process. The mathematical method used to handle the above-mentioned neural network model and which is also t o be used in the present report implies that transformation of patterns need only be considered in terms of the total activity of patterns and similarity between patterns. This idea was also used by Rosenblatt (1962), while Klemera (1968) and Amari (1972) used only the total activity of patterns as a state variable. Similarity between signal patterns is a useful variable when memory mechanisms are studied since it makes possible a comparison of the present situation with the past. The present report is a further development of ideas presented in the two papers discussed above (WigstrBm, 1973, 1974). Three neural network models will be considered, the first being identical with the above-mentioned model of a neural network with recurrent inhibition. The reason for reexamining this model is that the original mathematical treatment (Wigstram, 1974) was carried out completely only in a case with simplifying assumptions. A more general approach is made in the present paper. The second model is a slight extension of the first, while the third is a further extension in which recurrent excitation is also included. It will be shown that the third model has some features not present in the first two, for instance, the ability to get into states of self-sustaining activity.

GENERAL REMARKS

We shall study three neural network models, which will be referred to as M-1, M-2, and M-3. All three models can be considered as possible descriptions of the cerebral cortex, though this interpretation is not necessary. This section of the paper deals with some general characteristics of the network models, most of it being a description of the constituent neuron models which are common in type for M-1, M-2, and M-3. The neuron models can be considered as models of pyramidal cells and stellate cells, which are the two main types of neurons in the cerebral cortex (Sholl, 1956). These as well as other neuroanatomical terms will sometimes be directly applied to the models. There should be no difficulty in deciding what is actually meant. In the following description of the neuron models Fig. 1 will sometimes be referred to though this figure is actually an illustration of the M-1 model which will be described later. The pyramidal cell model has been described in a previous paper (Wigstrtim, 1974),

289

Associative Recall in Neural Network Models

while most of its theoretical basis originates from a still earlier paper (WigstrEim, 1973). For this reason the following description of the pyramidal cell model will partly be a review and partly a mere citation of previous results. The input consists of a pattern X of excitatory signals, a pattern Uof inhibitory signals, and so-called unspecified signals S which may include both excitatory and inhibitory action (see Fig. 1). The output is a scalar quantity denoted by y. Summarized below are the main features of the model: (1) Connections t o the synapses are random. (2) The distributions of excitatory as well as inhibitory synapses can be described by Poisson processes. (3) Activity at inhibitory synapses will cut off certain excitatory synapses from signal connection with the cell body. This is the basis for a temporary division of the excitatory synapses into an active and a passive group. (4) At certain learning events during the so-called learning phase the active excitatory synapses can become facilitated as a result of simultaneous activity in the cell body and at the synapse in question (two-conditional facilitation). During the so-called association phase the unspecified signals S are supposed to be zero. The output (y) is generated by the signals arriving at the active excitatory synapses. More precisely, the output is obtained by letting a weighted sum of these signals be formed which is then passed through a low pass filter and acted upon by a threshold function.

Because of the stochastic character of the pyramidal cell model its output y will be a stochastic variable. It was shown (Wigstrbm, 1974) that the expectation of the output during the association phase satisfies the equation K

The vector A is called the activity pattern and is the result of a stochastic transformation of the inhibitory pattern U. The superscript k is used to indicate that a signal or pattern occurred during learning event number k. Variables without superscript occur during the association phase at the time t. The total number of learning events is equal to K. C is a proportionality constant. It can be shown that the expectation E[A@) A] is a function of only IUk)I2, I UI2, and )'U U. The relation

-

-

has previously been used (WigstrGm, 1974) and will also be used in the present report. The expression is valid for a cell called type 1 (Wigstrcm, 1973), which is a cell with a certain kind of synaptic distribution. Now let us turn to the model for the stellate cells. Models of both excitatory and inhibitory type will be used. This classification refers to the type of action exerted upon other cells and does not actually affect the character of the stellate cell models. Therefore, in the following we can arbitrarily think of inhibitory cells and use the correspond-

290

Wigstriim

ing notations. In this case the input is a pattern y = ( y l , y2, . . . , yN ) while the output P is a scalar quantity U (Fig. 1). In the previous description of a neural network model (WigstrBm, 1974) the stellate cell model was not specified although the choice of a linear threshold unit was discussed. Now the model which will be used in the present report is a Poisson-tipe linear threshold unit, described by the equation N,

/ I if =

2

i= 1

wiyi>p (3)

10 otherwise

where the weighting factors wi are independent Poisson distributed stochastic variables with the common parameter X and the threshold p is a nonnegative fixed number. This could also have been derived as a result of the following more elementary assumptions, for example: ( I ) The connections to the synapses are random. ( 2 ) The distribution of synapses can be described by a Poisson process. (3) The output 1 is obtained if the sum (not weighted sum) of the signals arriving at the synapses exceeds a certain threshold value. It should be noted that these assumptions are very similar to the assumptions for the pyramidal cell model. It can be shown that the pattern transformation properties of a collection of N, Poisson-type linear threshold units is given by

ccc

p+r>M

[h(lYIIZ -Y1 -

* Y 2 ) l P l h ( l Y 2 l 2 -Y1

p! q ! I!

.Y2)1q[XYl * Y 2 1 r

(4)

q+r>b

where the subscripts 1 and 2 indicate two different events. The vector notation U indicates the pattern of output signals U, ea'ch of which is defined by an equation identical to (3) but with independent sets of weighting factors. A derivation of the formula is given in Appendix 1. An analogous expression (but with different notations) has been used by Rosenblatt (1962). Before going on with the descriptions of the different neural network models let us look at some conventions which will be used in the present report. Let us first discuss the use of vectors as mathematical notations for signal patterns. In fact, vectors were already used in equations (l), (2), and (4). To clarify how this is actually done let us consider a hypothetical pattern which can be described by the numbers v l , v2, . . . , v,. By v (in boldface) is then meant the vector with component representation ( v l , v2, . . . , vn). When referring to a component in v we can write vi, which means the component number i, or merely v which means an arbitrary component of the vector v. The word pattern will sometimes also be used in the meaning of the corresponding vector. All vectors used in this report have components which are either 0 or I . Because the models are of stochastic character we can often determine variables

29 1

Associative Recall in Neural Network Models

only in the form of expectations. The situation is more favorable, however, when the variable is a scalar product of two vectors. In all relevant cases the scalar products are sums of equally distributed independent stochastic variables (with the possible exception of terms which are identically zero). If the number of vector components is large the number of stochastic variables in the sum will be large and the expectation can be used as an approximation t o the scalar product itself (the variance is negligible). We will in fact assume that this is always the case since it is the only situation which can be handled by simple mathematical methods. For this reason we shall usually omit to write out the expectation operator E when the variable considered is a scalar product. The equations will still hold with good accuracy. Still another convention concerns the indication of time dependence. Consider for instance equation (1) in which y is acted upon by the differential operator d

+ 1. To write

this and other equations in a simpler form and to obtain some other advantages as well the operator 2' =

(s+

I)-'

will be used. Equation (1) would then become

where the above convention for expectations has also been introduced. It is realized that2.k the integral operator which performs low pass filtering with time constant 1. An explicit expression for 2 can be determined but is not needed. M-1: A MODEL OF A NEURAL NETWORK WITH RECURRENT INHIBITION Description and Mathematical Treatment

A model of a neural network with recurrent inhibition will now be studied. The model, which will be called M-1, has also been described in a previous paper (Wigstrsm, 1974). The original mathematical treatment was carried out completely only in a case with simplifying assumptions; therefore, the model will now be reconsidered under more general conditions. The model is composed of N, pyramidal cell models and N, stellate cell models having the properties described in the preceding paragraph. The cells and their connections are shown schematically in Fig. 1. The cloudlike formations denote a random selection of signals which is statistically symmetrical with respect t o all individual signals within a signal pattern. It is seen that the input consists of the patterns X and S which are common to the pyramidal cells. The output y is formed by the pyramidal cell outputs y l , . . . , YNp. Feedback is provided by the stellate cells which transform the output y into the inhibitory pattern U which affects the pyramidal cells. During the learning phase associations are established between combinations of X@) and U@) and the simultaneous output patterns y@).The latter can be considered as determined by the unspecified signals s&). As was mentioned before the index k is a numbering of the learning events. It is assumed that the absolute values of the vectors X@) and of the vectors y(k)are the same in all learningevents, i.e., for all values of k. The y o vectors are assumed to be orthogonal. The interpretation of these assumptions is that the same amount of

292

Wigstrijm Excitatory pattern I X

I

Pyramidal cells

Output pattern

(yl

Fig. 1. The neural network model M-I. The figure also illustrates the similar model M-2. Only three neuron models of each type are shown though their actual numbers are large. Reproduced from Wigstrtfm (1974).

neural activity is evoked at each learning event but that separate groups of cells are involved. During the association phase the network is activated by the excitatory input pattern X while the unspecified signals S are zero. In the following we will study the model’s output during the association phase when a certain input pattern X is applied. As variables we shall use scalar products y@)* y describing the composition of the output pattern in terms of the output patterns w h c h occurred during learning. In the previous study of the model (Wigstr8m, 1974) the following equation was derived on the basis of (1):

where, k = 1 , 2 , . . . , K. Using the conventions mentioned in the preceding paragraph the equation can be written in the form y@) y = 2 Cly’)

(X’)

*

X)(A@’) * A)

(7)

where, k = 1 , 2 , . . . , K.

-

Since we are interested in the relative sizes of the scalar products X@) X and y@) * y rather than their absolute magnitudes we introduce

Associative Recall in Neural Network Models

293

The quantities $& and vk are called input components and output components, respectively, since they describe the composition of input and output patterns. They take on values in the interval [0, 11. By using $k and 7)k equation (7) can be written

where, k = 1 , 2 , . . . , K. We have introduced the quantity I A&) I* at two places in the formula. The reason for doing this is that we can then identify the product C IX”)l2 IA*),l2. This was chosen equal to 1 in the original description of the model (WigstrEm, 1974). To make the present study more general the parameter co = CIX*)l2 is introduced, co being a number which is not necessarily equal to 1. However, in the preliminary study described in this paragraph we will use the special choice co = I .

It was shown (WigstrBm, 1974) that A”) and^(^) * y :

- A (actually its expectation) can be writ-

ten as a function of Iy”)I2, lyI2,

The function T was called a pattern transformation function. Since the number of ones in y is equal to the sum of the numbers of ones which are common to each of the patterns yo 1 is used in the equations without any other changes. The simplest way to handle this would be to put nonlinear limiting conditions on y and q k . For example, ( 5 ) and (14) could be changed into the following equations: K

E [y]

= @(gC

where, k = I , 2 , .

k = l

y(k)(X(k) X)(A@) * A))

. . , K.

The function @ is defined by

@(x) =

xif0GxG 1 1 ifx>l

It can in fact be shown (see Appendix 2) that the above equations provide the correct descriptions of the pyramidal cell model and the network model for any nonnegative value of co (under one moderate condition). The network model so generalized will be called M-2 to distinguish it from M-1 which corresponds to the special choice co = 1. The situation which is most interesting is the so-called overlearning case, that is, when co > 1 . In Appendix 2 it is also shown that with a slight change of the pyramidal cell model can be chosen as any increasing function with values in the interval [0, 11. However. because of its simplicity expression (23) will be used for @ in the following. Finally, let it be said that equation (22) is completely determined by the four parameters cg. c l , c 2 , and c3.

299

Associative Recall in Neural Network Models

Computer Simulation of the M-2 Model

To simulate the M-2 model equation (22) was used, rewritten as a difference equation. Computer simulations analogous to those of Fig. 2 were carried,out but with the overlearning factor co = 2 instead of 1, The results are shown in Fig. 5 . It is seen that the slight overlearning produced a decsion-making function for 10 parameter combinations, while for co = 1 this was obtained only in one case. c2

'1

L?

'1

1

0

2

3

1

n

[L

Fig. 5. The M-2 model: Stable state solutions. The figure is analogous t o Figs. 2 and 4. It is seen that the decision-making function is obtained in 10 cases. K =4; , E 2 , t 3 , t q ) = (0.9, 0.8,0.5, 0.3); (co, ~ 3 =) (2, 8).

The output components as time functions are shown in Fig. 6 for a typical case with the decision-making function. The sharp knee in the curve for q 1 is due t o a similar knee in the @-function. A smoother curve could have been obtained by choosing a different expression for a. It is seen that the complete pattern y(') is retrieved even though the entire pattern X(') is not present. This is a characteristic which cannot be achieved for the M-1 model since co > 1 is required.

300

Wigstrtim

A

0 0

I

I

I

10

20

30

*t

Fig. 6 . The M-2 model: Input and output components as functions of time. It is seen that the complete y(l)-pattern is retrieved even though the entire X(l)-pattern is not present. K = 4; (co, c1, C2, c3) = (2,2,2,8).

It is seen that the parameter combinations which produce the decision-making function both for the M-1 and for the M-2 model are those which have approximately equal values of c1 and c2. This could be expected since in these cases a moderately large fraction of the stellate cells is active. M-3: A MODEL OF A NEURAL NETWORK WITH BOTH RECURRENT INHIBITION AND EXCITATION Description and Mathematical Treatment

We would like to study a model of a neural network with both recurrent inhibition and excitation. In order to construct such a model we start with the M-2 model previously developed and see how this can be modified to be used in the present case. To obtain the extra input which is now required it is assumed that the excitatory input pattern consists of two parts, X and X'. X is used as an input to the network while X'is connected to the output of N i excitatory stellate cells which provide the positive feedback. It is thus assumed that all positive feedback is mediated by excitatory interneurons. The network is shown schematically in Fig. 7. As a model for the excitatory stellate cells we choose a model identical to the one used for the inhibitory stellate cells but with construction parameters which may be different. The output X' of an excitatory stellate cell is thus given by N,

x'=

1 if \

i: w f y i > p '

i = l

t 0 otherwise where wf are independent Poisson-distributed stochastic variables with the parameter A', while p ' is a fixed threshold.

Associative Recall in Neural Network Models

30 1

Excitatory pattern ( X I

lnhibitory

Excitatory

Stellate cells

Pyramidal cells

t t Output pattern ( y )

Fig. 7. The neural network model M-3. The model can be considered as derived from the M-2 model by addition of an excitatory feedback link. Only two neuron models of each type are shown, though their actual numbers are large.

The equation analogous to (7) in the case of the M-2 model is given by (A2.12) in Appendix 2. The method used to introduce positive feedback implies that the scalar product XR) X in (A2.12) should be replaced by the sum X*) X X ' O * X':

-

- +

Asbefore weset Ek = X @ ) * X/IXn012,qk = Y ( ~ ) ~ y / l y ~ ~ 1 2 , a n d c g = C l ~ ) 1 2 1 A @ ) 1 2 Further, if ch is introduced to denote the expression C)X'*)I2 we get the following equation:

where, k = I , 2,

. . . , K.

-

Previously we expressed AN) A in terms of a pattern transformation function T. Similarly, X'@) * X' can be expressed in terms of a function T'. We thus have

302

Wigstriim

where I y I is given by K

lY12 =

c

n=l

Y(”). Y

T has been defined previously by (18), (19), and (20), while T‘ is defined by an expression analogous to equation (20) alone:

where, k = 1, 2, . . . , K. The last quotient in (31) was denoted by 0 in the previous theory. Analogously we can use 0’to indicate the first quotient. We then obtain the following equation for the M-3 model:

where, k = 1 , 2 , . . . ,K. We have previously defined the parameters c l , c 2 , and c3 which describe the function 0. It can easily be shown that 0’can be described by only two parameters. In analogy to c1 and c2 we introduce c; = h’ (y&)l2 and c; = p‘. Equation (32), which describes the M-3 model, is thus completely determined by seven construction parameters, namely, co, c l , c2, c3, c;,, c; ,and c i . It is realized that the M-2 model can formally be obtained as a special case of M-3 by setting c;, = 0. Since M-1 is likewise a special case of M-2 it is obvious that equation (32) can be used to describe all the models considered in this report. Computer Simulation of the M-3 Model

The M-3 model was simulated on a computer on the basis of equation (32). Since

Associative Recall in Neural Network Models

303

the equation depends on as many as seven parameters it is hardly possible to investigate the effects of all these. However, it appeared rather easy to construct models with the decision-making function by choosing c1 c2 and c; iy, c i and giving moderate values to the rest of the parameters. Let us therefore consider only one typical example of the M-3 model performing a decision-making function. The example is shown in Fig. 8.

4

1 -

I I

I

I

0

10

20

*t

30

Fig. 8. The M-3 model: Input and output components as functions of time. It is seen that the stable state obtained remains even after the input has been turned off. This kind, o f f stable state in a network is called a stable mode of activity. K = 4;(CO,c1, c2, c3, c 0,cl , c;) = ( 2 , 2 , 2 , 8 , 2 , 2 , 2 ) .

One interesting feature seen in Fig. 8 is that the network maintains its ouput even after the input has been turned off. This state remains indefinitely unless it is prevented in some way. Such a stable condition in a network will be called a stable mode of activity. It is true that this term was also used in the previous study (Wigstriim, 1974) in referring to a network with recurrent inhibition, but the term is probably more adequately used if restricted to situations with self-sustaining activity such as can be achieved for the M-3 model. It can be shown that self-sustaining activity is possible when c i 2 1 . Additional computer tests showed that the network will go into a stable mode of activity even if the input is turned off before the stable condition is reached. If the input is turned off too early, however, the activity of the network will return to zero. The excitatory feedback link is in many respects similar to the inhibitory link and it is natural to ask whether the model could operate in the absence of the inhibitory link. In order to investigate this the model in Fig. 8 was tested once again, but with c3 = 0. Setting c3 = 0 is equivalent to cutting the inhibitory feedback link. From Fig. 9 it is seen that although only one nonzero input component was applied all four possible modes became active. Obviously inhibition is essential in M-3 t o suppress modes which are not wanted. The question then arises whether it is possible to construct a model with weak inhibition in which several but not all possible modes can become active at the same time. By hand calculation aided by Rosenblatt’s tables (Rosenblatt, 1960) parameters could be

Wigstrijm

304

0.5

0

0

0.5

1

1.5

Fig. 9. The M-3 model without inhibitory feedback link: Input and output components as functions of time. It is seen that although the network is activated by just one input component and only durin a short period of time all four possible modes become active. K = 4; f 9 f (co, c1, c2, c3, co, c1, c 2 ) = ( 2 , 2 , 290, 2 , 2 3 2 ) .

determined for a model in which a maximum of two modes could be activated independently of each other. This kind of operation is theoretically possible also for more than two modes but then requires very special and unrealistic parameter combinations. A computer simulation of the above case is shown in Fig. 10. It is seen how the first and second modes are activated in succession but how the attempt to activate also the third causes the two first to be extinguished. If g3 is turned off at a slightly earlier instant it can even happen that all three modes are extinguished.

4 1

0.5

0 0

10

20

30

Fig. 10. The M-3 model designed for a maximum of two independently excitable modes: Input and output components as functions of time. It is seen how the first two modes are activated in succession b u t how activation of the third causes the first two t o be extinguished. I f f K = 4; (CO,C I , ~ 2 ~, 3 CO, , ~ 1c2) , = (2, 1.5, 2,0.5,4,0.5, 3 ) .

305

Associative Recall in Neural Network Models

DISCUSS ION

We have studied three neural network models called M-1, M-2, and M-3. These are actually not different models but merely a series of successive generalizations. Model M-1 is a model of a neural network with recurrent inhibition and was originally described in an earlier paper (Wigstriim, 1974). There it was shown that the model possesses a certain faculty which we have called the decision-making function in this text. As was mentioned in the introduction this means that the output obtained through an associative recall is made up of a particular pattern which occurred as an output during a certain learning event. T h s i s the learning event characterized by the presence of an input pattern with maximal similarity to the pattern used to evoke the associative recall. It was concluded (Wigstriim, 1974) that this ability depends on the fact that pattterns during learning become associated with themselves through a link in which pattern separation occurs. We have now shown that the M-1 model can perform its decision-making function even in the absence of the special assumptions introduced in the original description (Wigstrgm, 1974) but rather special or unrealistic parameter combinations are required (see Figs. 2 and 4, and the discussion under Computer Simulation of the M-1 Model). It was far easier to obtain the decision-making function for the M-2 model (see Fig. 5 ) which differs from M-1 only by the fact that overlearning is allowed (co > 1). This means that learning is so efficient that a complete recall can be elicited by an incomplete though adequate input pattern. Although the model is very similar t o M-l the mathematical description had to be modified. The discovery that overlearning has a favorable influence on the decision-making function is perhaps not surprising. The psychological interpretation is obvious, but precautions should be taken against making far-reaching analogies. It should be remembered that the model is intended as a description of only a small piece of nerve tissue. Let us now consider model M-3, which is provided with recurrent excitation as well as inhibition. This is presumably a more realistic model since it is known that self-sustaining activity can be induced in an isolated piece of cerebral cortex (Burns, 1958), and this probably requires the presence of positive feedback. This is also in agreement with the conclusions of Eccles (1969). The computer simulations showed that the decision-making function could be obtained also for the general M-3 model. It was shown that the network can become stabilized in alternative stable modes of activity which remain even after the input has been turned off (Fig. 8). The activity continues indefinitely unless it is prevented in some way. This is not the case for reverberative activity in real cortex (Burns, 1958). A more realistic operation could have been obtained if the neuron models had been given fatigue properties. The simulation experiments also showed that inhibition is just as essential as excitation for the M-3 model because in the absence of an inhibitory feedback link all stable modes easily become active at the same time (see Fig. 9). This can be compared with the fact that both recurrent excitation and inhibition are probably present in real cerebral cortex (Eccles, 1969). In one case of the model with weak inhibition, it was possible to evoke two modes simultaneously and independently of each other (see Fig. 10). However, the construction parameters were very specifically chosen, so perhaps not too much attention should be paid to this variant of the model. On the other hand, this kind of action might be easier to obtain if, for instance, the simultaneously evoked modes had become associated with

306

W igst r6 m

each other through simultaneous presentation during the learning phase. In a previous paper (Wigstram, 1974) we have pointed out the analogy between the M-1 model and Hebb’s (1949) concept of the cell assembly. The analogy is perhaps more obvious in the case of the M-3 model which includes recurrent excitation because Hebb’s idea is based upon reciprocal excitatory action between neurons that have previously been active at the same time. The cells within an assembly correspond to those cells in our model (the M-3 model) which have been active during the same learning event. The activation of an assembly corresponds to the formation of what we have called a stable mode of activity. It seems as if Hebb conceived the possibility of several cell assemblies being active at the same time within the same neural structure. We have shown that this situation could also happen for the M-3 model at least in the case of two simultaneous modes of activity (see Fig. 10). However, this is not the only way in which several modes could be active at the same time. This could also be achieved if the model were given a spatial structure (preferably two-dimensional) by letting connections between neighboring cells be more probable than connections between distant ones. It is likely that in a model so defined different stable modes can become active at the same time but at different spatial locations. Perhaps these modes could also interact with each other by moving in the spatial dimension. This may not be unrealistic since it is known that reverberating activity can travel across the surface of isolated cerebral cortex (Burns, 1958). In the following discussion which gives some further interpretations as well as some limitations of the present theory, the term “model” will refer to M-3, since this is the most general of the models considered and includes M-1 and M-2 as special cases. In analogy with Hebb’s interpretation, we could say that the stable modes of activity in our model correspond to concepts in an organism’s brain. According to our theory the formation of a set of possible stable modes of activity, or let us say concepts, is a result of learning in which certain signal patterns have become associated with themselves. The ability of certain input patterns to elicit certain concepts is also established through learning but in this case due t o associations between the input patterns and those output patterns which represent the corresponding concepts. Thus, there are two kinds of learning which we can call concept learning and association learning, respectively, although this terminology is not completely adequate. When studying the model we assumed that both kinds of learning occurred simultaneously during each learning event. The same results could have been obtained had learning consisted of two parts, one period of concept learning and one subsequent period of association learning. If desired we could let several input patterns become associated with the same concept. The above is probably a more realistic approach since it is analogous to the human (animal) learning process and to its division into infantile and adult learning. However, by assuming that the output is controlled by the unspecified signals during learning, we avoided the problem of how the concepts are actually generated. We have assumed that learning occurs at certain learning events. This contrasts with the real world in which the organism is affected by a continuous stream of stimuli during most of its life. Since it is hardly possible that all information received is learnt, there must be mechanisms for making selections. One such selection occurs when the organism is rewarded or punished. This is the case of so-called instrumental learning. The brain’s strategy used during instrumental learning has been formulated in Thorndike’s so-called law of

307

Associative Recall in Neural Network Models

effect (Thorndike, 1911). However, is it possible to describe these more complicated kinds of learning by use of the present model or by any model of associative learning? Or can one use a modified version of the present model in which, for example, the two-conditional facilitation principle is changed to three-conditional? These are intriguing questions remaining to be answered. One other problem which has not been considered in the present study is concerned with the storage of spatio-temporal patterns in the brain. The temporal variations of these patterns can be relatively slow, as in the case of forming a word as a sequence of phonemes, or fast, as in the case of learning flickering stimuli (Morell and Jasper, 1956). It has been suggested (Wigstram, 1973) that a temporal sequence of spatial patterns can be stored as a chain of associations. Whether the above two cases can be considered as the formation of a chain of associations remains to be investigated. In any case it should be clear that temporal variations of signals are features which can be stored in the brain, although the present report has only dealt with possible mechanisms for the storage of spatial patterns of signals. An extreme point of view is suggested by John (1967) who considers the temporal aspect of neural activity to be the only important one. The truth probably lies somewhere in between which means that learning in the brain would involve the storage of both spatial and temporal characteristics of signal patterns.

REFERENCES Amari, S. (1972). Characteristics of random nets of analog neuron-like elements. IEEE Trans. Syst., Man, Cybern. 2:643-657. Burns, B.D. (1958). “The Mammalian Cerebral Cortex.” London: Edward Arnold. Eccles, J.C. (1969). “The Inhibitory Pathways of the Central Nervous System.” Liverpool: Liverpool University Press. Hebb, D.O. (1949). “The Organization of Behavior.” New York: John Wiley. John, E.R. (1967). “Mechanisms of Memory.” New York, London: Academic Press. Klemera, P. (1968). Random network as a model of associative region of cerebral cortex. Sborn. Ved. Prac. Lek. Fak. Karlov Univ. 1 1 :559-565. Morell, F., and Jasper, H.H. (1956). Electrographic studies of the formation of temporary connections in the brain. EEG Clip. Neurophysiol. 8:201-215. Rosenblatt, F. (1960). Tables of Q-functions for two perceptron models. Cornell Aeronaut. Lab. (Buffalo) Rep. VG-1196-G-6. Rosenblatt, F. (1962). “Principles of Neurodynamics.” Washington, D.C.: Spartan. Sholl, D.A. (1956). “The Organization of the Cerebral Cortex.” New York: John Wiley. Thorndike, E.L. (19 11). “Animal Intelligence: Experimental Studies.” New York: Macmillan. Wigstrcm, H. (1973). A neuron model with learning capability and its relation t o mechanisms of association. Kybernetik 12:204-215. WigstrBm, H. (1974). A model of a neural network with recurrent inhibition. Kybernetik 16:103-112.

308

WigstrBm

APPENDIX 1 Calculating the Pattern Transformation Function of a Collection of Poisson Type Linear Threshold Units

The input-output relationship of a Poisson-type linear threshold unit is given by (3), repeated below.

(Al. 1) The numbers yi which are components of the vector y are assumed to take on only the values 0 and 1. The weighting factors wi are equally distributed, independent, poissondistributed stochastic variables with the parameter A. We are going to determine the pattern transformation function of a collection of such units. This will be done by first considering one unit and then generalizing the result. Using an auxiliary variable z the following two equations can be used instead of (Al.l).

z NP

z=

u=

wiyi

(Al.2)

fz>p { 0 iotherwise

(Al.3)

i=l

1

Let us study the signals which occur at two different events. To distinguish the signals these,are labeled by subscripts 1 and 2. Now, let us define the sets (f = {i:yli = 1 and y2i = 0}

(Al.4)

9= {i:yli = 0 and y2i = l }

(A1.5)

e = {i:yli = 1 and y2i = 1 )

(Al.6)

Using the above definitions we can define the sums (Al.7)

(A1.8) (A1.9)

c

Since @,9, and are disjoint sets it follows that z, , z @ , and z, are independent Poisson-distributed stochastic variables with parameters An, , Ana, and Ane where na ,

Associative Recall in Neural Network Models

309

na, and n, equal the number of elements in Q , % , and (?,respectively, that is, n, = IY1 l2 -Y1 n @ = tY,l

"e = Y ,

2

-Y1

- Y2

(A1.lO)

Y2

(Al.ll)

- Y2

(A1 .12)

It is realized that z1 = za

+ z,

(A 1.13 )

z2 = Z@

+Z,

(Al.14)

Using this enables us to obtain an expression for the expectation E [U, U2] : E[UlU2] = P(U, = 1 and U2 = 1) =P(zl > p a n d z 2 > p ) = P(z,

+ z, > ,u and zB+ z,

> P ) =~~~ P +r>P q +*>P

P(zQ= p and za= q and z, = r) =

(A1.15)

where (A 1.16)

(A 1 . 1 7)

(A 1.1 8) The numbers p, q, and r are nonnegative integers. We have used the fact that the density function for the three-dimensional variabletz, . za, z,) is equal to the product of the density functions for the single variables z, ,za , and z, which are given by the usual expressions for Poisson-distributed stochastic variables. The expectation E [U, U2] where the vectors U, and U, denote the output patterns from a collection of N, units is now simply obtained by multiplying the result in (Al.15) by N,.

-

W igst rijm

310

(A1.19)

Combining with (Al.lO), ( A l . l l ) , (A1.12), (A1.16), (A1.17), ant

18) yields

This is identical with equation (4) in the text, which is thereby verified. However, (Al.20) is not suitable for numerical calculations since infinite sums are used. The sums can be made finite in the following way. Let us start with (Al.19) instead of (A1.20) in order to make shorter notations. It is assumed that p is a nonnegative integer, which is no serious restriction. We now get

We have been using the fact that the summation of any of the density functions f

za ’ fza, or fze over all possible values of the corresponding stochastic variable (i.e. from 0 to

a)will

give the result 1. The above expression but with the appropriate functions of , and yl * y2 substituted fro f, and f, is the equivalent form of

Iyl I 2 , Iy2 I

Q

fZaj

(Al.20), which was used in the computer simulations.

e

31 1

Associative Recall in Neural Network Models

APPENDIX 2 Modifications Needed in the Case of Overlearning

We are going to study how equations (5) and (14) should be modified t o handle the case of an arbitrary nonnegative co-value and thus verify equations (21) and (22). Of special interest is the so-called overlearning case, that is when co > 1. We have to go back to the derivation of (1) which has been described in an earlier paper (WigstrBm, 1974). Equations (3), (4), and (5) in this paper are quoted below.

(A2.1) (A2.2) ,f1 ifYb>d =

(A2.3)

1 0 otherwise

Using these equations and the assumption that d is a stochastic variable uniformly distributed on [0, 11, the auxiliary variables ya and yb could be eliminated, yielding equation (1) for the pyramidal cell model. In the derivation of ( 1 ) the fact that P(yb > 1) is small during the association phase was used. It was in order to make this condition possible that the assumption co = 1 was made, although the notation co was not used. In order to handle the case co > 1 we will assume that the stochastic variable ya can be treated as fixed (nonstochastic). This can be obtained in a neuron model with a large number of dendrites (or dendritic branches). We can then write ya instead of the expectation E [y,] in (A2.1). Combining with (A2.2) and using the operator 2 = instead of

d (z +

d (x + 1) yields

(A2.4) In order to be more general we will assume that the distribution of d is not necessarily uniform but is described by an arbitrary density function q . If we want to maintain the interpretation of co > 1 as overlearning, we should require that the main part of the probability mass be located within the interval [0, 11. The expectation E [y] can now be written

(A2.5) where is the cumulative density function corresponding to cp, Combining (A2.4) with (A2.5) and writing A@) * A instead of E [A@) A] yields

-

(A2.6) Equation (21) is now verified. Let us now verify (22) which describes thz operation of the network. In (A2.6) we can write out the index i which indicates a certain pyramidal cell: K

E [y,]

= @(

2 C

c

n=l

yi(n) (X(")

- X)(A(n) - A)).

(A2.7)

The summation variable n has been used instead of k. We can now calculate the expectation of the scalar product y@) y.

-

(A2.8) It will be shown in the following that

(A2.9) Equation (A2.8) can then be written

(A2.10) Now, let us prove that (A2.9) is correct. If yiR) = 0 the equality holds since both sides are then equal to zero. Now look at the case yi@) # 0 which is equivalent to yi@) = 1. The right side is then equal to @( S C(X@) X)(A(k) A)). The left side can be simplified using the fact that yi(n) is equal to 1 for n = k but 0 for all other values of n. This is a consequence of the fact that the patterns yo

Associative recall and formation of stable modes of activity in neural network models.

Models of neural networks with recurrent inhibition are studied, as well as one model which also includes recurrent excitation. The models are intende...
1MB Sizes 0 Downloads 0 Views