Quantal analysis using maximum entropy noise deconvolution.

Journal of Neuroscience Methods, 44 (1992) 47-57

47

© 1992 Elsevier Science Publishers B.V. All rights reserved 0165-0270/92/$05.00

NSM 013%

Quantal analysis using maximum entropy noise deconvolution D i m i t r i M. K u l l m a n n Department of Pharmacology, School of Medicine, Unil'ersity of California, San Francisco, CA 94143-0450 (USA) (Received 6 January 1992) (Revised version received 3 April 1992) (Accepted 22 May t992)

Key words:

Noise deconvolution; Maximum likelihood estimator; Maximum entropy; E - M algorithm; Quantal analysis; Neurotransmitter release

W h e n applying quantal analysis to synaptic transmission it is often unclear how much of the measured postsynaptic signal fluctuation arises from random sampling and noise rather than from the probabilistic transmitter release process. Unconstrained noise deconvolution methods do not overcome this because they tend to overfit the data, often giving a misleading picture of the underlying process. Instead, maximum entropy deconvolution provides a solution which is the smoothest, or most featureless, distribution that is still compatible with the data, taking noise and sample size into account. A simple way of achieving this is described, together with results of Monte Carlo simulations which show that the features present in the maximum entropy solution usually reflect the process underlying the data and not random sampling or noise.

Introduction

An important problem in quantal analysis of synaptic transmission is that the trial-to-trial fluctuation in the amplitude of the post-synaptic signal is generally not exclusively due to probabilistic transmitter release but also to other sources of variability of the post-synaptic m e m b r a n e potential or current. These include electrode noise, m e m b r a n e channel activity, spontaneous synaptic activity, non-stationarity in recording conditions and noise introduced by the amplifier circuit. These will be collectively referred to as noise, and in most cases they can be separately measured. As long as they interact linearly with the underlying synaptic signal, it should still be possible to recover the quantal parameters. This is the basis

Correspondence: D.M. Kullmann, Institute of Neurology, Q u e e n Square, London WC1N 3BG, UK. Fax: (44-71) 2785(t69.

of noise deconvolution (reviewed in Redman, 1990). The deconvolution approach operates as follows. The observed synaptic signal amplitudes are a finite sample drawn from a probability density function f ( u ; 0), where u is the measured variable (potential, current, charge), and 0 contains the unknown underlying quantal parameters. For convenience, the amplitudes can be binned to form a histogram a (a i, i = 1. . . . . N, where a i is the number of trials resulting in a signal of amplitude z,i _+ 8 / 2 , ~ is the binning interval and N is the number of bins). Clearly, as the sample size increases, a becomes an increasingly faithful representation of f ( u ; 0). At the same time, noise amplitudes can be measured in the same way to give a noise histogram g (gi, i = 1 . . . . . k . . . . . N , where k indicates the bin at 0), drawn from a noise probability density function g(u). Since several noise measurements can generally be obtained for every evoked synaptic signal, g ( c ) can be sampled more than f ( u ; 0) and therefore esti-

mated with high precision, in practice, g can be fitted by a continuous unimodal function, for instance a gaussian integral, or more generally the sum of 2 gaussians (Kullmann, 1989). If the synaptic signal and noise fluctuate independently and sum linearly, f ( u ; 0) is effectively a convolution of 2 functions: s(t'), describing the underlying synaptic signal fluctuation in the absence of noise, and the noise g(u). The goal is to find s(u) from a and g. A further requirement is that s(u) should be positive everywhere and its integral should be 1 (since it is a probability density

function). This makes it impossible to find an analytical solution, so the problem is not strictly speaking an inverse convolution. Instead it becomes one of optimization: to find s(t,) such that when convolved with g(t') it gives the best fit to a.

Several aproaches have been proposed for this task. The most powerful one is to maximize the likelihood function. For convenience again, let us define s (s j, j = 1. . . . . N ) as the discrete form of s(u) defined over the same interval as a and g. That is, s i is the integrated probability density

0.06

0.06

.# 0.04

~ 0.04

8

£ o_ 0.02

-Q

& 0.02

J,

0

0

'

4.

'

8

'

12

'

1'6

Oz

4

8

12

16

4 8 Amplitude

12

16

0.06]

40

B ¢0

"6

3O 2o

0.02

j -4

'

4' 8' Amplitude

1'2 '~' "1'6

0_~-

Fig. 1. Example of Monte Carlo simulations illustrating different deconvolution approaches. A quantal model was simulated, with 4 release sites with probability = 0.5, quantal amplitude = 3, and quantal coefficient of variation = 17% (that is, the standard deviation of the c o m p o n e n t at 3 was 0.5). Gaussian noise with standard deviation = 1 was added. A: the probability density function from which the data were sampled, shown together with the underlying components. Each component is drawn as a gaussian curve, centred about its m e a n amplitude, with an integral equal to its probability of occurrence, and a standard deviation calculated from the quantal coefficient of variation. The components were s u m m e d together and convolved with a gaussian curve with standard deviation 1 (representing the noise) to yield the parent probability density function. B: 500 trials were randomly sampled a n d binned to form a histogram. C: unconstrained m a x i m u m likelihood solution, obtained using recursion (R1). This approach failed to resolve the underlying model correctly, and instead suggests that it is made up of 6 components. D: when the result was constrained to conform to a simple binomial model with 4 release sites, the underlying components were recovered almost exactly. The curves show the components which were resolved using the method described in Kullmann (1989). The free variables were the quantal amplitude and variability and the release probability. In general, however, the assumptions required to obtain this result cannot be made.

49

between c j - 3 / 2 and vj + 3 / 2 . The likelihood function which describes the agreement between a and the convolution of s and g is given in its logarithmic form as: L = ~ i=1

ai

log

sjgi_j+ k

(l)

j

N Sjg,-j+k may be recognized as the discrete (Ej=~ form of the convolution integral). This can be done in a number of ways, and a very simple iterative approach is described below. Simply maximizing likelihood on its own, however, can often give misleading results because the problem as posed here is underconstrained: some of the features of the observed amplitude histogram reflect finite sampling of the data rather than the 'true' underlying model, and the maximum likelihood method cannot distinguish between them. As a result, the maximum likelihood solution tends to overfit the data: when s is reconvolved with g and compared to a, the agreement is better than would be expected from random sampling. A possible result of this bias is that when the 'true' underlying distribution is continuous, the maximum likelihood solution is still generally made up of discrete components. That is, the fluctuation is interpreted as being among a few amplitude levels with very little scatter about those levels. This could give rise to the impression that a quantal description of transmission was correct when in fact there was only a continuously graded signal underlying the data. Clamann et al. (1991) have moreover argued that this phenomenon invalidates several published estimates of quantal parameters underlying synaptic transmission in the spinal cord (reviewed by Redman, 1990). That is, the unknown 'true' underlying quantal amplitude may be considerably smaller than that estimated from the separation between the components of the maximum likelihood solution. This problem also manifests itself if there is appreciable quantal variability, that is, variability in the size of the signal elicited by a single release event, either from trial to trial at a given synaptic contact, or between the active contacts. Fig. 1C illustrates the result of applying the maximum likelihood method to simulated data

generated from a quantal model with quantal variability: the solution is made up of sharp peaks which do not correctly reproduce the underlying model. How can this problem be overcome? One approach which has been applied with some success is to perform Monte Carlo simulations to estimate confidence intervals for the optimization solution: simulated data samples are generated using a range of possible underlying models (continuous versus discrete distributions, different quantal amplitudes and variabilities, and different probabilistic models of release), but with a similar sample size and noise distribution to those determined experimentally. The maximum likelihood solution is obtained in each case, and if different competing models can give rise to solutions similar to that obtained for the experimental data, then the latter is rejected as non-unique. A solution such as that shown in Fig. 1C would be rejected using this method. The principal shortcomings of this approach are that the solution space must be explored in an exhaustive manner, and that it is difficult to present the results of the simulations in a readily accessible format. An alternative approach is to incorporate more constraints in the optimization to restrict its solution space and thereby improve its robustness. Two constraints already included are that s must be positive everywhere and sum to 1. What other constraints may reasonably be added? This is determined by what other information is available about the system under study. If is is known that the quantal description is correct, and that transmitter release conforms to a binomial model, then these assumptions can be incorporated into the analysis. This forces the solution to be made up of components occurring at equal spacing, and further, the probabilities corresponding to 0, 1 . . . . . n quanta released to have a unimodal distribution. Another reasonable constraint is to assume that the quantal amplitude has a finite but unknown variability. Two approaches to incorporate binomial models and quantal variability into maximum likelihood methods have been described (Kullmann, 1989; Smith et al., 1991) and yield greatly improved resolution of quantal pa-

rameters. Fig. 1D illustrates the advantage of this method over the unconstrained maximum likelihood approach. What if there is no additional information about the system? That is, it cannot be assumed a priori that a quantal model, let alone a binomial one, underlies transmission. This is in fact a plausible situation, especially when the active synapses are located at several dendritic sites which present different impedances to the propagation of the electrical signal to the recording site. Presynaptic transmitter release events cannot then be expected to give rise to approximately equal signals. In this case, the quantal constraint clearly cannot be incorporated into the analysis. Instead, the important question to ask is whether there is any evidence from the observed amplitude distribution that the quantal model is correct and, if so, what is the quantal amplitude? The algorithm described below provides an answer to these questions without relying either on extensive Monte Carlo simulations or on untested assumptions about the mechanisms underlying transmission. Briefly, the method relies on maximizing two quantities simultaneously: on the one hand, the likelihood function (1) to obtain a good fit to the data and to detect clustering in the synaptic signal amplitudes which may indicate a quantal process; and on the other hand, the entropy (or 'smoothness') of the solution s to bias against overfitting minor features arising from sampling artifact. Maximizing likelihood alone gives rise to a solution with sharp peaks and troughs, which agrees 'too well' with the data. That is, the fit to the data is better than would be expected if the data were actually sampled from the convolution of s and g. Maximizing entropy alone, in contrast, gives a flat solution, which does not agree at all with the data. With the appropriate trade-off, however, a unique solution can be found; the maximum entropy noise deconvolution (MEND) solution, which is the smoothest, or most featureless, distribution which still gives an acceptable fit to the data. The important goal is not to fit the data precisely, but simply to find a solution which is still compatible with the data, according to some goodness-of-fit criterion.

If the MEND solution contains equally spaced peaks, with the first peak at 0, then this is strong evidence in favour of the quantal model. The spacing between the peaks then yields an estimate of the quantal amplitude, which can subsequently be incorporated in constrained deconvolution methods to estimate release probabilities and quantal variability.

Materials and methods

M a x i m u m likelihood

Maximizing likelihood alone generally yields a solution made up of a number of discrete components. As mentioned above, the form of this solution may not be identical with the 'true' underlying distribution because it may also show features which arise from finite sampling. A very simple way of maximizing likelihood relies on treating the data as a finitely sampled mixture of distributions, each of which has the form of the noise function. This allows the expectation-maximization ( E - M ) algorithm (Dempster et al., 1977) to be applied. This is a very powerful iterative algorithm which is insensitive to starting estimates and generally guarantees convergence to the global likelihood maximum. Each element of the solution s is updated according to the recursion: 1

S]=

N

E

N

E aihij ai i=l

(R1)

i=l where hi, j is the posterior probability that trials of amplitude v i + 3 / 2 in fact arise from an underlying component of amplitude t,j,

sjgl-j+k hi,J

N

E Sjgi-j+k ]=1

At each iteration, s is normalized to sum to 1. Starting with an arbitrary distribution (with the only proviso that each sj. > O) the recursion is allowed to cycle until s changes by less than a very small amount (the convergence threshold).

51

The solution shown in Fig. 1C was obtained using this algorithm. (If the binning interval 6 is sufficiently small this approach is very similar to that proposed by Hasselblad (1966) and further considered by Ling and Tolhurst (1983) and Kullmann (1989). Instead of describing the solution by a small number of components, each of whose amplitude and probability are allowed to move, in this case a large number of components are positioned at equal intervals on the amplitude axis, and only their probabilities are updated.)

Maximum entropy Maximizing the entropy of the solution, in constrast, simply makes it flat. The conventional expression for entropy is - Y ~ ' l S j lOgeSj, which reaches a maximum when each of the s t is equal to e i. To satisfy the requirement that s must sum to 1, this can be altered to -

F~j=@

log,,s~. Entropy is then maximized when st = 1 / N ( j = 1. . . . . N). If entropy and likelihood are traded off in the appropriate ratio, a unique result can be found, the M E N D solution. This is the smoothest distribution which could give rise to the data, given the size of the sample and noise function. If inflexions are still present in this solution, then they must reflect some features of the underlying data which are very unlikely to have arisen from sampling artifact. To trade off entropy and likelihood, the method proposed by Gull and Daniell (1978) to deal with related problems in astronomy is adapted. Each element of s is updated by a weighted product of the expressions for maximum likelihood and maximum entropy: N

st =

Nl ~

E

~ aihij

I-A

(R2)

ai i=1

i=1

where A plays a role akin to the Lagrange multiplier, and serves to balance the degree to which likelihood or entropy is maximized. It can take a value between 0 (which gives the maximum entropy solution) and 1 (which results in the maximum likelihood solution). As for unconstrained

likelihood maximization, the recursion is started with arbitrary starting values for the s j, and allowed to cycle until it converges. How does one choose the appropriate value of A? This is determined by the expected goodnessof-fit of the solution to the original data, given the size of the sample. If only a few synaptic signal amplitudes were measured, then A must be small, to reflect the fact that one can say very little about the shape of s(v). As the size of the data sample increases, there is more information about s(v), and A can then be made larger. A rational scheme for finding the correct value of A was provided by Gull and Daniell (1978). Starting with a low value of A, s is obtained from the above recursion, convolved with g, and compared to a using the X 2 test. This tests the hypothesis that a was sampled from the convolution of s and g. Now, in an idealized random sampling situation the expected value of X z is equal to the number of degrees of freedom (d.o.f.), equal to one less than the number of classes used for the X 2 test. If the measured X 2 is greater than d.o.f., then A is increased and the recursion is repeated. These steps continue until )f2 becomes approximately equal to d.o.f. In the above description the data were assumed to have been binned, but the maximum entropy method can also be applied to unbinned data. This is arguably a more rational approach because it allows the K o l m o g o r o v - S m i r n o v test to be substituted for the X 2 test. (The disadvantage of the latter is that the expected membership of each class should be greater than 5, so the solution space has to be divided up arbitrarily to satisfy this requirement.) The data in this case, d (d t, l = 1 , . . . , M ) , are the unbinned amplitudes of the synaptic signal on each of the M trials. It is assumed that the unbinned noise probability density function g ( v ) has been estimated accurately. The solution space can still be defined in the discrete form s (where sj is the probability density integral between vj - 6 / 2 and vj + 6/2). The recursion above is rewritten:

st =

E h(d,/=1

(R3)

52 where h(d I - t)) is now given by

h(d l -

cj)

=

s i g ( d I -~).) N E s j g ( d , - t,i) j=l

The appropriate value of A is found using the Kolmogorov-Smirnov test in the same way as with the X e test. When X z = d . o . f . , there is a probability of 0.5 that a larger value of X e would have been observed if the data were generated by the model. In other words, the equivalence corresponds to a significance level, a, of 0.5 in rejecting the hypothesis that a was drawn from the convolution of s and g. The expected value for the K o l m o g o r o v - S m i r n o v statistic is chosen to satisfy the same condition. Starting with a low value of A, s is obtained, reconvolved with g(t') and compared to d. If the Kolmogorov-Smirnov statistic gives a significance level less than 0.5 ( a < 0.5), A is increased and the optimization repeated. These steps continue until a reaches 0.5. The unbinned form of the algorithm was found to be marginally more robust, and all the illustrated simulation results were obtained using this version.

Simulations The maximum entropy noise deconvolution method has been tested with extensive Monte Carlo simulations designed to reproduce a wide range of processes which could underlie synaptic transmission. The variables which have been explored are: sample size, discrete versus continuous underlying fluctuation, different numbers of quanta, and different quantal amplitudes and variability. For simplicity, most of the illustrated simulations were carried out on samples of 500 trials drawn from a simple binomial model, with all the release probabilities equal to 0.5, and gaussian noise with a standard deviation of 1. By varying the other p a r a m e t e r s in the model, a wide range of possible situations could be explored. In particular, a continuous underlying distribution was approximated by making the quantal content very large, and the quantal amplitude very small in relation to the noise. This gives rise to ampli-

tude distributions which are clearly beyond the limit of resolution of any optimization method. and is similar to some of the simulations used by Clamann et al. (1989) to argue that quantal amplitude estimates obtained from deconvolution could be inaccurate. A further variable which was explored was the level of significance used to find A. in the exposition given above the approach proposed by Gull and Daniell (1978) was adopted; that is, the expected )62 value was equal to d.o.f., or equivalently a = 0.5. An even more stringent test to identify statistical features in the data would be to set a = 0.05. This would maximize entropy further at the expense of likelihood and put the M E N D solution just at the borderline for rejection at the 5% significance level because it is too smooth to account for the inflexions in the data distribution. If peaks are still visible in this solution, it provides an even more compelling argument to interpret them as genuine features of the underlying process which gave rise to the data. In an experimental situation this would rule out the 'worst case' scenario, where as a result of particularly unfortunate sampling bias, a continuous underlying distribution happened to give rise to some clustering in the data which might otherwise be interpreted as a quantal process. All the simulations were repeated with a = 0.05 and a 0.5.

Results Two questions were asked for every simulation result. First, does the M E N D solution contain peaks where there should be none? In an experimental situation this would imply that a quantal model pertains when the true underlying distribution could actually be continuous. Second, does the method yield an incorrect estimate of the quantal amplitude? In Fig. 2 the same data used in Fig. 1 were analyzed using the maximum entropy method, setting a at 0.05 (Fig. 2A) and 0.5 (Fig. 2B). In both cases peaks were identified at the correct amplitudes (approximately 3 and 6). The other peaks (at 0, 9 and 12) were not detected (al-

53

the amount of information present in the data increases as more trials are added, it is expected that inflexions should eventually emerge with large samples. This is confirmed in Fig. 3 where a different model was used to generate the randomly sampled data (shown in Fig. 3A). The results in Fig. 3B were obtained from random samples of different sizes, showing that as the sample size increases the inflexions in the M E N D solution become more pronounced. This reflects the tendency for h to increase with the sample size for a constant value of a (0.05 in this case). With 100 trials, the solution is essentially featureless. It shows peaks roughly corresponding to the first two components in the underlying model (at 0 and 3) with 200 trials. The third component at 6 only begins to emerge with a sample size of 5000. When a was set at 0.5, peaks emerged with correspondingly smaller sample sizes (not shown). Again, the results were consistent between independent random samples of the same size (Fig.

though some suggestion of their positions is provided by the inflexions in the solutions, especially in Fig. 2B). This result can be taken to support the quantal hypothesis because peaks occurred at integral multiples of a fixed interval from the origin. The quantal amplitude, determined from the spacing between the peaks, was close to the correct value. This was highly reproducible, as witnessed by the results obtained from 29 independent random samples, drawn from the same parent distribution, shown in Fig. 2C (a = 0.05) and 2D (ce = 0.5). In some cases additional peaks were detected, but always at approximately the correct amplitudes (0, 9 and 12). In no case did the M E N D solution suggest an incorrect quantal amplitude (20% or greater error). If peaks are present in the underlying model, this does not automatically entail that they should appear in the M E N D solutions, because they could have been beyond the limit of resolution, given the sample size and noise. However, since

015

°t51 C

A

01 Z a_ 0.05

a+ 005

02

0

2

4

6

8

10

12

02

14

015

0

2

4

6

8

10

12 1'4

2

4 6 8 Amplitude

10

12

0

B

o.1

=~ 01 ~5

0.05

a_ 0.05

02

()

2

4 6 8 Amplitude

10

12

14

0'+2

14

Fig. 2. Maximum entropy deconvolution. A: the same randomly sampled data analyzed in Fig. 1 were subjected to maximum entropy noise deconvolution, setting the expected level of significance, a, to 0.05 to find h. Peaks are seen at approximately 3 and 6, indicating that a quantal process underlies the data and that the second and third components have been correctly identified. The rest of the distribution is not resolved. B: M E N D solution obtained for the same data setting a = 0.5. The same peaks are visible. C: 29 different r andom samples of 500 trials analyzed as for A ( a = 0.05). D: the same 29 samples analyzed as for B ( a = 0.5). In many cases peaks were resolved not only at 3 and 6, but also at 0, 9 and 12. In no cases did peaks occur in a ma nne r to suggest an incorrect quantal amplitude estimate.

5-1

02[ i c

0 06 A 004

i

002

o/

0

4

2

6

02

B

,~f /\

8

10

o?

5

4

g

g - ~o

4

6

8

02

n: 5000

_-n~lO00

Ol

0,2

0.1

01.

2

4 6 Amplitude

8

10

o2

0

e

t0

Amplitude

Fig. 3. Maximum entropy deconvolution of another quantal model: in this case, there were 2 release sites, with probability = 0,5, quantal amplitude = 3 and quantal coefficient of variation = 33% (that is, the standard deviation of the component at 3 was 1). Gaussian noise with standard deviation = 1 was added. A: the parent probability density function and underlying components shown as in Fig. 1A (the component at 0, which has a probability of 0.25, is truncated). B: M E N D solutions obtained for random samples of different sizes as indicated, setting a = 0.05. As the number of trials sampled increased, the resolution improved. C: M E N D solutions for 29 different random samples of 500 trials, setting a = 0.05. D: M E N D solutions obtained from the same 29 samples by setting a = 0.5. Only in one case was a small spurious side-lobe seen at approximately 9.

3C and D), and again the peaks were more clearly defined when a was set at 0.5. In this case, however, there was an additional spurious peak in one of the 29 random samples. This was a small peak at approximately 9, and was not present when a was set to 0.05. In this particular case the interpretation of the data would not have been seriously affected, since the M E N D solution was still made up of more prominent peaks at 0 and 3, coinciding closely with the amplitudes of the underlying components in the model, but it does raise the question: do peaks ever a p p e a r when the underlying distribution is in fact continuous? This is addressed in Figs. 4A and B. In this case the underlying model was made up of a large n u m b e r of closely spaced components, clearly beyond the resolution of any optimization method. In no case was the solution made up of more than one peak, whether a was

set at 0.05 or 0.5. This implies that even with a set at 0.5, a continuous underlying distribution will not be mistakenly interpreted as a quantal one using the maximum entropy method. Finally, in Fig. 4C and D, the M E N D solutions are shown for an underlying model similar to that illustrated in Figs. 1 and 2, but where the binomial release probability was reduced from 0.5 to 0.1 (this may be a more realistic simulation of transmission in the CNS). Again, when peaks occur in the solutions, they do so close to the 'true' positions of the underlying components in the model (0,3 and 6). Taken together with the other simulations, this shows that the performance of the M E N D method is insensitive to a wide range of features of the underlying model used to generate the data, and is therefore free of biases which may lead to an incorrect interpretation of experimental data.

55 Discussion

T h e M E N D algorithm is a simple m o d e l - i n d e p e n d e n t m e t h o d to s e p a r a t e robust f e a t u r e s of the a m p l i t u d e d i s t r i b u t i o n from s a m p l i n g artifact. It gives a conservative i n t e r p r e t a t i o n of the data, that is, the solution c o n t a i n s peaks a n d troughs only if t h e r e is sufficient evidence for clustering in the a m p l i t u d e distribution. If there is n o such clustering, or if the sample size or signal-to-noise ratio does not justify the r e s o l u t i o n of such features in the data, the solution is smooth, a n d in contrast to u n c o n s t r a i n e d m e t h o d s (e.g., Fig. 1C), spurious peaks do not appear. T h e level of resolution is not as high as for c o n s t r a i n e d m e t h o d s (e.g., Fig. 1D), b u t their use is restricted to cases w h e r e t h e r e is i n d e p e n d e n t evidence to justify the a s s u m p t i o n s on which they d e p e n d . T h e s i m u l a t i o n s c o n f i r m that setting the significance level, a, at 0.5 to find A is generally safe;

in only o n e case was a small spurious peak seen w h e n it was not actually part of the u n d e r l y i n g model, a n d this did not c h a n g e the i n t e r p r e t a t i o n of that p a r t i c u l a r sample b e c a u s e the rest of the solution was essentially correct. Setting a at 0.05 leads to an u n n e c e s s a r i l y conservative i n t e r p r e t a tion, b u t clearly if peaks are still p r e s e n t in the M E N D solution o b t a i n e d with this value, then o n e can c o n c l u d e with even g r e a t e r c o n f i d e n c e that they g e n u i n e l y reflect the u n d e r l y i n g process, T h e p r o c e d u r e o u t l i n e d here is only the first step in analyzing the data. If it has i n d e e d confirmed that a q u a n t a l m o d e l is correct, t h e n this yields an estimate of the q u a n t a l a m p l i t u d e . T h e next step is to apply a c o n s t r a i n e d form of deconvolution, using the q u a n t a l a m p l i t u d e as a constraint, to estimate the q u a n t a l variability a n d release probabilities. Some m e t h o d s for this are described by K u l l m a n n (1989) a n d Smith et al (1991). Fig, 5 A shows the result of this w h e n

C .~.0.2 A

'0+O" +0 O+i

~- o.1 02 o +-#"0.2 B

2

4

6

8 lO 0.8

~

._Z" o 0.4

~- o.1 o2 o

lO

a_

2

4 6 Amplitude

8 lO

O~ •

Amplitude

8

.

lO

Fig. 4. A, B: MEND solutions for 29 random samples drawn from an effectively continuous underlying distribution. The binomial model had 9 release sites, with probability = 0.5, and quantal amplitude = 1 without quantal variability. Gaussian noise with standard deviation = 1 was added. 500 trials were sampled for each optimization. A: solutions obtained by setting a = 0.05. B: solutions obtained by setting a = 0.5. In every case there was only one peak in the solution, lending no support for a quantal model, and therefore no estimate of quantal amplitude. C, D: MEND solutions for 29 samples drawn from a model with four release sites, with probability = 0.1, quantal amplitude = 3 and quantal coefficient of variation = 17%. Gaussian noise with standard deviation = 1 was added. 500 trials were sampled for each optimization. C: solutions obtained by setting a = 0.05. D: solutions obtained by setting a = 0.5. When peaks were present in the solutions they always coincided closely with the positions of the first two or three underlying components (0, 3 and 6).

5~

A 0.04 ..Q

o

a_ 0.02

%:4

0 ' 4 ' 8 ' 1'2 Amplitude

16

40JB ~

0.15

30i ~ L t

o.1

o z

o 10

~ c~

0.05 0

5 10 15 Amplitude (pA)

20

Fig 5. A: result of applying constrained maximum likelihood optimization to the data illustrated in Fig. 1B, with the quantal amplitude set at 3.2, as estimated from the MEND solution shown in Fig. 2B. The resolved components are shown as continuous lines, with the 'true' underlying model used to generate the data shown as dotted lines. The constrained maximum likelihood solution was obtained without assumptions as to the number of components, the quantal variability, or whether transmitter release was described by an a priori probabilistic model such as a binomial distribution. B: histogram of excitatory post-synaptic currents evoked in a hippocampal pyramidal cell by stimulation of presynaptic afferents, together with the MEND solution. Clear peaks are seen in the latter, supporting a quantal model, and implying a quantal amplitude of approximately 3.5 pA.

applied to the data sample illustrated in Fig. 1B (analyzed in Figs. 1 C - D and 2A-B). The only constraint was that the underlying distribution should conform to a quantal model, with a quantal amplitude estimated at 3.2 from the M E N D solution in Fig. 2B. The number of quanta, quantal variability and relative probabilities of different numbers of quanta released were free parameters. The result is clearly closer to the underlying model used to generate the data than the

unconstrained maximum likelihood solution (Fig. 1C). It is comparable to the result obtained by constraining the solution to conform to a simple binomial model, with the number of release sites set to 4 (Fig. 1D), although it was obtained without making the assumptions necessary for the latter. An example of the M E N D method applied to experimental data is shown in Fig. 5B. Excitatory post-synaptic currents were recorded from a CA I pyramidal cell in a hippocampal slice using whole-cell voltage clamp (for details, see Kullmann and Nicoll, 1992). A histogram of synaptic signal amplitudes, elicited by repeated stimulation of presynaptic fibres, shows peaks occurring at equal intervals from the origin. That these represent an underlying quantal process, rather than arising from sampling artifact and noise, is indicated by the M E N D solution drawn through the histogram: clear peaks are seen at approximately 0, 3.5, 7 and 11 pA, implying a quantal amplitude of roughly 3.5 pA. If the entropy of the solution is increased to make the peaks disappear, then it has to be rejected: that is, the hypothesis that the data were sampled from the reconvolution of the solution with the noise fails the X 2 or Kolmogorov-Smirnov test. The approach described here is significantly different from all the deconvolution methods which have been used in quantal analysis of synaptic transmission to date, because the goal is not to obtain the best fit to the data, but effectively to find the smoothest, most featureless distribution which could still give rise to the data. It is however similar in motivation to some methods used in spectroscopy, astronomy and computerized tomography, where there is a similar problem of inference: the data sample is finite, there is contaminating noise, and there is a danger of overinterpretation of spurious points. The precise details of the algorithm are however unique, because the nature of the measurement process here is different from that encountered in most other problems; rather than obtaining readings, with an error value, at each of a number of detectors, a single continuous variable is measured (viz. synaptic signal amplitude). This allows a remarkably powerful maximum likelihood esti-

57

mator, the E - M algorithm to be applied, instead of less elegant methods, mostly relying on gradient descent, which have been used elsewhere. There is no fundamental reason why this method should be restricted to quantal analysis. It is equally applicable, with only small modifications, to any process where a continuous variable is measured a finite number of times in the presence of noise. The question of interest is whether there is evidence for some clustering of the variable other than could arise from random sampling. Among important examples in the neurosciences are channel kinetics, where some preferred values for channel conductance or channel lifetime may be contaminated by other sources of variability, and receptive field properties of cortical neurones, where, for instance, orientation tuning may be concealed by spontaneous variability in firing rates.

Acknowledgements Supported by the American Epilepsy Society, the Human Frontiers Science Program, the Smith and Nephew Foundation, and the Fulbright Commission.

References Clamann, H.P., Rioult-Pedotti, M.S. and Liischer, H.R. (1991) The influence of noise on quantal EPSP size obtained by deconvolution in spinal motoneurons in the cat. J. Neurophysiol., 65: 67-75. Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maxim u m likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B., 39: 1-21. Gull, S.F. and Daniell, G.J. (1978) Image reconstruction from incomplete and noisy data. Nature, 272: 686-69(I. Hasselblad, V. (1966) Estimation of parameters for a mixture of normal distributions. Technometrics, 8:431-44(/. Kullmann, D.M. (1989) Applications of the expectation-maximization algorithm to quantal analysis of post-synaptic potentials. J. Neurosci. Methods, 30: 231-245. Kullmann, D.M. and Nicoll, R.A. (1992). Long-term potentiation is associated with increases in both quantal content and quantal amplitude. Nature, 357: 240-244. Ling, L, and Tolhurst, D.J. (1983). Recovering the parameters of finite mixtures of normal distributions from a noisy record: an empirical comparison of different estimating procedures. J. Neurosci. Methods, 8: 3(t9-333. Redman, S. (1990) Quantal analysis of synaptic potentials in neurons of the central nervous system. Physiol. Rev., 70: 165-198. Smith, B.R., Wojtowicz, J.M. and Atwood, H . L (1991) Maxim u m likelihood estimation of non-uniform transmitter release probabilities at the crayfish neuromuscular junction. J. Theor. Biol., 150: 457-472.

Noise Attenuation Estimation for Maximum Length Sequences in Deconvolution Process of Auditory Evoked Potentials.

Combining experiments and simulations using the maximum entropy principle.

Convex accelerated maximum entropy reconstruction.

The Maximum Entropy Fallacy Redux?

Quantal analyses of quantal analysis.

Stationary properties of maximum-entropy random walks.

Maximum entropy models capture melodic styles.

A maximum entropy framework for nonexponential distributions.

Depth-variant deconvolution of 3D widefield fluorescence microscopy using the penalized maximum likelihood estimation method.

Early Fault Diagnosis of Bearings Using an Improved Spectral Kurtosis by Maximum Correlated Kurtosis Deconvolution.

Aircraft noise annoyance and average versus maximum noise levels.

Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

Deriving the electron-phonon spectral density of MgB2 from optical data, using maximum entropy techniques.

How multiplicity determines entropy and the derivation of the maximum entropy principle for complex systems.

Maximum information entropy: a foundation for ecological theory.

Observations on maximum entropy processing of MR images.

Steepest entropy ascent model for far-nonequilibrium thermodynamics: unified implementation of the maximum entropy production principle.

Nuclear-weighted X-ray maximum entropy method - NXMEM.

Inverse spin glass and related maximum entropy problems.

A maximum likelihood approach to correlation dimension and entropy estimation.

Maximum entropy, word-frequency, Chinese characters, and multiple meanings.

Nonuniform sampling and maximum entropy reconstruction in multidimensional NMR.

Maximum entropy image reconstruction of DNA sequencing gel autoradiographs.

Analysis of physiological signals using state space correlation entropy.