ANALYTICAL

BIOCHEMISTRY

94,270-273

A Nonparametric

(1979)

Method for Fitting a Single Biological Data I. A. NIMMO

Department

of Biochemistry,

Exponential

to

AND G. L. ATKINS

University of Edinburgh Medical Edinburgh EH8 9AG, Scotland

School,

Teviot

Place,

Received October 25, 1978 A nonparametric (“median”) method for fitting a single exponential to biological data is described. Its performance compared with that of least-squares alternatives has been assessed by analyzing simulated experimental data. The method appears to be reliable and relatively efficient.

A single exponential describes the data from several kinds of biological experiment: for example, the time course of thermal inactivation of a protein, the relationship between the initial velocity of an enzymic reaction and the reciprocal of the absolute temperature, or the disappearance of tracer from a one-compartment system. The rate constants of these processes are usually found by plotting the data in semilog form, and then fitting a straight line to them by eye or by unweighted least-squares regression. Neither method is entirely satisfactory unless the data contain little error; thus drawing a line by eye is subjective, and the leastsquares regression requires the error in the transformed data to be normally distributed and of constant variance. There are at least two approaches that may be more reliable. One is to fit the exponential to the untransformed data by a correctly weighted nonlinear least-squares method; it is valid so long as the error is normally distributed and of known variance. The second approach is to transform the data to their semilog form and then find the “best” straight line by a nonparametric or “median” method (1,2). The advantage of this method is that it makes no assumptions about the variance and distribution of the error; its only requirements are that the in0003-26971791060270-04$02.00/O Copyright 0 1979 by Academic Press. Inc. All rights of reproduction m any form reserved.

270

dividual errors be uncorrelated with one another and as likely to be positive as negative. In the present paper we compare the three numerical methods by using them to analyze both simulated and real experimental data. METHODS

Simulated data. The perfect or error-free data (t,y) followed the exponential relationship y = y,,.exp(-k*t). They were derived by setting y,, = 100 and k = 1 and calculating y at 9 roughly geometrically spaced values oft between 0 and 3.5 inclusive. Experimental data (i.e., containing error) .were simulated as previously described, using series of pseudorandom normally distributed numbers of known mean and SD (3). Four sorts of error were present; they were chosen to represent those likely to occur in practice (4). (i) N 1: normally distributed error of coefficient of variation 10%. (ii) 01: as Nl, except that 10% of the points (chosen at random) were outliers with a coefficient of variation of 25%. (iii) N2: normally distributed error, with an SD of y = [Y]‘/~. (iv) 02: as N2, except that 10% of the points had an SD of 2.5 [y]“*.

FITTING

A SINGLE

A total of 500 sets of data was generated with each sort of error. Since in practice the value of y at zero time may not be known, all the sets were analyzed with and without this point. Real data. The initial velocities of the reactions catalyzed by the glutathione-Stransferases of rat and trout liver were measured at five temperatures between 7 and 36°C inclusive (5). The exponential was fitted to the initial velocity and the reciprocal of the absolute temperature. Numerical methods. The exponential was fitted to these data in four ways. (i) LS: by calculating the slope and intercept of the unweighted least-squares regression of in y on t. (ii) UNL: by an unweighted nonlinear least-squares regression, using the procedure of Davidon as modified by Fletcher and Powell and exploited by Atkins (6). (iii) WNL: by a weighted version of method (ii). The weighting factor was the reciprocal of y2 (i.e., it assumed the error had a constant coefficient of variation). (iv) NP: by calculating the slope and intercept by the nonparametric method (1,2). Essentially one takes every possible pair of points [n points giving %.n(n - 1) pairs] and calculates a slope (b) and an intercept (a) from each pair. The equations are b = lny2 - lnyl t2

-

t1

and a=

t2 ln y1

-

tl

t2 - t,

In ~2

.

The slopes and intercepts given by replicate points are discarded, the remaining slopes and intercepts are ranked, and their medians are taken as the “best” estimates. In addition, eight of our colleagues plotted the real data in semilog form and estimated the slopes and intercepts of the lines using pencil and ruler.

271

EXPONENTIAL

RESULTS

The results of the simulated experiments are in Table 1. For each parameter the upper row of values was calculated including the point at zero time, whereas the lower row was calculated excluding it. It can be seen there is little difference between the two rows, except that the lower values tended to be less precise. The unweighted linear regression of In y on t gave good answers with data set Nl (constant relative error); this is not surprising, as the points were in fact correctly weighted. It gave poorer answers with the other data sets; for example, those derived from sets N2 and 02 tended to be biased and were very imprecise. The unweighted nonlinear regression of y on t produced relatively imprecise values, and 4 out of 16 of them were biased as well. Curiously, when the regression was weighted the results were even worse, and the program also failed to converge with 30-40 of each group of 500 data sets. The nonparametric regression of In y on t performed well throughout, in that its estimates were always relatively precise and tended to be unbiased (1 out of 16 estimates was biased; the expectation on the basis of chance alone is one out of 20). The efficiency of the method compared with correctly weighted least-squares can be assessed from the results for data set Nl ; it is about 80% for the parameter y0 and about 85% for k. Real data were also used to compare the numerical methods with graphical analysis, as follows. The initial velocities of the reactions catalyzed by the two glutathioneS-transferases (y) decreased exponentially as the reciprocal of the absolute temperature (t) increased (Fig. 1). The data were analyzed both graphically and numerically, and the values computed for the decay constant k were expressed as percentages of that derived graphically for the trout (Table 2). It is evident that, when there was scat-

272

NIMMO

AND

ATKINS

TABLE

ANALYSIS

OF

I

SIMULATEDEXPERIMENTALDATABYFOURMATHEMATICALMETH~D~' Data set

PZUlleter

NI

(0 Least-squares YO k

99.4 99.5 I.003 I.003

+ k _f _f

4.9 5.5 0.029 0.031

(99.5) (99.6) (1.001) (1.003)

100.2 100.6 1.015 1.018

k + + 2

6.4 7.5 0.098 0.097

(100.0) (100.5) (1.006) (1.016)*

99.1 99.3 I.007 1.009

+ + f +

5.6 6.2 0.049 0.049

(98.9)* (99.5) (1.006)* (l.om)*

100.1 100.1 I.003 1.003

+ + 2 f

6.5 6.8 0.034 0.036

(100.1) (100.0) (1.003) (1.002)

98.8 98.9 0.999 1.ooo

? t f +

(ii) Nonlinear YO k

99.4 99.6 I.006 1.096 (iii)

YO k

k

99.5 99.6 0.999 0.998

regression (98.5)’ (98.7)’ (0.998) (0.997)

regression

regression f + t +

+ r k +

104.9 107.3 1.093 1.098

f k f +

20.7 (99.9) 29.0 (100.2) 0.271 (1.022)’ 0.290(1.017)*

104.4 106.9 1.099 1.10s

t + + 2

22.3 30.8 0.293 0.312

(99.6) (100.0) (1.017)’ (1.022)*

99.5 99.6 1.011 1.010

r f f ‘+

5.9 @x3)* 7.7 (98.9)’ 0.120(1.000) 0.129 (0.995)

99.5 loo.3 1.015 1.026

ir k + +

7.0 8.8 0.139 0.156

(99.2) (100.1) (0.999) (1.008)

f f * f

6.1 (99.7) 7.5 (100.4) 0.177 (1.053)* 0.146(1.049)*

loo.3 101.3 1.110 1.101

+ r f f

8.4 10.0 0.240 0.228

(99.5) (100.1) (1.057)* (LOSS)’

+ k + k

6.5 8.1 0.102 0.110

100.1 100.5 1.014 1.019

L + + f

7.5 8.6 0.109 0.118

(99.6) (100.3) (0.998) (1.Guo)

of y on f (weighted)

6.0 (98.2)’ 6.8 (98.5)* 0.059 (0.997) 0.060(1.001)

6.3 6.8 0.038 0.040

02

of In y on I

of y on r (unweighted)

+ 6.9 (99.1)’ + 8.0 (99.3) + 0.108(I.W2) _+ 0.108 (1.002)

Nonlinear 98.2 98.4 I.002 1.002

5.3 5.9 0.038 0.040

(iv) Nonparametric YO

NZ

01

regression (99.1)* (99.6) (1.000) (0.999)

100.2 100.6 1.088 1.072 of In y on I 100.1 100.7 I.014 1.016

(99.9) (100.6) (0.997) (1.002)

a See text for a description of the mathematical methods and the sorts of error incorporated into the data. The values of the parameters mean ? SD (median in parentheses) of 500 experiments [except method (iii); see text]. In each row the upper value was obtained when point at r = 0 was included. and the lower one when it was omitted. The perfect values are: y0 = 100, k = l.ooO. * P < about 0.05 that the median does not differ from its true value (7).

ter in the data [rat experiment; Fig. 1 (b)], the eight graphical estimates differed from one another and from those derived numerically. In both instances the weighted version of the nonlinear regression failed to converge, thus demonstrating its requirement for “good” data. DISCUSSION

The analysis of the simulated experimental data suggests that the nonparametric regression of In y on t gives parameter estimates that are unbiased and quite precise, even when the nature and distribution of the experimental error is not known. (As mentioned above, the regression depends on the individual errors being uncorrelated with one another and as likely to be positive as negative.) This is an appealing attribute, because, in enzyme assays at least, little is known about the characteris-

are the

tics to be expected of the error. For example, the few detailed studies of this topic that have been published [e.g., (4,8- lo)] came to rather different conclusions. The classical least-squares regression of In y on t performed satisfactorily when it was weighted correctly and tended to be more precise (“efficient”) than the nonparametric regression. It has a further advantage, which is that it can give approximate standard errors for the parameters being estimated (y. and k). In contrast, the nonparametric method can only give standard errors for either y. or k (but not both) under certain well-defined conditions (I,1 1). On the other hand, the classical least-squares regression suffers from a serious drawback: it produces biased answers if the data are incorrectly weighted or, to a lesser extent, if the errors are not normally distributed. The requirement for correct weighting is particularly difficult to meet in experiments

FITTING

2.4

, 2.4 3.6 3.2

I 3.4

3.2

213

A SINGLE EXPONENTIAL

TEMPERATURE

I 3.4 (lo3

1 3.6

/OK)

FIG. 1. Temperature dependence of glutathione-S-transferase activity. Initial velocities were measured between 7 and 36°C (5). The abscissa is linear in the reciprocal of absolute temperature (1OVK). (a) trout liver; (b) rat liver.

following exponential processes, as the values of the dependent variable are likely to cover a wide range. We were frankly surprised that the nonlinear regression of y on t gave such bad answers, especially when the data were weighted in approximately the correct manner. Overall, the unweighted version performs about as well as the classical linear regression, but employs algorithms that are mathematically much more complex and is therefore not an attractive alternative. To sum up, we have concluded that, as TABLE

2

ANALYSIS OF REAL DATA~

one usually has little quantitative idea about the random experimental error present in one’s assays, one should opt for the nonparametric regression. This is especially true if the presence of bias in the calculated parameter is likely to influence one’s interpretation of the data-for example, when measuring the thermal stability of an enzyme in the search for genetic variants. REFERENCES 1. Sen, P. K. (1%8) J. Amer. Stat. Ass. 63, 13791389. 2. Porter, W. R., and Trager, W. F. (1977) Biochem. J. 161, 293-302. 3. Atkins, G. L., and Nimmo, I. A. (1975) Biochem. J. 149, 775-777.

4. Nimmo, I. A., and Mabood, S. F. (1979) Anal. Source of enzyme Method Graphical LS (i) UNL (ii) WNL (iii) NP (iv)

Trout liver

Rat liver

100.0 r 1.6 97.4 rt_5.2 100.9 105.0 98.8 101.6 Would not converge 100.9 117.6

0 The slopes of the plots in Fig. 1 were estimated graphically by eight individuals; the mean values 2 SD are expressed as percentages of the mean value for the trout enzyme [Fig. l(a)]. The slopes were also computed by the numerical methods given in the text and expressed in the same way.

Biochem.

94, 265-269.

5. Nimmo, I. A., Clapp, J. B., and Strange, R. C. Camp. Biochem. Physiol., in press. 6. Atkins, G. L. (1971) Biochim. Biophys. Acfa 252, 405-420. 7. Campbell, R. C. (1967) Statistics for Biologists, p. 34, Cambridge Univ. Press, London/New York. 8. Storer, A. C., Darlison, M. G., and ComishBowden, A. (1975) Biochem. J. 151, 361-367. 9. Siano, D. B., Zyskind, J. W., and Fromm, H. J. (1975) Arch. Biochem. Biophys. 1’70, 587600. 10. Askelof, P., Korsfeldt, M., and Mannervik, B. (1976) Eur. J. Biochem. 69, 61-67. 11. Comish-Bowden, A., Porter, W. R., and Trager, W. F. (1978j.J. Theor. Biol. 74. 163-175.

A nonparametric method for fitting a single exponential to biological data.

ANALYTICAL BIOCHEMISTRY 94,270-273 A Nonparametric (1979) Method for Fitting a Single Biological Data I. A. NIMMO Department of Biochemistry,...
310KB Sizes 0 Downloads 0 Views