Statistical analysis of enzyme kinetic data.

[6]

STATISTICAL

ANALYSIS

OF

END; DO K 2 : I TO N P $ W H | L E ( P S ( L t K 2 ) ~: 0); PUT F I L E ( O U T ) EDIT(I~ItPSS(PS[L,K2II,e(:) END; PUT F|LE(OUT) ED|T( ° + e) (A); L=NXT(L); END; END P U N C H ; FIN= PUT SK|P L|ST(ITHE END e ) ; END E N Z E Q ;

ENZYME

I )

KINETIC

DATA

06770 06760 06790 06800 06810 06820 06830 06840 06850

(AeAoA);

[6] S t a t i s t i c a l A n a l y s i s o f E n z y m e

103

Kinetic Data

By W. WALLACECLELAND Although graphical analysis is a quick and useful way to visualize enzyme kinetic data, for any serious study the data must be subjected to statistical analysis so that the precision of the derived kinetic constants can be evaluated. This is particularly important when one is trying to distinguish between possible patterns, such as competitive or noncompetitive inhibition. In this case the question is whether or not the intercepts of reciprocal plots are a function of inhibitor concentration; when such variation is small or absent, only statistical analysis can give one a quantitative estimation of the probability that the data represent one pattern or the other. It is important to note at the start what statistics can and cannot do. Statistical analysis is a matter of calculating probabilities, and where there is no useful information present in the data, statistical analysis will not magically produce any. Likewise when the data clearly define the nature of the rate equation, statistical analysis is not needed to do this, but will provide precision estimates of the fitted kinetic constants. Finally, one always has to use common sense and not be led blindly by the results of any statistical analysis.

Least Squares Method We will discuss only one method of fitting data to rate equations; namely, the least squares method, which is the one most commonly used and is the proper one to use where errors in the data are normally distributed. We will assume the error to be present only in the experimental parameter (velocity, or kinetic constant derived from velocities, such as V, V/K, 1/Ki), and that the concentrations of substrates or inhibitors, or such variables as pH, are free of error, at least relative to the error in the

METHODS IN ENZYMOLOGY, VOL. 63

Copyright © 1979 by Academic Press, Inc. All rights of reproduction in any form reserved. 1SBN 0-12-181963-9

104

INITIAL RATE METHODS

[6]

experimental parameters. 1 This is normally a fairly good assumption, and the errors in velocity are certainly greater than those in substrate concentrations, at least if one uses care in making up reaction mixtures. Even if the absolute concentration of the substrate is not accurately known, the relative concentrations of the substrate should be accurate if a single stock solution is used to prepare the reaction mixtures, and when one is trying to determine the kinetic pattern this will be sufficient. This point should be kept in mind when comparing the absolute values of kinetic constants; the standard errors reflect experimental variation, and the absolute error is determined by how well you know the concentration of the stock solution used. In the least squares method, one picks values for the constants in the rate equation so that the sum of the squares of the differences between experimental and calculated velocities is minimized. Mathematically this is done by writing out the appropriate expression, taking partial derivatives with respect to each constant to be determined, and setting these partial derivatives equal to zero. Simultaneous solution of the resulting equations gives the desired constants. This procedure is quite straightforward when the equation being fitted is linear in all the constants, since it results in a set of linear simultaneous equations. The rate equations one deals with in enzyme kinetic studies are not of this type, however, but are always the ratio of two expressions, with the denominator at least (and in some cases the numerator) being the sum of a number of separate terms, each containing kinetic constants. Application of the procedure outlined above produces a set of nonlinear simultaneous equations that are not readily solved, and thus a trick is needed to generate a substitute equation to which the data can be fitted that is linear in the constants, or in parameters that will allow estimates of the constants.

Nonlinear Least Squares A large number of strategies have been developed for nonlinear least squares analysis, but all involve picking preliminary estimates for the constants (in the most powerful methods, one can pick any values) and then calculating better estimates in some way so that the residual least square (sum of squares of differences between experimental and calculated velocities) is minimized. The most powerful nonlinear methods succeed simply because they are cautious--they do not adjust the preliminary estiFor a t r e a t m e n t o f the general case where this is not true, see G. J o h a n s e n and R. L u m r y , C. R. Tray. Lab. Carlsberg 32, 185 (1961).

[6]

STATISTICAL ANALYSIS OF ENZYME KINETIC DATA

105

mates very much at each cycle of iteration, and thus the fit rarely diverges and tends to converge cleanly (if slowly) to the minimum point. The disadvantages to such methods are the large number of iterations required and the resulting expense. The method that we will describe here (which is used in all the computer programs included here) requires good preliminary estimates, but converges very rapidly to the final answer, and thus is very cheap to use. When the fit fails to converge, the fault is usually with the data, but construction of a least squares surface (see below) allows one to diagnose what the problem is.

Proper Weighting Most rate equations for enzyme-catalyzed reactions can be made linear by inversion, and thus good preliminary estimates can be obtained by normal least squares fitting to these reciprocal equations. Such fits must be properly weighted, however, since the variance of 1/v will be the variance of v divided by v4, and one uses the reciprocal of the variance as the weighting factor. The variances of the velocities (or other parameters being fitted) may be determined experimentally, which is a lot of work, or one can make a reasonable assumption about them. Wilkinson2 assumed the variance of the velocities to be constant, calling for v4 weights in reciprocal fits, but no weights in the final iterative fits. This assumption is reasonable when the range of experimental velocities is not greater than a factor of 5 and corresponds to constant absolute errors (that is, 5 _+ 0.2 and 1 -+ 0.2). It is the assumption made in most of the computer programs included in this chapter. When the range of experimental velocities is a power of 10 or more, however, the assumption of constant variance is not appropriate and causes the lower velocities to be ignored in the fitting process. In fitting pH profiles for V, V/K, and 1/Ki values this assumption is also inappropriate, since these parameters vary a factor of 10 per pH unit above or below the pK which causes loss of activity or binding. In these cases it is better to assume that the variance of the velocities, or other parameter, is proportional to the square of the velocity. This corresponds to constant proportional error (that is, 5 -+ 1 and 1 + 0.2), and calls for v2 weights in reciprocal plots. The final iterative fits may then be unweighted if the equation is expressed in log form, since the variance of log v is the variance of v divided by v2, and the variance of v in this case is proportional to v2. The programs for fitting pH profiles, and also the alternate one for simple reciprocal plots, make this assumption. z G. N. Wilkinson, Biochem. J. 80, 324 (1961).

106


Iterative Fitting by the Gauss-Newton

[6]

Method

When good preliminary estimates are available, either from properly weighted fits to the equation in reciprocal form, or from graphical analysis (necessary when inversion does not generate a linear form of the equation), the G a u s s - N e w t o n method of iteration converges rapidly and allows simple calculation of standard errors of the fitted constants. The basic equation used is v = Fo + (a -

ao)(OF/Oa)o + (b -

bo)(OF/Ob)o + • • •

(1)

where there are as many terms containing partial derivatives as there are nonlinear constants in the rate equation. The constants ao, b0 . . . . are the preliminary estimates of the constants a, b , . . . ; F0 is the function evaluated using these preliminary values a0, bo . . . . . This is now a linear equation, and fitting data to it by the least squares method gives as constants (1) any linear constant in the original rate equation, and (2) correction factors such as (a - ao), which can be used to adjust the preliminary estimates of the nonlinear constants. The process is then repeated with the new preliminary estimates; after 3 - 5 cycles of iteration, no further change will occur. At this point, although (a - a0) has b e c o m e zero, its variance is finite and is readily used to calculate the standard error of a. The above brief description is designed to give the user some feel for what actually occurs during least squares analysis, but we will not present any further description o f the mathematics here, in order to be able to include as many computer programs as possible. The theory and mathematics for the procedures used in these programs are given by Wilkinson 2 and Cleland? Those interested in the full statistical complexities involved in fitting e n z y m e kinetic data should consult the review by Garfinkel, 4 which evaluates all these matters thoroughly. Since, however, the average biochemist will understand none of these articles, we believe it is more helpful simply to present the computer programs and show how to use them. Thus a number of programs are included at the end of this chapter and briefly described, and a list of other equations that have been fitted is included. Least Squares Surfaces

The computer programs included in this chapter will handle most decent data with little difficulty, but in some cases where data are bad, n W. W. Cleland, Adv. Enzymol. 29, 1 (1967). L. Garfinkel, M. C. Kohn, and D. Garfinkel, Crit. Rev. Bioeng. 2, 329 (1977).

[6]

STATISTICAL ANALYSIS OF ENZYME

KINETIC DATA

107

where the preliminary estimates cannot be obtained by fitting in reciprocal form, or where the equation is simply a difficult one to fit, the program may fail to converge on a position of minimum residual least square. This is indicated by the failure of the 3 - 5 lines after the printed title to show convergence, or if they are not even printed, by an error indication such as attempted division by zero or taking o f a negative square root. In these cases it is useful to examine the actual shape o f the least squares surface. For an equation with only two constants, such as Eq. (2) v = VA/(K

+

A)

(2)

the surface is simply a contour map of residual least square (that is, sum o f squares of differences between experimental v and the value of v calculated with the assumed values of K and V) as a function of V and K (see Fig. 1). The elongated diagonal shape of the contours results from V being in the numerator and K in the denominator and shows that K and V can be raised or lowered together with a much smaller effect on residual least square than if one is raised and the other lowered.

1.05

K 1.0

0.95

-

m

0.95

1.0 V

1.05

FIG. I. Contours of equal residual least square for a set of data that fit Eq. (2).

108

[6]

I N I T I A L RATE M E T H O D S

The construction of such a least squares surface is very simple; one simply progresses through a grid of K and V values and calculates Z[vi - VAi/(K + Ai)] 2

(3)

This is easily done with a short computer program, and if the results are printed out in grid form, one can then draw contours directly on the printed sheet. Where there are three constants in the equation, the least squares surface consists of contours in three dimensions and may be constructed by calculating two-dimensional contour maps for two constants at various levels of the third constant. The problem is further c o m p o u n d e d with four or more constants. What is often more useful, however, is to vary two nonlinear constants and make a least squares fit to the equation in which only the remaining constants are now the variables. For example, consider Eq. (4). log y = log

c

1 + K~/H)(1 + K2/H)

]

(4)

which describes the drop in V or V/K at high pH when two groups ionizing independently both must be protonated for activity. To construct a least squares surface with K1 and K2 (or actually, pK~ and pK2) as the variables, we assume a grid of these, and at each set of pK values we make a least squares fit to Eq. (4) with c as the only variable. This gives: log c =

E log y~D~ n

(5)

where Di = (1 + K1/H0(1 + K , / H 0

(6)

n is the number of data points, and Yi and Hi are the experimental data. When we substitute the value o f log c from Eq. (5) into the expression for the residual least square E(log yi - log c + log DO 2

(7)

E(log yiDi) 2 - (E log yiDi)2/n

(8)

we get

which is the value plotted on the contour map as a function of pK1 and pK2. A more complex example is the equation for hyperbolic competitive inhibition, which can be fitted by the G a u s s - N e w t o n method only when good preliminary estimates are available:

[6]


109

VA v = [K(1 + I/Kin)/(1 + I/Kid)] + A

(9)

Since there is no way to make this equation linear by inversion, preliminary estimates of Kin and Kid must be obtained by analysis of the slopes of reciprocal plots by graphical methods, or by using the HYPRPLT program. When the data are good and properly placed this works nicely, but in some cases divergence has been observed. In these cases, examination of the least squares surface was a useful way to tell whether any minimum really existed. The procedure used was to vary Kin and Kid over a grid of values, and at each set of values to fit the data to Eq. (9) with Kin and Kid considered to be constants, and K and V the variables. The grid covered a power of 10 for both Kin and Kia, with the values stepped by factors of 1.26 (the tenth root of 10). The computer program read in the starting values of Kin and Kid (as well as the experimental data), and these could be altered as needed to determine different parts of the least squares surface. The program then printed out for each set of Kin and Kid values the fitted values of K and V and the residual least square. The results of this study were rather interesting. In some cases a minimum was found, and with the values of Kin and Kid from this minimum as preliminary estimates, the computer program using the Gauss-Newton method then did converge. In some of the cases, however, the minimum was reached at a negative value of Kin or Kid, suggesting that this equation was not really the proper one for the data. In other cases no minimum was found, and the contours of residual least square formed a long shallow valley with one end closed, but the other end open, and the floor showing a very gradual drop toward the open end. This shows that one constant is reasonably well determined (set by the coordinate of the floor of the valley), but the other one has only a minimum value (set by the head of the valley) and is not significantly different from infinity (which corresponds to the open end of the valley). In such cases, one will get a good fit by assuming the constant to be infinity (that is, by leaving the appropriate inhibition term out of Eq. 9). The use of least squares surfaces is thus of great value in stubborn cases in diagnosing why the data do not seem to fit the assumed rate equation, and should be strongly encouraged. Computer programs to accomplish the analysis are very easy to write, and the author will be happy to assist anyone desiring to perform this sort of analysis.

Evaluation of Resuhs Once one has fitted the data to a given rate equation (or to several possible rate equations), one has to evaluate the results and draw proper con-

110

I N I T I A L RATE M E T H O D S

[6]

clusions. First, let us consider the case where several possible rate equations have been considered. The following criteria are used in picking the best equation. 1. Residual least square, or its square root, SIGMA. Adding extra terms to a rate equation will lower SIGMA only when the fit is really improved; thus the fit with the lowest SIGMA is usually the best. 2. Standard errors of the constants. When the standard errors are less than 25% o f the values, one can consider the values to be well determined, and thus the term containing this constant definitely present. On the other hand, values such as 2 ± 5 or - 1 - 3 show complete lack of significance, and suggest that the term may be absent. If no appreciable rise in SIGMA results from leaving the term out, it should be discarded. The meaning of this is that the data do not detect the presence o f the term; it is possible of course that more precise experiments, or a different spread of data points would establish the presence of the term. 3. Randomness of the residuals. The size and sign of the residuals (DIFF values in the table printed out) should be random and show no trends. Thus in a competitive inhibition experiment if the residuals for the middle line are all of one sign, and for the top line of opposite sign, one should suspect parabolic or hyperbolic inhibition (that is, nonlinearity of the slope replot). The size of the residuals can also check on proper weighting. Thus if residuals at high substrate levels are much larger than those at low levels, one should use log weighting. Two examples of the use of these methods are shown in Tables I and II. In Table I two initial velocity patterns were run in which DPN was varied at different levels of either deuterated or nondeuterated cyclohexanol. The entire set of data were fitted to a rate equation which assumed that an isotope effect (that is, a value different from 1.00) was present on 4, 3, 2, or only 1 kinetic parameters, as shown. In this case there are only

TABLE I ISOTOPE EFFECTS WITH CYCLOHEXANOL-1-D AND LIVER ALCOHOL DEHYDROGENASE a Isotope effects on V//Keyclohexanot

3.14 3.04 3.04 3.19

_-_ 0.30 ± 0.23 ± 0.24 ± 0.16

V

V/KDpN

Kl DPN

SIGMA

1.08 -+ 0.09 1.10 --- 0.09 1.06 -+ 0.08 (1.00)

0.90 ± 0.18 0.85 - 0.20 (1.00) (1.00)

0.43 ± 0.66 (1.00) (1.00) (1.00)

0.285 0.283 0.282 0.280

a Unpublished data of Dr. P. F. Cook.

[6]


KINETIC DATA

111

small changes in SIGMA, but it is clear from the size of the standard errors that isotope effects on Ki DPNand V/KDpNdo not exist, and if there is an isotope effect on V it is very small. Table II illustrates another example of computer fits to rate equations assuming isotope effects on V, V/K, or both. ~ In this case TPN was held constant at a saturating level, and only the concentration of deuterated or nondeuterated isocitrate was varied. At pH 7.45, none of the values is significantly different from 1.00, and thus there are no isotope effects on either V o r v/gisocitrate. At pH 9.5, however, the much lower SIGMA value for the fit with effects on both V and V/K, as well as the small standard errors, show clearly that small but real effects on both parameters are present.

Description of Computer Programs These programs are written in very simple FORTRAN and should run on any computer with minimal changes in input and output statements. All are designed for card input and will accept as many sets of data as desired (a blank card on the end of the data deck stops the program). Each program makes a least squares fit to a given rate equation and prints out the kinetic constants and various combinations thereof, together with standard errors. In addition, the average residual least square (VARIANCE) and its square root (SIGMA) are printed out, so that one can compare the results of fitting the data to different rate equations. Each program also prints out a table of residuals showing calculated as well as experimental values for each data point, and the differences. Several of the programs also examine the residuals, discard each data point T A B L E II ISOTOPE EFFECTS WITH ISOCITRATE-2-D AND ISOCITRATE DEHYDROGENASE a Isotope effects on pH

V

V / Kjso~itrate

SIGMA

7.45

0.98 --- 0.04 (1.00) 0.95 - 0.06 1.07 -+ 0.01 (1.00) 1.04 ± 0.005

(1.00) 0.99 --- 0.09 1.07 _-+ 0.16 (I.00) 1.12 ± 0.02 1.06 -4- 0.008

0.40 0.42 0.44 1.48 i.90 0.47

9.5

a U n p u b l i s h e d data o f Dr. P. F. Cook. " I s o t o p e Effects on E n z y m e - C a t a l y z e d R e a c t i o n s " (W. W. Cleland, M. H. O ' L e a r y , and D. B. Northrop, eds,), p. 261. Univ. Park Press, Baltimore, Maryland, 1977.

112


[6]

when the difference between the calculated and experimental value exceeds 2.6 × SIGMA, and make a revised fit to the remaining data points. This procedure is designed to throw out bad points and is based on a 99% probability that any difference greater than 2.6 × SIGMA is not the result of random error. To use these programs, the data are placed on cards as follows. The first card is always a title card having in columns 1-3 the number of data points in 13 format (that is, a number with the decimal point assumed to lie between columns 3 and 4). Columns 4-20 are normally left blank, except that all programs except for replots and pH profiles accept reciprocal velocities as input when a number is punched in column 20. Columns 21-68 will be read verbatim from the title card and printed out as a title to identify the output. The title card is followed by as many data cards as are indicated in columns 1-3 of the title card. There is one data card for each point. If replicates have been run, treat each as a separate data point, and do not average them first. In columns 1- 10 of the data card place the velocity, or whatever experimental parameter is being fitted (V, V/K, 1/Ki for pH profiles, for example), using F10.5 format (a number with a decimal point but no exponent, such as 1.26, 101., or 0.013). The number can be placed anywhere in the 10 columns, and the decimal point should always be punched. Columns 11-20 are used for substrate concentration, pH, etc., depending on the program, and in some cases data will be placed in columns 21-30 and subsequent fields. All data are in F10.5 format. The individual programs included at the end of this chapter are described in more detail below. A. Programs for single reciprocal plots 1. HYPER. This program fits Eq. (2), assuming equal variance for the velocities. Data input: velocities in columns 1-10 (or reciprocal velocities if a number is placed in column 20 of the title card), substrate concentration in columns 11-20. Express concentrations as molar, millimolar, or micromolar, so that the size of the numbers is reasonable (between 0.01 and 100); the value of K will come out in the same units. Likewise, scale velocities to some reasonable units; V will come out in these units. 2. H Y P E R L . This program fits data to log v = log (K----~A)

(10)

Data input is the same as for HYPER. Because of the log fit, the units of SIGMA are in log v units, rather than in units of velocity, and thus SIGMA values for HYPER and H Y P E R L can not be directly compared. If one is in doubt which fit to use, look at the residuals (DIFF column in

[6]


1 13

table of residuals) for the two fits. If the residuals are randomly distributed and about the same size at high and low velocities, H Y P E R is the appropriate fit. If the D I F F values for v are low at low velocities and high for high velocities, but the D I F F values for log v are about the same size, H Y P E R L is the correct fit. 3. SIGMOIL. The rate equation fitted is l o g v = log (

VA2 ) a + 2bA + A 2

(11)

Data input is similar to HYPER. The log fit is used because of the wider range of velocities commonly seen when the kinetics are of this form. 4. T W O O N L . The rate equation fitted is

[ V(dA + A 2) log v = log \ c ~ b/~ -+ X 2]

(12)

Data cards are similar to those for HYPER. Because this equation does not become linear upon inversion, it is necessary to supply preliminary estimates from graphical analysis in double-reciprocal form. The following are placed on a single card, which follows the title card and precedes the data cards (it is not counted in the total in columns 1-3 of the title card). Columns 1-10: initial slope of reciprocal plot (when 1/A = 0). Columns 11-20: slope of asymptote (when I / A = ~). Columns 21-30: 1/v intercept of the reciprocal plot. Columns 3 1 - 4 0 : 1 / v intercept of the asymptote. All data are F format. Be careful to see that these parameters have the correct dimensions. If the fit diverges, construct a least squares surface for a grid of b and c values with d and V as the variables. Pick the approximate values of b and c to start with from the equations: d = (SL0 - SLo)/(INT® - INT0) b = d + SL0/INT0 c = d SLoJINT0

(13) (14) (15)

where SL is slope, INT is 1/v intercept, and the subscripts 0 and ~ refer to the values when 1/A is zero or infinity. If a minimum in residual least square is found, modify the program to accept preliminary estimates of b, c, and d directly on the extra card, and now a convergent fit should be obtained. 5. SUBIN. This program fits a reciprocal plot showing linear (that is, total) substrate inhibition v =

VA K + A + A2/K~

(16)

Data input same as for HYPER. It is important that the data cover both

114

1NIT1AL RATE METHODS

[6]

high and low concentrations of substrate so that the velocity falls at least a factor of 2 on both sides of its maximum observed value. B. Programs for initial velocity patterns 6. SEQUEN. This program fits the rate equation for an intersecting initial velocity pattern v=

VAB KiaKb + KaB + KbA + AB

(17)

Data input: velocities in columns 1-10, concentrations of A in columns 1 1 - 20, concentrations of B in columns 21- 30. The units of A and B do not need to be the same; the values of K~a and Ka come out in the units of A, and the values of Kb and Kib (calculated from the assumed relationship K~aKb = K~ K~b) come out in the units used for B. Although one usually runs this sort of pattern as 3 or 4 lines with variable A at different B levels (or vice versa), the data do not have to fall on single reciprocal plots so long as a grid of values of A and B is used. All data are combined and processed with one single title card [the data may of course be separately fitted to Eq. (2) for each of the separate lines, if desired]. Either substrate may be A, since the equation is symmetrical with respect to A and B. 7. PINGPONG. This program fits the parallel initial velocity pattern v =

VAB K~B + KbA + A B

(18)

Data input is the same as for SEQUEN. Again, either substrate may be A. 8. EQORD. This program fits the equilibrium ordered initial velocity pattern VAB KbA + AB

V = KiaKb +

(19)

Data input is similar to SEQUEN, except that, unlike SEQUEN and PINGPONG, it matters which reactant is considered to be A andwhich is B. This is the result of the equation not being symmetrical in A and B, with Ka = 0, and K~Kb/K~ = ~. Thus for input, velocities go in columns 1-10, A concentrations in 11-20, and B concentrations in 21-30. C. Programs for inhibition patterns 9. COMP. This program fits data to a linear competitive inhibition pattern VA v = K(1 + I/Kis) + A

(20)

Data input: v in columns 1-10, A concentration in columns 11-20, I concentrations in columns 21-30. K comes out in the units of A, and Kis in

[6]


KINETIC DATA

115

the units of I, and these units may be different. Note that all data are combined and processed together with a single title card, although each reciprocal plot at a separate I level can be fitted separately to Eq. (2). 10. U N C O M P . The fit is to a linear uncompetitive inhibition pattern v =

VA K + A(1 + I/Kii)

(21)

Data input is the same as for COMP. 11. N O N C O M P . The fit is to a linear noncompetitive inhibition pattern VA v = K(1 + I/Ki~) + A(1 + I/Kii)

(22)

Data input is the same as for COMP. D. Programs for replot analysis 12. L I N E . This program fits kinetic parameters such as K / V o r 1/V to a straight line as a function of inhibitor or reciprocal substrate concentration. It is thus useful for analysis of slope or intercept replots. All possible combinations of slope, vertical or horizontal intercepts are provided plus standard errors. Data input: K / V or 1/V in columns 1-10, inhibitor or reciprocal substrate concentration in columns 11-20, weights if desired in columns 21-30. If no weights are provided, they are set equal to 1. Programs o f types A, B, and C above provide the reciprocal of the square o f the standard error of each parameter as a possible weighting factor. 13. PARA. A kinetic parameter is assumed to be a parabola

y = a + b X + cX 2

(23)

where y is the kinetic parameter and X is inhibitor or reciprocal substrate concentration. Data input is the same as for L I N E . 14. H Y P R P L T . This program assumes hyperbolic variation of a kinetic parameter (1 + X/Kin) y = a (1 + X/Kid)

(24)

where X is again inhibitor or reciprocal substrate or activator concentration. It can be used for analysis of replots or for any set o f data where y changes hyperbolically to a new value as a function o f X. If Kid < Kin, Y decreases with increasing X, while if Kid > Kin, it increases to the new limiting value of aKiJKin. Data input: y in columns 1-10, X in columns 11-20, weights if desired in columns 21-30. The first data card after the title card must be one where X = 0. If no such point exists, estimate the value of y when X = 0 from graphical analysis and include this point with

116


[6]

a very low weighting factor (.000000001). Weights do not need to be supplied for the rest of the data; if not supplied, they are assumed equal to 1. The reason for this procedure is that the preliminary estimates of Kin and Kid are obtained by raising the horizontal axis so that the curve goes through the origin (using the value of y on the first data card), and then making a double-reciprocal plot of the data in the normal manner. E. Programs for pH profiles 15. B ELL. This program fits kinetic constants, such as V, V/K, or 1/Ki from a series of inhibition experiments, to a pH profile where the kinetic constant drops at both high and low pH by a factor of 10 per pH unit: l o g y = log

1 + H/Ka + Kb/H

(25)

where Ka and Kb are the dissociation constants of the groups that ionize, and c is the pH independent value of the parameter y. The fit is in the log form because of the wide range o f y values involved. Data input: y in columns 1-10 (scaled so that the numbers are of a reasonable size), pH in columns 11-20, weights if desired in columns 21-30. Although the reciprocal of the square of the standard error of y can be used as a weighting factor, such weights will have considerable random variation, and in practice we have preferred fits where no weights were used. Wilkinson discusses this problem at length. ~ When two groups ionizing independently are responsible for the form of Eq. (25), the true form of the denominator is: 1 + K~/K1 + H/K1 + Ks~H, where K1 and K2 are the dissociation constants of the two groups. The minimum separation between pKa and pKb in Eq. (25) is then log 4 = 0.6 (corresponding to pK1 = pK2), and the true values for the pK's of the two groups can be calculated from the apparent ones given by the fit to Eq. 25 (see Cleland 6 for a full discussion of this fitting problem). The separation of pK~ and PKb is very easy to establish when it is 2 or more pH units, but when the pK's are closer together than this, the curve has almost the same shape regardless of the pK's, and small experimental errors cause considerable uncertainty in the pK separation (although not in the sum pK~ + pKb). It is, in fact, common in such cases for pKa to appear higher than pKb in Eq. (25), a mathematical possibility, but a physical impossibility. The program thus calculates the values of pKI and pK2, unless pKb - pK~ < 0.6, in which case it assumes that pK1 = pK2 and computes this value instead. 16. H A B E L L . This program assumes a drop in kinetic constant only at low pH: 6 W. W. Cleland, Adv. Enzymol. 45, 273 (•977).

[6]


KINETIC DATA

117

Data input is similar to BELL. 17. H B B E L L . The drop in parameter with pH occurs only at high pH. ,o

y

+

c

Data input is similar to BELL. 18. WAVL. This program assumes that the parameter y decreases at high or low pH, but levels out at a new value + log -v = log (YL 1- -+YH(K/H)'~ ~ ]

(28)

where YL is the value of y at low pH, YH the value at high pH, and K is the pK of the group whose ionization or protonation decreases activity. Data input is similar to BELL, except that a card carrying preliminary estimates from graphical analysis must be placed between the title and data cards (it is not counted in the value placed in columns 1-3 of the title card). On this card, place preliminary estimates in columns 1-10 of pK (the pH at which log y has dropped 0.3 below its higher plateau value, regardless of whether this is YL or YH), and in columns 11-20 of

YL/YH. Additional Programs The programs included here, and those published previously, should allow statistical analysis of most kinetic data. A number of other programs have been written for less commonly encountered rate equations, and the author will make listings of these available to people who have data requiring them. Investigators should send graphs of their data and an explanation of what rate equations they think may fit their data to Dr. W. W. Cleland (Department of Biochemistry, University of Wisconsin, Madison, Wisconsin 53706). Include a phone number in case consultation is necessary. Requests for "all programs," requests from libraries, or requests unaccompanied by justifying data will not be honored. Programs for the following rate equations are available, r A. Programs for single-reciprocal plots Programs to fit rate equations similar to Eqs. (11) and (12), except not in the log form, and Eq. (16) in the log form. r Existing programs can be u s e d for other equations by altering the input. For example, 1/v vs I plots can be fitted by using H Y P E R or H Y P E R L with v in c o l u m n s 1-10 and 1/1 in colu m n s 11-20. K~ is then given by the reciprocal of the printed K value.

o ÷

x ~

"

"

z

nl

~

e

x

n

J

nJn~

B. i-.-

z~

..

M

~ z

M

.l-

m p,1

;

.

•

#,.

'~ ~..J

oi m.-~

>~ m

z

ZL~

118

z c , ~ m

~

Q

•

>

J

>

.

x

4

o

u n n ~ u

~,

un

I~

u

n

~

.~

N g~ n ~

m

ru

~

u

119

°

ua

. .

oo I I

2~222 m •, .

|

x

x

m ~ m ~ m

m

o

Q

g,

"; ,

2a ~. ~. ~. ~. ~. ~ x ~

m

i

mllll ÷

x ~

o Z

o

120

z

~ u

u

.

.

.

.

,~ ~ ~'* .~ g.,

.

. . . . .

x

.

m ÷

J~,,

.

z

~

. .

.

•

.

o

.

.

.

.

z

ii: . . . . . . . . . . . . . . . . . . . .

~u I m,-z z e

~ N ~

~

N

~J

~

m z z

a

z

%-.

o ~ r ~ m

. ~z

~

n

g

126

o

=n

a=2

~¢,

M

g c

~m-

INN

N

o

,.,,.) ~-

U:r~

ii

H £::. ~ UOl

~

~r

~

.q g

>

:

.-; -.

);

o - "... >

:> ~

>

~ . ~

~.

~

u

127

"l

~,

>

o

~

~

"-~

~>

7. CL I--

u

~untn ° ° .

:~

Zat~

M II

•

III n

l'- ~

W

p-

Ua

÷

~

z

nJ u~

",.

:¢

.

"~,_

,g,g,,7"2 "

-~nc

~ "'>

.

. . . . . . . . . . . . .

r-

g oze-

Error structure of enzyme kinetic experiments. Implications for weighting in regression analysis of experimental data.

Statistical analysis of radioligand assay data.

A computer method for the kinetic analysis of enzyme activity.

Graphical analysis of steady-state kinetic data of multireactant enzymes.

Bayesian Sensitivity Analysis of Statistical Models with Missing Data.

Statistical analysis of functional neuroimaging data: exploratory versus inferential methods.

An on-line method for the collection and analysis of quasi-steady-state kinetic data for enzyme-catalyzed reactions.

Statistical analysis of genetic interactions in Tn-Seq data.

Evolution of Enzyme Kinetic Mechanisms.

Statistical Analysis of High-Dimensional Genetic Data in Complex Traits.

A Statistical Framework for the Analysis of ChIP-Seq Data.

EBprot: Statistical analysis of labeling-based quantitative proteomics data.

Statistical methods for the analysis of high-throughput metabolomics data.

Cardiac pacing and pacemakers IX. Statistical analysis of pacemaker data.

Classification, predictive modelling, and statistical analysis of cancer data (a).

Controversy in statistical analysis of functional magnetic resonance imaging data.

Statistical analysis and modeling of mass spectrometry-based metabolomics data.

Versatile kinetic-approach to analysis of dissolution data.

MULTI-RESOLUTION STATISTICAL ANALYSIS ON GRAPH STRUCTURED DATA IN NEUROIMAGING.

Statistical data presentation.

Kinetic behavior of immobilized enzyme systems.

Statistical approaches to toxicological data.

Kinetic analysis of experiments involving the single turnover of an enzyme.

Nonequilibrium statistical field theory for classical particles: Basic kinetic theory.