[1]

ANALYSIS

OF RADIOLIGAND

ASSAY DATA

3

[1] Statistical Analysis of Radioligand Assay Data B y D. RODBA~D a n d G. R. FRAZlER

T h e w i d e s p r e a d use of l a r g e - s c a l e r a d i o i m m u n o a s s a y s ( R I A ) a n d rel a t e d t e c h n i q u e s ( s a t u r a t i o n assays, c o m p e t i t i v e p r o t e i n b i n d i n g a s s a y s , or r a d i o l i g a n d a s s a y s in general) h a s led to the d e v e l o p m e n t of n u m e r o u s m e t h o d s for r o u t i n e d a t a a n a l y s i s . 1-19 U n f o r t u n a t e l y , m a n y persons still utilize g r a p h i c a l m e t h o d s alone or l i n e a r i n t e r p o l a t i o n between a d j a c e n t p o i n t s on t h e d o s e - r e s p o n s e curve. T h e s e m e t h o d s do n o t p r o v i d e efficient u t i l i z a t i o n of t h e d a t a , do n o t p r o v i d e e s t i m a t e s of t h e precision of u n k n o w n s , a r e s u b j e c t to e r r a t i c b e h a v i o r a n d to s u b j e c t i v e biases, a n d forfeit i m p o r t a n t i n f o r m a t i o n about the assay system. 1D. Rodbard, P. L. Rayford, and G. T. Ross, I. Clin. Endocr. Metab. 29, 352 (1968). 2D. Rodbard, W. Bridson, and P. L. Rayford, d. Lab. Clin. Med. 74, 770 (1969). ~D. Rodbard and J. A. Cooper, in "In Vitro Procedures with Radioisotopes in Medicine," p. 659. IAEA, Vienna, 1970. 4 D. Rodbard, P. L. Rayford, and G. T. Ross, in "Statistics in Endocrinology" (J. W. McArthur and T. Colton, eds.), p. 41. MIT Press, Cambridge, Massachusetts, 1970. D. Rodbard and J. E. Lewald, Acta Endocrinol. (Copenhagen) 64, Suppl. 147, 79 (1970). D. Rodbard, in "Principles of Competitive Protein Binding Assays" (W. D. Odell and W. H. Daughaday, eds.), p. 204. Lippincott, Philadelphia, Pennsylvania, 1971. ~G. M. Brown, R. L. Boshans, and D. S. Schalch, Comput. Biomed. Res. 3, 212 (1970). 8 A. J. Valleron and G. E. Rosselin, Ann. Biol. Clin. (Paris) 29, 145 (1971). W. G. Duddleson, A. R. Midgley, and G. D. Niswender, Comput. Biomed. Res. 5, 205 (1972). loC. L. Bliss, in "Statistics in Endocrinology" (J. W. McArthur and T. Colton, eds.), p. 431. MIT Press, Cambridge, Massachusetts, 1970. 11C. L. Meinert and R. B. McHugh, Math. Biosci. 2, 319 (1968). 1~H. G. Burger, V. W. K. Lee, and G. C. Rennie, J. Lab. Clin.. Med. 80, 302 (1972). 1~A. Arrigucci, G. Forti, G. Fiorelli, M. Pazzagli, and M. Serio, in "The Endocrine Function of the Human Testis" (V. H. T. James, L. Martini, and M. Serio, eds.), p. 73. Academic Press, New York, 1972. 1' M. J. R. Healy, Biochem. J. 130, 207 (1972). ~ R. P. Ekins and B. Newman, Acta Endocrinol. (Copenhagen) 64, Suppl. 147, 11 (1970). ~eI. B. Taljedal and S. Wold, Biochem. d. 119, 139 (1970). " R. Leclercq, I. B. Taljedal, and S. Wold, Clin. Chem. Acta 36, 257 (1972). ~8D. Wilson, G. Sarfaty, B. Clarris, M. Douglas, and K. Crawshaw, Steroids 18, 77 (1971). ~9S. R. Vivian and F. S. LaBella, J. Clin. Endocrinol. Metab. 33, 225 (1971).

4

HORMONE ASSAYS

[1]

The RIA dose-response curve presents two problems: nonlinearity and nonuniformity of variance (i.e., the scatter around the curve depends on the position on the dose-response curve). 1--6 Both of these problems have been handled by adaptation of conventional statistical methods similar to those used for analysis of bioassay results. ~ Response Variables

Either the "bound" or the "free" fraction may be counted. However, with the exception of the original though now virtually obsolete chromatoelectrophoresis method, it is unnecessary and undesirable to count both fractions, provided that the pipetting error for labeled antigen is 1% or less. Commonly employed response variables include: bound counts; free counts; B / T , fraction bound or its reciprocal; F / T , fraction free; B / F , R, bound-to-free ratio; F / B , free-to-bound ratio; Y = B / B o counts bound relative to counts bound for zero dose. When using any of these response variables (with the exception of the first two ("raw" counts), it is imperative that the "nonspecific" counts (N) be subtracted from both the numerator and denominator (nonspecific counts represent the counts recorded in the absence of specific antibody or in the presence of an "infinite" amount of ligand), i.e.: B/T

-

B-N T-N

Y = B/Bo -

B/F-

B--N T-- B

(1h) B - N (B/T) Bo- N (B/T)o

(1B) (1C)

where B is counts bound, Bo is counts bound for zero dose, N is nonspecific counts, and T is total counts. The symbols on the left in these equations correspond to common usage and are to be regarded as symbols (e.g., B / B o ) not as the ratio of two numbers. Usually N ranges up to 10% of the total counts. Either arithmetic or logarithmic scales may be used for the dose (X) axis. The latter facilitates dose interpolation in the low dose region and provides partial linearization. Curve Fitting Based on the Mass Action Law Several workers have attempted to predict the shape of the dose-response curve on the basis of first-order mass action law and have at-

[1]

ANALYSIS OF RADIOLIGAND ASSAY DATA

5

tempted to use these equations as the basis for curve fitting (e.g., Meinert and MeHugh ~1 and Wilson et al.18). However, the empirical dose-response curve is usually beset with problems of (a) heterogeneity of binding sites, 2° (b) nonidentical behavior of labeled and unlabeled ligand, 2°,21 (c) failure to reach equilibrium (especially with staggered addition of reagents), 22-2~ and (d) errors in separation of bound and free. ~8,~5 Although each of these effects can be included in the theoretical treatment, ~°-~5 the resulting expressions involve too many arbitrary or unknown parameters for routine use for dose interpolation. The parameter fitting programs of Baulieu and R a y n a u d 2G or Feldman 27 can handle problem a; the program of Feldman 27 can also handle problem b when adequate data are available. In our routine program for R I A data processing we employ parameter fit for four successive models, corresponding to (a) a single class of "specific ''~ or saturable sites, (b) a single class of sites and a horizontal asymptote, (c) two classes of saturable sites, and (d) two classes of sites and a horizontal asymptote. Case d corresponds to: ~6,~s,29 B/F -

Kllql 1 + K11F

+

K12q2 1 + K12F

(2)

+ K3

where Kll and KI: represent the affinity constants; ql and q2 represent the binding capacities for the two classes of saturable sites: and K3, the horizontal asymptote, corresponds to K q for a third class of sites (nonsaturable). Equation (2) implicitly defines B / F as a function of dose X since F -

X+X*

1 +B/F

-

(F/T)(X

+ X*) = (1 - B / T ) ( X

+ X*)

(3)

where X * is the "dose" of labeled ligand.: The major purpose of this series of calculations is to attempt to char:0 H. Feldman, D. Rodbard, and D. Levine, Anal. Biochem. 45, 530 (1972).

:1 R. P. Ekins, B. G. Newman, and J. L. H. O'Riordan, in "Radioisotopes in Medicine: In Vitro Studies" (R. L. Hayes, F. A. Goswitz, and B. E. P. Murphy, eds.), p. 59. U.S. At. Energy Comm., Oak Ridge, Tennessee, 1968. :SD. Rodbard, H. J. Ruder, J. Vaitukaitis, and H. S. Jacobs, J. Clin. Endocrinol. Metab. 33, 343 (1971). :~D. Rodbard and G. H. Weiss, Anal. Biochem. 52, 10 (1973). 2~G. Vassent and S. Jard. C. R. Acad. Sci. 272, 880 (1971). 25D. Rodbard and K. J. Catt, J. Steroid Biochem. 3, 255 (1972). ~ E.-E. Baulieu and J-P. Raynaud, Eur. J. Biochera. 13, 193 (1970). ...7H. A. Feldman, Anal. Biochem. 48, 317 (1972). ~ D. Rodbard, in "Receptors for Reproductive Hormones" (B. W. O'Malley and A. R. Means, eds.). Plenum, New York 289, 327, 342 (1973). ~gD. Rodbard and H. A. Feldman, Vol. 36 [1].

6

111

HORMONE ASSAYS

acterize the antiserum or reaction system. Simpler and more reliable methods are available for dose interpolation.

"Empirical" Methods of Curve Fitting 1. Hyperbolas. A plot of B / F , B / T , B/Bo, or of "raw" counts bound vs. dose gives a "hyperbolic" curve. M a n y persons have attempted to "fit" this hyperbola2 °,l~,17 Further, many persons have attempted to linearize this hyperbola by use of the reciprocal of the response variable. A plot of

1 1 B/F' ~'

1 or B / B o

vs. X will give a linear relationship for a limited region: The extent of this linearity varies depending on the assay. If the assays operate "at saturation" so that the amount bound remains constant, then this method is theoretically justified and performs well. 3° However, there may be serious departure from the model. These "hyperbolic" methods, especially those involving linearization of the curve by use of reciprocals, are readily adaptable to those (programmable) desk-top calculators which are not equipped to handle logs or antilogs readily. When using this method, it is important to use weighted least-squares linear regression since the use of the reciprocal magnifies the error in the response variable for high doses; for example, if B/Bo were to have uniformity of variance, then 2s Var(Bo/B) _~ - -

1

(B/Bo)'

Var(B/Bo)

2. Orthogonal Polynomials (Power Series). M a n y workers have used "multiple polynomials" or power series to describe the dose-response c u r v e : 8,13,17

Either Y=

Co + c , X + c2X ~ + c,~X3--}- • • " + c , X ~

X

Co+OYq-c2Y

or =

2q-c3Y

3q-

" " " q-c~Y

~

may be used. This is a powerful, general method for curve fitting. However, it has several drawbacks and in comparative studies 1~,17 does not perform as well as the logit-log or logistic models discussed below. First, the use of multiple polynomials involves the use of many (e.g., 6) parameters which must be fitted. These constants are very unstable and ~*C. N. H a l e s a n d P. J. R a n d l e , Biochem. J. 88, 137 (1963).

[1]

ANALYSIS OF RADIOLIGAND ASSAY DATA

7

m a y fluctuate widely from week to week or from assay to assay. Also, the prediction of confidence limits for unknowns is complicated. As noted by Weaver, 31 the use of the following series is more efficient than use of the usual power series: X

1 =-~(c0-[-clY+c2y2-4-

• • • A-cnY~)

(4)

By use of the term 1/Y, this method represents an "adjustment" (by use of a power series) to the rectangular hyperbola. Brown et al. 7 have used a power series (with three terms) in terms of X vs. log(Y) : X = Co A- ci log (Y) -4- c2[log (y)]2 However, the use of log(Y) introduces potentially severe problems of nonuniformity of variance, especially as Y approaches zero. 3. Truncation. The sigmoidal curve of B/F, B / T , B/Bo, F / T , or raw counts (either bound or free) vs. log(X) can be approximated by a straight line, for the central region of the curve. Linear regression, together with truncation, has been used by a large number of workers with considerable success. However, even the central portion of the dose-response curve is not exactly linear, and this method forfeits the ability to use the very low dose region of the curve, which is often important. 4. The "Logit-Log" Method. The sigmoidal curve of B/Bo vs. log(X) suggests that a linear dose-response curve could be obtained by use of a "sigmoidal" or "S-shaped" transformation of the response variable. Either the logit, probit, or arc-sine transformation can be used. 1-~,~4'19,3z,3'~ The logit transformation is preferable: It is the easiest to calculate, provides the simplest expressions for weighting, and is theoretically justified. When Y is a decimal fraction, 0 < Y < 1, the logit transform, is defined by logit(Y) = loge ( l _Y y )

(5)

Then, the R I A dose-response curve is described by the linear equation ~-~ Y' = logit (Y) = a -f- b log X (6) where Y = B/Bo, X is dose, and a and b are constants. 31C. K. Weaver and C. M. Cargille, J. Lab. Clin. Mem. 77, 661 (1971). ~ D. J. Finney, "Statistical M e t h o d in Biological Assay." Griffin, London, 1964. 33D. J. Finney, "Probit Analysis," 3rd ed. Cambridge Univ. Press, London and New York, 1971.

8

[1]

HORMONE ASSAYS

Graph paper is available, with a logistic ruling on the vertical axis and a logarithmic ruling on the horizontal axis (from Heifers & Co., 26 King St., Cambridge, England, and TEAM, Box 25, Tamworth, N.H., USA). Plotting Y = B / B o vs. X on this paper, will immediately indicate whether this method provides linearity. Again, it is necessary that the nonspecific counts have been subtracted from both the numerator and the denominator. When the hyperbolic methods (discussed above) are successful, the logit-log method will also linearize, with a slope of --1 (when using loge X or common logs) or with a slope of --2.303 (when using logl~ X or common logs on the abscissa).2 In addition, the logit-log method will often provide excellent linearization even when the hyperbolic methods fail. The logit-log model is justified on the basis of the first-order mass action law: For several combinations of tracer and antibody concentration (especially for those near the "optimum"), the logit-log method shows no departure from linearity,a4 However, when there is marked disparity of tracer and antibody concentration, or when there is severe antibody heterogeneity, then the logit-log model may also fail to provide adequate linearization. 34 Ekins and Newman 15 have noted that there is also a linear relationship between logit \(B--~0]

and

log (X)

or equivalently, between log[F/B -

(F/B)o]

and

log (X)

(see Figs. 5 and 6 of Ekins and Newman, 1~ respectively). The logit-log method used here has certain similarities to the Hill or Sips plot (see Rodbard29), but it is not mathematically equivalent or interconvertible with them. The logit-log model requires accurate estimation of both the "100%" (zero dose) and "0%" (infinite dose) response levels, designated Bo and N, respectively (N is usually measured by omission of the specific antibody). This is most readily achieved by use of replicates (e.g., quadruplicates) at these ends of the curve. Any serious error in either B~, or N may result in significant nonlinearity of the logit(Y) vs. log(X) plot. However, usually the error in Bo and N is negligible for practical purH. Feldman and D. Rodbard, in "Principles of Competitive Protein Binding Assays" (W. D. Odell and W. H. Daughaday, eds.), p. 158. Lippincott, Philadelphia, 1971.

[1]

ANALYSIS OF RADIOLIGAND ASSAY DATA

9

poses. The logit-log method has been used in dozens of different assays with excellent results (e.g., HCG, LH, FSH, cAMP, H G H , H T S H , insulin, cortisol, testosterone, estradiol, angiotensins I and II, progesterone, 17a-hydroxyprogesterone, DHA, folic acid, morphine, heroin, cAMP, and vitamin B12). In an appreciable number of these assay systems, the slope of the curve is significantly different from --1, indicating that the use of the hyperbolic methods would have been unsatisfactory. By providing linearity, the logit-iog method greatly facilitates dose interpolation over the entire dose range. Graphical methods can be used, usually quite satisfactorily. The slopes and intercepts obtained graphically are usually in quite good agreement with computed values. Unweighted least-squares regression may be used. 2,19 These methods usually provide satisfactory curve fits; however, the unweighted method should not be used to predict confidence limits. Also, unweighted regression can only be used with truncation, e.g., at B/Bo = 0.2 and 0.8. Truncation may be regarded as a crude form of weighting. The major purpose of the unweighted regression is to obtain initial estimates for subsequent calculation of an iterative weighted regression 5 using a maximum likelihood method. ~ A computer program is available for thus purpose (Fortran, IV, G)~5.~6 and has been employed successfully in several laboratories. Notes on "weighting" will be given below. The necessary equations are summarized in Appendix II of Rodbard and Lewald. 5 Usually, only two or three iterations are necessary: The method converges very rapidly. Our program a~ tests for linearity by two methods: (a) fitting straight lines to the two halves of the data and testing for identity of slope; and (b) fitting a parabola to the data, using the method of orthogonal polynomials (weighted).37.3s If the parabola provides an improved fit to the data (i.e., a significant reduction in residual sum of squares or a coefficient of [log(X)]-~ significantly different from zero), then there is significant nonlinearity and a warning is printed. Indeed, the use of the parabolic relationship between logit(Y) and log(X) has been used at the basis for dose interpolation in the method of Hansen. 3' Our program plots ~ N. L. McBride and D. Rodbard, "Radioimmunoassay Data Processing," Reports NItt-RIA-71-1 and NIH-RIA-71-2 (Accession No. PB205587 and PB205588). Nat. Tech. Inform. Serv., Springfield, Virginia, 1972. G. R. Frazier and D. Rodbard, "Radioimmunoassay Data Processing," 2nd ed., Reports NIH-RIA-72-1 and NIH-RIA-72-2 (Accession No. PB217366 and PB217367). Nat. Tech. Inform. Serv., Springfield, Virginia, 1973. ~*K. A. Brownlee, "Statistical Theory and Methodology in Science and Engineering." Wiley, New York, 1960. '~ C. L. Bliss, "Statistics in Biology." McGraw-Hill, New York, 1967. D. L. Hansen, personal communication.

10

HORMONE ASSAYS

[1]

the R I A dose-response curve in coordinates of B/Bo vs. log(X), logit(B/Bo) vs. log(X), and B/Bo vs. arithmetic dose X. Indeed, the logit-log method may be regarded as a technique for nonlinear curve fitting in terms of the original nontransformed variables B/Bo vs. log dose or of counts vs. dose. The graphs show the points, the regression line, and the 95% confidence limits for a single observation around the line (ignoring the errors in the estimation of the regression line, which may be regarded as "fixed" within any one assay).~,6,3~ Our program then proceeds with dose interpolation for unknowns and predicts the coefficient of variation for each potency estimate based on the behavior of the standards. If samples have been run in replicate, the mean, standard deviation and coefficient of variation are provided. Corrections are made for variable sample volumes a n d / o r variable recoveries, if appropriate. (Corrections for recoveries are often necessary in assays involving a preliminary extraction, thin-layer chromatography, etc.) The unknowns are plotted consecutively. Also, the upper and lower 95% confidence limits, expressed as a fraction of the potency estimate, are shown as a function of the position on the dose-response curve. Finally, a Scatchard plot analysis is performed, to estimate the affinity constants and binding capacities of the various classes of sites present. 2.-29 In addition, this program provides an optimization routine similar to that of Ekins and Newman 15,21,4° to predict the concentration of labeled antigen to antibody to provide the optimal sensitivity (minimal least detectable dose). The program also makes it possible to compare two preparations which have been studied at three or more "points" involving two or more dose levels2 ,~2 A regression line is calculated for each curve. The residual variances are compared and combined. The lines are tested for parallelism (identity of slopes). The residual variance is adjusted. The intercepts are tested for identity. Then the log-potency estimate and the potency estimate are obtained together with their 95% confidence limits. Finally, this answer is multiplied by an arbitrary constant (if desired) to convert the relative potency into an absolute potency. This is a standard operating procedure in bioassay statistics and is adapted from Finney 32 with trivial modification. However, the weighting function used here was developed specifically for radioimmunoassays. 5,6 When two curves are not parallel (or if either is nonlinear), then the interpretation of the relative potency is subject to doubt. In this case, one can calculate the ratio of ~°J. Albano and R. P. Ekins, in "In Vitro Procedures with Radioisotopes in Medicine," p. 491. IAEA, Vienna, 1970; R. P. Ekins and G. B. Newman, in "Protein and Polypeptide Hormones" (M. Margoulies, ed.), Int. Congr. Ser. No. 161, p, 329ff and pp. 672-682, Excerpta Med. Found., Amsterdam, 1969.

[1]

ANALYSIS OF RADIOLIGAND ASSAY DATA

11

the midranges (EDso, IDso, or 50% intercept, i.e., the dose resulting in B / B o = 0.50) for the two preparations and the confidence limits for this ratio. Also, in some cases of nonparallelism, one may convert the ratio of the EDso'S into the ratio of the affinity constants for the two preparations (Vivian and LaBella, 1'(' Appendix III of Rodbard et al.5,2s,41). 5. A Generalized Logistic Model. The logit-log approach requires ac-

curate and precise estimates of Bo and N, which are then regarded as constants in the remainder of the curve-fitting procedure and dose interpolation. However, we can use statistical methods to "estimate" or "adjust" our initial estimates of Bo and N. This provides greater generality but also introduces greater complexity. Bliss, TM Leclercq et al., 17 and others have suggested the use of curve fitting for the "nonspecific" counts N, i.e., the lower "horizontal asymptote," but in so doing, they forced the slope of the logit-log relationship (or the exponent of X) to be --1 (i. e., they used the hyperbolic model). These workers 16,17 did not appreciate that the nonspecific counts had already been subtracted in the logitlog m e t h o d . " Burger et al. 1~ and Serio et al. 13 have suggested the use of curve fitting to readjust and refine the estimate of Bo, using information from all of the points on the curve. However, these authors regarded the nonspecific counts as known, fixed, and already subtracted from the response variable. It appears that one can subserve all of the above models, by use of the equation: y -

a--d q- d 1 + (X/c) b

(7)

where a corresponds to the (predicted) response when X = 0 (viz., Bo) ; d corresponds to N, the response when X = ~ ; c - - d o s e when B / B o = 0.5 (previously designated as the "50% intercept" or the "midrange")16; and b = exponent, corresponds to (--1) times the slope of logit(Y) vs. log(X), Eq. (6). This is a four parameter "logistic" model. The use of this approach was apparently first publicly suggested by D. J. Finney during the Fourth International Biometrics Congress, Hannover, 1970. Healy TM has recently proposed this approach, using the identical model in a slightly different nomenclature. Extensive literature is available for similar methods used for statistical analysis of bioassay results 32 and for growth curves. 42 The 4, D. Rodbard, J. Clin. Endocrinol. Metab. 32, 92 (1971). 42E. Marubini, L. F. Resele, J. M. Tanner, and R. H. Whitehouse, Hum. Biol. 44, 511 (1972); E. Marubini and L. F. Resele, Comput. Programs Biomed. 2, 16 (1971).

12

HORMONE ASSAYS

[11

two-parameter logit-log model [Eq. (6)] may be used to obtain initial estimates of b and c. 13 Then, a general nonlinear weighted curve-fitting program is used for calculation of parameters a, b, c, and d. Of course, three-parameter models with either a or d fixed and the other "floating" can also be used. Note that changing the sign of b is equivalent to reversing the roles of a and d. Numerous programs are available for this purpose, using either the Newton-Raphson method or the Marquardt-Levenberg iteration. The latter provides both stability and rapid convergence, is less dependent on the availability of good initial estimates for parameters, and is currently in vogue. However, since we have excellent starting estimates, the Newton-Raphson and Gauss-Newton methods are also quite satisfactory. In contrast to the unweighted methods of Burger, 12 Serio, 13 Healy, 14 and Taljedal, 16,~7 weighted regression must be used, unless one can demonstrate homogeneity of variance for Y (see below). Indeed, the data of several studies, '2-'~ and particularly of Taljedal and Wold 16 indicate a 10-fold, systematic variation of the standard deviation of counts with the position on the dose-response curve. This would correspond to a 100-fold range for the variance and for the weights assigned to the various points on the curve. The four-parameter logistic model [Eq. (7)] avoids the need for the log and logit transformations. However, it does require the use of exponentials and employs a nonlinear regression (in matrix notation) in lieu of the relatively simple and familiar methods of linear regression. Thus, it is a moot point as to which method provides greater "simplicity." Also, the four-parameter model may give difficulty in convergence, the errors in the four parameters are interdependent (ideally, the joint confidence regions for the four-parameters should be used, but this is quite difficult to obtain), and calculation of the confidence limits or standard error for potency estimates is extremely complicated, unless one assumes that the regression curve is fixed. (Incidentally, even if this model [Eq. (7)] is chosen, we prefer the use of a log scale for dose in graphical display of the curve.) The general four-parameter logistic model (above) can be used with either the free or bound counts B / T , B/F, F/T, or B/Bo as the response variable. Further, this method promises to be useful for curve fitting, for immunoradiometric (labeled antibody) assays, 43 and for the two-site immunoradiometric assay. 44 In addition, it should be useful in certain enzymatic assays and in vitro bioassays and in describing the dose-re"~L. E. M. Miles and A. S. Hartree, J. Endocrinol. 51, 91 (1971). 44D. Rodbard and D. M. Hott, in "Symposiums on Radioimmunoassay and Related Procedures in Clinical Medicine and Research, IAEA, Vienna, 1974.

[1]

ANALYSIS OF RADIOLIGAND ASSAY DATA

13

sponse curves for cAMP production stimulated by hormones in vitro. 45,4G Further, the same programs can be used to find parameters for the Sips or Hill plot when these are linear. 29 This method can result in an increase in the degrees of f r e e d o m - - a n d thus reliability--of the residual variance: we have two additional fitted parameters, a and d, but the replicate values for Bo and N are included as observations in the regression analysis. When performing "parallel-line" analyses using the four-parameter model, it is necessary to (a) compare a, b, and d for the two curves (i.e., do the two curves have the same upper and lower asymptotes, and are they parallel?) ; (b) obtain combined estimates of a, b, and d; (c) reestim a t e residual variance; and (d) reestimate c values and obtain the relatire potency and its confidence limits. Alternately, each "point" for the unknown m a y be "read" from the dose-response curve individually and corrected for sample volume (dose or dilution). These values are then averaged (each one weighted according to its precision), and a significance test can be used to detect systematic trend (nonparallelism). Alternatively, one can plot the observed potency vs. dose and test whether this relationship can be described by a straight line going through the origin.

N o t e s on N o n u n i f o r m i t y of Variance and Weighting In the analysis of bioassay data, a large degree of "nonuniformity of variance" m a y go unnoticed. Variance is large, replication is difficult and expensive, and there is usually no way to predict the magnitude of the scatter of the response variable around the curve as a function of its position on the dose-response curve. In radioimmunoassay, variance is small, replication is easy, and we can predict the size of the scatter (variance) as a function of Y (or as a function of X). Certainly, we know t h a t the magnitude of the counting error is directly related to the number of counts recorded. Similarly, we can calculate the error in the response variable attributable to a 1% error (or any other percent error, v~,) in the pipetting of standard (or unknown) antigen, a 1% error in the pipetting of the labeled antigen, a 1% error in the pipetting of the antibody, and a 1% error in the K value 4~K. J. Catt, W. Watanabe, and M. L. Dufau, Nature (London) 239, 5370 (1972); M. L. Dufau, K. J. Cart, and T, Tsuruhara, Proc. Nat. Acad. Sci. U.S. 69, 2414 (1972). 4~G. Sayers, R.J. Beall, and S. Seelig, Science 175, 1131 (1972); T. Barth, S. JaM, F. Morel, and M. Montegut, Experientia 28, 967 (1972); I. D. Goldfine, J. Roth, and L. Birnbaumer, J. Biol. Chem. 247, 1211 (1972).

14

HORMONE ~.SSAYS

[1]

(or in the total volume of the reaction mixture). Finally, we can calculate the random errors resulting from the incomplete separation of bound and free fractions (see Model I I I of Rodbard6). The results of this analysis have been confirmed by Monte-Carlo computer simulation and by direct comparison with experiment2 In general, there will be a systematic relationship between ay 2 (the standard deviation of the response variable) and the response level in the form ~ r 2 = ao + a l Y + a2Y 2

(8)

A three term (three-parameter) model of this form is adequate. M a n y workers 12-14,16-19 have assumed al = as = 0. Bliss 1° and Duddleson et a l 2 have assumed that ao = as = 0. However, when attempting to simplify this relationship, we have preferred to use a linear model obtained by setting ~2 = 0. The magnitude of ao, ai, and a2 depends on the total counts T, nonspecific counts N, Bo, pipetting error (vp), and the misclassification errors (vi and vH). Accordingly, these may v a r y from assay to assay, from one type of RIA to another, and between laboratories. Thus, it is not surprising that some workers find significant nonuniformity of variance, whereas others report this problem to be nonexistent or minimal. However, the onus of proof is on those who claim that there is uniformity of variance. The best way to establish ao, al, and a2 is empirically. This may be done as follows: (a) Run ten replicates at each of 10 dose levels (including 0 and ~ ) . (b) Calculate s~ ~ at each dose level. (c) Plot s~.s vs. Y. (d) Attempt to fit this scattergram by (1) a horizontal line (i.e., uniformity of variance, with al = as = 0) ; (2) a straight line forced through the origin (i.e., a Poisson-like variance with a0 = as = 0) ; (3) a straight line not forced through the origin (i.e., a3 = 0) (this is the method used with greatest success by the present authors); and (4) if justified by the data, the complete quadratic form of Eq. (8) with ao, a~, and as.

Comments 1. In lieu of ten replicates at 10-dose levels, one can use duplicates and then "pool" results over assays in order to obtain a sufficient number of degrees of freedom (dr) to have reliable estimates of sample variance. 2. Most reports of homogeneity of variance result from use of insufficiently sensitive methods to detect nonuniformity of variance; for example, Bartlett's test is very inefficient and insensitive to systematic trends of the type readily revealed by the graphical analysis just described. 3. The variance at one "end" of the response scale may be, for exam-

[1]

ANALYSIS OF RADIOLIGAND ASSAY DATA

15

ple, four times the variance at the other end. This corresponds to a twofold difference in the standard deviation. Although this difference is small, it means that the size of the confidence limits will v a r y by a factor of two and the weights assayed to different points will v a r y by a factor of four. 4. In contrast to the Poisson-like model of Bliss 1° and Duddleson et al. 9 the sy ~ may actually decrease as Y increases. This has been seen repeatedly in several assays, especially those in which the free, rather than the bound, fraction is counted. 5. The same type of analysis (of ay2 vs. Y) just described for the standards should be repeated using the unknowns, provided that the unknown samples have been run in duplicate (or higher replication) at exactly the same dose (or volume) level and with the same percent recovery (if applicable). Thus, if we have 200 samples, each run in duplicate, we will have 200 estimates of sy 2, each with one degree of freedom: sY ~ = ( Y 1 -

(9)

Y2)2/2

B y pooling these estimates of sr 2 (e.g., for 0_~ Y < . I , .1_~ Y < . 2 , etc.) we can construct a graph of sy 2 vs. Y and obtain very reliable estimates of ao, al, and a~ in Eq. (8). The results (in terms of the plot of s~ ~ vs. Y or no, al, and as) for the unknowns should he compared with those for the standards. If any discrepancy is noted, one should investigate its source and use the results based on the performance of the unknowns. If the results (in terms of sr ~ vs. Y) for the standards and for the unknowns are comparable, these results can be pooled thereby increasing the degrees of freedom, the sensitivity of our ability to detect nonuniformity of variance, and the precision of our estimates of ao, al, and a:. Ideally, one should use weighted least-squares regression to compute the parameters (ao, al, and a~) of Eg. (8). The weights will be (approximately) inversely related to sy 2. The values of these parameters should be compared with results from previous assays. With these values in hand, we then proceed to compute the parameters of Eq. (6) or Eq. (7). In the curve-fitting methods of several authors, l-~-~,16,17 one should then assign a "weight" to each point, which is the reciprocal of the variance predicted by Eq. (8) : 1

1

weight - Var(Y) - (s~) 2

1

a0 + a,Y + a21~2

(10)

Then, on each iteration of the nonlinear regression of Y or X using Eq. (7), the value of the weight for each point is adjusted by recalculating

16

HORMONE ASSAYS

[1]

sy2 as a function of the Y value predicted on the basis of the dose X for that point using the parameters a, b, c, and d obtained on the previous iteration. Thus, even though Eq. (10) specifies weights on the basis of Y implicitly, weights become a function of dose X. This has several desirable properties: Replicate values for Y at the same X receive the same weight. In the logit-log method [fitting parameters using Eq. (6)] the weight assigned to each point is given by 5,6 w =

1 Var[logit (Y)]

(llA)

where Var(~) Var[logit (Y)] ~_ ~':(1 - ?)3

(llB)

Here, Var(l~) is not constant; it must be evaluated from Eq. (8). The Y values on the right side of Eq. (llB) represent the Y values predicted for a given X on the basis of the previous iteration. Also, in this method, it is advisable to use a "working logit" analogous to the "working probit. ''5,32 The use of the logit transformation greatly increases the severity of nonuniformity of variance. Even if the original Y value were to have uniformity of variance, there would be severe nonuniformity of variance of logit(Y)--or probit(Y) or arcsin(Y). However, this nonuniformity of variance is well taken into account by Eq. (llB). Thus, the weights have a different meaning in the logit-log method than in the methods using a Y variable without the logit transformation. Similarly, the residual variance has a different meaning and magnitude. Thus, comparisons of residual variances obtained in methods using different response variables 13,16,1T (or weighting functions) are not valid. When faced with a choice between a weighted linear regression vs. an unweighted nonlinear regression, most workers prefer the former approach. Certainly, a weighted linear regression is easier to handle than a weighted nonlinear regression. For these reasons, the original twoparameter logit-log method still remains the first choice for most routine RIA data processing.

Empirical Quality Control The analyses discussed above enable us to predict the precision of a potency estimate for any position on the dose-response curve in any given assay. Usually, this is based on the performance of the standards (as in most of the presently available computer programs), but this may

[1]

ANALYSIS OF RADIOLIGAND ASSAY DATA

17

be done equally well or better on the basis of the behavior of the unknowns. Nevertheless, it is essential to have an independent check on the precision of the assay system, both in terms of variability within assays and variability between assays. To accomplish this, a sample (or group of samples, one at the low dose, one at the midrange, and one at the high dose position on the dose-response curve) is run in replicate both within a given assay and on several different assays. These data are analyzed as follows: 1. Plot the individual potency estimates (and the mean of the replicates for each sample in each assay) vs. assay number (or date) on the abscissa. Use of a logarithmic scale on the ordinate facilitates examintion of samples over a wide dose range. In effect, this permits us to look at the relative or percentage error in the original observation. 2. Examine the scatter for replicates within the latest assay. How does this compare with the scatter in previous assays? 3. If samples have been run at different dose or dilution levels (e.g., 50 and 100 ~l), how do the answers from the two dilutions compare? Is there any evidence that one dilution gives a significantly or consistently higher/lower answer than the other dilution (s) ? If many samples are run in singlicate at two dose levels, compare the potency estimates (after correction for dose, dilution, or volume) and score a plus for each sample in which dilution No. 1 is given a higher answer than dilution No. 2, and a minus if the reverse is true. Then, a systematic departure from parallelism will be indicated by a fraction of pulses significantly different from 0.50. This can be tested most easily by the sign test, a chi-square test, or the t test for proportions. When samples are run in duplicate (or higher replication) at each of two-dose levels, a formal analysis of variance (ANOVA) should be employed to test for "homogeneity" of results at the two-dose levels. One should obtain estimates of the various "mean squares" (MS), designated as MS, within dose 1; MS, within dose 2; and MS, between doses. Then compare the two estimates of the MS (variance) within doses, pool if appropriate, and test for any significant difference between doses. This is comparable to, or could be done by, an unpaired Student's t test with uniformity of variance. Because of the low degree of replication with small dr, this test will be very insensitive to departure from parallelism. However, by pooling results from several different samples in an analysis of variance, one can obtain a very reliable estimate of whether, overall, there is significant departure from parallelism. Finally, a parallel-line bioassay type of analysis should be used 32'36

18

HORMONE ASSAYS

[1]

whenever the unknown is "run" at three or more "points" involving two or more dose levels (i.e., whenever regression analysis is applicable). Results of the t test for parallelism should be pooled for similar types of samples (i.e., an analysis of covariance) to permit detection of heterogeneity of slopes with improved sensitivity. 4. Plot the cumulative average for all values of a given sample in all assays to date. H o w does the mean of the replicates on the latest assay compare with the previous cumulative mean? Is there any evidence of a systematic trend in the cumulative mean? If a discrepancy is noted for any one sample, then the other quality control samples should be checked for the same trend (or other form of discrepancy). I f several of the quality control samples are behaving in like manner, one m a y have sufficient reason to reject t h a t assay. 5. The standard deviation s~ for each quality control sample run in replicate is then calculated for the latest assay. One should calculate s~2 (the mean square) for each sample, within each assay. This result (s~2) m a y be pooled (averaged) with results from previous assays and with results for s~~ or other samples from the same general region on the dose-response curve. In this manner, one can obtain a very reliable estimate of within-assay variance, with expenditure of very few extra "tubes." When pooling results from multiple samples, it m a y be desirable to use the log transform to improve uniformity of variance. 47,48 Alternatively, the coefficient of variation m a y be calculated for each quality control specimen, and the coefficient of variation vx m a y be plotted as a function of either the X or Y coordinate of the dose-response curve. Then a smooth curve (e.g., a parabola) m a y be used to describe this relationship. I f all (or most) of the samples within an assay are run in duplicate (or replicate), one can obtain an empirical s~, i.e., standard deviation of the duplicates for each sample. Then one can plot sx vs. X (e.g., with X on a logarithmic scale) or v~ vs. X. This m a y be compared with the relationship between error and dose level predicted on the basis of the behavior of the standards2 ,6 Because of the small number of degrees of freedom for s~2 for duplicates, triplicates, etc., it is advisable to combine results from similar samples to obtain at least 10 df's for each point on the plot of s~ vs. X. 6. Both the "local" and the cumulative within-assay variance should be plotted on a control chart. 1,4,6 In lieu of sx2, one can use s~ or the coefficient of variation v~. Use of a semilog scale for s~ or s~" permits examination of s~ over a very wide range. This will indicate whether the precision of the present assay is in line with previous assay performance. 4, E. Cotlove, E. K. Harris, and G. Z. Williams, Clin. Chem. 16, 1028 (1970). 48E. K. Harris and D. L. Demets, Clin. Chem. 18, 244 (1972).

[1]

A N A L Y S I S OF R A D I O L I G A N D ASSAY DATA

19

7. Between-assay variance can be evaluated by direct inspection of the graph of the values obtained for each quality control sample of repetitive assays. 4"(~,5°Inspection of a table in a notebook is far less satisfactory. However, for testing whether an assay is " i n control," it is desirable and necessary to perform an ANOVA with exaniination of the "components of variance. ''47-49 Between-assay variance is simply the square of the standard deviation for a given sample run in series of different assays. 6 If routine samples are in duplicate and then averaged, one should do the same for the quality control sample: run the sample in duplicate, calculate the average (for each assay), and then calculate the standard deviation of this average between assays. Results are commonly expressed in terms of a coefficient of variation, which may then be compared with the within-assay coefficient of variation. Also, the latest assay may be compared with all previous assays by a "contrast" with one dr. This contrast may be compared with the previous, cumulative between-assay variance (e.g., by an F test). If the results are compatible, one is )ustified in using the new, cumulative between-assay variance as a measure of variation. If unknown samples are studied in singlicate, triplicate, etc., then computations will be simplified if the quality control samples are handled similarly. Alternatively, one can use an analysis of variance to calculate both within-assay variance and between-assay variance. In previous treatments ~,~,6 we have used a straightforward ANOVA, and the between-assay variance was a mean square MSb. However, one should use a components of variance approach to calculate the underlying ~'s both within and between assays~; for example, if quality control samples are run in replicate (r) then the expectation of the between-assay variance or MS~ ~,~,~ is E(MSb)

= ~ 2 + r ab 2

(12A)

where zw is the " t r u e " standard deviation within assays, and ab is the " t r u e " standard deviation for a sample run in different assays, if we could eliminate the " m e a s u r e m e n t " error within assays (as by use of a high degree of replication within assays). T h e n the expected variance for an unknown sample run in singlieate (r = 1) in each of several assays is (zw2 -~ zb2)1/2. The intrinsic variation between assays (over and above measurement errors within one assay) is given by the component of variance ~b2 -

MSb

- - aw ~

(12B)

r

Roberts, T e c h n o m e t r i c s 8, 411 (1966). ~E. Amador, A m e r . J. Clin. P a t h o l . 50, 360 (1968); R. Saracci, ibid. 52, 161 (1969). 49S. W.

20

I~OaMONE ASSAYS

[ll

One should plot the between-assay variance [either as s 2, s, log(s) or coefficient of variation[ vs. time, showing both the "local" and the cumulative values; this can serve as the basis for objective rules as to when to reject an assay. Also, one should attempt to pool this information from several samples. Again, use of a log transform may provide sufficient uniformity of variance to make this possible. These quality control charts may be used to "reject" an assay. However, the rules for "rejection" when using multiple, simultaneous criteria are very complex in general, and they must be worked out for each particular laboratory and assay system, depending on its application. 4'~,~° 8. It is convenient to construct a quality control chart with a number of other variables includingl,4,6: a. Concentration of labeled ligand b. Specific activity c. Total counts d. ( B / T ) at zero dose = ( B / T ) o e. Nonspecific counts (expressed as fraction of total) f. Slope [b in Eqs. (6) or (7)] g. Intercept (or midrange, or dose corresponding to B / B o = 0.5) h. Residual variance (should be 1.0 within random sampling error, if the correct model and parameters ao, al, and a2 were used to predict the variance of the response variable and the weight for each point) i. Minimal detectable concentration, i.e., that dose level which produces a response which is t standard errors of the difference away from the response level for zero dose, where t is the one-tailed Student's t value for the desired percentile. Note: This should include the uncertainty in the Bo or 100% level as well as the error for the unknown. However, for practical purposes, one can regard the zero dose response as "known" or fixed, and then calculate the dose level giving an expected response at 1, 2, or t standard deviations away from this initial value j. Pipetting error. This is calculated as vp = 100

(s 2 _

~

¢).2

(13)

where s ~ is the square of the standard deviation of total counts and the mean total counts. 7~ and s 2 must be based on total counting time, which is usually, but not necessarily, 1 minute. Usually, pipetting error is approximately 1%. This is one of the major factors contributing to within-assay error. In order for this measure of v~ to be reliable, it is necessary to have at least 50 tubes (49 dr's) counted for total counts. If

[1]

ANALYSIS

OF RADIOLIGAND

ASSAY DATA

21

only counting errors were present, then vp for replicate counts on a single tube should be exactly zero. This can be checked--thus checking the stability of the radiation counter--by counting the same tube 50 or 100 times and checking that, within the limits of sampling error, the variance is equal to the mean (i.e., s 2 = T: thus, ~, = 0.00). Comment: Use an isotope with a long half-life. The above items (a-j) are provided by our computer program for routine data processing. 36

Summary and Conclusions 1. The RIA dose-response curves may be described by the "first-order mass action law" equations to characterize the assay system in terms of affinity constants, binding capacity(ies), and binding site homogeneity. 2. The logit-log method, viz., logit(Y) = a + blog(X), where Y = ( B / B o ) , provides a simple, general model for linear curve fitting and dose interpolation. Weighting (or t r u n c a t i o n - - a crude form of weighting) must be used. Also, iteration should be used to adjust weights. Confidence limits for unknowns can then be calculated, and "parallelline" analyses are available. 3. A general four-parameter logistic model may be used to generalize and extend the properties of the logit-log model. This requires nonlinear regression, and weighting is still (at least usually) necessary or desirable although not as important as for the logit-log method. 4. In general, the variance of the response variable (B/Bo, B / T , or counts bound) is given by ,~y = ao + a l Y - t - a ' -'Y2. Simple methods are available to estimate ao, al, and as. Usually, a2 may be ignored; thus, a simple linear relationship between ~ 2 and Y is sufficient. 5. Methods are available for predicting the variance (and standard deviation and coefficient of variation) for a potency estimate as a function of its position on the dose-response curve. B y pooling information (over dose levels and over assays) we obtain very reliable estimates of both within-assay variance and between-assay variance with the expenditure of very little additional effort in terms of number of tubes or computation. Likewise, empirical estimates of precision are readily obtained for both within- and between-assay variance. These, combined with quality control charts, permit establishment of appropriate criteria for "rejection" of an assay. 6. All of the necessary calculations can be done by hand or by desktop calculators. However, a programmable calculator or a high-speed computer greatly facilitates these calculations and makes them readily

22

HORMONE ASSAYS

[21

and economically available for routine data processing. Programs for most of these computations are now available. Acknowledgments This work is the outgrowth of long-standing collaborative efforts with H. A. Feldman, N. L. McBride, P. S. Vogel, J. E. Lewald, and J. A. Cooper. D. J. Finney first suggested the use of a weighted four-parameter logistic model. R. Saracci has made several useful suggestions. M. Serio made the manuscript available 13 prior to publication.

[2] G e n e r a l C o n s i d e r a t i o n s for R a d i o i m m u n o a s s a y o f Peptide Hormones B y DAVID N. ORTH

This chapter is only a brief introduction to the general subject of peptide hormone radioimmunoassay. It is based on three assumptions: (1) the reader is more or less completely unfamiliar with the subject; (2) certain general considerations can be applied to the immunoassay of all polypeptide hormones; and (3) the special characteristics of each antibody and each hormone dictate that the specific idiosyncrasies of each radioimmunoassay must be taken into account, supplementing the general principles applicable to them all. The technique of polypeptide radioimmunoassay was developed by the late Dr. Solomon A. Berson and Dr. Rosalyn S. Yalow, who first observed that diabetic patients who received injections of insulin developed antibodies which bound 131I-labeled insulin. More importantly, they found t h a t binding of labeled insulin by the antibodies could be competitively inhibited by addition of unlabeled insulin? Recognition that the fraction of labeled insulin bound by the antibodies was a quantitative function of the amount of unlabeled insulin added to the reaction mixtures, when the concentration of antibody was held constant, formed the basis for the radioimmunoassay of insulin 2 and, by analogy, of all the peptide hormones. The general principle involved in the radioimmunoassay is summarized by the following reactions, where Ab stands for specific antibody, Ag* for labeled hormonal antigen, Ag for unlabeled antigen, 1S, A. Berson, R. S. Yalow, A. Bauman, M. A. :Rothschild, and K. Newerly, J. Clin. Invest. 35, 170-190 (1956). 2S. A. Berson and R. S. Yalow, Advan. Biol. Med. Phys. 6, 349-430 (1958).

Statistical analysis of radioligand assay data.

[1] ANALYSIS OF RADIOLIGAND ASSAY DATA 3 [1] Statistical Analysis of Radioligand Assay Data B y D. RODBA~D a n d G. R. FRAZlER T h e w i d e s p...
1MB Sizes 0 Downloads 0 Views