ANALYTICAL

87, 477-495 (1978)

BIOCHEMISTRY

Automatic Collection Ultracentrifuge

DANIEL Marrs

INNERS, McLean

and Processing of Data from the Using a Programmable Desk Calculator’

STEPHEN

Department

H. TINDALL,

AND KIRK

of Biochemisfry. Baylor Houston, Texas 77030

College

C. AUNE’ of’ Medicine,

Received August 22, 1977; accepted February 3, 1978 Previous investigators [Trautman. R., Spragg, S. P., and Halsall, H. B. t 1969) Biochem. 28, 396-4151 have published a detailed protocol for the analysis of sedimentation velocity measurements which is adaptable to data generated by an ultraviolet scanning system. The advent of programmable desk calculators capable of sampling the output of digital measuring devices has made it possible to develop inexpensive and highly convenient systems for collecting and processing scanner data. Basing our approach on the referenced protocol. we have developed algorithms for dealing with real data, that is, data characterized by a relatively high level of noise. The techniques are applicable to both sedimentation equilibrium and sedimentation velocity measurements using the scanning system and multicell rotors. With known concentration dependence, valid estimates of weight-average sedimentation coefficients, diffusion coefficients, and heterogeneity parameters have been obtained for both simulated and actual sedimenting and diffusing macromolecular solutes. We find, however, that concentration dependence derived internally from a single sedimentation velocity measurement is unreliable. Anal.

The ultraviolet scanning accessory (l-3) for the analytical ultracentrifuge has made possible the routine collection of a great deal of information describing the hydrodynamic behavior of sedimenting and diffusing macromolecules. Several reports (4,5) have described the online use of computers for the collection and analysis of data generated ?by at least two types of photoelectric scanning systems. Many users of the ultracentrifuge, however, undoubtedly still discard much valuable information because of the tedium and difficulties associated with the analysis of boundary spreading or the expense of using a sophisticated computer to process the data automatically. Shapiro er uL(6) have re’ This work was supported in part by the National Institutes of Health (GM22244) and the Robert A. Welch Foundation (Q-592). A preliminary report was presented at the 61st Annual Meeting of the Federation of American Societies of Experimental Biologists in Chicago, April 1977. ’ Recipient of an NIH Research Career Development Award (K04-GMOO071). 477

0003-2697/78/0872-0477$02.00/O Copyright All rights

0 1978 by Academic Press. Inc. of reproduction in any fom\ reserved

478

INNERS.

TINDALL.

AND

AUNE

viewed the application of programmable desk calculators to common laboratory tasks including the collection and processing of ultracentrifuge data. Described here is the application of a relatively inexpensive computational system consisting of a programmable desk calculator, digital voltmeter, and magnetic tape cassette memory units.3 The system has the virtue of requiring interfacing of only the most rudimentary kind. In operation, the system collects, unattended, the data generated by the scanner and multiplexer operating in the automatic mode and carries out a complete boundary spreading and sedimentation analysis on the stored data, Depending on the amount of data collected, it is possible to carry out preliminary or cursory analysis while the centrifuge continues to operate. Trautman et al. (7) have published a detailed computational technique which yields the weight-average sedimentation coefficient, concentration-dependence of the sedimentation rate, heterogeneity parameter, diffusion coefficient, and apparent weight-average molecular weight. The successful application of this computational strategy in an automatic processing system depends greatly on algorithms capable of providing reliable estimates of the absorbance at the meniscus and in the plateau region. Particular attention has been given here to the development of efficient algorithms which extend the approach of Trautman et al. (7) to practical situations. We have used the system in measurements on well-characterized molecular species and on simulated data containing various levels of superimposed random noise. BIOLOGICAL

MATERIALS

Bovine pancreas cr-chymotrypsinogen-A (Type II. Sigma Chemical Co., St. Louis, MO.) was obtained commercially and used without further purification. The solvent used for the protein solutions was 0.10 M NaCl (pH = 2.5). Protein concentrations were determined spectrophotometritally using Ezso = 2.0 ml/mg cm (8). Escherichia coli Q13 cells were grown at 37°C in Nomura broth (9) an6 harvested using a Sharples centrifuge. Ribosomes were prepared from the frozen cells by a modification of the procedure employed by Hindennach et al. (10). The 30 S subunits were isolated by the method of Eickenberry et al. (11). modified for use with a zonal rotor. RNA was extracted from the 30 S subunits at 0°C using the Macaloidphenol technique (12). The solvent for 16 S rRNA was 0.35 M :( The programs in Hewlett-Packard 9810A machine language occupy cassette. We will be pleased to discuss making available the programs their use to interested investigators: however. it should be emphasized written require two cassette units and will function without modification panded 9810A calculator.

halfof a 300-ft and instructions for that the programs as only on the fully exabout

PROCESSING

ULTRACENTRIFUGE

DATA

479

KCI, 0.03 M Tris-HCI, 0.02 M MgC&, 0.01 M 2-mercaptoethanol (pH 7.5). The absorptivity of rRNA in this buffer at 25°C is 22.5 ml/mg cm at 260 nm (12,13). INSTRUMENTAL

COMPONENTS

Our centrifuge is a standard Beckman Model E ultracentrifuge equipped with the ultraviolet scanning accessory, multiplexer, monochromator, and high intensity light source. Double sector Kel-F-coated aluminum centerpieces and sapphire windows were employed in all experiments. The calculator employed in this work is a Hewlett-Packard 9810A programmable desk calculator extended to include 2036 program steps and 111 data registers. With appropriate read-only memory units, this calculator is equivalent to a 3K byte (6 bit/byte) computer with a cycle time of 1 msec, is programmed in machine language, and is able to communicate with four peripheral devices. The calculator is interfaced with a digital voltmeter (HP 34703A) equipped with a BCD (ASCII) output module (HP 34721B). The scanner output is sampled by the DVM just prior to the Dynograph recorder (T2-19). The signal at this point is log amplified and relative to the machine (ultracentrifuge) ground. The potential differences measured are typically about -360 mV when the photomultiplier is positioned in the image of the outer reference hole (air-air potential) and approximately -3900 mV in the dark space at the top and bottom of the cell image (dark potential). With clean optics and a new lamp, the noise level in the air-air region is less than 2%, measured by the standard deviation of the sampled voltages. A twofold increase in the noise level is noted when the calculator is grounded through its power cord. This increase in noise is attributed to the unfortunate existence of a ground loop. If the calculator frame is isolated from ground, the problem is eliminated. The limited memory of the calculator dictates the use of at least ujte magnetic tape cassette unit (HP 9865A) for data storage, while automatic processing requires a second for the program sequence and for storage of intermediate results. The approximate total cost of the system, including calculator, ROM inserts, interfacing, two magnetic tape cassette drives, and digital voltmeter, was about $10,000. Using slightly more modern components at current prices, the cost could probably be reduced to about $6000 for an equivalent system. COLLECTION

AND PROCESSING

OF DATA

In order to analyze a sedimenting boundary, the calculator must convert the scanner output to absorbance and relate each value of the absorbance to a radial position in the ultracentrifuge cell. Information

480

INNERS,

TINDALL,

AND

AUNE

for the first requirement is provided by calibrating the scanner photometric system with K,Cr,O, solutions of known concentration and absorbance (14). This calibration appears somewhat sensitive to optical pulse rate; consequently it is advisable to perform the calibration at as many angular velocities as it is anticipated will be used in subsequent experiments. The second requirement is met by assuming the scanning velocity is constant and, therefore, that the distance traveled by the analyzing slit and phototube is proportional to the time elapsed since encountering the inner edge of the outer reference hole. All distances are measured relative to the outer reference position. Since this position is sensed while the DVM is sampling at its maximum rate, it is considered to be the most precisely located position in the cell image. Measurement of the time required to scan between the reference positions confirms that the average scanner velocity is highly reproducible. For the standard uv counterbalance employed in this study at the intermediate scanning speed, this time is 63.4 set with a reliability of about kO.05 set or better. In the reference frame of the rotor, the intermediate scanning speed is 2.523 x 10e2 cmlsec. Timing between scans, or between a sequence of scans if multiple cells are in use, is effected by the automatic photo sequence of the ultracentrifuge; the calculator is passive in this respect but keeps track of the time. Data acquisition is initiated when the calculator detects a potential difference greater than a preselected value; e.g., the mean of the dark potential and the air-air potential. Further acquisition is timed by a loop operation of constant, measured frequency in the collection program. Collection continues slightly longer than the time required to scan reference-to-reference in order to sample the air-air signal in the inner reference hole. The collection rate is 3.71 points/set which is nearly equal to the maximum rate of which this BCD module is capable. Two hundred sixty-one voltage samples are collected and stored in core as four-digit numbers, three channels of data being stacked in a single data register. Upon completion of a scan the raw data and corolla& information (run identification number, time, reference voltages, data file location, etc.) are automatically recorded on a 48,000-byte casette, and the calculator returns to its waiting status, marking time until the photomultiplier carriage has returned and advanced into the outer reference hole on the next scan. The rate of collection may be put into perspective by comparing the pulse rate of light through a cell to the distance traveled by the photomultiplier during the voltage sampling interval. The DVM is capable of readout at intervals of 200 msec, but only about a 20-msec integration period is devoted to the actual sampling of the potential difference. At 60,000 rpm, a given cell passes through the light beam 20 times in this 20-msec period. and the photomultiplier, moving at the

PROCESSING

ULTRACENTRIFUGE

DATA

481

intermediate scan speed, scans the equivalent of about 5 pm in the rotor frame. Thus the width of a data point is 5 pm, and the channel width or spacing between points is approximately 70 pm. The periodicity of the noise in the scanner output is of the order of 1 msec as observed by oscilloscope. Integration times shorter than 20 msec would therefore seem to be of doubtful value. The coordinates of a point in the raw data set are (X. r), where X is the channel number, and Y is the potential difference sensed by the DVM. The lower limit of channel 1 is assigned as the outer reference position, the upper limit of the inner reference channel (Xi) as the inner reference position. The magnification factor or channel width is therefore given by Ar = 1.61 cm/X,. t11 At a wavelength of 280 nm, this parameter is usually between 66.9 and 67.2 pm. The actual radial coordinate of a point in the rotor reference frame is related to the channel number by the following: ri = 7.30 - (Xi - M)b

PI

The term % arises from the convention that the point is located in the middle of the channel. Once the data are recorded in acceptable form, the problem is reduced to analysis, which can be carried out on- or off-line. The sequence of programs which carry out the editing, scaling, and computational steps are represented as a flow chart in Fig. 1. NORM I is a program block which locates the inner reference position and searches for the solution meniscus. Location of the inner reference is effected by scanning the voltages in the inner reference hole toward the bottom of the cell until a preselected voltage, e.g., the mean of the air-air and dark voltages, is exceeded. The channel where this occurs is defined as Xi. The meniscus is found by searching the voltage values in the opposite direction beginning at a position slightly above the pottom of the cell. When a voltage is found which exceeds its immediate predecessor by a fixed increment, e.g., 60 mV, the local maximum voltage is regarded as corresponding to the peak absorbance observed at the solution meniscus, and the channel number of this point is defined as XA. The increment of 60 mV which signals the presence of the local peak in the data corresponds to about S-10 times the standard deviation of the voltages sampled in the air-air regions. EDIT prints the table of data file number, X,, and XA created by the preceding program. The operator has the option of editing this table and listing any of the raw data which may be of help in making a decision in doubtful cases. Once this task is completed, the magnification factor and the radial position of the meniscus (r,) are calculated from Eqs. [l] and [2], respectively. The mean values of Ar and ra are then made an integral part of the data sets.

482

INNER%

TINDALL,

AND

AUNE

COLLECTION

MENISCUS PLATEAU

& CONC.

1: SEDIMENTATION COEFF. CALC.

BOUNDARY SPREADING

I

FIG. 1. Flow chart showing the program sequence for the collection centrifuge data from sedimentation velocity experiments.

f

and processing

ofultra-

At this point the raw data may be smoothed by calling the SMOOTH program. It should be emphasized that during the collection process, neither electronic damping nor digital averaging is employed. Furthermore, SMOOTH does not alter the raw data records, which thus preserve the maximum resolution of which the system is capable. The smoothing process employs a parabolic least squares fit of II points in succession and steps through the data point by point replacing the middle point with the voltage predicted by the fitted curve at that point. The five points at either end of the data set, adjacent to the meniscus

PROCESSING

ULTRACENTRIFUGE

DATA

483

and cell bottom, are left undisturbed. If a point in the 11-membered subset deviates from the computed curve by more than a fixed number of standard deviations (2.5 used here) that value is replaced by the computed value, and the subset is refitted. This procedure eliminates the bias of shot noise, which is occasionally observed. Particularly noisy data, such as are obtained when the solvent contains a uv-absorbing component, e.g., 2-mercaptoethanol, are rendered more tractable by this smoothing process. Ordinarily, however, smoothing may be omitted without any loss of precision. NORM II converts the raw data (voltages stored on magnetic tape) to absorbance using the photometric calibration data and determines the approximate plateau value (A,,) by sampling the absorbance near the bottom of the cell and near the meniscus. The calculator next edits the data automatically by the following procedure. The points in the boundary which correspond to one-fourth A,, one-half A,, and threefourths A, are found, and the number of points (NJ lying between the levels one-fourth A, and three-fourths A, is calculated. Using the assumption that the boundary is approximated by a Gaussian error function, N, is converted to the number of points (N2) which comprise the middle 95% of the boundary. Fifteen points are then added to N, on the meniscus side of the boundary and 30 points on the plateau side. If the total exceeds 78 points, the scan is converted to “low resolution”; that is, only alternate points are considered, and the channel width is multiplied by a factor of 2. Whether low or high resolution is maintained, the final normalized data set consists of a 78-point window (Fig. 2), which moves with the boundary. Five additional constants specify the radial positions of the meniscus and the first point in the data set, the location in the data set of the point which is the approximate midpoint of the boundary, the time, the channel width, and a scan identification index. This normalized data set of 83 registers is then recorded on a separate tape. All subs:quent programs in the series operate on normalized data. Before analysis can proceed further, the absorbance at the meniscus and in the plateau region must be determined more reliably and accurately than the values found by NORM II. Experimentation with several techniques demonstrated the practicality of the following procedures. The meniscus absorbance (A,) is estimated by a linearized exponential fit of the data points from the region immediately adjacent to the meniscus to a point located one-fourth of the way up the boundary. The fitted curve is then extrapolated to the meniscus position, and the predicted absorbance at ra is taken as the best estimate of the absorbance baseline for the particular scan being processed. The plateau absorbance (A,) is estimated by compiling a cumulative mean and standard deviation point by point starting from the data points closest to the bottom of the cell and progressing toward the boundary. After more

484

INNERS, TINDALL,

AND AUNE

0. 8-

0. l-

0. 6z %= 0. 5-

Y l-l 5

0. 4-

g :

0. 3-

:

0. 2-

0. l-

O-

I

I

I

I

I

I

25

40

55

70

85

100

REGISTER

NUMBER

FIG. 2. A plot of normalized, nonsmoothed data from a single scan of a sedimentation velocity experiment on 16 S RNA. The abscissa is arbitrary and represents the sequential addresses where contiguous absorbance values are stored in core; the register number is proportional to r as explained in the text. The upper line is a schematic representation of the relative positions in the centerpiece, showing the traveling “widow” (hatched) of 78 data points which constitute the normalized data set. The raw data extend from the outer reference (OR) to slightly beyond the inner reference (IR). RT,R,, and RBare the positions, respe?tively, of the top of the sector, the solution meniscus, and the bottom of the sector. The sohd curve through the data points near the meniscus is the exponential curve fitted by the procedure described in the text. The line through the points in the plateau region marks the mean plateau absorbance found by the technique described.

than 10 points have been accumulated, the mean is calculated and compared to the mean of the last 10 points included in the calculation. The next point is then added into the calculation, and the comparison is repeated after updating the cumulative mean and the mean of the 10 most recently added points. When the two means differ by more than 2.5 times the standard error ofthe cumulative mean (signaling the boundary has been reached), the contribution of the last 10 points is deleted, and the mean of the points remaining is regarded as the best estimate of

PROCESSING

ULTRACENTRIFUGE

DATA

485

the plateau absorbance. A table is printed in which A, and A, are listed along with their respective standard errors and other information (e.g., number of points fitted) which might help in judging their validity. This table may be edited by the operator if inconsistencies appear or if the standard errors on some of the estimates are unacceptably large. PROBIT carries out the task of fitting the boundary with a Gaussian error function applying the formalism of “probit analysis” (15). Each scan is first normalized relative to its plateau absorbance (Eq. [3]), and the data set is condensed to those points constituting the central 94% of the boundary, the extreme points in the leading and trailing edges of the boundary being neglected. f = (A - A,MA,

- A,)

[31

The inverse error function off is fitted with a weighted least squares straight line, with Y as the independent variable (Eq. [4]). et-f-‘(f)

= ff,, + a,r

f41

The second moment of the boundary (i’), the apparent boundary width ((T*), and their respective standard errors are then calculated from the fitted straight line. So that Baldwin’s (16) analysis of boundary spreading may be applied, values of (T* are converted to (T’, the equivalent boundary width which would be observed in a constant radial field in a rectangular cell (17). Other functions of the PROBIT block are the calculation of the absorbance at zero time from the radial dilution law and the test of linearity of the transformed data. The latter function is effected by fitting the probit data with a quadratic equation instead of Eq. [4] and performing a t test on the quadratic coefficient to test the significance of its difference from zero (7). An additional correction to the primary data is made at this point. Since the scanner requires a finite time to scan from a point at the $ttom of the cell to one near the top, the time coordinate of the moving boundary relative to the scan initiation time should be corrected for the travel time of the photomultiplier carriage, otherwise the time between the first and last scans may appear too long by as much as 0.8 min. This would lead to a systematic underestimation of the sedimentation coefficient. The corrected time is given by Eq. [5]. t = (7.3 - Y)/v + k + t’,

[51

where t’ is the time at which the scan is initiated, v is the velocity of the scanner in the rotor frame of reference, k is a constant which equals the time required to scan the air space in the outer reference hole (about 15 set), and t is the corrected time in seconds. It should be pointed out that the finite time required to scan the boundary itself results in a slight apparent sharpening of the boundary. No attempt has been

486

INNERS,

TINDALL,

AND

AUNE

made at this time to correct analytically the latter systematic error, but we are currently studying ways in which this might be done. Cohen et al. (26) have solved this problem by averaging the absorbances for forward and reverse scans, a procedure which generates an “instantaneous” boundary corresponding to the time at which the scanner reverses its scanning direction. This procedure cannot be used with the standard Beckman scanner unless it is modified to allow data collection during the reset cycle. A useful procedure at this stage of the analysis is to check the corrected plateau absorbances (A, - A,) against the square law of radial dilution. This is done by plotting A, - A, for each frame against the corresponding values of (Y/Y~Y and fitting the points by linear least squares. Visual inspection of this plot usually suffices to identify aberrant scans which should be deleted from the data. Since concentration in the plateau region is changing at twice the rate of the second moment position, a certain amount of scatter in the plotted points is acceptable. Subsequent program blocks compute time-dependent and concentration-dependent hydrodynamic parameters. The weight-average sedimentation coefficient (s) is calculated by fitting the second moment positions with Eq. [6] using both weighted and unweighted linear least squares algorithms, and the effective zero time is obtained from the abscissa intercept. In Fir, = 0~~ + a,dt C61 In order to estimate the change in sedimentation coefficient with increasing time and decreasing concentration, the data are then fitted with Eq. [7]. The tangent to this curve at a time ti yields the sedimentation coefficient at ti* In ? = b, + b,o*t + bzdt2 [71 Both weighted and unweighted fitting procedures are carried out, and the calculations having the lower standard errors are chosen as the best representations of the data. The plateau absorbance at ti is used to evaluate the concentration dependence of s and the sedimentation coefficient at zero concentration (so). This is done by fitting Eqs. [S] and [9] usi& unweighted linear least squares. As used in these equations, g and k are unitless but, with the appropriate absorptivity, can be converted to reciprocal concentration units. s = s”[l - g(A, - A,,)]

@I

l/s = l/so + (k/sO)(A, -A,)

t91

The final program block performs the boundary spreading analysis. The initial step is the correction of boundary distortion due to concentrationdependent sedimentation. We have used a procedure which differs from that proposed by Trautman et al. (7). Following reference (7) we fit the (u’)~ values with a weighted least squares quadratic equation forced

PROCESSING

ULTRACENTRIFUGE

through the origin (Eq. [lo]). integral Ja’dt is then calculated using small values of At.

DATA

487

Using the predicted values of (T’, the by numerical integration under the curve

(a’)2 = p,t + p2t2

[lOI

This integral is then used in Eq. [25] of reference (7) to convert o’ to the corrected equivalent boundary width (o); i.e., the boundary width which would be observed in a rectangular cell, under a uniform radial field with zero concentration dependence. The boundary spreading is next resolved into a diffusion-dependent part with t *I2dependence and a sedimentation heterogeneity-dependent part with linear t dependence (18). The corrected equivalent boundary width is fitted by weighted and unweighted least squares using Eq. [ 111. Since the boundary width is zero at zero time, the effective to calculated by Eq. [6] is added as a point and the degrees of freedom are increased by one as suggested in reference (7). a2 = k. f k,t + kzt2

[Ill

This point is assigned a weight equal to the maximum weight in the weighted least squares fit. Finally, the diffusion coefficient (D) and standard deviation of the sedimentation coefficient distribution (p) are calculated from the linear and quadratic coefficients of Eq. [ll] (7). STATISTICAL

METHODS

The probit analysis is based on the transformation, K = e&‘(f).

[I21

The weighting factor for the subsequent least-squares fit of Eq. [4] used by Trautman ef al. (7) is given by Eq. [13]. This factor works satisfactorily with noise-free data, but in the case of noisy data, a few points near the extremes of the sigmoid boundary which are displaced upward or down-r-d from the baseline or plateau, respectively, are able to bias the fit so that the resulting (T* is obviously too large. w(f) = exp[-U2]/(1

-f

+f)

[I31

For example, in the plateau region a point with a lower ordinate (smaller f and u) receives greater weight than one with a higher ordinate regardless ofthe corresponding value ofr. To offset this bias, the additional factor given by Eq. 1141 is used, and the final weight is given by Eq. [HI. The rationale for using a weight based on r is that we wish to fit the region of the boundary adjacent to the inflection point. w(r) = exp[-(r M’ = W(f)W(Y)

- rav)2/2~~,“]

[I41 WI

488

INNERS,

TINDALL.

AND

AUNE

Thus the weighting factor is made proportional to dfidr. Reconstruction of the sigmoid boundary from the theoretical probit ordinates confirms that the use of Eq. [I.51 results in a better fit of the points lying near the center of the boundary. Furthermore, smaller standard errors in ? and smaller residuals in the probit fit result when Eq. [ 151 is used instead of Eq. [13]. In all other cases, the weighting factors are proportional to the reciprocal of the variance of the dependent variable in the most fundamental equation (19). Where the fundamental equations have been linearized or otherwise mathematically transformed to facilitate curve fitting, the variances of the fitted parameters have been computed according to the error analysis given by Trautman et al. (7). SIMULATION

OF SEDIMENTING

AND DIFFUSING

BOUNDARIES

To test the extensive and complicated working sections of the programs it was necessary to have data available which were derived from known parameters. Synthetic data for a hypothetical homogeneous solute were constructed by expanding the test data of reference (7) to 78 position-absorbance values for each of four frames spaced 20 mm apart. This was accomplished by fitting the published points with an inverse error function and interpolating absorbance values at 0.006-cm increments in radial position. These data are essentially noise-free: therefore, it was necessary to superimpose random fluctuations in absorbance in order to mimic the effect of instrumental noise. A program was developed which generates random fluctuations with a predetermined standard deviation. These fluctuations were added to the theoretical absorbance values to form data sets corresponding to 0.2 and 5.0% background noise. In order to test calculations on heterogeneous systems, data were simulated using the methods discussed by Cox (20), modified to suit the specific requirements of the Hewlett-Packard computing system. The simulation technique is able to reproduce the Trautman et al. (7) daa and is therefore considered to be an acceptable approach for heterogeneous systems as well. These latter examples were obtained either from direct simulations of associating systems or the simple additive result of two different concentration distributions (emphasizing concentrationindependent sedimentation and diffusion). RESULTS

AND DISCUSSION

Although the most important nonsystematic source of experimental errors in scanner operation is undoubtedly in the absorbance measurement, the precision with which radial positions can be measured is of obvious interest. For example, the assumption that r is error-free is implicit in the statistical protocol described above. This assumption was

PROCESSING

ULTRACENTRIFUGE

489

DATA

tested in various runs with protein solutions. In the most extensive of these tests the centrifuge was operated at 60,000 rpm for 3 hr during which 15 scans were successfully recorded and processed using NORM I. The wavelength of the light in this experiment was 280 nm. The inner reference was located at a mean position of 237.07 channels. The standard error of this mean value is 0.07 channel width corresponding to a distance in the rotor reference frame of 4.5 pm. In the same run, the meniscus was found at a mean position of 6.06288 cm, with a standard deviation of 0.00279 cm. The standard error of this position is 7.2 ym. This represents a precision of the order of 0.01% in Y which compares favorably with the precision reported for other automated systems (5) and approaches that obtainable with a microcomparator.Since the relative error in absorbance is at least 2.5 times greater (see below), the assumption that r is relatively error-free is justified. Table 1 lists the results calculated from synthetic and real data for homogeneous solutes. The sedimentation coefficients were obtained from unweighted linear least squares regression of In F/r, on dt. Diffusion coTABLE HYDRODYNAMIC

AND AND

Run No. 99- 1 (synthetic, 0 noise)” 99-2 (synthetic. 0.2% noise)” 99-3 (synthetic. %.O% noise)”

No. of scans

OPTICAL

REAL

DATA

1

PARAMETERS

OBTAINED

FOR HOMOGENEOUS

s ( lO-‘:L set)

D ( 1O-’ cm%ec)

WITH

SYNTHETIC

SOLUTES

P” ( lo-‘:’ set)

-c 0.01

A” (280

nm)

4

1.297 f 0.003

11.22 t 0.02

0.05

1.000 -c 0.000

4

1.284 t 0.001

11.32 t 0.10

0.1 Ii k 0.02

0.997

+ 0.002

4

1.315

+ 0.037

10.78

_c 1.40

0.38

I 0.10

0.993

t 0.050

4 19

2.548 2.615

+ 0.071 -+ 0.013

9.83

2 0.79 + 0.40

0.15i

+ 0.18 t 0.03

0.593

0.26

_i 0.008 k 0.007

1412

(a-chymotrypsinogenA)”

8.78

0.602

(1 The symbol “i” appearing in a value for p indicates that p was calculated from a negative quadratic coefficient and is therefore imaginary. b The theoretical values for the synthetic data are s = 1.2978. D = Ii.31F. p = 0. K = 0.20, A” = 1.000, 59780 rpm. The first frame is at 60 min. and the time interval between frames is 20 min. ’ The run conditions were as follows: T = 22.9”C: 60,000 rpm: solvent. 0.1 M NaCI. pH 2.5: (’ = 0.30 mgiml. The first scan is at 22.34 min. and the time interval between scans is variable (3-S min). An assumed value of g = 0.0018 was used in the boundary spreading analysis (see text for details).

490

INNERS,TINDALL,ANDAUNE

efficients and heterogeneity factors were calculated from weighted quadratic least squares analyses using zero width at zero time as an additional point. The tabulated errors are the standard errors (7) in the case of s, D, and p. The errors given for A" (the absorbance at zero time in the ultracentrifuge cell) are the standard deviations of the absorbance fluctuations in the plateau region. Runs 99-1, 99-2, and 99-3 are synthetic data sets which have random noise superimposed at 0, 0.2, and 5.0% of the plateau concentrations, respectively. The observed standard deviations of the absorbance in the plateau regions accurately reflect the imposed levels of noise. Also listed in Table I are typical results for an actual homogeneous solute, cr-chymotrypsinogen-A. Two sets of results are listed: one for the complete data set of 19 scans and one for a set of four scans of the same run spaced at approximately 20 min apart abstracted from the complete data set. The diffusion and sedimentation heterogeneity contributions to the boundary spreading may be evaluated at arbitrary times during the run. The desired time is taken as the origin of the time coordinate in Eq. [1 11, and the appropriate concentration at which the resulting D and p are vahd is calculated from the plateau absorbance at that time, Trautman et al. (7) suggest using the mean time for diffusion as the time origin on the grounds that the calculations show minimal standard errors when this practice is followed. We have used a slightly different procedure in obtaining the boundary spreading parameters in Table 1. The time origin is the to measured by extrapolation of Eq. [6]. The concentration to which the computed D and p correspond is then the concentration at zero time, i.e., the original concentration of the protein in the solution. The simulated data in Table 1 illustrate the general agreement, even in the presence of considerable noise, between the calculated parameters and their theoretical values. Imposition of 5.0% random noise results in a discrepancy of only 1.4% in s compared to the theoretical value. On the other hand, this level of background increases the standard error ab&t lo-fold over the zero-noise value. The relative error in D in this example is 4.8% compared to the theoretical value. It is readily apparent that the standard error in D is greatly affected by background noise. Therefore, for data characterized by large random fluctuations in the plateau region, calculated values of D will, in general, be imprecise. The calculated magnitudes of p depart significantly from the theoretical value, which is zero for a homogeneous solute. As indicated in Table 1, it is possible in some instances to observe small imaginary values of p due to a negative quadratic coefficient in Eq. [ 11). Since this occurs even in the simulated data which are Gaussian and contain only random errors, small values of p occurring in actual data may be regarded as indicative of negligible heterogeneity. Random noise appears to introduce

PROCESSING

ULTRACENTRIFUGE

DATA

491

spurious curvature into the time-dependence of a2 without inflating the standard error ofp. Apparently, at this late stage in the calculation, the error propagation analysis is inadequate for the prediction of the precision of the heterogeneity parameter. As a test case for the application of the data collection and computational techniques, we chose cu-chymotrypsinogen-A, which is a wellcharacterized, homogeneous protein. Wilcox et al. (21) have reported the following hydrodynamic data at pH 3.0 in 0.10 ionic strength glycine buffer: .s&, = 2.493, IX&,. = 9.01 x 10F7 cm?sec. These authors calculate a hydrodynamic molecular weight of 24,400 under the stated solvent conditions. The work of Schwert (22) and Dreyer ef al. (23) demonstrated a small, negative concentration dependence ofs at acid pH. From the latter two references, we estimate the value of g in acidic solution to be about 0.00365 liters liters/g or 0.0018 in absorbance units. The data for cY-chymotrypsinogen-A in Table 1 closely parallel the behavior seen in the results for the simulated runs. As expected, increasing the number of scans in a given run increases the precision of the hydrodynamic parameters calculated. The noise level in the plateau region is 1.2% which is typical at 280 nm for the conditions of this run as given in the footnote to the table. The time corresponding to twothirds the final rotor speed agrees with the calculated t, to within 53 sec. Values of i; from the probit analysis were determined with a precision in the range 3-12 pm as estimated by the standard error of ?. With a channel width of 67 pm, the precision of i: is of the order of ‘Is of a channel width in this case. Adjustment of the 19-scan result to water at 20°C yields an szo,wof 2.49 S in excellent agreement with the result of Wilcox et ul. (21). The molecular weight calculated from the Svedberg equation, using I;20 = 0.721 ml/g (22) and the s and D at 22.9“C, is 26,400, which agrees quite well with the sequence molecular weight of 25,656 (24). This kind of agreement suggests very strongly the practicality of obtaining molecular weights and diffusion coefficients from a single moving boundary rutin in the ultracentrifuge, particularly when the solute is homogeneous and the background electronic noise is low. In Table 2 are presented results for some heterogeneous systems and their component macromolecular species. All synthetic data for these results were simulated by the method of Cox (20) and are essentially noise-free data. Of particular interest is Run No. 33 which is a linear sum of the data for pure 3.8 S and pure 6.2 S material. The calculation yields the correct weight-average sedimentation coefficient with good precision, whereas the diffusion coefficient is in error by 25%. Nevertheless, this is fairly good agreement in the case of the diffusion coefficient, for the components in a two-species system such as this rapidly begin to be resolved to form two separate boundaries as the simulation proceeds to longer sedimentation and diffusion times. Again, the value found for p is quite reasonable in view of the behavior of the system.

492

INNERS, TINDALL, TABLE HYDRODYNAMIC

PARAMETERS SYSTEMS AND

AND AUNE 2

OBTAINED FOR HETEROGENEOUS SYSTEM COMPONENTS

Run No.

No. of scans

s (IO-‘” set)

(synthetic, pure 3.8 S)

4

3.804 t 0.002

3.88 + 0.02

0.13 2 0.02

(synthetic, pure 6.2 SP

4

6.200 + o.ooo

3.88 -r- 0.02

0.13 ? 0.02

4

4.983 + 0.003

2.99 2 0.33

1.55 ” 0.05

4

6.852 iz 0.30

4.44 f. 0.27

0.88 rt 0.13

4

16.042 _t 0.004

1.62 t- 0.01

0.68i _c 0.04

6

17.937 2 0.166

5.38 -c 1.51

4.45 t 0.33

D

(IO-’ cm*/sec)

P (IO-‘”

see)

84 24 33 (synthetic, linear combination of No. 84 and 24)c 40 (synthetic, associating system)d 99-o (synthetic, pure 16 S) 1345 (16 S RNA)’

Q s = 3.8 S; D = 4.0 F;p = 0;g = 0; A0 = I; 60,000 rpm; Ar = 6.67 min; t, = 6.67 min. At = time interval between frames; t, = time elapsed between reaching two-thirds the final rotor velocity and time of initiating first scan, or (for simulated data) time between theoretical zero time and time of first “scan.” b s = 6.2 S; D = 4.0 F; p = 0; g = 0; A0 = I; 60,000 rpm; At = 6.67 min; t, = 6.67 min. c S, = 5.0 S; ri,. = 4.0 F; o, = 1.70 S; g = 0, A0 = 2: 60,000 rpm; At = 6.67 min; t, = 6.67 min. d s (monomer) = 5.0 S; D = 4.0 F; g = 0; Aa = 1; 48,WO rpm (see text); At = 6.67 min; tl = 6.67 min. es = 16.0 S: D = 1.57 F; p = 0: g = 0, A0 = 1: 20,000 rpm; ht = 5 min; f, = 5 min. ‘30,000 rpm; T = 18.3”C; g = -0.0035; C = 0.037 mg/ml; A = 275 nm; At = 5-12 mm; tl = 12 min.

Table 2 also contains the results for a monomer-dimer associating system characterized by an association constant (k’) equal to 4.0 liters/g. At zero concentration the observed s will be that of pure monomer, SA = 5.0 S, while at infinite concentration the solute consists of pure dimer with sAz = 7.94 s. k’ = c&/CA2

[I61

The composition of the boundary is not unique over the span of the simulated measurement because of the effect of radial dilution, but at the initial concentration of 1.0 mg/ml (A0 = l), the weight-average sedimentation coefficient will be 6.79 S. The observed values of s, D, and p agree well with the expected values. It is noted that D is estimated with greater accuracy in this example than in the preceding, probably because there is no discontinuity in the sedimentation coefficient distribution. The results in Table 2 for 16 S RNA show a degree of precision which

PROCESSING

ULTRACENTRIFUGE

DATA

493

is comparable to that observed for c-u-chymotrypsinogen. Because of the nature of the specimen, the sedimenting boundary deviates appreciably in shape from a Gaussian error function. Nevertheless, the standard error of J is maintained at 9-20 pm or about ‘13of a channel width. This result supports the view of Trautman et al. (7) that the boundary position and velocity are relatively insensitive to deviations of the shape of the boundary from a perfect Gaussian profile. The s*~,~ derived from Run No. 1345 is 19.42 S, which is in very good agreement with the findings of Stanley and Bock (12) for 16 S RNA at an ionic strength of 0.428. The boundary spreading was corrected for concentration dependence of s using a value of -0.049 liters/l: for g which was estimated from data in reference (12). In absorbance units, the appropriate value is -0.0035. A short-column sedimentation equilibrium measurement (25) was performed at 6000 rpm, 2o”C, A = 280 nm on the same specimen used in Run No. 1345. The weight-average molecular weight calculated for the 16 S RNA was 575,000 (&I, :&Z, : ti, = 1.00: 1.03 : 1.12). From the S,. measured at 18.3”C, using fi = 0.53 ml/f: (12) and the measured weight-average molecular weight, D may be predicted from the Svedberg equation to have the value 1.65 F at 18.3”C in the solvent employed. The observed value (Table 2) is about three times larger. This result is typical of many experiments on heterogeneous samples of rRNA in which consistently high values of D are observed. The deviations from Gaussian boundary profile referred to above seem to be responsible for the apparent discrepancies. This explanation is supported by the results of synthetic boundary runs on the same RNA specimen carried out at very low angular velocities. For example, in a synthetic boundary run at 1500 RPM, 19.2”C, analyzed with the boundary spreading program, the observed diffusion coefficient (calculated from a weighted linear least squares analysis) was 2.17 t 0.08 F. The boundary profiles were observed to be Gaussian in such cases, indicating that the non-Gaussian shape results primarily from sedimentation heterogeneity. Although the application of Eqs. [8] and [9] should in principle yield u:eful information about the concentration dependence of the sedimentation coefficient, we have found that attempts to extract this information from the radial dilution and time-dependence of s are futile when dealing with real data. Even in the case of low-noise synthetic data (Run No. 99-2, Table 1) the value of g is 0.13, which is already a deviation of 35% from the theoretical value of 0.20. The high-noise data of Run No. 99-3 gives rise to ag of 1.23, which is in error by 500%. Run No. 1412, which has been seen in the discussion above to generate precise and valid estimates of s and D, yields a I: of 0.58 or I. 16 liters/g. This is grossly in error when compared with published data (22,23) for this protein. It should be pointed out here that the calculation of I: will be extremely sensitive to systematic experimental errors, such as a small difference in rotor temperature at the end of an experiment relative to the initial rotor

494

INNERS. TINDALL,

AND AUNE

temperature. As in the case of the heterogeneity parameter p, one observes unexpectedly small standard errors on the coefficient R. This emphasizes the caveat of Trautman et al. (7) that the standard errors reflect only the internal consistency of the data and are not to be construed as measures of the validity of a particular result. We conclude from this investigation that the relatively inexpensive instrumental array of digital voltmeter, programmable calculator, and magnetic tape cassette units is a practical vehicle for the computational strategy of Trautman et al. (7). When supplemented with efficient algorithms for estimating limiting concentrations, magnification factors, and radial position, the system is capable of measuring sedimentation rates with a high degree of accuracy and precision. In the case of homogeneous solutes, reasonably accurate and precise values of the diffusion coefficient can be readily obtained if the background noise is not higher than a few percent. For heterogeneous macromolecular solutes, reliable and accurate weight-average sedimentation coefficients can be found; however, values of D and p should probably be treated with some degree of caution, particularly if the boundary shape deviates appreciably from a Gaussian profile. ACKNOWLEDGMENTS The authors thank Linda Talbert for her expert technical assistance, Dr. Louis Meharg for his time and helpful suggestions with the electronic problems, Dr. Goldschmidt for the growth stoLk of E. coli Q-13 strain. and Dr. Harris Busch for providing a zonal rotor necessary for RNA preparation.

REFERENCES 1. 2. 3. 4. 5.

Schachman, H. K. (1963) Biochemistry 2, 887-905. Schachman, H. K., and Edelstein. S. J. (1966) Biochemistry 5, 2681-2705. Spragg, S. P.. Travers. S., and Saxton, T. (1965) And. Riochem. 12, 259-270. Crepeau. R. H.. Edelstein, S. J.. and Rehmar, M. J. (1972)Anul. Eiochem. 50,213-233. Spragg, S. P., Barnett, W. A., Wilcox, J. K., and Roche, J. (1976) Biophw.. Chem.

5, 43-53.

6. Shapiro, M. B., Schultz, A. R., and Jennings, W. H. (1976) Annu. Rev. Biophys. Bioeng. 5, 177-204. 7. Trautman. R., Spragg, S. P., and Halsall, H. B. (1969) Anal. Biochem. 28, 396-415. 8. Guy. 0.. Gratecos, D., Rovery, M., and Desnuelle, P. (1966) Eiochim. Biophys. Acra

115, 404-422.

9. Held. W. A.. Mizushima, S., and Nomura, M. (1973) J. Biol. Chem. 248, 57205730. IO. Hindennach. I., Stoffler, G.. and Wittman, H. G. (1971) Eur. J. B&hem. 23, 7-11. 1 I. Eickenberry, E. F, Bickle. T. A., Tram, R. R.. and Price, C. A. (1970)Eur. J. Biochem. 12, 113-116. 12. Stanley, W. M., Jr., and Bock, R. M. (1965) Biochemistry 4, 1302-1311. 13. Kurland, C. G. (196O)J. Mol. Biol. 2, 83-91. 14. Handbook of Biochemistry (Sober, H. A., ed.). 2nd ed., p, K-79, Chemical Rubber Co., Cleveland. 1970.

PROCESSING

ULTRACENTRIFUGE

DATA

495

15. Finney, D. J. (1952) Probit Analysis: A Statistical Treatment of the Sigmoid Response Curve, 2nd ed., Cambridge University Press, London. 16. Baldwin, R. L. (1957) Biochem. J. 65, 490-502. 17. Trautman, R. (1963) in Ultracentifugal Analysis in Theory and Experiment (J. W. Williams, ed.), p. 203, Academic Press, New York. 18. Williams, J. W. Baldwin, R. L., Saunders, W. M.. and Squire. P. G. (1952) J. Amer. Chem Sot. 74, 1542- 1548. 19. Brownlee, K. A, (1965) Statistical Theory and Methodology in Science and Engineering, 2nd ed., p. 97, Wiley, New York. 20. Cox. D. J. (1967) Arch. Biochem. Biophys. 119, 230-239. 21. Wilcox, P. E., Kraut, J., Wade. R. D., and Neurath, H. (1957)Biochim. Biophys. Acfa 24, 72-78. 22. Schwert, G. W. (1951) .I. Bid. Chem. 190, 799-806. 23. Dreyer, W. J., Wade, R. D.. and Neurath, H. (1955) Arch. Biochem. Biophys. 59, 145-156. 24. Croft, L. R. (1973) Handbook of Protein Sequences, p. 9, Joynson-Bruvvers, Ltd., Oxford. 25. Van Holde. K. E., and Baldwin, R. L. (1958) J. Phys. Chem. 62, 734-743. 26. Cohen. R.. Cluzel, J., Cohen, H., Male, P., Moignier. M. and Soulie, C. (1976) Biophys. Chem. 5, 77-96.

Automatic collection and processing of data from the ultracentrifuge using a programmable desk calculator.

ANALYTICAL 87, 477-495 (1978) BIOCHEMISTRY Automatic Collection Ultracentrifuge DANIEL Marrs INNERS, McLean and Processing of Data from the Usin...
1MB Sizes 0 Downloads 0 Views