Compur. Biol. Med.

Pergamon

Press

1975. Vol. 5, pp. 21-X

NORMALIZATION

Prmted

m Great

Britain

OF CHROMOSOME A NEW METHOD

MEASUREMENTS:

DAN H. MOOREII Biomedical

Division,

University

(Received

of California, Lawrence Livermore California 94550. U.S.A.

7 January

Laboratory,

Livermore,

1974 and irl revised form 6 August 1974)

new method is proposed for normalizing chromosome measurements based on the maximum likelihood principle. The method can be applied to any number of chromosomes and does not require measurements on all the chromosomes of a cell. Thus it is ideally suited for normalizing measurements from cells with partial or aberrant karyotypes. Several measures of performance for normalizers are proposed and used to evaluate the new method. Data from human metaphase chromosomes are used to illustrate the new method. Abstract-A

Normalization

Chromosomes

Automatic

karyotyping

INTRODUCTION Chromosome measurements such as length, area and DNA stain content vary widely from cell to cell due to differential staining and contraction of the chromosomes. Therefore, if one wishes to compare measurements on chromosomes from different individuals it is necessary first to remove this (within individual) cell to cell variation. The process of removing this cell to cell variation is called “normalization”. The usual method is to divide each chromosome measurement by the total of all the measurements for the cell from which it arose. However, this method has serious limitations that have been discussed in several papers (e.g. Ledley et al. [l] and Hilditch and Rutovitz [2]); basically, the problem is that the method can be applied only to cells that contain the normal complement of 46 chromosomes. Thus, it cannot be used on measurements from abnormal cells that contain other than 46 chromosomes. Nor can it be applied to measurements from normal cells for which it was not possible to obtain usable measurements on all 46 chromosomes. Various other methods for normalization have been proposed and reviewed by Hilditch and Rutovitz [a]. They also proposed a normalization method of their own which they show to be superior to any of the traditional methods. This paper reviews the problem of normalization from a mathematical-statistical viewpoint and presents a new method for normalization. This method is based on the maximum likelihood principle and minimizes the cell to cell variation in the measurements. In addition, several criteria are proposed for evaluating the effectiveness of normalization procedures. This method as well as its performance is illustrated on some human chromosome data. NORMALIZATION

METHOD

To simplify explanation of the normalization method, assume that measurements have been made on II metaphase cells, each containing m chromosomes. Let X, denote the measurement on the ith chromosome (i = 1,2,. . m) from the jth metaphase cell (j = 1,2,. . . 21

DAN H. MOOREII

22

n). Let Ci represent the multiplicative normalizer which is applied to all m chromosome measurements of thelth cell. Also, let pi and crf be the true (population) mean and variance of the normalized measurements (Cj Xij) on the ith chromosome. The goal of normalization is to find Cj so that the differences between the normalized measurements and the true means are as small as possible. Algebraicly, these differences can be written dij =

CjXij - pi.

In general, the size of the difference dij increases with the size of the mean Iii, so that it is appropriate to determine Cj so that the weighted sum S = ~ (CjXij - ~i)2/Oz 1

i=

(1)

is minimized. It can be shown (see Appendix) that minimizing S also maximizes the log of the likelihood function when the measurements are assumed to be normally and independently distributed random variables. Thus we call this method maximum likelihood (ML) normalization. Setting the derivative of S (with respect to Cj) equal to zero yields the solution c,

=

zi(xijPilOf)

J

(Xij/Oi)” .

pi

(2)

In practical applications the true means pi and the variances cr’ are unknown and must be estimated from data. This leads to the following iterative procedure for normalizing measurements (see flowchart in Fig. 1): 1. Find the sample means Xi = C Xij/?l J

and the sample variances So = C (Xij - Xi)‘/(i~ - l), j for each chromosome (i = 1,2,. . . m). 2. Using Xi and Sf as estimates of pi and a;, find Cj according (2). 3. Form corrected values

to equation

X!!’ = C!“X.. 1J

J

IJ’

where the superscript 1 refers to the results of the first iteration. Now calculate corrected means

and corrected variances (S:)(l) = C(X’iJ-!)- Xi’))‘/(n - 1). I

Steps 2 and 3 are repeated until the corrected values X$j”’stabilize (i.e. X$) differ little from X~~-” or. equivalently, C’(i”’converge to 1.0). In practice this usually occurs after two or three iterations. Once the corrected values have stabilized, the value for the normalizer

Normalization

of chromosome

measurements

23

Cj may be found by dividing the final corrected value X$) for any chromosome i by the original, unnormalized value X,. Finally, all measurements are resealed so that the sum of the autosome means is 100 units. This procedure for normalization is similar to one proposed by Hilditch and Rutovitz [2] although theirs was derived from a different criterion. Their method was derived by minimizing the variance of the normalizer Cj rather than by minimizing the variance of the normalized measurements. Also, their derivation requires an assumption that the measurements be independent. This restriction is not required in the present derivation, although independence among the measurements does ensure that the normalizer given in equation (2) maximizes the likelihood (see Appendix). Finally, since Hilditch and Rutovitz do not discuss estimation of ,ui and crt it is not clear how to use their procedure when these parameters are unknown. NORMALIZATION OR

IN

ABNORMAL

INCOMPLETE CELLS

In the preceding section it was assumed that m chromosomes were measured in each cell. For normal cells, m = 46, but in practice only the 44 autosomes are measured for the determinion of Cj, so as to avoid problems with the differences between male and female sex chromosomes. However, measurements on ariy chromosome can be eliminated from the calculations to determine the normalization constants Cj. For example, if it is known that some (or all) cells of a certain individual contain an abnormal chromosome, measurements on this chromosome should not be used in determining Cj. In addition, measurements are not required on an equal number of chromosomes from each metaphase cell. The index 111can be replaced with nij to denote the number of measurements in the jth cell. For example, if a measurement on one chromosome in cell j = 1 is missing, then m, = 43, not 44, and normalization is still possible. Thus we see that the minimum requirements for this procedure are: 1. at least one chromosome must be identified in each cell; 2. there must be at least two measurements, among all II cells, of each identified chromosome. The first requirement is no serious limitation since in most preparations the number 1 chromosomes are easily identified by their size and shape. The second requirement is easily met by sampling cells until the minimum number of measurements is obtained on identified chromosomes. Obviously, chromosomes that cannot be identified positively can still be normalized once the constants Cj are determined from measurements on the remaining (identified) chromosomes. CORRECTIONS

TO

NORMALIZER

It may be necessary to apply corrections to the normalizer Cj when there is evidence that the variable being normalized is affected by differential chromosomal contraction. For example, Neutrath [3] and Ledley [l] report a significant regression of length on total length of all chromosomes in a cell. Regression effects may be removed by the addition or subtraction of another constant, Ri, whose size is determined by the regression of chromosome i length on total cell length. An explanation of the method for determining the R;s can be found in either report. Fortunately, regression effects are insignificant for optical density measurements (e.g. less than 0.2 per cent of the variation in o.d. measurements on number 1 chromosomes is accounted for by regression on total cellular o.d.).

DAN H. MOOREII

24

EVALUATION

METHODS

A simple measure of the effectiveness of normalization

is the within-group variance

So = C (Yij - Pi)‘/(ni - l), where ~j is the normalized measurement on the ith chromosome from the jth cell, F is the mean of the normalized measurements on the ith chromosome and 11~is the number of i chromosomes. With the constraint that Ciyi = 100 for the 44 autosome means, smaller values of Sf indicate better normalization. An overall measure is provided by the weighted nrealr standard deviation

S = 1 nisi/C I

I

A second measure of normalization

n,.

efficiency is the total log likelihood (see Appendix)

TLL = c c (Yij - ~J’/S; 1 j

+ log $ .

This measure will, in general, increase with the total sample size N = Ci ni. Thus a more useful means for comparing normalization in different experiments is provided by the average log likelihood per measurement. % = TLL/N. Since one of the main reasons for normalizing is to increase the ability to distinguish between chromosomes of different karyotype number, a third measure of effectiveness is the ratio

F=

ci

ci cj cj

(x - T.)‘/Wl

(yij

-

Ti)2/Ci (Hi

-

1)



where

P = C niri/C n,. I

I

Those familiar with the analysis of variance technique will recognize F as the ratio used to test whether the 22 autosome means are from the same population. The test itself is not very informative since F is significant even for the unnormalized measurements, due to the large differences between the large and small chromosomes. Again, the value for F will increase with total sample size N, so that the suggested evaluation measure is the average Ffbr separability _ = FIN.

COMPARISONS

ON

ACTUAL

DATA

Blood samples were obtained from four males who had no known medical defects. Metaphase chromosomes were prepared, identified by banding, and restained and scanned by CYDAC (see Mendelsohn et al. [4,5]). The optical density was measured for each chromosome; these measurements are unnormalized measurements. The measurements in each complete cell (i.e. cells with useable measurements on all 44 autosomes) are then normalized by the autosome total for that cell (the “standard” method for normalizing measurements). The results here are called autosome normalized measurements. The unnormalized

Normalization

of chromosome

measurements

25

data also are normalized by the maximum likelihood (ML) procedure, described earlier, and given the name ML normalized measurements. Summary statistics for the three types of measurements are shown in Table 1. First, with either method of normalization, note the large reduction in cell-to-cell variability. For these data, the average standard deviation over the four experiments is reduced by 50.4 per cent for normalization by autosome total and 51.6 per cent for normalization by the ML procedure. After normalization, the data more closely follow a normal distribution. This is shown by the increase in the average log likelihood in the table: this average increase is 49.6 per cent for autosome normalization and 49.0 per cent for ML normalization over the four experiments. Normalization also increases the ability to distinguish the chromosomes using only optical density measurements, as is shown by the large increase in the average F for separability. The average increase over the four data sets was 289 per cent for autosome normalization and 309 per cent for ML normalization. In addition, the ML procedure allows all 1177 chromosome measurements to be used, whereas for normalization by autosome totals, only 827 (70 per cent) of the sample is useable. Convergence of the ML procedure is rapid, with at most three iterations required for C,j”’to be within 0.01 of 1.0. The summary statistics in Table 1 show that when all of the chromosomes are identifiable, the ML normalization procedure compares favorably with the usual method for normalization. To test the effectiveness of ML normalization when very few of the chromosomes are identifiable, normalization was accomplished using only information from the number 1 chromosomes. This simulates the performance of ML normalization when only the number 1 chromosomes are identifiable and used to determine the values for Cj. Once the Cj are determined, measurements on all chromosomes can then be normalized and Table 1. Summary

statistics

for unnormalized some data

Total Subject BHM

b

DHM

“nnorm.

b

MLM

5

JM

9

Totals

and

2 6

274

.I19

1.75

2.52

230

,085

2.10

4.97

274

,082

2.11

5. 33

Unnorm.

2b8

.I73

1.37

1.17

Auto. M. I..

138 268

.069 ,072

2.32 2.25

7. 54 6. 84

“NVXTL

228

,188

1.33

0.93

AlaO.

184

.097

2.05

3.30

M.

L.

228

.096

2. 04

3.44

“lln0ZTXl.

407

.179

I.34

1.11

Auto.

275

.075

2.19

6.44

M. L.

407

.075

2. 16

6. 73

1177

.lb5

“nll0rIlI. Auto. M. I..

number

44

autosome

normalized

?Normalized

of

usable

data

for

means

827

maximum

082

1177

,080

1.44

1.42

2. 16

5. 52

2.15

5.79

chromosomes

each is

individual 100;

data.

by

Avg. F

Aut0.t

Averages

*N I total

*+

Avg. LL

s. D.

chromo-

M.L.t

Weighted

+t;Unnormalized

Avg.

N>’

Method

cells

and normalized

likelihood

this

are allows

scaled

so

direct

comparisons

that

the

sum with

of

the

DAN H. MOORE II

26

Table 2. Summary

statistics

for M.L. normalization

AlltOSubject

Total

somes

cells

used

N

All

274

6

BHM

#l DHM

6

11

All

268

#1 MLM

5

12

All

228

#I JM

9

10

All

407

#1 Totals

and

2 6

18

All

Weighted

1177

#1

51

Differences

in

Avg.

Avg.

Avg.

Normalization

S.D.

LL

F

Avg.

Max.

,082

2.11 5.33 1.4% Z.-i%

,087

2.05

.072

2.25

6. 84

.O83

2.11

4.94

,096

2.04

3.44

,109

1.87

2.83 6. 73

4. 64

,075

2.16

.082

2.09

5.60

,080

2.15

5.79

.089

2.04

4. 69

1.4%

3.4%

1.7%

5. 7%

1.5%

3.4%

1. 5%

Averages

summary statistics calculated. The results are shown in Table 2 for the same data as used for Table 1. As expected, the average standard deviation is increased (10.8 per cent) when normalization is based on fewer chromosomes. There are corresponding decreases in average log likelihood (5.1 per cent) and average F for separability (18.9 per cent). The last two columns of the table show the average and maximum differences in the computed values for Cj. On the average, the value for Cj based on number 1 chromosomes alone differs Table 3. Mean and standard

BHM Chrome.

Mea*

deviations for M.L. normalized measurements DHM

Mean

s. D.

Mean

4.22

.ll

4.30

1

4.31

.lO

2

4.24

s. D.

Mean

density

Pooled

JM

MIA4

s. D.

optical

s. D.

Mean

s. D

.24

4.29

.08

4. 38

.16

4.21

.lO

4.22

.13

14

3

3. 53

.14 4.25 .lO 3.46

.08

3.55

3.51

.12

3.51

.11

4

3.35

.11

3.30

.08

3.33

.11

3.34

.09

3.33

.I0

5

3.19

.I3

3. 16

.I2

3. 16

.12

3.23

.I,

3.19

.12

6

3.02

.08

3.02

.oa

2.95

.18

2.99

.lO

3.00

.11

7

2.73

.ll

2.77

2.78

.06

2.79

.09

2.77

.08

8

2.57

.ll

2. 56

2.48

.10

2.56

.10

2. 55

.10

2.37

.lO

2.34

.07

2.34

.08

2.39

.14

2.35

.08

2. 36

.09

.lO

2.33

.06

2. 34

.09 .09 .09 .07 .07 .06 .06 .05 .05

12 4.18

14

9

2.33

.08

2. 34

10

2.33

.07

2. 36

.06 .09 .05 .07

11

2.33

.08

2.33

.11

2.35

12

2. 36

.08

2.33

.07

2.31

.08

7.. 31

.lO

2.32

13

1. 86

1.88

.08

1.98

.09

1.85

.07

1.88

14

1.80

.I2 .05

1.79

1.80

.I1

1.73

.07 I. 77

15

1.68

.08

1. 72

16

1. 55

1.61

17

1.48

18

1.41

1.18

.06 .06 .03 .05 .03 . 06 .04

1.70

.09

1.71

1.59

.05

1.62

.07

1. 59

1.45

.07

1.48

.05

1.47

1.40

.06

1.43

.06

1.41

1.06

.05

1.08

1.16

.02

1.16

.06

.07

0.81

.04 0.80

0.89

.05

0. 89

1.74

19

1.07

20

1.18

.08 .09 .05 .05 .03

21

0.81

.08

0. 80

.05

0.77

22

0. 87

.04

0. 88

0.91

x

2.68

. 02

2.60

Y

0.92

.06

0.94

.06 .06 .06

1.48 1.39 1.11

04

@4

04

1.08 1. 17

2.71

.05

2.63

.07

2. 65

1.10

.08

0.92

.O5

0. 96

04 06

.05 .05 .06

Normalization

of chromosome

27

measurements

from its value based on all 44 autosomes by less than 2 per cent. The changes shown in Table 2 are small and show that ML normalization is robust to changes in the number of chromosomes used in determining the normalization constants, Cj. Finally, to facilitate comparisons with data from other studies, Table 3 shows means and standard deviations for our optical density measurements. SUMMARY A new procedure for minimizing cell to cell variation in chromosome measurements is derived and is tested on samples from four normal male subjects. Three statistics are suggested for evaluating the effectiveness of normalization procedures and are used to show that the new procedure is superior to the usual method (dividing each measurement by the total for the cell from which it arose). The new procedure is iterative and can easily be used to normalize data from incbmplete and/or abnormal cells. It can be applied to any of the measurements now being made on chromosomes such as length, area and DNA stain content. Ackno,vlrcigrnlerlts-This work was performed USPHS GRANT 7 ROl GM 20291.

under

the auspices

of the U.S. Atomic

Energy

Commission

and

REFERENCES to automatic chromosome analysis, Comput. Biol. Med. 2, 107-128 (1972). C. J. Hilditch and D. Rutovitz, Normalization of chromosome measurements, Comput. Biol. Med. 2, 167-I 79 (1972). P. W. Neurath, B. Kess and D. A. Low, Individualized human karyotyping through quantitative analysis. Co/nptrt. Biol. Med. 2, 181-193 (1972). M. L. Mendelsohn and B. H. Mayall, Chromosome identification by image analysis and quantitative cytochemistry, in Hurnar~ Chromosome Methodology. J. Yunis, Ed., Academic Press, New York (1973). M. L. Mendelsohn. B. H. Mayall, E. Bogart, D. H. Moore II and B. H. Perry, DNA content and DNA-based centromeric index of the 24 human chromosomes, Science 179, 1126-l 129 (1973).

I. R. S. Ledley, H. A. Lubs and F. H. Ruddle, Introduction 2. 3. 4. 5.

ABOUT

THE

AUTHOR

DAN HOUSTON MOORE II earned his B.A. at the University of California Santa Barbara in 1963. He received a Ph.D. in biostatistics from the Univ. of Calif. Berkeley in September 1970. From 1970 to 1972 he worked as a research associate in the Depar’tment of Radiology at the University of Pennsylvania. While there he developed a statistical test for distinguishing between pairs of homologous chromosomes. He joined the staff as a biostatistician in the Biomedical Division of the Lawrence Livermore Laboratory in September 1972, where he is continuing work on the development and application of statistical methods to chromosome research.

APPENDIX Under

the assumption

that the normalized

measurements .xj = c,x,

are normally and independently xi is given by

distributed L(xj)

with known

means

pi and known

= (2 7~uf)-~” exp[ - l/2 (Kj - p$/uf].

variances

~2, the likelihood

for (Al)

DAN H. MOORE II

28 If the observations

are independent,

the joint likelihood

for all of the observations

Yj is given by

UUL(Y,J. The log likelihood into (A2) is

is obtained

by taking

cc{

i i

the log of expression

_1,2(Yij

- /_L,)‘/crf -

(A2) (A2) which,

1’210g(2~)logii,).

upon substitution

of expression

(Al)

(A3)

Maximizing the likelihood is accomplished by minimizing the negative of expression (A3). Substitution of C,S,j for xj in expression (A3) followed by differentiation with respect to Cj and setting this derivative equal to zero leads to equation (2) of the text.

Normalization of chromosome measurements: a new method.

Compur. Biol. Med. Pergamon Press 1975. Vol. 5, pp. 21-X NORMALIZATION Prmted m Great Britain OF CHROMOSOME A NEW METHOD MEASUREMENTS: DAN H...
529KB Sizes 0 Downloads 0 Views