9. 1%.-1”:3

(1991)

A Simple Method for Ordering Loci Using Data from Radiation Hybrids C. T. The Lindsley

F. Kimball

Research Received

lnstltute July

of The New 11,

1990,

revised

SC 1991

Academic

York

Blood

September

Center, 21,

New

York,

New

York

10021

1990

method for mapping loci using data generated from radiation hybrids which involves estimating aparameter H, comparable to a recombination frequency, and several retention frequencies. The method both orders the loci and provides map distances using not only information on locus pairs, but also on sets of four loci simultaneously. Here, we take a very different approach, based not on a model of the system and its resulting parameters, but on a statistical measure between pairs of loci, coupled with an ordering algorithm. Our nonparametric approach not only provides a straightforward, relatively rapid method for ordering, but also provides a way of comparing results with the more extensive method of Cox et al. After presenting the method we will illustrate it,s use on two sets of loci and compare the resulting maps with those of Cox et al.

A method is presented for ordering loci on a chromosome based on data generated from radiation hybrids. All loci are tabulated as being present, absent, or not scored in a series of clones. Correlation coefficients are calculated for all pairs of loci indicating how often they are retained or lost together in the clones. On the assumption that a high positive correlation implies closely linked loci, a distance score, d, equal to one minus the correlation coefficient, is obtained for each locus pair and an order is generated that minimizes the sum of the adjacent distances [the MDMAP method of Falk (“Multipoint Mapping and Linkage Analysis Based upon Affected Pedigree Members: Genetic Analysis Workshop 6,” pp. 17-22, A. R. Liss, New York, 1989)]. Two sets of data, with information on 13 and 16 loci mapped to chromosome 21q, have been ordered using this method. The results are in very good agreement with other ordering methods used on the same data and with physical

mapping data.

FALK

Press, Inc.

METHODS Consider N loci that have been scored as beingpresent or absent in a set of M clones. For any given pair of loci we can construct a 2 X 2 table showing how many clones retained both, one, or neither of the loci (seeTable 1). We can calculate the correlation coeffcient, r, for these two loci which is simply

INTRODUCTION A useful method for mapping chromosomes has been developed by Cox et al. (1990) based on earlier work of Goss and Harris (1977). The method involves exposing chromosomes to a high dose of X rays in order to break the chromosomes into several fragments. These fragments can be recovered in rodent cells and a set of such rodent-human hybrid clones scored for the presence or absence of specific human DNA markers. The closer two loci are on a chromosome, the more likely it is that they will be retained on the same fragment. The farther apart, the more likely it is that they will be on separate fragments. Thus the frequency of breakage can be used as an estimate of distance between two markers. For a more detailed discussion of the method, see Cox et al. (1990). Because of the nature of the procedure, inferences can be made about the distance between any two loci based on information about retention or loss of the loci in the clones. Cox et al. (1990) have developed a 0%3s-7533/91 E3.W CopyrIght ICI 1991 by Academic Press. All rights of reproduction in any fbrm

/X2 r=t

-,

G M

where x’ is the standard value for a 2 X 2 table (Li, 1961). If ad > bc, the correlation is positive, if ad < bc, it is negative. Since the coretention of two loci in the same clones implies closely linked loci, we can use the correlation coefficient as a measure of how close or how distant two loci are, relative to the other loci tested. Thus a high positive correlation implies close linkage, a value near zero implies loose (or no) linkage. Significant negative correlations would not be expected. Let d = 1 - r, which we will use as our measure of “distance.” 120

Inc. reserved.

‘,,

ORDERING; LOCI LJSING RADIATION TABLE

1

Two by Two Table Showing Loci in M Clones Produced Methods

1,0cus1 t LOCllS1

Retention by Radiation

or Loss of Hybrid

Locus 2 +

Locus2 ~

n

h d

c

No&. a. b, c. and d represent the numberof clonesthat fall into the four classes shown in the table. where CI + b + c + d = M. x2 = (ad ~ hc)*fM/[(a + c)(b + dJ(a + h)(c + d)]. Falk (1989) outlines a preliminary ordering scheme that can then be used to order the loci based on the pairwise d values. The scheme is based on the assumption that, given the true order of a set of, say, three loci, the distance between the two flanking loci will be greater than the distance between the corresponding pairs of adjacent loci. If the true order of three loci were A-B-C, then we would expect d,, > 6,s and dAc > d,,. Therefore the most likely order of the three loci would be the one that minimizes the sum of the distances for the two adjacent intervals. Similar arguments can be used to show that a reasonable preliminary ordering for any number of loci can be obtained by looking for a “minimum distance map,” MDMAP (Falk, 1989). This requires looking for the map with the shortest total distance defined by the sum of all N - 1 adjacent distances from among the N!/2 possible orderings of N loci. For example, if we have three loci, A, B, and C with r values rAB = 0.9, = 0.8, and rBc = 0.2, the sum of adjacent d values rAr for the three orders would be Order A-B-C A-C-B B-A-C

HYBRID

DATA

121

proximal set of 14 loci (Cox et al., 1990) and a distal set of 19 loci (Burmeister et al., 1990). In these experiments, IO3 independent somatic cell hybrid clones were scored for the presence of each locus. Not all hybrids were scored for all markers, the number of clones scored ranging from 65 to 96 per marker in the first set and 51 to 92 per marker in the second. The number of clones scored for a given marker pair ranged from 53 to 95 in the first set and from 22 to 82 in the second. Of the loci scored, 13 in the first set and 16 in the second were separated by X-ray breakage and thus were informative for ordering. Using the raw data, indicating the presence or the absence of each locus in the hybrid cells, if scored, we calculated pairwise correlation coefficients for the separable loci in each data set and generated ordered maps as described above, using the computer program MDMAP (Falk, 1989) to search for the best order. The results are shown in Tables 2 and 3. The end result appears to be quite st,able in that the same final order is attained from several arbitrary starting orders. In addition to the orders and distances produced by the correlation method (CORMAP) the orders given by Cox et al. and Burmeister et al. are shown. CORMAP predicts the same order as Burmeister et al. for the 16 distal loci (Table 3) and agrees with Cox et 01. on the proximal set (Table 2) except for one nearest-neighbor pair (APP, S8) where the two methods give opposite orders. Cox et al. have estimated the odds in favor of their order over the reverse to be only 43:l. The Cox order is ranked second by CORMAP, with a distance of 2.852. Thus CORMAP appears to be a very good predictor of order, given information generated by radiation hybrid experiments.

Total distance

DISCUSSION

0.1 + 0.8 = 0.9 0.2 + 0.8 = 1.0 0.1 + 0.2 = 0.3

The ability to reliably order a set of loci on a chromosome becomes increasingly difficult as the number of loci grows. This is true whether one is using family data for mult,ilocus mapping by computer, pairwise lod score results, or dat,a generat.ed from a totally different approach, such as that produced by radiation hybrid mapping. It is important to obtain both the correct order of loci and the relative distances between them. Sometimes these two can be done in a single step, for example when using a computer program such as CRIMAP (Lander and Green, 1987) or MAPMAKER (Lander et al., 1987) to analyze family data. However, for more than a moderate number of loci the computer overhead becomes very high and in practice loci are often ordered in smaller sets and then combined into one complete map (see, e.g., Warren et al., 1989). Another approach is to first make a reasonable attempt t,o order the entire set of loci based on, say, pairwise information and then refine bot,h the

The best order would then be B-A-C, represented by the minimum total distance, d = 0.3. The same reasoning can be used for a set of N loci, with the shortest total distance giving the most likely map order. For moderate values of N the ordering can be done “by hand,” but for large N, the number of distinct orders to be considered becomes quite large. In such cases a computer algorithm such as the one proposed by Falk (1989), which makes use of simulated annealing (Kirkpatrick et nl., 1983), can be used to “search” for the minimum distance map. EXAMPLE

Two sets of loci on the long arm of chromosome 21 were studied using radiation hybrid techniques, a

122

C’. ‘I’. FALK

TABLE Comparison Source

of the Ordering

of 13 Loci

2

in the Proximal

Dist

cos CORMAP

Order

2.852 2.784

Note. COX Locus names

Sl6 S16

S48 S48

S46 S46

S4 S4

s5:! ST,”

Source

“351 2.351 BUR

represents

S58 S58

the order

in the Distal

the method

S‘l’i S-15

SOD SOD

presented

here.

GM29177 from the National R. Cox for many helpful and

ST,5

7‘ s.3

SlOl

s39

s40

S49

S55

b.3 ”

SlOl

s39

S&J

S-l9

ct nl. (2); CORMAP

Portion

of Chromosome

21q

of’ loci

Sl’i

in Burmeister

Sl’ Sl:!

3

s17

presented

(S8 API’) (APP SX)

This work is supported by Grant Institutes of Health. I thank David

Order SOD SOD

SlX S18

ACKNOWLEDGMENTS

Dist

BUR CORMAP Note. here.

of 16 Loci

Sl Sl

represents the order obtained using and the CORMAP order differ.

TABLE of the Ordering

21q

distant. A simple measure of distance between two loci can then be expressed as one minus the correlation. After defining the measure of distance, an ordering algorithm is then used based on those distances. Here the minimum distance map (MDMAP) method (Falk 1989) has been used, defining as the best preliminary order the one that minimizes the sum of t,he adjacent pairwise distances. This algorithm has also been used with classical family data with good results (Olson and Boehnke, 1990; Falk, unpublished) and has been tested using as the distance measure the H values generated by Cox et al. on radiation hybrid data. In the lat.ter case, the MDMAP algorithm chooses precisely the same order of both chromosome 21 data sets as that obtained using the correlation coefficients. This is not unexpected, as the H values and distances estimated by CORMAP are not independent measures. The CORMAP method thus appears to provide a simple alternative way to order loci from radiation hybrid data. As currently designed CORMAP provides a preliminary ordering procedure but has not yet been extended to estimate odds of one particular order over another or to estimate genetic dist,ances. Its strength is that it does not require the modeling of a breakage parameter and the underlying retention frequencies, but utilizes instead a nonparametric approach. It does, therefore, give a good starting point for fine tuning order and distances as well as an alternative method for comparison with other ordering techniques.

order and the relative positions by using more statistically precise methods and/or information on several loci simultaneously. Several approaches to initial ordering have been suggested, including those by Buetow and Chakravarti (1987), Weeks and Lange (1987), and Falk (1989). These methods are based on information about all locus pairs generated by classical family linkage analysis, but each has a different algorithm for ordering. In radiation hybrid mapping the data generated are quite different from that in family studies, but a similar principle holds. The closer two loci are on a chromosome, the less likely it is that there will be a break between them. In radiation hybrids, the break is caused by exposing human chromosomes to radiation. If the loci are closely linked, the chance of a break between them is low and thus the probability of seeing them retained or lost together in subsequently scored clones is high. Cox et al. (1990) have modeled the procedure based on a parameter 0 that is analogous to the recombination fraction, and several parameters representing retention frequencies of single loci or sets of loci. Using estimates of these parameters they are able to predict the most likely order of the loci as well as the relative distances between loci. Using these parameters they have also devised a method to compare relative odds of different orders, by flipping small sets of loci from the original order in a systematic fashion. It is the intent here to present an alternate, simple, model-free method of determining a likely order for the loci, based on the logical premise that the correlation coefficient, measuring absence or presence of pairs of loci in all of the clones, will be high for closely linked loci and will decrease for loci that are more

Comparison

of Chromosome

of’ loci

Sll Sll

represents the order presented in (‘ox et al. (3): CORMAP in parentheses indicate those loci for which the COS order

Portion

Sill s141

represents

(‘DlH CD18 the order

S26 S”6 obtained

S4-4 844 using

COL:! (‘OL’

COLl COLl

the method

SlOO s 100 presented

ORDERING stimulating discussions me prior to publication.

and for generously

LOCI

sharing

IJSING

his data

with

BUETOW, K. H., AND CHAKRAVARTI, A. (1987). Multipoint gene mapping using seriation. I. General methods. Amer. Hum. Genet. 41: 180-188.

7. J.

BURMEISTER, M., KIM, S., PRICE, E. R.. DE LANGE, T., TANTRAVAHI, LT., MYERS, R. M.. AND Cox, D. R. (1991). A map of the distal region of the long arm of human chromosome 21 constructed by radiation hybrid mapping and pulsed-field gel electrophoresis. Genomics 9: 19-30.

8.

3.

(lox, D. R.. BURMEISTER, M., PRICE, E. R., KIM, S., AND MYERS, R. M. (1990). Radiation hybrid mapping: A somatic cell genetic method for constructing high resolution maps of mammalian chromosomes. Science 250: 245-250.

9.

5.

FALK, C. ‘I’. (1989). A simple scheme for preliminary ordering of multiple loci: Application to 45 CF families. In “Multipoint Mapping and Linkage Analysis Based upon Affected Pedigree Members: Genetic Analysis Workshop 6” (R. C. Elston. M. A. Spence, S. E. Hodge. and .J. W. MacCluer, Eds.). pp. 17-22. A. R. Liss, New York. Goss,

S. -1.. AND HARRIS,

H. (1977).

Gene

transfer

by means

123

DATA

KIRKPATRICK, S., GELA’~T, C. D., AND VECCHI, M. P. (1983). Optimization by simulated annealing. Sriencp 220: 671&6&l. LANDER, E., AND GREEN, P. (1987). Construction of multilocus genetic linkage maps in humans. Proc. N&l. Acad. Sci. l!SA

2.

3.

HYBRID

of cell fusion: II. The mapping of 8 loci on human chromosome 1 by statistical analysis of gene assortment in somatic cell hybrids. J. Cell Sci. 25: 39-57. 6.

REFERENCES 1.

RADIATION

10.

84:

2363-2367.

LANDER, E.. GREEN, P., ABRAHAMSON, J.. BARLOW, A., DALY, M., LINCOLN, S., AND NEWBURG, L. (1987). MAPMAKER: An interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1: 174-181. LI, C. C. ( 1961). “Human Genetics: Principles and Methods.” pp. 8%83. McGraw-Hill. New York. OLSON. .J. M., AND BOEHNKE, M. (1990). Monte Carlo comparison of preliminary methods for ordering multiple genetic loci. Amer. J. Hum. Genet. 47: 470-482.

11.

WARREN, A. C., SLAUGENHAUPT. CHAKRAVARTI, A.. AND ANTONARAKIS, linkage map of 17 markers on human mics 4: 679-591.

12.

WEEKS, D., AND LANGE, K. (1987). Preliminary ranking cedures for multilocus ordering. &nomics 1: 2:X-242.

S. A., LEWIS, J. G., S. E. (1989). A genetic chromosome 21. Genupro-