GENOMICS
9. 1%.-1”:3
(1991)
A Simple Method for Ordering Loci Using Data from Radiation Hybrids C. T. The Lindsley
F. Kimball
Research Received
lnstltute July
of The New 11,
1990,
revised
SC 1991
Academic
York
Blood
September
Center, 21,
New
York,
New
York
10021
1990
method for mapping loci using data generated from radiation hybrids which involves estimating aparameter H, comparable to a recombination frequency, and several retention frequencies. The method both orders the loci and provides map distances using not only information on locus pairs, but also on sets of four loci simultaneously. Here, we take a very different approach, based not on a model of the system and its resulting parameters, but on a statistical measure between pairs of loci, coupled with an ordering algorithm. Our nonparametric approach not only provides a straightforward, relatively rapid method for ordering, but also provides a way of comparing results with the more extensive method of Cox et al. After presenting the method we will illustrate it,s use on two sets of loci and compare the resulting maps with those of Cox et al.
A method is presented for ordering loci on a chromosome based on data generated from radiation hybrids. All loci are tabulated as being present, absent, or not scored in a series of clones. Correlation coefficients are calculated for all pairs of loci indicating how often they are retained or lost together in the clones. On the assumption that a high positive correlation implies closely linked loci, a distance score, d, equal to one minus the correlation coefficient, is obtained for each locus pair and an order is generated that minimizes the sum of the adjacent distances [the MDMAP method of Falk (“Multipoint Mapping and Linkage Analysis Based upon Affected Pedigree Members: Genetic Analysis Workshop 6,” pp. 17-22, A. R. Liss, New York, 1989)]. Two sets of data, with information on 13 and 16 loci mapped to chromosome 21q, have been ordered using this method. The results are in very good agreement with other ordering methods used on the same data and with physical
mapping data.
FALK
Press, Inc.
METHODS Consider N loci that have been scored as beingpresent or absent in a set of M clones. For any given pair of loci we can construct a 2 X 2 table showing how many clones retained both, one, or neither of the loci (seeTable 1). We can calculate the correlation coeffcient, r, for these two loci which is simply
INTRODUCTION A useful method for mapping chromosomes has been developed by Cox et al. (1990) based on earlier work of Goss and Harris (1977). The method involves exposing chromosomes to a high dose of X rays in order to break the chromosomes into several fragments. These fragments can be recovered in rodent cells and a set of such rodent-human hybrid clones scored for the presence or absence of specific human DNA markers. The closer two loci are on a chromosome, the more likely it is that they will be retained on the same fragment. The farther apart, the more likely it is that they will be on separate fragments. Thus the frequency of breakage can be used as an estimate of distance between two markers. For a more detailed discussion of the method, see Cox et al. (1990). Because of the nature of the procedure, inferences can be made about the distance between any two loci based on information about retention or loss of the loci in the clones. Cox et al. (1990) have developed a 0%3s-7533/91 E3.W CopyrIght ICI 1991 by Academic Press. All rights of reproduction in any fbrm
/X2 r=t
-,
G M
where x’ is the standard value for a 2 X 2 table (Li, 1961). If ad > bc, the correlation is positive, if ad < bc, it is negative. Since the coretention of two loci in the same clones implies closely linked loci, we can use the correlation coefficient as a measure of how close or how distant two loci are, relative to the other loci tested. Thus a high positive correlation implies close linkage, a value near zero implies loose (or no) linkage. Significant negative correlations would not be expected. Let d = 1 - r, which we will use as our measure of “distance.” 120
Inc. reserved.
‘,,
ORDERING; LOCI LJSING RADIATION TABLE
1
Two by Two Table Showing Loci in M Clones Produced Methods
1,0cus1 t LOCllS1
Retention by Radiation
or Loss of Hybrid
Locus 2 +
Locus2 ~
n
h d
c
No&. a. b, c. and d represent the numberof clonesthat fall into the four classes shown in the table. where CI + b + c + d = M. x2 = (ad ~ hc)*fM/[(a + c)(b + dJ(a + h)(c + d)]. Falk (1989) outlines a preliminary ordering scheme that can then be used to order the loci based on the pairwise d values. The scheme is based on the assumption that, given the true order of a set of, say, three loci, the distance between the two flanking loci will be greater than the distance between the corresponding pairs of adjacent loci. If the true order of three loci were A-B-C, then we would expect d,, > 6,s and dAc > d,,. Therefore the most likely order of the three loci would be the one that minimizes the sum of the distances for the two adjacent intervals. Similar arguments can be used to show that a reasonable preliminary ordering for any number of loci can be obtained by looking for a “minimum distance map,” MDMAP (Falk, 1989). This requires looking for the map with the shortest total distance defined by the sum of all N - 1 adjacent distances from among the N!/2 possible orderings of N loci. For example, if we have three loci, A, B, and C with r values rAB = 0.9, = 0.8, and rBc = 0.2, the sum of adjacent d values rAr for the three orders would be Order A-B-C A-C-B B-A-C
HYBRID
DATA
121
proximal set of 14 loci (Cox et al., 1990) and a distal set of 19 loci (Burmeister et al., 1990). In these experiments, IO3 independent somatic cell hybrid clones were scored for the presence of each locus. Not all hybrids were scored for all markers, the number of clones scored ranging from 65 to 96 per marker in the first set and 51 to 92 per marker in the second. The number of clones scored for a given marker pair ranged from 53 to 95 in the first set and from 22 to 82 in the second. Of the loci scored, 13 in the first set and 16 in the second were separated by X-ray breakage and thus were informative for ordering. Using the raw data, indicating the presence or the absence of each locus in the hybrid cells, if scored, we calculated pairwise correlation coefficients for the separable loci in each data set and generated ordered maps as described above, using the computer program MDMAP (Falk, 1989) to search for the best order. The results are shown in Tables 2 and 3. The end result appears to be quite st,able in that the same final order is attained from several arbitrary starting orders. In addition to the orders and distances produced by the correlation method (CORMAP) the orders given by Cox et al. and Burmeister et al. are shown. CORMAP predicts the same order as Burmeister et al. for the 16 distal loci (Table 3) and agrees with Cox et 01. on the proximal set (Table 2) except for one nearest-neighbor pair (APP, S8) where the two methods give opposite orders. Cox et al. have estimated the odds in favor of their order over the reverse to be only 43:l. The Cox order is ranked second by CORMAP, with a distance of 2.852. Thus CORMAP appears to be a very good predictor of order, given information generated by radiation hybrid experiments.
Total distance
DISCUSSION
0.1 + 0.8 = 0.9 0.2 + 0.8 = 1.0 0.1 + 0.2 = 0.3
The ability to reliably order a set of loci on a chromosome becomes increasingly difficult as the number of loci grows. This is true whether one is using family data for mult,ilocus mapping by computer, pairwise lod score results, or dat,a generat.ed from a totally different approach, such as that produced by radiation hybrid mapping. It is important to obtain both the correct order of loci and the relative distances between them. Sometimes these two can be done in a single step, for example when using a computer program such as CRIMAP (Lander and Green, 1987) or MAPMAKER (Lander et al., 1987) to analyze family data. However, for more than a moderate number of loci the computer overhead becomes very high and in practice loci are often ordered in smaller sets and then combined into one complete map (see, e.g., Warren et al., 1989). Another approach is to first make a reasonable attempt t,o order the entire set of loci based on, say, pairwise information and then refine bot,h the
The best order would then be B-A-C, represented by the minimum total distance, d = 0.3. The same reasoning can be used for a set of N loci, with the shortest total distance giving the most likely map order. For moderate values of N the ordering can be done “by hand,” but for large N, the number of distinct orders to be considered becomes quite large. In such cases a computer algorithm such as the one proposed by Falk (1989), which makes use of simulated annealing (Kirkpatrick et nl., 1983), can be used to “search” for the minimum distance map. EXAMPLE
Two sets of loci on the long arm of chromosome 21 were studied using radiation hybrid techniques, a
122
C’. ‘I’. FALK
TABLE Comparison Source
of the Ordering
of 13 Loci
2
in the Proximal
Dist
cos CORMAP
Order
2.852 2.784
Note. COX Locus names
Sl6 S16
S48 S48
S46 S46
S4 S4
s5:! ST,”
Source
“351 2.351 BUR
represents
S58 S58
the order
in the Distal
the method
S‘l’i S-15
SOD SOD
presented
here.
GM29177 from the National R. Cox for many helpful and
ST,5
7‘ s.3
SlOl
s39
s40
S49
S55
b.3 ”
SlOl
s39
S&J
S-l9
ct nl. (2); CORMAP
Portion
of Chromosome
21q
of’ loci
Sl’i
in Burmeister
Sl’ Sl:!
3
s17
presented
(S8 API’) (APP SX)
This work is supported by Grant Institutes of Health. I thank David
Order SOD SOD
SlX S18
ACKNOWLEDGMENTS
Dist
BUR CORMAP Note. here.
of 16 Loci
Sl Sl
represents the order obtained using and the CORMAP order differ.
TABLE of the Ordering
21q
distant. A simple measure of distance between two loci can then be expressed as one minus the correlation. After defining the measure of distance, an ordering algorithm is then used based on those distances. Here the minimum distance map (MDMAP) method (Falk 1989) has been used, defining as the best preliminary order the one that minimizes the sum of t,he adjacent pairwise distances. This algorithm has also been used with classical family data with good results (Olson and Boehnke, 1990; Falk, unpublished) and has been tested using as the distance measure the H values generated by Cox et al. on radiation hybrid data. In the lat.ter case, the MDMAP algorithm chooses precisely the same order of both chromosome 21 data sets as that obtained using the correlation coefficients. This is not unexpected, as the H values and distances estimated by CORMAP are not independent measures. The CORMAP method thus appears to provide a simple alternative way to order loci from radiation hybrid data. As currently designed CORMAP provides a preliminary ordering procedure but has not yet been extended to estimate odds of one particular order over another or to estimate genetic dist,ances. Its strength is that it does not require the modeling of a breakage parameter and the underlying retention frequencies, but utilizes instead a nonparametric approach. It does, therefore, give a good starting point for fine tuning order and distances as well as an alternative method for comparison with other ordering techniques.
order and the relative positions by using more statistically precise methods and/or information on several loci simultaneously. Several approaches to initial ordering have been suggested, including those by Buetow and Chakravarti (1987), Weeks and Lange (1987), and Falk (1989). These methods are based on information about all locus pairs generated by classical family linkage analysis, but each has a different algorithm for ordering. In radiation hybrid mapping the data generated are quite different from that in family studies, but a similar principle holds. The closer two loci are on a chromosome, the less likely it is that there will be a break between them. In radiation hybrids, the break is caused by exposing human chromosomes to radiation. If the loci are closely linked, the chance of a break between them is low and thus the probability of seeing them retained or lost together in subsequently scored clones is high. Cox et al. (1990) have modeled the procedure based on a parameter 0 that is analogous to the recombination fraction, and several parameters representing retention frequencies of single loci or sets of loci. Using estimates of these parameters they are able to predict the most likely order of the loci as well as the relative distances between loci. Using these parameters they have also devised a method to compare relative odds of different orders, by flipping small sets of loci from the original order in a systematic fashion. It is the intent here to present an alternate, simple, model-free method of determining a likely order for the loci, based on the logical premise that the correlation coefficient, measuring absence or presence of pairs of loci in all of the clones, will be high for closely linked loci and will decrease for loci that are more
Comparison
of Chromosome
of’ loci
Sll Sll
represents the order presented in (‘ox et al. (3): CORMAP in parentheses indicate those loci for which the COS order
Portion
Sill s141
represents
(‘DlH CD18 the order
S26 S”6 obtained
S4-4 844 using
COL:! (‘OL’
COLl COLl
the method
SlOO s 100 presented
ORDERING stimulating discussions me prior to publication.
and for generously
LOCI
sharing
IJSING
his data
with
BUETOW, K. H., AND CHAKRAVARTI, A. (1987). Multipoint gene mapping using seriation. I. General methods. Amer. Hum. Genet. 41: 180-188.
7. J.
BURMEISTER, M., KIM, S., PRICE, E. R.. DE LANGE, T., TANTRAVAHI, LT., MYERS, R. M.. AND Cox, D. R. (1991). A map of the distal region of the long arm of human chromosome 21 constructed by radiation hybrid mapping and pulsed-field gel electrophoresis. Genomics 9: 19-30.
8.
3.
(lox, D. R.. BURMEISTER, M., PRICE, E. R., KIM, S., AND MYERS, R. M. (1990). Radiation hybrid mapping: A somatic cell genetic method for constructing high resolution maps of mammalian chromosomes. Science 250: 245-250.
9.
5.
FALK, C. ‘I’. (1989). A simple scheme for preliminary ordering of multiple loci: Application to 45 CF families. In “Multipoint Mapping and Linkage Analysis Based upon Affected Pedigree Members: Genetic Analysis Workshop 6” (R. C. Elston. M. A. Spence, S. E. Hodge. and .J. W. MacCluer, Eds.). pp. 17-22. A. R. Liss, New York. Goss,
S. -1.. AND HARRIS,
H. (1977).
Gene
transfer
by means
123
DATA
KIRKPATRICK, S., GELA’~T, C. D., AND VECCHI, M. P. (1983). Optimization by simulated annealing. Sriencp 220: 671&6&l. LANDER, E., AND GREEN, P. (1987). Construction of multilocus genetic linkage maps in humans. Proc. N&l. Acad. Sci. l!SA
2.
3.
HYBRID
of cell fusion: II. The mapping of 8 loci on human chromosome 1 by statistical analysis of gene assortment in somatic cell hybrids. J. Cell Sci. 25: 39-57. 6.
REFERENCES 1.
RADIATION
10.
84:
2363-2367.
LANDER, E.. GREEN, P., ABRAHAMSON, J.. BARLOW, A., DALY, M., LINCOLN, S., AND NEWBURG, L. (1987). MAPMAKER: An interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1: 174-181. LI, C. C. ( 1961). “Human Genetics: Principles and Methods.” pp. 8%83. McGraw-Hill. New York. OLSON. .J. M., AND BOEHNKE, M. (1990). Monte Carlo comparison of preliminary methods for ordering multiple genetic loci. Amer. J. Hum. Genet. 47: 470-482.
11.
WARREN, A. C., SLAUGENHAUPT. CHAKRAVARTI, A.. AND ANTONARAKIS, linkage map of 17 markers on human mics 4: 679-591.
12.
WEEKS, D., AND LANGE, K. (1987). Preliminary ranking cedures for multilocus ordering. &nomics 1: 2:X-242.
S. A., LEWIS, J. G., S. E. (1989). A genetic chromosome 21. Genupro-