Am. J. Hum. Genet. 46:1156-1157, 1990

Genetic Loci Ordering Instability: An Example H. Falk* and C. T. Falkt *Department of Physics, City College of the City University of New York, and fLindsley F. Kimball Research Institute, New York Blood Center, New York

Summary In attempting to establish the order of genetic loci by constructing a map from pairwise linkage data, one assumes that the loci satisfy a linear-order relation. If the data utilized in the construction are not consistent with the linear-order assumption, then a very small change in the data may lead to a large qualitative change in the map. An example of such an instability is presented in this paper.

Introduction

Various methods have been proposed for the ordering of multiple linked loci. Several of the methods utilize information based on pairwise linkage data. In these methods (e.g., see Buetow and Chakravarti 1987; Weeks and Lange 1987; Wilson 1988; Falk 1989) the assumption is made that loci satisfy a linear-order relation (Simmons 1963), as do the real numbers. The methods present a procedure for determining (in a specified sense) the best order of the loci. However, the data available for the methods may or may not be consistent with the ordering assumption. For example, if one had (pairwise) distance data for all points on a two-dimensional, 5 x 5 square grid, then any attempt to order the 25 points on a line would be misguided. It is the purpose of the present paper to present an explicit example which demonstrates that, if one has a data set not consistent with linear order, then a small change in the data can lead to quite different conclusions as to the best ordering of the loci. Since sampling fluctuations (e.g., chance fluctuations in recombination fraction estimates), incomplete data, and imprecise data (e.g., errors in the marker data) may indeed lead to violation of the assumption of linear order, such a simple example, contrived as it is, may nevertheless be of interest. The example is based on minimizing the sum of adjacent recombination frequencies to form a "minimumReceived November 30, 1989; revision received February 7, 1990. Address for correspondence and reprints: Dr. Catherine T. Falk, New York Blood Center, 310 East 67th Street, New York, NY 10021. o 1990 by The American Society of Human Genetics. All rights reserved. 0002-9297/90/4606-0016$02.00

1156

distance" map (akin to the shortest tour in the "traveling salesman problem"). The minimum-distance map method (Falk 1989) for preliminary ordering requires estimates of all N(N-1)/2 pairwise recombination frequencies for a set of N linked loci. For a given configuration (order) of N loci one calculates the above sum. In principle, the sum is recalculated for each of the distinct N!/2 configurations. The configurations that yield the minimum value of the sum are candidates for a minimum-distance map. For small values of N, say for N < 10, one is able to perform an explicit computer comparison of all the configurations. For large N one uses a computer algorithm, such as simulated annealing (Kirkpatrick et al. 1983), to search the large set of configurations for an approximate minimum-distance map. The Example Consider a set of five points (loci) arranged such that four of the points lie on a circle of radius r = 1 and the fifth point is at the center of the circle. Define the angles 0 and 4 as shown in figure 1. Thus, the (pairwise) distance between any two points can be expressed in terms of trigonometric functions of the angles 0 and 4. The distance between any pair of points i, j is given by the element Rij of the following symmetric matrix: 1 m k 0 d 0 m d 1 f 1 1 0 1 R = 1 m 1 0 d k 0 Lm d f 1

I],

Genetic Loci Ordering Instability

1157 Table I sin 4

Tours of Minimum Length

Minimum Tour Length

.5 .5 - E

3.035276180408 3.035276180408

1 3

2 1

5 2

4 5

3 4

+ e + F + F + F

3.035276180410 3.035276180410 3.035276180410 3.035276180410

1 1 2 2

2 2 1 1

3 3 3 3

4 5 4 5

5 4 5 4

,

.5 .5 .5 .5

have point 3 in the middle of the five points. Thus, a very small change in c results in a small change in the total tour length but in a significant shift in the "best" order, with point 3 moving from an end to the middle.

4 Figure I

Five loci arranged such that four are on the circle of unit radius and one is at the center of the circle. In the text the distance between loci i, j is denoted by Re, where R12 = R45 = d, RI = R23 = R =R=s. = 1, RI4 = k, RIs = R = m, and R2

= f. (The line R5. has been intentionally omitted from the diagram, for simplicity.) The equations relating d, k, m, and f to the angles 0 and are given in the text. A "minimum-distance" map is associated with any order of loci such that a tour of the loci in the prescribed order corresponds to the minimum path length of the 5!/2 distinct tours. (On any tour each and every locus is visited once and only once.)

Discussion

The example we have considered was constructed to provide data violating linear order, so as to illustrate a potential problem in interpreting actual data, which is typically imprecise. One must simply be cautious in interpreting and accepting results obtained from methods of this kind. But the methods, at least, provide useful preliminary guidelines in seeking solutions to very

difficult problems.

Acknowledgments where d = 2 sin 0 and f = 2 sin c and where the law of cosinesgivesm = V 2 - 2 cos (2++20) andk = / 2 - 2 cos (2d++40). For a given 0 and 4 one can calculate the (pairwise) distances and then calculate the lengths of the 5!/2 = 60 distinct "tours" to determine the set of tours with minimum length. We find for a particular choice of 0 and 4 (given below) that very small changes in those chosen values lead to an abrupt change in the order of the points in the tours of minimum length even though the minimum tour length necessarily changes by a very small amount. That instability in the order of the points translates into an instability in the ordering of loci. Numerical Details Select 0 = n/12 and consider values of 4 such that sin 4 = 0.5 ± s where £ = 10-12. The results are given in table 1. It is seen for sin X = 0.5 - s that the shortest tours always have point 3 as an endpoint, whereas for sin 4 = 0.5 + s the shortest tours always

This work was supported, in part, by National Institutes of Health grant GM29177 to C.T.E

References Buetow KH, Chakravarti A (1987) Multipoint gene mapping using seriation. I. General methods. Am J Hum Genet 41:180-188 Falk CT (1989) A simple scheme for preliminary ordering of multiple loci: application to 45 CF families. In: Elston RC, Spence MA, Hodge SE, MacCluerJW (eds) Multipoint mapping and linkage based upon affected pedigree members: Genetic Analysis Workshop 6. Alan R LiitNew York, pp 17-22 Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220:671-680 Simmons GF (1963) Introduction to topology and modern analysis. McGraw-Hill, New York Weeks D, Lange K (1987) Preliminary ranking procedures for multilocus ordering. Genomics 1:236-242 Wilson S (1988) A major simplification in the preliminary ordering of linked loci. Genet Epidemiol 5:75-80

Genetic loci ordering instability: an example.

In attempting to establish the order of genetic loci by constructing a map from pairwise linkage data, one assumes that the loci satisfy a linear-orde...
257KB Sizes 0 Downloads 0 Views