155

Ann. Hum. Genet. (1992), 56, 155-158 Printed in Grral Britain

The effect of mutation on linkage disequilibrium A. I). CAROTHERS A N D A. F. W R I G H T Medical Research Council H u m a n Genetics Unit, Western General Hospital, Crewe Road, Edinburgh E H 4 2XU SUMMARY

The standard formula for the approach to linkage equilibrium between two diallelic loci, initially at diseyuilibrium, is expressed in terms of their probability of recombination ( T i , 1955). By a simple extension, we show how t o incorporate the effects of mutation at one or both loci. I t can thereby be inferred that in general these effects are unlikely to be of major importance, contrary t o some recent suggestions. INTRODVCTION

The apparently anomalous situation in which linkage disequilibrium exists between two loci, 2 and A say, but not between Z and R,where H is a third locus known to be closely linked t o A , can be explained in various ways. For example, mutations at Z may have occurred independently on several occasions in gametes th a t by chance carried identical alleles of A , but diEering alleles o f B . Alternatively, if the order of loci is %-A-H, there may be a recombinational ‘hot spot ’ between A and R. Over periods of many generations, measures of disequilibrium may become distorted as a result of changes in allele frequencies through selection, migration or random drift. Some authors have recently speculated th a t the anomaly might be explained, at least in part, by mutation at B (Snell et al. 1989; Theilmann et al. 1989; Gusella, 1991). The present study was motivated by a wish to investigate this suggestion quantitatively. MODE L FOR T H E E F F E C T S O F MUTATION

Suppose there are two diallelic loci A , with alleles a,and a 2 ,a n d I3 with alleles b, and 6,. Let x i l , xi2, x i 3 and xi4 be the respective population frequencies of gametes a , b,, a, b,, a2b, and a2b2 in the ith generation (where zj-,xii = 1). Then the linkage disequilibrium in the ith is defined as I xi, xi4- x i 2 xi3 1. generation, Di, To enable us to ignore thc effects of sampling variation, which are not relevant to the present argument, we effect,ively assume an infinite population of gametes. For convenience, we also suppose t ha t each generation of gametes undergoes mutation, then recombination to produce the next generation. Dcnot,e the mutation rates a1+a2,a,+a,, b,+b2 and b2+b,, by a,, a2,p1and p2 respectively. Then, letting y i j denote the frequency (in generation i ) of gamete j after mutation but before rccornbination (ignoring second order terms in the mutation rates) Yi, = X i , ( 1 - a,- P,)

yiz = x i z ( l - a 1 - / 3 2 )

p2

a2’

-PI) + xi1 a1 + xi4 P 2 , = Xi4(1 - a2-/3,) + xi2a , + xi3p,.

~ i = 3

yi,

+xi2 +xi3 + xi1 +xi4 012,

xi3(1-a*

156

A . D. CAROTHERSAND A. F. WRIGHT

By simple algebra it then follows, again ignoring second order terms, that

E, = (1 - y ) D i , where

Ei

=I~il~i4-~i2~i3 Y I=?a l + a z + P l + P z .

Since (assuming random mating)

D i = (l-O)Ei-l,

(Li, 1955)

\rhcrc 0 dcnotcs t h c recombination probability between A and H, it follo\vs that

Di = ( l - B ) ( l - y) D i - l

1:

( ~ - O - Y ) D ~=- ~( I - O - Y ) ~ D ~ .

(1)

+

Thus the effect of introducing mutation is simply to replace 0 in the standard formula by ( 0 y ) .

APPLICATION : DISEASE LOCUS LINKED TO TWO MARKERS

In deriving the above result, we have assumed that gametes have been sampled a t random from the population. However, in looking for evidence of linkage disequilibrium between a disease locus, Z say, and a DNA marker, A say, it is usual to compare a random sample of gametes carrying the disease allele (zl)with a random sample carrying the normal allele ( z 2 ) .In this situation, the above formula needs to be modified by defining

so that

+

xil xi2 = p i

+

and xi3 xi4= 1 - p i ,

where p i denotes the frequency of 2, in the population a t generation i , and nil, ni2,ni3 and ni4 the observed numbers of alleles of types zla,, z1a2,z2a1,and z p a 2 .The disequilibrium between Z and A in generation i then becomes

Note that this quantity depends upon p i . However, i f a second DNA marker, B say, is available, then the quantity D i ( Z A ) / D i ( Z Bis ) independent of p i and, applying result ( I ) above, then taking logarithms and approximating to first order terms in the Taylor series expansion, we get

where i here represents the number of generations, B,, the recombination probability between Z and A , y Z Athe sum of the four mutation rates at Z and A , and similarly for Bz, and y z B . A numerical example is provided by Snell et al. (1989),who reported linkage disequilibrium between the Huntington’s disease locus and the MboI enzyme restriction site within the marker D4S95, but not between the disease locus and the TaqI site within the same marker. The published figures were

Effect of mutation on linkage disequilibrium Tag1

MboI

c1 Huntington’s allele Sormal allele

157

c 2

El

E2

14 24

46

5

14

57

33

25

By making the assumptions ( i ) that there was a single original Huntington’s mutation that occurred on a chromosome carrying the marker alleles with which the Huntington’s allele is presently associated (i.c. C1 forMbo1 and either allele for TuqI),(ii) that the relative frequencies of marker alleles on chromosomes carrying the ‘Normal ’ allele at the Huntington’s locus have remained unchanged (though the conclusions that follow are robust to departures from this), we arrive at the following estimate of relative numbers in generation 0 : TagI

MboI .- -

(’2

(’1

Huntington’s allele Normal allrle

El

E:!

I

0

I

0

57

33

25

24

Substituting these values in the left-hand side of equation (2) and rearranging gives

3.6 Although the order and distariccs between the Huntington’s (Z),MboI ( A )and Il’uql ( B )loci are not known precisely, it can be inferred that the latter pair are separated by less than 5000 base pairs (Wasmuth et ul. 1988). Assuming average recombination rates, this corresponds to a recombination probability ( O z B - O z A ) , of less than 5 x per gamete per generation. Furthermore, mutation rates have been estimated to be typically in the range lop9to per base pcr year (Salser, 1977) and hence for an ‘average’ four-base site would be expected not to exceed per site per gencration of 25 years. Thus, even if the mutation rate were 100-fold greater than average a t the 7’uqI site ( y z s - y Z A ) would be unlikely to exceed per gamete pcr generation. Thest considerations suggest an upper limit of about 1.5 x lo-* for the denominator of the above expression, and hence a lower limit of about 25000 for the number of generations. Even were it possible for the Huntington’s mutation to have existed for this length of time, it is of course wholly implausible to suppose that the various assumptions involved in this derivation could remain valid over such a period.

DISCUSSION

By substituting various realistic combinations of parameter values in equations ( 1 ) and (2), we have not been able to determine any plausible circumstances in which mutation would have a noticeable influence on measures of disequilibrium. For loci separated by distances greater than lo5 base pairs, the effects of mutation are negligible compared with those of recombination. For more closely-separated loci where the effects of mutation and recombination may be comparable, the number of generations required to explain the observations becomes extremely large, and alternative explanations, of which there is no dearth (see above), become potentially of greater importance than either. These considerations would exclude mutation as a general explanation for anomalous findings such as those discussed above. Any particular case might

158

A. D. CAROTHERSAND A. F. WRIGHT

of course be explained as a consequence of a mutation occurring at a marker site ( B )in a gamete carrying the rarer allele of 2, at a stage when only a few copies of the latter were present in the population. However, by its nature such an event must be highly improbable.

REFERENCES

GUSELLA,J. F. (1991). Huntington’s Disease. I n Advances in Human Genetics 20 (ed. H . Harris and K . Hirschhorn), pp. 125-151. New York: Plenum. LI, C. C. (1955). Population Genetics. Chicago: University of Chicago Press. SALSER,W. (1977). Globin mRNA sequences: analysis of base pairing and evolutionary implications. Cold Spring Harbor Symposia on Quantitatiwe Biology XLII, 985-1002. S?~F.I.I,. R,. (;.. TIAZAROIT. I,. P.. POVNGMAN, S..QUARREI~I., 0.W. J . , WASMUTH, J. J., SHAW,D. J. & HARPER, P. S. (1989).Linkage disequilibrium in Huntington’s disease: an improved localisation for the gene. J . Med. Genet. 26, 673-675. TNEILMANN, , J . , KANANI, S., SHIANG, R., ROBBINS, C., QUARRELL, O., HUGGINS, M., HEDRICK,A., WEBER,B., COLLINS, C., WASMUTH, J. J., BUETOW,K. H., MURRAY,J . C. & HAYDEN,M. R. (1989). Non-random association between alleles detected a t 04595 and D4S98 and the Huntington’s disease gene. J. Med. Genet. 26, 67&681. WASMUTH, J. J., HEWITT,J., SMITH,B., ALLARD, D., HAINES,J. L., SKARECKY, D., PARTLOW, E. & HAYDEN, M . R. (1988). A highly polymorphic locus very tightly linked t o the Huntington’s disease gene. Nature 332, 734-736.

The effect of mutation on linkage disequilibrium.

The standard formula for the approach to linkage equilibrium between two diallelic loci, initially at disequilibrium, is expressed in terms of their p...
218KB Sizes 0 Downloads 0 Views