THEORETICAL

POPULATION

BIOLOGY

8,

117-126 (1975)

Linkage Disequilibrium among Multiple Neutral Alleles Produced by Mutation in Finite Population WILLIAM Institute

of Animal

G.

HILL

Genetics, West Mains Road, Edinburgh, EH9 3 JN, Scotland Received September

20, 1974

An analysis is undertaken for a finite random mating population of the linkage disequilibrium between two loci, at both of which all alleles are neutral, all mutant alleles differ from existing ones and several may be segregating at any time. Formulae are derived for the expected total squared disequilibrium, measured as the sum of squares of disequilibria between all pairs of alleles. The ratio of this quantity to the expected value of the product of the heterozygosities at the two loci is similar to that obtained previously by Ohta and Kimura for two nucleotide sites at each of which not more than two mutant types can segregate at any time.

A model of mutation was introduced by Kimura and Crow (1964) in which each mutant allele at a locus is assumed never to have existed previously in the population, and several may be segregating at any time. This has become known as the “infinite alleles” model and was used by Kimura and Crow (1964) to compute expected homozygosities and the effective number of segregating alleles in a finite population, assuming all alleles had no effect on fitness. An alternative model is one of many nucleotide sites at which there are molecular mutations, with mutation at each site so rare that new mutants only occur at nonsegragating sites and not more than two mutants are present at any site at one time. This “infinite sites” model, originally of Karlin and McGregor (1967), was used by Kimura (1969) to find the expected number of heterozygous sites in finite population. Ewens (1974) h as recently contrasted the two models. Using the infinite sites model, Ohta and Kimura (1969) computed the variance of linkage disequilibrium between pairs of sites. In this note the equivalent result is obtained for the infinite alleles model in which account has now to be taken of several alleles segregating at each locus. The results, however, can be expressed in a simple way. Copyright 0 1975 by Academic Press, Inc. All rights of reproduction in any form reserved.

117

118

WILLIAM

G. HILL

ANALYSIS

The following definitions and assumptions are made: t is the generation number; p, , pi are the frequencies of alleles Ah , Ai at the A locus; qi , qk are the frequencies of alleles Bj , B, at the B locus; fij is the frequency of the chromosome AiBi ; Dij = fir - piqj is the disequilibrium between alleles Ai and Bj ; 01and /I are the number of alleles at loci A and B, respectively, which have existed by generation t; N is the number of monecious diploids in the population, which is assumed to be random mating; u and a are the mutation rates per generation at loci A and B, respectively, with all mutations being to new alleles; c is the recombination fraction between loci A and B; for simplicity N is assumed to be sufficiently large and u, v and c sufficiently small that 1/N2, u2, v2 and c2 can be ignored relative to l/N, i.e., u, v and c are 0(1/N), but the effects of relaxing this assumption are discussed;

U = Nu, V = NV, C = NC. A haploid model is used in which mutation and recombination are assumed to change chromosome frequencies deterministically, and from these new frequencies a sample of 2N chromosomes is taken from a multinomial distribution. All existing and mutant alleles are assumed to be neutral, so no frequency changes occur from selection, and the expected values of all disequilibria are zero. Let us consider the vector yhi,jk(t) of moments associated with alleles A, , Ai , Bj and B, at generation t, defined by

E(Pd)i,

Yhi,Occ(t)=

i

E( P?iPiWk) (t) + Ptd’ij + Pit@,, + Pi@hj)(t) 9 E(%jDib + DhDij)(t) 1

where E( )o) denotes the expected value of the quantity at generation t. Denoting by D, R and M the transition matrices for changes in these moments due to drift, recombination and mutation, respectively, we have for the haploid model ~hi.jtc(t+l) = DR1VLym(t) ’ Using, for example, the methods of Hill (1974) it can be shown that

(1)

LINKAGE

119

DISEQUILIBRIUM

where x = (2N)-l. The matrix D is essentially that obtained for two alleles at each locus by Hill and Robertson (1968) and subsequently others, and has been given recently by Weir and Cockerham (1974), for a vector y defined slightly differently, in the more general multiple allele case. From (2),

where I is the identity matrix. The recombination and mutation matrices are, respectively,

(li)J

=I-(l,$

i

2;)+o(N-,,

and M = (1 - u)” (1 - v)” I = I - (1/N)(2U + 2V)I + o(P). Therefore, DRM=I-(l/N)

-l/2 5/2+C+2U+2V -1

1+2u+2v 0 ---I i

0 -1 3/2+2C+2U+2V

i

+ op2) = I - (l/N)P

+ O(N-a),

(3)

say. Now consider variation from all alleles present at the two loci and define X(t) = & h#i

Let us denote by HA = ~&iphpi at the A and B loci. Also, since

&

Yhi.jkW *

ifk

and HB = &+,

we have

hG*

D,, = Dij .

q&k the heterozygosities

120

WILLIAM

G. HILL

Using these relationships, xu) can be rewritten into more meaningful quantities:

Using (1) and (4),

x(t+l) = D-‘&t)

+ w(t) ,

(5)

and substituting in (3),

J&+1)= [I - (l/N)P] X(t)+ W(t)+ OW),

(6)

where wo) denotes the vector of expected increments due to mutation to new alleles. Before considering the magnitude of w(~) it is helpful to review the analysis of Kimura and Crow (1964) f or single loci. Then E(H,) is reduced by a factor (1 - 1/2N)(l - u)” = 1 - 1/2N - 2u + O(N-a) due to drift and mutation from old alleles, and increased by the quantity (expected number of mutants x heterozygosity from a new mutant) which equals 2Nu x 2/2N = 2u, again ignoring terms in N-2. Hence, E(fJ~(t+d

= E(H,dl

- WN - 24 + 2u,

and at equilibrium E(H,) = 4Nu/(4Nu + 1) = 4U/(4U + l),

(7)

corresponding to a value of 1/(4U + 1) for the homozygosity, derived by Kimura and Crow. The increment in E(HAHB) due to new mutation is, by extending the single locus arguments, WI(~)= 2WH,d

+ 2ufV&d

+ OW2).

The values of disequilibria associated with a new mutant, A,+r , at the A locus, occurring on a chromosome containing Bj in some population, are D a+l,j = 1/2N - qj/2N = (1 - q,)/2N, D a+~ = --P&N

k #j.

LINKAGE

121

DISEQUILIBRIUM

The probability that the mutation is associated with Bj is qi ; hence the total increment in cc Dfi in this population due to the single mutation at the A locus is expected to be MY

ci 4j[(I- %Y +

4,;] =(2N)-‘(1

= (2N)-2HB.

-Cqj2)

in E(CCij DfJ over all populations is thus 2N~4(2N)-~iY~ , and is O(N-2), as is that due to mutation at B. A similar argument holds for the element zu2tt)which is also O(N-3. Therefore, The

expected

increment

Wit) = (2VffA(,) + 2al(,)

90, 0) + o(N-2).

Using (7), the steady flux value of wu) is w’ = (8UV/N)((4U

+ 1)-l + (4V + 1)-l, O,O),

(8)

and using (6), the steady flux value of xo) is x = NP-‘w.

(9)

From (3), (8) and (9), x = SUI’[(4U

+ 1)-l + (4I’ + l)-‘I[9

+76C(U+V)+80(U+ X

+ 26C + 54(U + V) + 8C2

V)2+16C2(U+ ?')+48C(U+

V)2+32(U+V)3]-1

11 + 26C + 32(U+ V)+ 8C2+ 24C(U+ ?')+ 16(U+ 4 10 + 4C + 8(U + V) t

V)2 , (10) i

where E(HAHB) = xl, E(CCp,q,D,,) = $x2 and E(CCDTi) = $x3. Steady flux values of E(HAHB) and E(CC DfJ, together with functions of them, are given in Table I for some examples of U, I’ and C. These results, obtained from (6), (9) and (10) rest on the assumption that u, v and c are O(N-1) and N is large. In particular, removal of the restriction of c to small values would be desirable. Equation (5) can be used directly, to give P = (I - DRM)-lw,

(11)

but an explicit form for ji has not been obtained, although presumably could be after much manipulation (cf. Littler, 1973). The approximation seems satisfactory for most purposes however: for example, with N = 100 and U = V = 0.01 the values of E(CC D:j) obtained using (11) are 0.06281, 0.02390, 0.00366,0.00165 and 0.00098 compared with the values obtained from (10) of 0.06273, 0.02369, 0.00345, 0.00144 and 0.00073 for C = 0.1, 1, 10, 25 and 50 respectively, i.e., recombination fractions up to 0.5, where the disequilibrium is trivial.

122

WILLIAM TABLE Computation

-+O

-4

477

4v

16UV

0

u

v--

1.222

10

1.171 1.047 1.002

8.89UV 7.12UV 2.6OUV 0.37uv

0.455 0.380 0.156 0.023

1

1

0

0

t

1

0

0

1

1

1

c

4u

4v

4u+1

4v+1

t

+a,

x 100

xl00 0.01

0.1

0.01

1

0.01

0.1

0.19

1

0.038

0.038

0.286 0.286

0.038

0.8

0.432

0.8

0.1479

8.1633

1.6608

64.000

for a Range of Parameters

19.56UV 18.73UV 16.74UV 16.03UV

1 -+co

I

of Heterozygosities and Disequilibria (see text for definitions)

0.1

--+a

G. HILL

0.1772

xl00

1.043 1.002

0.0773 0.0627 0.0237 0.0035

1.086 1.072

2.850 2.477

1.026 1.001

1.149 0.187

0.322 0.283 0.137 0.023

1.086 1.072 1.026 1.001

0.580 0.504 0.234 0.038

0.322 0.283 0.137 0.023

0 64.185 0.1 64.175

1.003

1 10

1.002 l.ooo

6.003 5.783 4.352

0.094 0.090 0.068 0.020

0 0.1 1 30

0.1708 0.1544 0.1482

0 0.1 1 10

8.865 8.753 8.374

0 0.1 1 10

1.804 1.781 1.704

8.174

1.663

64.116 64.015

1.198 1.154

1.003

1.258

0.436 0.367 0.153

0.023

DISCUSSION

Let us contrast our result (10) with that obtained by Ohta and Kimura (1971) for the multiple site model. Not more than two types can be segregating at either of the two sites at any time, so let the frequency of type Ai at the first site be p

LINKAGE

123

DISEQUILIBRIUM

and of type Bj at the second be 4 and D the disequilibrium between Ai and Bj . Ohta and Kimura found (their Eqs. 7 and substituting our C for their R) E[p(l - p) q(1 - q)] = N,K(ll E[(l - 2p)(l - 2q)D] = 4N,K(l E[D2] = N&(1

+ 26C + 8c2 + 2/N)/(9 + 26C + 8c2) + l/N)/(9 + 26C + 8C2)

(12)

+ l/N)(5 + 2C)/(9 + 26C + 8Ca),

where N, is the effective population size, used also to define C, and K = vS/[4N(log, 2N + l)] where V, is “the number of pairs of nucleotide sites that start segregating simultaneously in the entire population each generation, considering only those pairs of sites that are separated by a distance corresponding to a recombination fraction c.” In Ohta and Kimura’s model, N is the population size when the mutant occurs at the site which is not previously segregating. With only two alleles segregating at a locus in our model, D,, = -D,,

= -D,,

= D,, = D,

say, HAHB = 4Al - P) ~(1 - a)>

4 11

Pd’ij

= 4(1 - 2~)(1 - 37)

and If U and V are both small relative to unity, the expected heterozygosity is small at each locus and we can assume only two alleles are segregating; (IO) gives E[p(l -p)

q(1 - q)] = &cl = 4UV(ll

+ 26C + 8C2)/(9 + 26C + 8C2)

E[(l - 2p)(l - 2q)D] = &vxz= 16UV/(9 + 26C + 8C2)

(13)

E[D2] = $x3 = 4UV(5 + 2C)/(9 + 26C + 8C2). These Eqs. (13) are the same as (12) above of Ohta and Kimura, providing a term of l/N is ignored relative to unity (as it is in the rest of their diffusion analysis) and with 4UV replacing N,K = (NJN) nS/[4(10ge2N + l)]. Both 4UV and N,K are proportional to the number of new mutant alleles (types) occurring simultaneously in any generation at the two loci (sites), but differ by a scalar multiplier. Of these quantities UV seems more tangible than U, ; although U or V are products of mutation rate and population size, and thus never estimated directly without full past knowledge of the population, they can be estimated from the marginal heterozygosities HA and Hs . The quantity ud2, the squared “standard linkage disequilibrium” given by

od2= W’211Eb(l - P>d1 - cdl

124

WILLIAM

G. HILL

was also discussed by Ohta and Kimura (1971). Both in their infinite site model (again ignoring terms of order N-l) and in this infinite allele model with U and V small, csd”= (5 + 2C)/(11 + 26C + 8C2)

(14)

from (12) and (13) and ~~~approaches 5/l 1 if Nc( =C) is small and 1/4Nc if NC is large. A multiple allele equivalent to ud2, say G:*, can be defined by

This reduces to ud2when not more than two alleles are segregating at each locus, and is then given by (14). Thus for very small values of U and V (i.e., population size x mutation rate) the values of u:* are the same as in the infinite sites model. With more mutation E(HAHB) increases faster than E(zx DfJ and values of oz* are smaller (Table I); the change in u:* with an increase of U and V from 0.01 to 0.1, corresponding to heterozygosities increasing from 0.038 to 0.286, is, however, only about 25% (from 0.436 to 0.322) for C = 0, about 10% for C = 1 and is negligible for C = 10. As shown by (10) and Table I, oy is a function of U + V and not of U or V separately. As a consequence of linkage the expectation of the product of the heterozygosities, E(HAHB), exceeds the product of their expectation, i.e., they are correlated. A measure of their association, E(HAHB)/[E(HA) E(H,)] is shown in Table I, but does not exceed unity by more than 22%. For large C, E(HAHB) given by x1 in (10) app roaches 16UV(4U + 1)-l (4I’ + 1)-l = E(H,) E(H,), and is always close to this value for C > 10. The model of Crow and Kimura (1964) of an infinite number of neutral alleles has been criticised in that it predicts too many segregating alleles in large populations and that the observed range of heterozygosity in nature corresponds to a very narrow range, say 0.015 to 0.057, of population size x recombination values (see e.g. Lewontin, 1974). Ohta and Kimura (1973) proposed a new model of electrophoretically detectable alleles, which gave a predicted heterozygosity of 1 - (1 + 8 U)-lj2 rather than 1 - (1 + 4U)-* as in (7). Although heterozygosities approach unity at much higher values of U with this model, the range over which heterozygosity is very sensitive to changes in U is not greatly affected. If the new model were incorporated into the analysis of this paper the total disequilibrium would be reduced at higher values of U and V, but will only change in proportion to the heterozygosities so oi* is unlikely to be substantially affected. There are several alternative ways of describing multiple-allele linkage disequilibria, that used here (2.C D&) is one of the simplest but perhaps too

LINKAGE

DISEQUILIBRIUM

125

great a condensation of the information, certainly if the Dij do not have a mean of zero. In an analysis of chromosomes taken from a population the standard test for disequilibrium would be by chi-square in a two-way contingency table. The expected frequencies in samples of size n are np,qj and the observed frequencies are nfj , so the chi-square statistic is n CC (Dfj/piqi), with degrees of freedom dependent on the number of alleles segregating. With two alleles at each locus this statistic equals nD2/p( 1 - p) q(l - q) with 1 df (Hill and Robertson, 1968). The moment formulation gives the ratio of expectations of numberator and denominator, which approximates the required expectation of the ratio where it has been examined (Ohta and Kimura, 1969; Hill, unpublished), and Littler (1973) d iscusses the conditions where this is likely to occur. The same simplification appears to hold less well with multiple alleles, but analysis of the behaviour of the chi-square statistic requires Monte Carlo simulation. There has been little theoretical study yet of disequilibrium between multiple alleles at loci which have an effect on fitness, yet such disequilibrium can occur as in the HL-A system of man (e.g., Cavalli-Sforza and Bodmer, 1971, Section 5.11). Thus comparisons between the predictions from neutral and selective models can not be made at this stage,

REFERENCES CAVALLI-SFOFXZA, L. L. AND BODMER, W. F. 1971. “The Genetics of Human Populations,” Freeman, San Francisco. EWENS, W. J. 1974. A note on the sampling theory for infinite alleles and infinite sites models, Theor. Pop. Biol. 6, 143-148. HILL, W. G. 1974. Disequilibrium among several linked neutral genes in finite population. II. Variances and covariances of disequilibria, Theor. Pop. Biol. 6, 184-198. HILL, W. G. AND ROBERTSON, A. 1968. Linkage disequilibrium in finite populations, Theor. Appl. Genet. 38, 226-231. KARLIN, S. AND MCGREGOR, J. L. 1967. The number of mutant forms maintained in a population, Proc. Fifth Berkeley Symp. Math. Statist. Prob. 4, 415-438. KIMURA, M. 1969. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations, Genetics 61, 893-903. KIMURA, M. AND CROW, J. F. 1964. The number of alleles that can be maintained in a finite population, Genetics 49, 725-738. LEWONTIN, R. C. 1974. “The Genetic Basis of Evolutionary Change,” Columbia University Press, New York. LITTLER, R. A. 1973. Linkage disequilibrium in two-locus, finite, random mating models without selection or mutation, Theor. Pop. Biol. 4, 259-275. OHTA, T. AND KIMURA, M. 1969. Linkage disequilibrium due to random genetic drift, Genet. Res. 13, 47-55. OHTA, T. AND KIMURA, M. 1971. Linkage disequilibrium between two segregating nucleotide sites under the steady flux of mutations in a finite population, Genetics 68, 571-580.

126

WILLIAM

G. HILL

OHTA, T. AND KIMURA, M. 1973. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population, Genet. Res. 22, 201-204.

Linkage disequilibrium among multiple neutral alleles produced by mutation in finite population.

THEORETICAL POPULATION BIOLOGY 8, 117-126 (1975) Linkage Disequilibrium among Multiple Neutral Alleles Produced by Mutation in Finite Population...
456KB Sizes 0 Downloads 0 Views