Hum Genet (1992) 88 : 267-272

9 Springer-Verlag 1992

Heterozygote deficiency, population substructure and their implications in DNA fingerprinting Ranajit Chakraborty and Li Jin Center for Demographic and Population Genetics, The University of Texas Graduate School of Biomedical Sciences, P. O. Box 20334, Houston, TX 77225, USA Received July 2, 1991 / Revised October 17, 1991

Summary. Substructured populations exhibit an overall deficiency of heterozygosity whose proportional magnitude depends on the nature of substructuring, i.e., the number of subpopulations (s), their time of divergence (t) from the ancestral population, and the rate of gene flow amongst them (m). Since apparent heterozygote deficiency could be caused by many factors other than population substructuring, one must examine the nature of substructuring that could produce the observed extent of heterozygote deficiency, in order to infer the substructuring from an observed heterozygote deficiency. Using the equivalence of proportional heterozygote deficiency and the coefficient of gene differentiation (GsT), we can generate isolines of GST as functions of s, t (in units of 2Ne generations, Ne being the effective population size) and m. Analytical results suggest that large GsT values cannot be reached by substructuring alone, unless the number of subpopulations are large and they remain isolated over a long period of time. Application of the theory to population data on six variable number of tandem repeats (VNTR) loci in US Caucasians and US Blacks demonstrates that the observed heterozygote deficiencies at these loci cannot be explained by substructuring within these populations alone. This is so because such large values of Gsr (3%-10%) would require an absence of gene exchange between the subpopulations and a divergence time from each other of at least 25000 years ago, neither of which is compatible with the demography and ethnohistory of US Caucasians and Blacks. In contrast, the inability to detect extreme-sized alleles and/or incomplete resolution of nearly similarsized alleles following Southern gel electrophoresis could easily explain the observed heterozygote deficiencies. The implications of these results are discussed in the context of the forensic use of DNA-typing data, and justify the employment of population genetic principles in forensic genetics.

Introduction Population substructuring is an important concept in studying the genetic make-up of a population. A population is said to be substructured when it consists of com-

Offprint requeststo: R. Chakraborty

ponents (subpopulations) amongst which gene flow is restricted. There could be a complete absence of gene flow between subpopulations because of social, geographic, ecological, or even biological barriers. In such cases, evolutionary changes between subpopulations occur under complete isolation (no gene flow). New mutations arising in certain subpopulations remain "private" (Neel 1973), and genetic differentiation between subpopulations occurs at a speed governed by mutation rate and effective subpopulation sizes. However, partial gene flow between subpopulations retards the process of genetic differentiation between subpopulations (Nei and Feldman 1972; Chakraborty and Nei 1974; Li 1976; Slatkin 1985). When substructuring exists and the subpopulations have been separated sufficiently long to permit differentiation, genotypic proportions in the total population cannot be predicted merely from the frequencies of different alleles in the total population, because the genotypic proportions in a subdivided population are also functions of variances and covariances of allele frequencies over all subpopulations (Nei 1965; Li 1969). When each subpopulation is known without ambiguity, the genetic structure of the total population can be specified in its full detail, even though the number of parameters (e.g., the number of subpopulations and frequencies of each allele in every subpopulation) needed is generally large. With some summary measures of genetic variation, however, fewer numbers of parameters may determine equivalent classes of the genetic structure of a subdivided population. Nei (1973) proposed an analysis that measures the extent of population substructuring by the coefficient of gene diversity (GsT). The evaluation of GST makes no assumption regarding the process/mechanism of evolution, nor does it assume that the subpopulations are replicates of each other. Nei (1977) also showed that GST can be expressed as the proportional deficiency of heterozygosity in the total population when the expected heterozygosity is predicted from the allele frequencies employing Hardy-Weinberg expectations (HWE). This single-parameter representation of population substructure also holds for any number of alleles, irrespective of the selective differential of allele functions (Nei 1975, 1987). The equivalence of Gsr with the proportional deficiency of heterozygosity has important implications in

268 s t u d i e s o t h e r t h a n e v o l u t i o n a r y genetics. A p o s s i b l e imp a c t c o u l d b e o n f o r e n s i c a p p l i c a t i o n s of g e n e t i c m a r k e r s for i d e n t i f i c a t i o n p u r p o s e s . R e c e n t l y , criticisms h a v e b e e n r a i s e d a g a i n s t t h e use o f v a r i a b l e n u m b e r of tand e m r e p e a t s ( V N T R ) p o l y m o r p h i s m s for f o r e n s i c i d e n tification o f i n d i v i d u a l s ; specifically, it has b e e n a r g u e d t h a t t h e h e t e r o z y g o t e d e f i c i e n c i e s o b s e r v e d at V N T R loci a r e c a u s e d b y p o p u l a t i o n s u b s t r u c t u r i n g , which in t u r n w o u l d i n v a l i d a t e t h e p r e d i c t i o n of g e n o t y p e f r e q u e n cies f r o m allele f r e q u e n c i e s ( L a n d e r 1989, 1991; C o h e n 1990). H o w e v e r , e v e n t h o u g h h e t e r o z y g o t e d e f i c i e n c y in t h e t o t a l p o p u l a t i o n is a n i m p o r t a n t c o n s e q u e n c e o f p o p ulation substructuring, the converse may not necessarily b e true. T h e o r e t i c a l a n d e m p i r i c a l studies h a v e r e p e a t e d ly d e m o n s t r a t e d t h a t a p p a r e n t h e t e r o z y g o t e d e f i c i e n c y c a n b e c a u s e d b y f a c t o r s o t h e r t h a n p o p u l a t i o n substruct u r i n g ( M i l k m a n a n d B e a t t y 1970; G a r t a n d N a m m 1984; S k i b i n s k i et al. 1983; D e v l i n et al. 1990). T h e p u r p o s e o f this p a p e r is to d e m o n s t r a t e t h a t this r e v e r s e logic ( i . e . , h e t e r o z y g o t e d e f i c i e n c y i m p l i e s p o p u l a t i o n s u b s t r u c t u r i n g ) is n o t n e c e s s a r i l y correct. W e show this, first, b y e s t a b l i s h i n g t h e n a t u r e o f s u b s t r u c t u r i n g (in terms of the number of subpopulations and their extent o f d i f f e r e n t i a t i o n ) t h a t w o u l d p r o d u c e a g i v e n level o f p r o p o r t i o n a l h e t e r o z y g o t e deficiency. S e c o n d l y , using d a t a f r o m six V N T R loci s t u d i e d in U S C a u c a s i a n s a n d B l a c k s ( B u d o w l e et al. 1991a), w e s h o w t h a t t h e obs e r v e d h e t e r o z y g o t e d e f i c i e n c i e s n o t e d at t h e s e loci a r e t o o high to b e e x p l a i n e d b y p o p u l a t i o n s u b s t r u c t u r i n g a l o n e , b e c a u s e t h e n a t u r e o f s u b s t r u c t u r i n g within U S C a u c a s i a n s a n d B l a c k s w o u l d b e t o o e x t r e m e to b e c o m patible with the history and demography of these human p o p u l a t i o n s . I n c o n t r a s t , o t h e r studies s h o w t h a t two s i m p l e t e c h n i c a l p h e n o m e n a a s s o c i a t e d with the S o u t h ern blot technique of detecting VNTR polymorphisms ( i n c o m p l e t e r e s o l u t i o n o f similar site alleles [Devlin et al. 1990], a n d t h e p r e s e n c e o f n o n - d e t e c t a b l e alleles [ C h a k r a b o r t y et al. 1992]) c a n easily e x p l a i n a n d p r e d i c t the o b s e r v e d levels o f a p p a r e n t deficiencies of h e t e r o z y g o s ity in such surveys.

Theory Consider a population that is subdivided into s subpopulations. Let xik be the frequency of the ith allele in the kth subpopulation at a

locus. If H~ represents the proportion of heterozygote individuals in the kth subpopulation, the observed proportion of heterozygote individuals in the total population (H0) is given by 11o = Ek Nk Hk/N,

Eq. 1

where Nk is the size of the kth subpopulation and N = ZkNk is the size of the total population. Let .~ = ZkNkx~JN represent the frequency of the ith allele in the total population. Then, under HWE, the expected proportion of heterozygote individuals (HE) in the total population is given by HE = 1 -- Y~k.~,i2.

Eq. 2

Nei's coefficient of gene diversity (Gsr; Nei 1973) is mathematically equivalent to GST = ( H E - Ho)/HE,

Eq. 3

the proportional heterozygote deficiency in the total population (Nei 1977). It is important to note here that this formulation is in

terms of population parameters (number of subpopulations, their sizes and allele frequencies in each subpopulation), even though the identity holds for the estimators of relevant parameters when they are replaced with their respective unbiased estimators (Nei and Chesser 1983; Chakraborty and Danker-Hopfe 1991). This identity also holds irrespective of the number of alleles and selection differentials of alleles (Nei 1986). The implication of this identity is that although substructuring may not be the primary cause of an observed heterozygote deficiency (in relation to HWE expectation) in a population, should it be so, then we can examine what equivalent classes of substructuring (in terms of number of subpopulations and their evolutionary divergence) can produce the observed level of deficiency. To determine the maximum expected effect of substructuring, under the infinite allele model of neutral mutations, let us consider that the s subpopulations (each of which is at Hardy-Weinberg equilibrium) diverged from their common ancestral population t generations ago, and that they have since then remained in isolation from each other (no gene flow). Furthermore, let us assume that the effective sizes of the subpopulations are the same (Ne) as that of the ancestral population, and that each subpopulation is at mutation-drift balance. Under these assumption, Nei (1975) showed that GST =

(1 - 1/S)Jo(=)(1 - e 2vt) 1 - J0(~) + (1 - 1/S)Jo(~)(1

-

Eq. 4 e-2Vt)

'

where J0(=) is the gene identity within each subpopulation (the probability of identity of two randomly chosen gens from a subpopulation) and v is the mutation rate per generation. The equilibrium assumption within subpopulations further implies that the within-subpopulation gene diversity (Hs) can be written as Hs = 1 - Jo(~) = 4Nev/(1 + 4Nev).

This leads to J0(=) = 1 - Hs and 2 v = Hs/[2N~(1 - Hs)], which substituted in Eq. 4, yield (1 - 1/s) (1 - Hs) [1 - e m,/(1- Hs)}t/2Ne] Gsr = Hs + (1 - 1/s) (1 - Hs) [1 - e -ms/(1 Hs)}t/2Nq" -

Eq. 5

Therefore, for a given level of gene diversity within subpopulations, the coefficient of gene diversity (GsT) is specified by the number of subpopulations (s) and their time of divergence (t/2Ne) in units of 2Ne generations. Alternatively, Eq. 5 can be written in the form

L

m

j

loge 1

s - 1" 1 - H s " 1 - GsTJ'

Eq. 6

from which isolines of Gsr can be drawn for various combinations of s and t/2Ne for a given level of Hs. Equation 6 is further instructive because it also specifies that, for given values of s and Hs, not all values of Gsr are permissible. This is so, because the argument of the logarithmic term of Eq. 6 must be positive, i.e., Gsr 1 - Gsr

s - 1 s

1 - Hs

9

Eq. 7

This inequality is reasonable, since for a given number of subpopulations (each of which is at mutation-drift balance) as t---~~, each of them will accumulate new mutations, eventually reaching a state when they share no allele with each other. In this case, the gene diversity in the total population becomes 1 - (1 - Hs)/S, so that the asymptotic value of Gsr becomes G sr ~

(s - 1) (1 - / 4 s ) s-l+Hs

Eq. 8

Since GST is a monotonically increasing function of t, the inequality in Eq. 7 is always satisfied. Equations 5 and 8 further indicate that, other parameters remaining the same. Gsr should generally become smaller as Hs becomes large. In other words, loci with larger mutation rates will

269 0.03

0.3

I

I

H s = 95 %

,, ,

0.02 t/2N e

Hs = 9 0 %

0.2

"...

t/2N e

0.01

"~176

0.1

~149176

0 a

0 5

0

10

15

20

25

b

30

i

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

o

5

10

15

0,3

25

30

0.3

\

H s = 75 % 0,2

0.2 ~"~..., t/2N e

t/2N e 0.1

Hs = 50%

\~.....

0.1

~ 1 4 9 1 7. 6 1 7 6

9 ~ 1 7 69 . . . . .

0

i

0

I

5

10

15

20

25

i

i

i

30

0 ! 0

Fig. l a - d . Isolines of GST for various combinations of n u m b e r of subpopulations (s) and their time of divergence (t/2Ne) u n d e r the model of complete isolation. GST = 0.1% ( ), 1.0% ( . . . . ), 5.0% ( . . . . ), 10.0% ( . . . . )

,. 5

10

15

generally exhibit higher within-population gene diversity (Hs), but the coefficient of gene diversity among subpopulations for such loci will be smaller compared with the loci with a reduced mutation rate. If the observed proportional heterozygote deficiency is to be ascribed to population substructure alone, this observation must be carefully considered. Figure 1 shows isolines of Gsr for four selected levels of withinsubpopulation gene diversity (Hs). If we assume that the subpopulations meet H W E , then these chosen values of Hs also represent the within-subpopulation heterozygosity. The isolines drawn are for four levels of Gsr, but the figures clearly show that, for Hs >90%, Gsr of the level of 5 % - 1 0 % cannot be reached with s

Heterozygote deficiency, population substructure and their implications in DNA fingerprinting.

Substructured populations exhibit an overall deficiency of heterozygosity whose proportional magnitude depends on the nature of substructuring, i.e., ...
707KB Sizes 0 Downloads 0 Views