Copyright 0 1992 by the Genetics Societyof America

The Evolution of Tandemly Repetitive DNA: Recombination Rules Rosalind M. Harding,* A. J. Boyce? andJ. B. Clegg* *MRC MolecularHaematology Unit, Institute of Molecular Medicine, University of Oxford, JohnRadcliffe Hospital, Headington, Oxford OX3 SOU, England, and ?Department of Biological Anthropology, University of Oxford, Oxford OX2 64S,England Manuscript received April3, 1992 Accepted for publication August 3, 1992

ABSTRACT Variable numbers of tandem repeats (VNTRs), which include hypervariableregions, minisatellites and microsatellites, can be assigned together with satellite DNAs to define a classof noncoding tandemly repetitive DNA (TR-DNA). The evolution of TR-DNA is assumed to be driven by an unbiased recombinational process.A simulation modelof unequal exchange is presented and used to investigate the evolutionary persistence of single TR-DNA lineages. Three different recombination rules are specified to govern the expansion and contraction of a TR-DNA lineage from an initial array of two repeats to, finally, a single repeat allele, which cannot participate in a misalignment and exchange process. In the absence of amplification or selection acting to bias array evolution toward expansion, the probability of attaining a target array size is a function only of the initial number of repeats. Weshow that the proportions oflineages attaining a targeted array size are the same irrespective of recombination rule and rate, demonstrating that our simulation modelis well behaved. The time takento attain a target array size, the persistence of the target array, and the total persistence time of repetitive array structure,are functions of the initial number of repeats, the rate of recombination, and the rules of misalignment preceding recombinational exchange. These relationships are investigated usingour simulation model. While misalignment contraint is probably greatest for satellite DNA it also seems important in accounting for the evolution of V N T R loci including minisatellites. This conclusion is consistent with the observed nonrandom distributions of VNTRs and other TR-DNAs in the human genome.

N

ONCODINGDNA sequenceswith“variable numbers of tandem repeats,” termed VNTRs,

include those loci called hypervariable regions (HVRs), minisatellites and microsatellites. Grouping VNTRs together with satellite DNA creates a class of noncoding tandemly repetitive DNA, hereafter denoted as TR-DNA. Satellite TR-DNA regions may also have variable numbers of tandem repeats, but this is difficult todetermine. Because VNTRs are extensively polymorphic they can be used to address a wide range of problems in forensicscience (JEFFREYS et al. 199 l), the determination of family relationships (JEFFREYS, TURNER and DEBENHAM1991), human et al. 1987), and populagene mapping (NAKAMURA tion genetics (BAIRDet al. 1986; BALAZSet al. 1989; CHAKRABORTY et al. 199 1; DEKA,CHAKRABORTY and FERRELL199 1; FLINTet al. 1989). T h e many applications of VNTR loci rest on assumptions about their evolutionary dynamics.Interestingly,aminisatellite VNTR modelpresented by GRAY and JEFFREYS (1 99 1) seems to sit at odds with general models for TR-DNA, developed before the explosion of interest in VNTRs. Are VNTRs, minisatellites in particular, TR-DNAs? T h e aim qualitatively different from other of the study reported here is to review the biology of TR-DNA, in particular the VNTR loci, and on this foundation, build a simulation model which enables Genetics 132: 847-859 (November, 1992)

VNTRs to be examined within a context of TR-DNA evolution. The evolution of TR-DNA can be viewed within the even broader context of the evolution of repetitive DNA, for which there area large numberof analytical and simulation models. Much attention has been given to the evolutionary forces acting on multigene families (HARDISON1991; HUGHES1991; LOOMISand GILPIN 1986; MAEDA and SMITHIES1986) andoninterspersed repeats such as transposable elements(CHARLESWORTH and LANGLEY1991; MARUYAMA and HARTL1991), SINES and LINES (BUCHETON1990; SINGER 1982; ZUCKERKANDL, LATTER and JURKA 1989). Questionsabouttheevolution of repetitive DNA haveaddressedtwobroad issues. First, how does recombinational exchange resolving as gene conversion,promotetheevolutionarypersistenceand spreading of particular repeat lineages across the genome of a species (NAGYLAKI and PETES1982; NAGYLAKI 1984a,b, 1990; OHTA 1978, 1989; OHTA and DOVER 1983)? Second,what processesofunequal in numbers exchange promoteor control the variation of repeats within lineages (KRUGERand VOGEL1975; PERELSON and BELL 1977; SMITH1976; TAKAHATA 1981)? T h e studies addressing the second question are pertinent to the developmenta TR-DNA of model which can account for VNTR dynamics.

848

R. M. Harding, A. J. Boyce and J. B. Clegg

A major aim of these earlier theoretical studies was to determine the balance of evolutionary forces permitting the accumulation and stabilization of large arrays ofrepeats. The dual parametersof genetic drift and unequal exchange, unbiased toward array expansion by amplification or selection, cannot account for equilibrium distributions of large numbers of repeats (WALSH1987). Amplification is ageneralizedterm for any mutational process that expands DNA length. We mean by unequal exchange a mutational process of recombination either between homologous chromosomes or within chromosomes between sister chromatids. T o counteract the loss of repeats by drift and unequal exchange it is necessary to posit a balancing rate of amplification or positive selection. Unless these evolutionary forces are ongoing, the probability of finding a large tandem array at a potential TR-DNA locus is low. Selection is important in studies applicable to multigene families (TAKAHATA 1981), as is amplification in those of transposable elements and retroposons(CHARLESWORTH and LANGLEY 199 1). Either selection or amplification, or both, have also been incorporated into evolutionary models of TRDNA lineages to account forthe accumulation of large stable tandem arraysof satellite DNA (STEPHAN 1986, 1987, 1989; WALSH1987). An important conclusion of these analyses is that the evolutionary persistence of large arrays critically depends on balance a between a moderate, or at least equivalent, rate of amplification relative to a low rate of unequal exchange. GRAY andJEFFREYS (1991) have alternatively emphasized that sufficient rates of amplification may be low, even when rates of unequal exchange are high. In their model for the evolutionary dynamics of a minisatellite, MS32, amplification is simulated as a duplication of uniquesequencecreatingan initial array of two tandem repeats. The dynamics subsequent to the single amplification event are modeled as a random walk via a path of unbiased array expansion and contraction to a single repeat evolutionary dead end.Rates of unequal exchangeare proportional to allele size and ordersof magnitude higher than the low rates determined for stable, persistent TR-DNA by analytical modeling (WALSH1987). Since there is a moderate probability (in the range of 0.001-0.05) that a long sequence of unequal exchange events is initiated by a single duplication, it seems that a low rate of amplification is sufficient. In this model, short arrays of tandem repeats are generally stable but a tandem-repeat lineage has a reasonable likelihood of amplifying explosively into hypervariability and, as rapidly, decaying. Instead of specifying equilibrium conditions forinfinite persistence time, arate and rule of recombination are definedtogenerateashort phase of expansion to hypervariability within a much longer period forthe existence of at least two repeats.

This model is shown to be compatible with the expansion times of two minisatellite VNTRs in humans from homologous TR-DNA loci in a primate ancestor.The justification for this model is that, while the probability of generating a large and hypervariable tandem array at any particular locus is small, given the vast number of potentialTR-DNA lociin the genome, many arrays should be large. BIOLOGICALBACKGROUND MODEL

TO ATR-DNA

Structure: Minisatellite DNA, HVRs andother VNTRs, including simple-sequence or microsatellite 1989; WEBERand MAY 1989)share DNA (TAUTZ commonstructuralfeatures with highly repeated DNA such as telomeric (BLACKBURN 1990; 199 1) and satellite (WILLARD and WAYE 1987) sequences. Their similarities suggest that these DNAs are subject to the same generalevolutionary processes of recombinational mutation. Within this class of polymorphic noncoding TR-DNA loci, however, there is a range of repeatunit size andnumber. The oligonucleotide repeat unit sequences of several to tens of base pairs UEFFREYS, WILSONand THEIN 1985; WoNG et al. 1987) in minisatellite loci are shorter than thetypical motifs of satellite DNA (SINGER 1982), but larger than the di- and trinucleotide repeats of microsatellite loci, such as CA repeats (TAUTZ 1989). Also, the average copy numbers per minisatellite allele in the tens to hundreds are intermediatebetween the thousandfold copies of satellite sequences and the few copies comprizing amicrosatellite allele. This structuralvariation among TR-DNAs points to differences in the recombinational rules acting on them. Rules andrates of recombination: Recombination of minisatellite alleles probably occurs during, or after replication, as DNA slippage or unequal sister chromatid exchange (USCE). The evidence against a role for homologous recombination within minisatellite arraysderivesfromthenonrandom association of minisatellite alleles on different haplotypes. This has been observed for the insulin 5’HVR (ROTWEIN et al. 1986), the VNTRlocus, YNZ22 (WOLFF,NAKAMURA and WHITE 1988), the minisatellite (MS 1) locus, D S7 1 (WOLFFet al. 1989), the HRAS1 3’VNTR (KASPERCZYK, DIMARTINO and KRONTIRIS1990) anda-globin 3‘HVR alleles (MARTINSON1991). On the other hand, new evidence from internally mapped minisatellite VNTR alleles suggests that localized recombination between nonidentical homologous alleles does occur, generating both sequenceconversion and new length variation (JEFFREYS et al. 1991). However, most new length mutant alleles probably result from misalignment or slippage between sister chromatids. There is no evidence that minisatellite mutation is biased towards the production of either expansion or

Evolution of Tandemly Repetitive DNA

contraction mutants by intrastrand deletion or ampliet al. 1989). Also, length changes fication (ARMOUR corresponding to amean change in allele repeat copy number of 5%, and of up to 200 repeat units, described for the minisatellite MSl locus (JEFFREYS et al. 1988), indicate that recombinational mutation is probably a consequence of USCE. An assumption of unbiased USCEisalso applicable to the evolutionary dynamics of satellite DNA. DURFYand WILLARD (1989) inferred mutation by unequal exchange from patterns of sequence variation in alpha satellite DNA, a family of centromeric TR-DNAs, sharing a fundamental monomer repeat unit of about 171 bp. Notably, the majority of misalignments were found to be of the orderof a few copies of the higher order repeat unit and they concludedthatlarge misalignments wereuncommon,occurring at least nomorefrequently than observed by JEFFREYS et al. (1988) and ARMOUR et al. (1 989) forminisatellite DNA. A prevalence of small misalignments, measured in numbers of repeats, can beproduced by mechanisms other than unbiased USCE-mechanisms which may be biased toward either contraction or amplification such as slipped strand mispairing (LEVISONand GUTMAN 1987) at the replication fork or during DNA strand repair.However, while biased replication or repair slippage is a likely amplification mechanism for generating the small motifs of microsatellite DNA, satellite repeats are probably too large to be subject to this process. Rates of mutation at TR-DNA loci vary. They are very high for some minisatellite VNTRs, in the range of 10-4-1O-' events per allele (JEFFREYS et al. 1988). Rates of mutation also appear to vary for differently sized alleles at the same locus, with largerVNTR arrays being less stable than smaller arrays (JARMAN and WELLS1989). Also, the most variable and unstable minisatellite loci do seem to bethose with the greatest average array length measured in numbers of repeats (JARMAN et al. 1986; JARMAN and WELLS 1989; JEFFREYS, WILSONand THEIN 1985). Function: Allelic length variation for most noncoding TR-DNA is assumed to be neutral, although this may be contained within functional limits (ZUCKERKANDL, LATTERand JURKA 1989). An architectural role in providing binding sites for protein scaffolding during replication or transcription has been suggested for TR-DNA such as satellite DNA (WELCHet al. 1989), andthis may apply to VNTR arrays. An alternative, and much debated, function for minisatellite VNTRs as a recombination signal, was first proposed b y JEFFREYS, WILSON and THEIN (1985). However, a sequence-specific signaling functionshared by all VNTRs seems unlikely given the findings of VERGNAUD (1 989),who reported that random short oligonucleotide motifs can detect polymorphic VNTR loci,

849

et al. (1987) who found that the insertand of VASSART free, wild-type M13 bacteriophage DNA detects hypervariable TR-DNA loci in the DNA of humans and other animals. Reviews by DOVER(1989) andJARMAN and WELLS (1989) concluded that minisatellite VNTRs are most probably found in regions of high homologous recombination as products, rather than as enhancers of recombination. Nonetheless, detection of a novel minisatellite-specificDNA-binding protein (COLLICK and JEFFREYS 1990) and the new evidence for exchange between homologous minisatellite alleles (JEFFREYS et al. 199 1)revitalizes the hypothesis that minisatellite VNTRs have a functional role in recombination. This raises the question of selective maintenance of array size. However, if selection is drivingthe evolution of minisatellite VNTRs they would persist, along with genes and other functional sequences, in different species lineages afterdivergence from a common ancestor. VNTR evolutionary persistence: GRAY andJEFFREYS (199 1) investigated several species of primates for the presence of loci homologous with the human MS32 minisatellite locus D1S8 and the MS1 minisatellite locus DlS7.These loci have large unstable tandem arrays in humans, but cross-hybridize to loci with relatively short arraysof several diverged repeats in great apes and Old World monkeys. MS32, but not MSl, also cross-hybridizes to repeats in New World monkeys. Both minisatellites failed to cross-hybridize with prosimian DNA. The prevalence of short arrays indicates that this was its likely ancestral state and that the MS1 and MS32 minisatellites began toexpand after the hominid lineage diverged from great apes (GRAY and JEFFREYS 1991). Interestingly, at the MS1 homologous locus in the Colobus monkey, anOld World monkey species, there has been a presumeably independentarray expansion (GRAYand JEFFREYS 1991). Other VNTRs, for which there is no indication of a functional role in recombination, show long evolutionary persistence. TR-DNA sequences homologous to the human {"globin IVSl HVR have been found in chimpanzee,goat,horse and mouse, butnot in chicken or duck (FLINT,TAYLOR and CLEGG1988). This suggests that this TR-DNA locus probably originated in the mammalian ancestor after divergence from the avian lineage. Further evidence for a long persistence time derives from the evident homology between the human {-globin IVSl HVR and the tandem-repeat locus found in the IVSl (first intron) of the pseudo-{"globin gene. Also, four 14-bp repeats in IVSl are identical between horse { and pseudo-r and a subsequent four repeatsare similar. The duplication of the {-globin gene is presumed to predate the mammalian radiation (FLINT, TAYLOR and CLEGG1988). The homology of the {-globin IVSlrepeat motif

850

R. M. Harding, A. J. Boyce and J. B. Clegg

between human and goat, andalso between the functional gene and the pseudogene in humans implies that, although array expansions may be independent and recent, a precursor arrayof some several repeats probably existed in a common mammalian ancestor. Whatever the functional significance of TR-DNA arrays, their variable persistence times and array sizes are not consistent with evolution by strong directional o r optimizing selection. TR-DNA chromosomal location: While TR-DNA loci are found on all human chromosomes, minisatellite VNTRs in humans appear to have a biased chromosomal distribution toward telomeres as against interstitial locations (NAKAMURA et al. 1988; ROYLEet al. 1988). However, there are other VNTR families which are widely distributed, such as the M13 family (CHRISTMANN, LAGODA and ZANG 1991) and that of the simple repeat (CAC)5 (NURNBERG et al. 1989). T h e patterns of “minisatellite” chromosomal distribution in mice (KELLYet al. 1989) andcattle (GEORGES et al. 1991) also show random genomic distributions. These distributions are comparable with those for microsatellite polymorphisms, the class of TR-DNA most greatly dispersed across the human genome. A third pattern of distribution is seen for satellite DNA which is mainly found in centromeric heterochromatin (WILLARD 1990). A SIMULATION MODEL FOR TR-DNA

Mutation-driven evolution in lineages of TR-DNA is investigated here using a Monte Carlo simulation model coded in FORTRAN and run on an IBM PS/ 2 386 computer. The evolutionary process is simulated as a random walk. This model is not formulated numerically, with a predefined probability transition matrix and Markov recurrence relations, but rather iterates through a decision-making process using random numbers. This is because it is easier to specify recombination rules at a fundamental level than to define the probability of mutation between each and every pair of possible allelic states, for each different set of recombination rules. Although amodel can only represent an abstracted andsimplified biological reality, theimportantadvantage of this MonteCarlo simulation model compared with a numerical simulation or analytical model is that its limitations are due less to assumptions introduced for tractability of implementationthan to those introduced because the complex mechanics of recombinational mutation operating at the molecular level are not fully understood. While primarily used in this study to investigate evolution by unbiased recombinational exchange in single chromosomal lineages, the simulation model is not limited to this purpose. It is a more general tool for theexploratory study of the evolution of TR-DNA and incorporates other features including the flexibil-

ity to bias recombination by amplification or intrastrand deletion, and the capacity to follow evolutionary changein populations. The evolutionary dynamics of TR-DNA subject to population processes will be presented elsewhere. Our simulation model has the constraint that the number of misaligned repeats can never be greater than the length of the progenitor array, n repeats, minus a minimal length necessary for alignment, set at 1 repeat. As the initial arrays are set at 2 repeats and duplication of unique sequence to create repetitive structure is not incorporated in the simulation runs for this study, recombinational mutation ceases in lineages reducedtoa single repeat. The three recombination rules which delimit possible misalignments are also further regulated by a probability function for recombinational exchangegiven an allele array of n repeats and a misalignment of k repeats. The first rule limits misalignment, k , to a single repeat and is referred to as “single-repeat misalignment,” abbreviated as SR-M. The probability of recombinational exchange equals 1 for misalignments of 1, and 0 for misalignments 2 I k 5 n - 1, for all allele arrays, n I2. The second rule limits maximal misalignment to a constant number of repeats, the “target,” t, set at the beginning of a simulation, and is referred to as “target-maximum misalignment” (TMM). The probabilities of exchange are a step function, specifying a uniform probability for misalignments 1 ~ k ~ t , 1o< rk < n - 1 i f n < t , a n d O f o r misalignments k > t , for all allele arrays, n 2 2. The third rule allows maximal misalignment of up to the total allele length, n , in array size minus 1 repeat and is referred toas “allele-maximum misalignment” (AMM). This rule was used by GRAY and JEFFREYS (1 99 1) in their study of the evolutionary dynamics of the minisatellite MS32. They derived from MS32 data a specific probability function for recombinational exchange, i, dependent on the misalignment k where 15kSn-1: Pr(i)

(n - i)3.4 =

kFn-1

(n - k)3.4 k= 1

This MS32 probability function makes small misalignments more likely than large ones. T o enable comparison between the results of this study and those of GRAYand JEFFREYS (1991),the MS32 probability function is used here with the AM-M rule. For each recombination rule, tandem-repeat lineages are followed as they expand from initial arrays of 2repeatstowarda preset targetarray size and contract until a single repeat allele is generated. Persistence time statistics were calculated on all lineages attaining a target of 20 in a set of 1,500 simulation runs, on all lineages attaining a target of 50 in a set

85 1

Evolution of Tandemly Repetitive DNA

o f 3,000 runs, on all lineages attaining a target of 200 in 15,000 runs and on all lineages attaining a target of 500 in 30,000 runs. The total numbers of simulation runs were chosen to ensureapproximately 50 target-attaining lineages. These targets representtypical average array sizes for VNTR loci (JEFFREYS et al. 1988). For the target-attaining lineages, the maximum array size attained is recorded. Lag, gain, dwell and decay phases of persistence time are as described by GRAYand JEFFREYS (1991). The lag is the first phase of expansion to the target, from the start to the last generation at which the array is equal to two repeats, and the gain is the second phase of rapidly increasing array size. The duration of the dwell phase is the number of generations between when an array first equals a targetsize and thelast time in its evolutionary history it is ever equally as large. Contraction of an array is described by the decay from the target to the first time it returns to a size of two repeats and the end phase between when an array of two repeats is finally reduced to a single repeat andbecomes extinct. T h e end phase is the difference between the decay and extinction phases described by GRAY andJEFFREYS (1 991). We refertothe “gain-dwell-decay” period as a “dynamic” phase for brevity in the context of describing the simulations. The simulation model controls the frequency of USCE deterministically, but uses random numbers, first,togeneratea misalignment of some variable number of repeats between the replicated sister chromatid arrays, and secondly, to choose the outcome of a sister chromatid exchange with equal probability as an expansion or contraction. The size of the array is updated at each recombination event. The allelic recombination rate, p , is modelled as a linear function of allele array size by accumulating the rate set per repeat, X, across repeats and generations until it is equal to 1. Although it is possible for a tandem array to become very large, generating recombination a rate per allele greater than 1, recombination events are constrained to single crossovers per lineage per gen(1 987) eration to maintain consistency with WALSH’S analytical TR-DNA model. The simulations run for this study are based on the high recombination rate per repeat of X = 9 X because the random number generators are called in every generation, and arun executes faster if, for the samenumber of recombination events, they occur over fewer generations. Since persistence times in generations are proportional to mutation rate they can easily be rescaled for lower mutation rates. Multiplying by 1,000 rescales the persistence times to time spans comparable with those computed from the minisatellite V N T R model (GRAY andJEFFREYS 1991). An extra set of simulations was run with the AM-M rule for arecombination rate per repeattwice as high

100:100

1OO:lO

0

20

40

60

80

100

120

numbers of recombination events, z expected

0

observed

I:K;LIRK 1 .“I’crsistence time computed in numbers of recotnhiIlation events. z. for three ratios of USCE to intrastr;und deletion. Solid sh;~dcdbars: expected value from analytical T K - D N A model o f ’ ~ ’ A L S H( I 987). Stippled bars: average over 100 simulation runs using the SK-M rule.

( i e . , X = 1.8 X per generation) to check that persistence times were half as long. Some results based on multiple lineages are initially presented to demonstrate that the persistence time estimates from our simulation model are consistent with those of the analytical TR-DNA model developed by WALSH(1987). In these multiple lineage simulations, as with the single lineage simulations, there are no population dynamics due to stochastic fluctuation in mutationrate or reproductive success (genetic (1 987) analytical TR-DNA model, drift). In WALSH’S persistence times are finite, not because of drift, but because recombinational exchange is biased toward contraction by intrastrand deletion, requiring these same conditions in our simulation model. Persistence time is computed for three ratios of a rate of unequal to rates of intraexchangeperrepeat, X = 9 x t =9 X stranddeletionperrepeat, t = 9 x and E = 9 X lo-’. The analytical TR-DNA model gives expected persistence times as numbers of recombination events. We too compute persistence times as numbers of recombination events to enable comparison, but more generally report persistence times in numbers of generations to permit comparison with GRAYand JEFFREYS’(1991) minisatellite VNTR model. RESULTS

Comparison of our simulation model with WALSH’s (1987) analytical TR-DNA model: Figure 1 demonstrates the concordance between persistence times observed for our simulation model and those expected from WALSH’S(1987) analytical TR-DNA model. Persistence times are measured in recombination events for lineages evolving subject to SR-M and exchange biased by different rates of intrastrand deletion. Persistence time decreases with increased rates of intrastrand deletion. (For all subsequent results, there is no intrastrand deletion and USCE is simulated

R. M. Harding, A. J. Boyce and J. B. Clegg

852

200

20

50

500

target array size

FIGURE2.-Lineages which attain targets of 20,50. 200and 500 a s percentages of total simulation attempts and compared for re-

combination rule.

as an unbiased process.) Array expansion: Figure 2 shows that percentages of lineages attaining preset targets do not vary substantially between different recombination rules, in accordance with the expectations of an unbiased ran1975). Apdom walk model (KARLIN and TAYLOR proximately 4-5% of tandem-repeat lineages starting from 2 repeats attain preset targetsof at least 20,and anorder of magnitude fewer (0.4-0.5%) lineages attain targets of at least 200 repeats. T h e same proportionality is evident for targets of 50 and 500 repeats with 1-2% of lineages attaining 50 and 0.10.2% attaining 500 repeats. The numbers of simulationattempts taken to attain each targetandthe percentagesthat the target-attaining lineages comprise out of the total attempts aregiven in Table 1. Persistence time with different recombination rates: Mean persistence time in numbers of recombination events, and means, standard deviations and medians of lag, gain, dwell and end phases in generations are reported in Table l . Note that the distributions of phase duration are positively skewed. For the AM" rule, statistics are given for the set recombination rate doubled (AM-M') as well as for the set Comparing persistence times T(Z)for rate (AM"*). the set and doubledrates indicates that they are slightly reduced by decreasing the mutation rate, consistent with expected rounding error. However, the effect is minimal and the phase estimates are comparable tothose reported by GRAYandJEFFREYS (1 991). Assuming AM", the average persistence time T ( z ) for lineages attaining 500 repeats isless than 300 recombination (USCE) events. Since this lineage expansion occurs with a probability of approximately 0.2% (Table l), for one such lineage expansion there have also been 500 single-repeat duplications. This is consistent with the findings of WALSH (1987) that rates per repeatof amplification must be greater than, o r at least equal to, rates per repeat of USCE. Persistence time with different recombination rules: With a recombination rate of X = 9 X IO-' the AM-M rule generates a persistence time of approxi-

mately 1,470 generations and there is not much variation between targets(Figure 3). A trade-off that allows the numberof recombination events to a target to be less if the number of repeats per misalignment step is greater, equalizes the expansion and contraction times fordifferentlytargeted lineages. Consequently, persistence time does not appear to greatly differ between targets (Figure 3A). However, dwell times at atargetedarray size are proportionately smaller for larger targeted TR-DNAlineages (Figure 3B). This is because for bigger arrays, larger misalignments in a single recombination event are possible, increasing the probability of contraction to a single repeat. T h e brief duration of the dwell phase in the evolutionary history of a TR-DNAlineage attaining a pictured in target of 200 repeats, assuming AM",is Figure 3C. The dwell phase is the number of generations between the first and last times that the array is larger than 200 repeats. Persistence times assuming TM-M vary greatly between targets. Whereas for the targets, 20 and 200, TR-DNA persistence times are similar to each other, for the targets 50 and 500,respectively, a very much longer, and a very much shorter, average persistence time has resulted (Table 1). Figure 4, comparing A and B, shows that the considerablevariation is mainly in the dwell phase at the target. It can be explained as a consequence of the equal probability weighting of TM-M. Allowing greater misalignment withuniform probabilities for misalignment length, TR-DNA array persistence is highly unpredictable, and may be short, as for the set of lineages subject to constraint at 500 repeats, but may be very long, as occurred in the set of lineages constrained at 200 repeats. The duration of the dwell phase in the evolutionary history for one of the short TR-DNA lineages attaining the target of 200 repeats by TM-M is shown in Figure 4c. Figure 5A shows that for SR-M, unlike the AM-M and TM-M rules, lineages attaining larger target sizes generally take longer to do so. This is because there is no trade-off for the number of recombination events to atarget with misalignments constrained to one repeat. Persistence times are one to two orders of magnitude longer for SR-M compared with AM-M and TM-M. In comparison with AM", dwelltime assuming maximal constraint is longest for the largest target (Figure 5B). Figure 5C shows a typical evolutionary history of array expansion to a target of 200 repeats with SR-M. As duplicate arraysare vulnerable to contraction to onerepeat with consequent extinction of the TRDNA lineage, their evolutionary life span is short unless they expand. Of lineages that do expand to targets, the greatest partof their evolutionary history, regardless of recombinationrule, is dynamic with

853

Evolution of Tandemly Repetitive DNA

" "

???'p! omao -m=?'

-

z$?z

-

???'p! om*o m e"

- e m -

"

-4-

v

v

1": " Y ? " : & m o a *mas m--m m m m m

m

m m - m

m m

v

v

-

"V

'?z

z?Zz

E

-

g &

m m

-m-r-

K .B

.-

2

" "

"

I

4

z

n c4 CI

311

w i n

..'P

L H Z *

U)

I

3

!i




4

maoo

a

m m

e 4 a

m v e o

5

e

mm--. "m-0-

E

"

h

-

"1 m m m a - m "

-L.P-P-

CI

z

I

P - v m m

- *

-a A & :m a m *a

0

P-

P-P-mm

o m 0 - 2 2 m m -o'!

c??

a b m o P- U',-

D"

m

pm

P-o*m mm-m.

.-C

d

-z!v

"m-m"

i

-

D "

0

P-mmm m m m m

h h h h

P-m-P-

" h -

" "

rdmit-:

2220

*.ye+ a***

zp2*

D M

H

g

y?zz

v

c??::

-

--

Y

m

3

m

.-i

o??oq?o???c?

o * * m * o - - - m m" 0- m * P - m m m a - m -ulP-mw-*mma 0 0 m m

m

2

W

q?

$'x m

-23 gi

E

U

&

>

$2.

2 3-3 Obe '0-

g-;

-ox E $m

0

11: p

b

-$5: %&b m n

?c?? ?

'Z

m w m 0 ?p m a "'4 Pm

-

'p!

2; ."

55 2: m m

F .c WE0 m

.S $Fg t-

&-

- E O

.5 a 2g "0 aJ d

Pie % C S

e,.;

:& - ??K 2 %2 % g x " E - m

P- P-

,'P-" W 1

c

e

.-

2 X - z

v). Z l I Z G

:m

P-

a,eEu i m c

G 2 -8 ,gy gE..&2%.$S ;5n i b E * i m W O Z S m ,

,X

v1

g,,,,

C

2.8

% m c

o mmcn "-W m m -m ??oi': ? ' p ! ? ? -P- m Z A 0 m m 2 0 , o * m w , m m a m Z - m o w ***mm o * ~ m m.g g m m c 02-0-'J m m P- ae m ._ 0; .g m m* m m m m ..m - m e

2 2 m *m

y1

8

"??c9 a m r b - m a %-a" "V

3

=?c,-z c m m P-m

P-mmm m m m *

zzoz .-i

c

2

m m m m P-mmm

" "

3

??":c? -mP-m -mP-m mm-0"

-

E

~ - md u E m 0 a m m c 3

v

*

4

zs

m

."

==-g

& * (m D rm- r- m a m m m*-P--

m * m m

5 Z??ZZ " h -

m * m m

z3-2

h

iij

n

% G

v

??Z?

C Z y 1

2 2 4 2 i u u

M

m

zzs .$ 'g + .SE'

W

0-

" "

$ $ W

s-cu

"

m 0 -W

a m m o -m-m-

534 c .B o F :.

zc $ 2 8 m 5,; E -L k z g m-

Q,.

m a*

"

m P - m 0 m a w m "'? v -. v v

2

" h

h

mo*P-

m m v m

v-?z

.-8

.el

g

? ? T A

c???, - m w m o-m99m . 09 m m - 0 5 '4

U',

" h -

0

e

m

v

? B

?

2

???F m P - * a - m m m --m.m. G P-m *

???'p!

m m

Q,

9

P-**m -mm-.

'p!

m m

00-

c9

h

.Bs zs

-2-m

2

-

v

m*

m m

ma-m-

*. m

m * * m &&d& t.*mP9T,m . ? *m

i a J

c?v?zz & v :: 2 E S m m

m a m m

d

U C

T,

e

v

*

omoom

;?% W 8s

BEY b M s s 50- .aE ;z

m-

'p!

'p!

-

o m

corn--

a"

c o m a 0 P-P-vm C? m

- 0

? ? 'J

P-*Q,o P - - m m

m a o o

e

-

3=- m-

e v

m m 2: m m -m*

22-22

h

'p!??, m - m 0

a!":y,

4 " -

.c

r

P-

& k & & "mm m m a Or-002 * m m 0 x

2 2 2e .i 5E m$ 'p!

2"

*-?m-

E % n

S?D: m*-m

- - h-- h

?g2

2 g 5 $.;$ & v ?'p! 2 52

P-000-

m-

-4 3E %$

v

*mo&

a*mm -ma* m* 00

3

.-8

"

m

In -.

- & m m m m a m - m

gY

2%- m

4

PW

-g

m m

" "

"

h

'p!

m a

m a - a m m m

m a - a m m m

m a - a m m m

-

": ": c? moo00 0-c?

$22

m m

- m

0 0 0 0 0

m -*-mm -P- m o m r - a m o*m 0 0 m m

gg

bmgk o z u w

="+$E

.r mUx s m

ca,

:$a's E yi

5.2'J

$zF: E F E L 3 j k Z L % % T E !%% EE 9 L% % E E E $ : 8 Z Z E $ : & E E E & &bz E S & .c o m . 5

c

O,c1!0

tri

$ d d t - m $ < < k m

$ < < + m

t-

F

k

$ d e+ m " y u e

t-

2 2 &

~

R. M. Harding, A. J. Boyce and J. B. Clegg

854

500

500

f

II

200

t 50

II

20 "

1000 0

500

1500

200025000 20000 15000 0 10000 5000

generations

generations

0 dynamic 0 end

lag

0 dynamic 0 end

lag

,B 500

f

I

0

1000

500

1500

2000

0

25000 20000 5000 15000 10000

generations

w expand

0 dwell

generations

w expand

0 contract

,c

611

0 dwell

contract

.c

51 11

41 21

31

71

81

71 1 61 1151

21 41

31

generations

generations

FIGURE3.-Persistence times in generations averaged over sets o f lineages attaining each target. 1, with recombinational mutation subject to AM". (A) Length of dynamic (gain dwell decay) phase highlighted against shaded lag and end phases. (B) Length of dwell phase at target array size highlighted against shaded expansion (lag + gain) and contraction (decay end) phases. (C) Array size evolution of a typical lineage in the set attaining a targetof 200 repeats.

FIGURE4.-Persistence times in generations averaged over sets of lineages attaining each target, f, with recombinational mutation subject to TM-M. (A) Length of dynamic (gain + dwell + decay) phase highlighted against shaded lag and end phases. (B) Length of dwell phase at target array size highlighted against shaded expansion (lag + gain) and contraction (decay + end) phases. ( C ) Array size evolution of a typical lineage in the set attaining a target of 200 repeats.

varying numbers of three or more repeats (Figure 6A). The duration of evolutionary time that alineage can dwell at a large target size is,however, dependent on recombination rule,and is consistently greatest with SR-M. Dwell times may, however, also dominate with TM-M (Figure 6B).

cause of the flexibility allowed to explore alternative molecular constraints on misalignment. However, simulations provide results less amenable to critical inspection and replication than analytical models, and so we have built upon afoundation established by others. As well as reporting results to show concordance with preceding studies, we have also presented some results regardingdistributions of array size, although they could be more elegantly demonstrated by an analytical approach. Since we have reached the same conclusions by simulation as can be done analytically, credence is given to our conclusions on the consequences of recombination rule for persistence time. The latterremain unconfirmed by analytical

+

+

+

DISCUSSION

In this study we have presented a simulation model for the evolution of TR-DNA making variables of both recombination rule and rate. The aim of the study was to investigate different recombination rules and we have used a computer simulation model be-

;

Evolution of Tandemly Repetitive DNA

500

,

I

I

t 2 0 0 7 , 50

;

;

:

;

;

AM.M

I

SR-M TM.M AM.M

0%

generations

1w

)

t 2 0 0 p 50

20

I

-

\

700000 1400000 2100000 2800000 3500000

0

500

855

lag

w lag

u

, 700000

expand

0 dwell

100%

0

dynamic

80%

100%

0 end

E

si;:

1400000 2100000 2800000 3500000

AM.M

generations

II

80%

B

si;:

b 0

60%

persistence time

1

0 dynamic end

40%20%

contract

0%

I

20%

60 40%

Yo

persistence time

I

~~

w expand

0 dwell

contract

FIGURE6.-Proportional persistence times comparing recombination rules. (A) Length of dynamic (gain + dwell + decay) phase highlighted against shaded lag and end phases. (B) Length of dwell phase at target array size highlighted against shaded expansion (lag + gain) and contraction (decay + end) phases.

1

101 201 301 401

501

601 701 801 901

generations

FIGURE5.-Persistence times in

generations averaged over sets

of lineages attaining each target, 1, with recombinational mutation subject to SR-M. (A) Length of dynamic (gain dwell decay)

+

+

phase highlighted against shaded lag and end phases. (B) Length of dwell phase at target array size highlighted against shaded expansion (lag gain) and contraction (decay end) phases. (C) Array size evolution of a typical lineage in the set attaining a targetof 200 repeats.

+

+

modeling. For the record, comparableresults of both our TRDNA simulation model andGRAYand JEFFREYS' (1 991)simulation model for the minisatellite MS32, show concordance with WALSH'S(1987) analytical JEFFREYS' (199 1) assump TR-DNA model. GRAY and tion of a duplication event to initiate VNTR evolution is consistent with the conclusions of WALSH'S(1 987) analytical TR-DNA model that amplification, possibly as duplication of unique sequence, is a critical process in TR-DNA evolution. Despite GRAY andJEFFREYS' (1991) conclusion that under their stochastic model of unbiased USCE it is not necessary to invoke additional processes such as saltatory amplification in ac-

cordance with WALSH(1987) and others, it is necessary, and they did. The important conclusion to be JEFFREYS' (199 1) model is that made from GRAY and using generations as units of evolutionary time, instead of numbers of recombination events, clearly demonstrates that the evolutionary persistence of a random walk process can trace back to distant ancestors in the phylogenetic history of an extant species. The evolutionary persistence of accumulated TRDNA, even though greater in durationthanthe evolutionary span of a species, may nonetheless be to bemodelled by transient,anddoesnothave an equilibrium balance between recombination and either amplification or selection. Distributions of array size suggest a test for biased array expansion: Varying the rules on misalignment constraint altersonly the persistence times over which TR-DNA arrays accumulate and decay and not the array sizes that TR-DNA lineages may attain. That the probability of an array exceeding some sizeis independent of the misalignment step is a result demonstrable by the theory of stochastic processes using 1975). Let the Wald's Identity (KARLINand TAYLOR change of array size in time, x ( t ) , be modeled by a nonsymmetrical random walk beginning at x(0) with anabsorbingboundary at one.Inthe absence of

856

R. M. Harding, A. J. Boyce and Clegg J. B.

selection or amplification, we expect the probabilities of expansion and contraction to be equal, P = 1/2, which allows us to represent x ( t ) by its mean value, x(0). If at time 0, x(0) = 2, the initial array size, we can calculate the probability that the array at some future time will attain the target before contracting to 1 as: [x(O) - l]/[target - 13. The probabilities of attaining the targets, 20, 50, 200 and 500 are 5.2%, 2%, 0.5% and 0.2%, respectively. Both our simulation probabilities and those calculated by GRAYand JEFFREYS (199 1)are biased downwards by rounding error. Nonetheless, an unbiased stochastic model, however formulated, gives an expected ratio of small to large TR-DNA arrays in the genome. This ratio has the form of 2,000 2-repeat TR-DNA loci : 100 20repeat VNTRs : 10 200-repeat VNTRs : 1 2,000-repeat VNTR locus. PERELSONand BELL(1977) also used an unbiased random-walk model to calculate the probabilities of attaining array sizes greater than an initial size. They assumed an absorbing boundary at zero rather than one andformulatedthe probability of finding an expanded arrayas a functionof time. This probability was shown to be maximized soon after beginning the random walk. At this time point the probability of finding an array of at least 10, 50, 100 or 500 repeats is 3.9%, 0.74%, 0.37% or 0.074%,respectively. The observed proportionality for unbiased stochastic models could be used as a null expectation in tests for amplification or selection bias. For instance, an experimentalapproachthatenabled observation of array expansion of the same small initial TR-DNA locus in multiple clonal lineages of yeastor some other model organism, would enable an estimate of the proportion of lineages thatexpand. An alternative test, which seems reasonable inviewof the nondependence of array expansion probabibties on recombination rule, would be to survey TR-DNA loci across a genome. A much higher proportionality of large and intermediate VNTR arraysto duplicateTR-DNA loci than expected for a stochastic model would indicate the importance of amplification or selection bias in tandem array expansion. Although array amplification has been modelled by the duplication of unique motifs in this and other (WALSH1987; GRAYand JEFFREYS 1991) studies, it is expectedthat thearrays themselves are subject to duplication and perhaps also to multiplication (possibly by rolling-circle replication). These processes may permit amplification of a large amount of TR-DNA by sporadic and widely dispersed expansion events in the genome. Low rates of saltatory amplification could inject the same amount of TR-DNA as a moderate rate of ongoing duplication. A model that incorporates an ongoing rate of duplicationrequires two boundaries, the first reflecting to amplify single re-

peats and the second, either reflecting or absorbing, to preventarraysfromexpanding infinitely. This barrier can be accounted for by selective constraint on array expansion. But, before we build upon this kind of equilibrium model, itwould be appropriate to confirm by observation the hypothesis that amplification is a continuous process of duplication occurring at rates similar to rates of recombination. Molecular biology has not yet provided any such evidence. Persistence times are a function of recombination rule: While array size is not dependent on recombination rule, evolutionary persistence time is. Regardlessof thearray size, our simulations suggest that tandem arrayshaving long persistence times are under much greater misalignment constraint than tandem arrays with short persistence times. AM-M allows misalignments to be very large when tandem arrays are large. If large arrays can greatly misalign, they can expand to largesizes in short times, but they can also go extinct veryeasily, and large array sizes are transient. These evolutionarydynamics were observed despitemodification of the AM-M rule by a probability function which decreases recombinational exchange for increasing misalignment length. This was the rule used by GRAYand JEFFREYS (1991) to representthetransience of hypervariable minisatellites using the example of MS32. They showed that if a small array of tandem repeats at the MS32 locus began to expand after divergence of the Homo lineage from the other great apes, the 700,000 generationtime span since is long enoughfora large array of 200 repeats or more to have been generated, assuming a rate of exchange per repeat of h = 9 x However, since a 200 repeat array mutating by AM-Mis not very stable, if this rule is operating, MS32 arrays are probably now contracting. T h e TM-M rule uses a step function for describing the probabilities of different misalignment lengths. Only arrays smaller than the preset target are vulnerable to one-step extinction. As with AM-M short arrays are unstable and may either go extinct or expand quickly. However, arrays that expand to sizes greater than the targetare protected fromone-step extinction and may remain large for longer dwell phases compared with the AM-M rule. There has been time since the divergence of humans and great apes for a minisatellite locus to have expanded by TM-M, and if so still to be in a dwell phase. TM-M works as an alternative and simpler rule to model the transience of hypervariable minisatellites. The occurrence of misalignment constraint would be consistent with observations of asymmetry in the location of deletion events in minisatellite arrays. JEFFREYS, NEUMANN and WILSON (1990) observed that the internal repeat structures of MS32 minisatellite alleles indicated greater stability of the 5’ ends and a gradient of increasing

EvolutionRepetitive of Tandemly variability toward the 3' ends. However, more data on the size and nature of mutations in minisatellite alleles are needed toevaluate whether a step function is a reasonable approximation of misalignment distributions. With the SR-M rule, misalignment is maximally constrained. Consequently TR-DNA expansionto, and contraction from, a large size takes a long time. T h e gain plus dwell time consequently far exceeds the expected persistence time for the MS32 minisatellite, unless the mutation rate per repeat is three orders of magnitude greater than that suggested by GRAY and JEFFREYS (1 99 1). Assuming recombination is constrained to SR-M, an average generation span of 10 years, and a rate of exchange per repeat of X = 9 X persistence times would be of the order of 50 million years. This result indicates recombination constrained to SR-Mmay account for the evolutionary persistence of the TR-DNA loci in introns of the {globin gene and pseudogene. With moderate rates of recombination but maximal constraint on misalignment, a TR-DNA locus may not only have a long persistence time, but also become a polymorphic VNTR. The persistence of satellite DNA (WALSH1987) is also consistent with great constraint on misalignment. Our TR-DNA model predictsthat satellite arrays would mutate in steps of single or few repeats. Also, the USCE rate, while not ashigh as thatat minisatellite VNTRs, may be orders of magnitude higher than nucleotide substitution rates. These rates may indeed be high enough to generate VNTRpolymorphism in satellite arrays. Our model applied to satellite DNA contrasts with that of WALSH(1987) which predicts a very low rate of unequalexchange consistent with observations that satellite DNA is found in regions of low homologous recombination. Both our model and that of WALSH(1 987)assumes amplification as a duplication process. Arguably, general a TR-DNA model would better account for satellite DNA if the amplification rule was also a variable and could alternatively occur as a saltatory burst generating a large initial array. However, given that large satellite arrays exist, choosing the most appropriate rule for their current dynamics may be resolved as follows. If satellite arrays are experiencing very little recombination they should show relatively uniform sequence divergence between any two member repeats of the array. O n the other hand, if satellite DNA is evolving by a moderate rate of unequal exchange, assuming SR-M, then repeats close to each other should show greater similarity than repeats that are far apart. No doubt, molecular data pertaining to these expectations will soon be available to test between them. Consequences of chromosomal location for TRDNA evolution: There aredifferent structural levels

DNA

857

at which to look for the key factors in the instability of a short TR-DNA locus and its potential to expand quickly as a VNTR. At a local level, the sequence of the repeat motif may be important in conferring instability and increasing the rate of recombinational mutation (MITANI,TAKAHASHI and KOMINAMI1990). Alternatively, the general chromosomal location of a TR-DNA locus may account for differences in misalignment constraint. Chromosomal location has been shown to be critical for the activation of a recombinational hot spot in the fission yeast Schizosaccharomyces pombe (PONTICELLI and SMITH1992). The importance of misalignment constraints forthe evolution of satellite and minisatellite DNAs, as shown by simulation modelling, andthe differentialpatterns of genomic distributionsof satellite DNA at centromeres and minisatellite DNA near telomeres, suggest that misalignment constraint varies with chromosomal location. In fact, chromosomal neighbourhood may be at least as important as motif sequence for the expansion of some TR-DNA loci as hypervariable minisatellite VNTRs, while other TR-DNA is much more stable. Assuming a stochastic model for TR-DNA evolution with an absorbing boundary at arrays reduced to one repeat, implies that there must be large numbers of potential TR-DNA sites in the genome, and that those which expand as VNTRs will be unrelated by sequence similarity. However, if there aresource TRDNA sequences with long persistence times due to SR-M constraint, many descendant VNTRs may show sequence relationships. The motif similarities shared by families of minisatellite VNTRs suggest their common descent from anold and persisting ancestral TRDNA locus. Arelated VNTRoccurring within an intron and constrained to SR-M by its location would be a candidate for the ancestral locus. The dispersal of the minisatellite-related VNTRs across chromosomes suggests the action of DNA-mediated transposition within and between chromosomes, particularly near chromosome telomeres (WONG, ROYLE and JEFFREYS 1990). A strategy for the detection of multilocus minisatellites for DNA fingerprinting may be to start with a TR-DNA probe which is known to have had a long persistence time in the genome of the species of interest, rather than a minisatellite probe from a different species. Conclusions: Our simulation model places minisatellite VNTR evolution within the general context of TR-DNA evolution. This has been achieved by incorporating recombination rule as an equally important variable as the rate of recombinational exchange. By subjecting misalignment to varying degrees of constraint the evolution of all classes of TR-DNA can be accountedfor. It is suggested that satellite DNA evolves under greatest misalignment constraint but at

858

R. M. Harding, A. J. Boyce and J. B. Clegg

a moderate rate with most mutation steps consisting of single repeats. Hypervariable minisatellite VNTRs may be characterized by their release from misalignment constraint, yet even so, the region of exchange and mutability within thearray is probably small. Unconstrained USCE as modelled by the AM-M rule may be less likely than a constrained TM-M rule. We emphasize theconcordance of results achieved by both analytical and simulation modelling of TR-DNA evolution. The abstraction of rule as well as rate suggests new ways of understanding thechromosomal distribution of, and relationships between, TR-DNA loci in the genome. We thank J. FLINT, J. J. MARTINSON and T. E. A. PETO for discussion throughout the study and our anonymous reviewers for their criticisms and suggestions.

LITERATURE CITED ARMOUR, J. A. L., I . PATEL,S. L. THEIN,M. F. FEY and A. J. JEFFREYS, 1989a Analysis ofsomatic mutations at human minisatellite loci in tumours and cell lines. Genomics 4: 328334. UAIRD, M., I. BALAZS,A. GIUSTI,L. MIYAZAKI,L. NICHOLAS,K. WEXLER, E. KANTER, J. GLASSBERG, F. ALLEN,P. RUBINSTEIN 1986 Allele frequencydistribution oftwo and I.. SUSSMAN, highly polymorphic DNA sequences in three ethnic groups and its application to the determination of paternity. Am. J. Hum. Genet. 39: 489-50 1. BAIAZS,I., M. BAIRD,M. CLYNEand E. MEADE, 1989Human population genetic studies of five hypervariable DNA loci. Am. J. Hum. Genet. 44: 182-190. E. H., 1990 Telomeres and their synthesis. Science BLACKBURN, 249: 489-490. E. H.,1991Structureandfunction of telomeres. BLACKBURN, Nature 350: 569-573. BUCHETON,A,,1990 1transposable elementsand I-R hybrid dysgenesis in Drosophila. Trends Genet. 6: 16-2 1. CHAKRABORTY,M. R.,FORNAGE, R. GUECUEN and E. BOERWINKLE, 1991 Populationgeneticsofhypervariable loci: analysis of PCR based VNTR polymorphism within a population,pp. 127-143 in DNA Fingerprinting: Approaches and Applications, edited by T . BURKE,G. DOLF,A. J. JEFFREYS and R. WOLFF. Birkhauser Verlag Basel, Switzerland. CHARLESWORTH, B., and C. H.LANGLEY,1991Populationgenetics of transposable elements in Drosophila, pp. 150-1 76 in Evolution at the Molecular Level, edited by R. K. SELANDER, A. G. CLARK and T. S. WHITTAM. Sinauer Associates, Sunderland, Mass. (:HRISTMANN, A,, P. J. L. LAGODAandK.D. ZANG,1991Nonradioactive in situ hybridization pattern of the M 13 minisatellite sequences on human metaphase chromosomes. Hum. Genet. 86: 487-490. (:OLLICK, A,,and A. J.JEFFREYS,1990 Detectionofa novel minisatellite-specific DNA-binding protein. Nucleic Acids Res. 18: 625-629. I ~ K AR., , R. CHAKRABORTY and R. E. FERRELL, 1991 A population genetic study of six VNTR loci in three ethnically defined populations. Genomics 11: 83-92. DOVER,G. A,, 1989 DNA fingerprints: victims or perpetrators of DNA turnover? Nature 342: 347-348. S. ,J.,andH. F. WILLARD,1989Patterns of intra-and I~URFY interarray sequence variation in alpha satellite from the human X chro~noso~ne: evidence for short range homogenimtion of

tandemly repeated DNA sequences. Genomics 5 8 10-82 1. and J. B. CLEGC, 1988 Structure and FLINT,J.. A.M. TAYLOR evolution of the horse zeta globin locus. J. Mol. Biol. 199: 427437. B. CLEGG, FLINT, J., A. J. BOYCE,J. J. MARTINSONandJ. 1989 Population bottleneck in Polynesia revealed by minisatellites. Hum. Genet. 83: 257-263. GEORGES, M.,A. GUNAWARDANA, D. W. THREADGILL, M . LATHROP, I. OLSAKER, A. MISHRA,L. L. SARGEANT, A. SCHOEBERLEIN,M. R. STEELE,C. TERRY, D. S. THREADGILL, X . ZHAO, 1991 Characterization T . HOLM, R.F R I E SJ. ~E.~WOMACK, ~ of a set of variable number of tandem repeat markers conserved in Bovidae. Genomics 11: 24-32. GRAY,1. C., and A. J. JEFFREYS,1991 Evolutionary transience of hypervariable minisatellites in man and the primates. Proc. R. Soc. Lond. B 243: 241-253. HARDISON,R.C.,1991 Evolutionofglobin genes families, pp. 272-289 in Evolution at the Molecular Level, edited by R. K. SELANDER, A. G. CLARK andT. S. WHITTAM.Sinauer Associates, Sunderland, Mass. HUGHES,A. L.,1991 Evolutionary originand diversification of the mam~nalian CDl antigen genes. Mol. Biol. Evol. 8: 185201. JARMAN, A. P., and R. A. WELLS,1989 Hypervariable minisatellites: recombinators or innocent bystanders? Trends Gen. 5: 367-37 1. JARMAN, A. P., R. D. NICHOLLS, D. J. WEATHERALL, J. B. CLEGG and D. R. HIGGS, 1986 Molecular characterisation ofa hypervariable region downstream of the human a-globin gene cluster. EMBO J. 5: 1857-1863. JEFFREYS, A . J., R. NEUMANN and V. WILSON,1990 Repeat unit sequence variation in minisatellites: a novel source of DNA polymorphism for studying allelic variation and mutation by single molecule analysis. Cell 60: 473-485. JEFFREYS, A. J., M. TURNER andP.DERENHAM,1991 T h e efficiency of multilocus DNA fingerprint probes for individualization and establishment of family relationships, determined from extensive casework. Am. J. Hum. Genet. 48: 824-840. JEFFREYS, A. J., V. WILSONand S. L. THEIN, 1985 Hypervariable ‘minisatellite’ regions in human DNA. Nature 314: 67-73. JEFFREYS,A. J., N. J. ROYLE, V. WILSON and Z. WONG, 1988Spontaneousmutationratesto new length alleles at tandem-repetitive hypervariable loci in human DNA. Nature 332: 278-281. JEFFREYS, A. J., A. MACLEOD,K. TAMAKI, D. L. NEIL and D. G. MONCKTON, 1991 Minisatellite repeat coding as a digital approach to DNA typing. Nature 354: 204-209. KARLIN, S . , and H. M. TAYLOR, 1975 A First Course in Stochastic Processes, Ed. 2. Academic Press, San Diego. KASPERCZYK, A,, N . A. DIMARTINO and T . G. KRONTIRIS, 1990 Minisatellite allelediversification: the originof rare alleles at the HRASI locus. Am. J. Hum. Genet. 47: 854-859. KELLY, R., G. BULFIELD, A. COLLICK, M. GIBBSand A. J. JEFFREYS, 1989 Characterization of a highly unstable mouse minisatellite locus: evidence for somatic mutation during early development. Genomics 5: 844-856. KRUGER,J., and F. VOGEL,1975 Population genetics of unequal crossing over. J. Mol. Evol. 4: 201-247. G., and G. A. GUTMAN, 1987 Slipped-strand mispairLEVINSON, ing:a major tnechanism for DNA sequence evolution. Mol. Bid. Evol. 4: 203-22 1. LOOMIS,W. F., and M.E. GILPIN,1986 Multigene families and vestigial sequences. Proc. Natl. Acad. Sci. USA 33: 2143-2147. MAEDA,N., and 0 . SMITHIES, 1986 T h e evolution of multigene Families: human haptoglobin genes. Annu. Rev. Genet. 20: 81108.

MARTINSON,J.J., 1991 Genetic variation in South Pacific IslandV I S , 1’h.D. Thesis, University of Oxford, Oxford, U.K.

EvolutionRepetitive of Tandemly MARUYAMA, K., and D. L. HARTL, 1991 Evolution of the transposable element mariner in Drosophila species. Genetics 128: 319-329. MITANI,K., Y. TAKAHASHI and R. KOMINAMI, 1990 A GGCAGG motif in minisatellites affecting their germlinestability. J. Biol. Chem. 265: 15203-15210. NAGYLAKI, T., 1984a Evolution of multigene families under interchromosomal gene conversion. Proc. Natl. Acad. Sci. USA 81: 3796-3800. NAGYLAKI, T., 1984b The evolution of multigene families under intrachromosomal gene conversion. Genetics1 0 6 529-548. NAGYLAKI, T., 1990 Gene conversion, linkage, and the evolution of repeatedgenes dispersed among multiple chromosomes. Genetics 126: 261-276. NAGYLAKI,T.,and T . D. PETES, 1982Intrachromosomalgene conversion andthemaintenanceofsequencehomogeneity among repeated genes. Genetics 100: 315-337. NAKAMURA, Y.,M. LEPPERT,P. O'CONNELL,R. WOLFF,T . HOLM, M. CULVER,C . MARTIN,E. FUJIMOTO,M. HOFF, E. KUMLIN andR. WHITE, 1987 Variable number of tandemrepeat (VNTR)markersforhumangenemapping. Science 235: 1616-1622. NAKAMURA, Y . , M. CARLSON, K. KRAPCHO,M. KANAMORI and R. WHITE,1988 New approach for isolation of VNTR markers. Am. J. Hum. Genet. 43: 854-859. NURNBERG, P., L. ROEWER, H. NEITZEL, K. SPERLING, A. POPERL, J. HUNDRIESER, P&HE, H. C . EPPELEN,H. ZISCHLER and J. T . EPPLEN,1989 DNA fingerprinting with theoligonucleotide probe (CAC)5/(GTG)n: somatic stability and germline mutations. Hum. Genet. 84: 75-78. O H T A , T., 1978 Theoreticalpopulation geneticsof repeated genes forming a multigene family. Genetics 88: 845-861. O H T A , T., 1989 Time for spreading of compensatory mutations under gene duplication. Genetics 123: 579-584. OHTA, T., and G. A. DOVER, 1983 Population genetics of multigene families that aredispersed into two or more chromosomes. Proc. Natl. Acad. Sci. USA 80: 4079-4083. PERELSON, A. S., and G. I. BELL, 1977 Mathematicalmodels for the evolution of multigene families by unequal crossing over. Nature 265: 304-310. PONTICELLI, A. S., and G. R. SMITH, 1992 Chromosomal context dependence of a eukaryotic recombinational hot spot. Proc. Natl. Acad. Sci. USA 8 9 227-231. D. K. DIDIERand J. M. CHIRGWIN, KOTWEIN,P., S. YOKOYAMA, 1986Genetic an;llysis of thehypervariable regionflanking the human insulin gene. Am. J. Hum. Genet. 3 9 291-299. ROYLE,N. J., R. E. CLARKSON,Z. WONG and A. J. JEFFREYS, 1988 Clustering of hypervariable minisatellites in the proterminal regions of human autosomes. Genomics 3: 352-360. SINGER,M. F., 1982 Highly repeated sequences in mammalian genomes. Internat. Rev. Cytol. 76: 63-1 12. SMITH,G. P., 1976 Evolutionof repeated DNA sequences by

DNA

a59

unequal crossover. Science 191: 528-535. STEPHAN, W., 1986 Recombination and the evolution of satellite DNA. Genet. Res. 47: 167-174. STEPHAN,W.,1987Quantitative variation and chromosomal location of satellite DNAs. Genet. Res. 50: 41-52. STEPHAN,W.,1989Tandem-repetitivenoncoding DNA: forms and forces. Mol. Biol. Evol. 6: 198-21 2. TAKAHATA, N., 1981A mathematical study on the distribution of the number of repeated genes per chromosome. Genet. Res. 38: 97-102. TAUTZ, D., 1989 Hypervariability of simple sequences as ageneral source for polymorphic DNA markers. Nucleic Acids Res. 17: 6463-6472. VASSART,G . , M.GEORGES, R. MONSIEUR,H. BROCAS,A. S. LEQUARRE, D. CHRISTOPHE,1987 A sequence in M I 3 phage detects hypervariableminisatellites in human and animal DNA. Science 235: 683-684. VERGNAUD, G., 1989Polymers of random short oligonucleotides detect polymorphic loci in the human genome. Nucleic Acids Res. 17: 7623-7630. WALSH,J. B., 1987 Persistence of tandem arrays: implications for satellite and simple-sequence DNAs. Genetics 115: 553-567. WEBER,J. I-., and P. E. MAY, 1989Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am. J. Hum. Genet. 44: 388-396. WELCH,H. M., J. K. DARBY, A. J. PILL, C. M. KO, and B. CARRITT, 1989 Transposition, amplification, and divergence in the orsequence igin of the DNFZS loci, apolymorphicrepetitive family on chromosomes 1 and 3. Genomics 5: 423-430. WILLARD, H. F., 1990 Centromeres of mammalian chromosomes. Trends Genet. 6: 410-416. WILLARD,H. F., andJ. S. WAYE,1987Hierarchialorder in chromosome-specific human alpha satellite DNA. Trends Genet. 3: 192- 198. WOLFF, R. K., Y . NAKAMURA andR.WHITE,1988 Molecular characterization of a spontaneously generated new allele at a VNTR locus: no exchange of flanking DNA sequence. Genomics 3: 347-35 l . WOLFF, R. K., R. PLAETKE,A. J.JEFFREYSandR.WHITE, 1989Unequal crossingoverbetweenhomologous chromosomes is nor the major mechanism involved in the generation of new alleles at VNTR loci. Genomics 5: 382-384. WONG, 2.. N. J. ROYLEand A. J. JEFFREYS, 1990 A novel human of DNA from DNA polymorphism resultingfromtransfer chromosome 6 to chromosome 16. Genomics 7: 222-234. WONG, 2.. V. WILSON,I. PATEL,S. POVEYand A. J. JEFFREYS, 1987 Characrerization of a panel of highly variabk minisateilites cloned from human DNA. Annu. Hum. Genet. 51: 269288. ZUCKERKANDL, E., G. LATTERand J. J U R K A , 1989 Maintenance of function without selection: Alu sequences as cheap genes. J. Mol. Evol. 29: 504-5 12. Communicating editor: W-H. LI

The evolution of tandemly repetitive DNA: recombination rules.

Variable numbers of tandem repeats (VNTRs), which include hypervariable regions, minisatellites and microsatellites, can be assigned together with sat...
5MB Sizes 0 Downloads 0 Views