Ann. Hum. Genet., Lond. (19781, 42, 239

239

Printed in Great Britain.

Ancestral inference 11. The founders of Tristan da Cunha

BY E. A. THOMPSON

King’s College, Cambridge CB2 1BT I. INTRODUCTION

The theory developed by Cannings, Thompson & Skolnick (1978) and the algorithms implementing this theory (Thompson, 1977a, b ) allow probabilities of a joint set of phenotypes for members of a large pedigree of arbitrary complexity to be computed. The probability that the total set of phenotypes is as observed, conditional on assumed genotypes for some or all of the founder members of the pedigree is the likelihood for that set of original founder genotypes. As described in paper I (preceding) this observation allows us to use the methods of Cannings et al. (1978) to find the likelihood for the genotype combinations of the original ancestors of a small population. We shall apply the methods to derive and investigate likelihood functions for the genotypes of the original founders of the population of Tristan da Cunha. We shall consider three single-locus autosomal characteristics which both exemplify the methods and are of interest in themselves. One is a three-genotype co-dominant blood-group system (MN),the second is a rare autosoma1recessive (retinitis pigmentosa), and the third is the problem discussed in the preceding paper of the extinction probabilities of various specified combinations of original founder genes.

11. THE POPULATION OF TRISTAN DA CUNHA

The early history of the geographically and genetically isolated population of Tristan da Cunha has been described in detail by Roberts (1971).Although discovered in 1506, the island did not receive a permanent population until 1815, when a garrison was placed on the island in the somewhat obscure belief that it would help to forestall any attempt to rescue Napoleon from St Helena some 1350 miles to the north. The first birth on the island occurred in the following year, and, although the garrison was withdrawn in 1817, one man (a Scot), his Dutch wife from South Africa, their two children and two other members of the garrison were granted permission to remain as settlers. Although the two latter men did not remain long, by 1827 there were, in addition to the Scot and his rapidly increasing family, two English ex-navy men and three other men. At this point it was realized that a permanent settlement requires a more balanced sex ratio, and several women were obtained from St Helena. Those having present descendants axe two sisters and the daughter of one of them, probably half-negro, all almost certainly having some negro ancestry. The three latter men mentioned above have no current descendants, but two Americans and a Dutchman who also joined the settlement in its early period do so. There were three immigrants later in the century, a Cape-coloured woman and two Italian men, who have substantial numbers of current descendants, and these bring the number of founders to 13. Also, early this century, there were five immigrants (two Irish sisters and three English men) but the genetic contributions of these individuals are not yet inextricably entangled with those of the earlier founders; their 16

I I G E 42

E.A. THOMPSON families can be separated off from the total pedigree, and hence they are irrelevant to the ancestral inference problem. Since, for the simplest genetic characteristics, one of the two Italians also falls into this category, and because of the relationships between the three St Helena women, at any single autosomal locus there are only 23 distinct genes we wish to make inferences about - 23 genes of very varied origin. The population grew, with some ups and downs (Roberts, 1971), until by 1961 there were 268 islanders. These were evacuated when the volcano erupted. Almost all came to Britain, and 243 of these - the exceptions being mainly those under 3 years old - were sampled for a variety of blood groups and other genetic and medical characteristics. Further sampling was done in 1971-2, but the most complete data are those for 1961 and we shall restrict attention to the population sampled at that date. The data have been checked for consistency with the pedigree, and show remarkable accuracy of typing and pedigree information (Thompson, 1976). For making inferences about ancestral types, even more important than the small number of founders is the structure of the genealogy. On Tristan da Cunha almost any pair of individuals are related to each other in several different ways. One original couple (the parents of the two 8t Helena sisters) are ancestors of 242 of the 243 individuals sampled in 1961, one individual of 235, another couple of 233, and three more individuals of over 170. Thus this population perhaps provides us with a sufficiently complex structure for data on current individuals to provide substantial information on ancestral gene types. On the other hand, if the structure of a pedigree is too complex, the number of individuals on whom we must consider joint functions in order to work back through the pedigree to the ancestors (see previous paper) may become too large for computational feasibility. III. THE TRISTAN DA CUNHA PEDIGREE

The total genealogy of the Tristan da Cunha population consists of 700 individuals and is extremely complex. However, for the problem of ancestral inference some simplifications can be made. As discussed in the preceding paper, when genotypes are observable, sampled individuals both of whose parents were also sampled are irrelevant. So also are unsampled individuals without sampled descendants, while individuals whose genotypes are kmown may be duplicated in order to break loops in the pedigree. Due to two bottle-necks in the Tristan da Cunha population - a large emigration in the 1850s and a boating disaster in 1885 (Roberts, 1971) -the ancestral pedigree is relatively small. This is the pedigree of sampled individuals having at least one unsarnpled parent (the senior sampled members) and all their direct ancestors, and is the relevant pedigree for ancestral inference when genotype observations are available (see previous paper). The marriage node graph of this pedigree is given in Fig. 1. The individuals named (i.e. the arcs identified by the index numbers assigned to individuals by Roberts and co-workers) are the unsampled ancestors. Sampled individuals are denoted by ‘L’; ‘3L’ for example, denotes three sampled offspring, while descendants of these offspring are omitted. The network of Fig. I is a gene flow network, genes from the original founders arriving via the arcs representing intermediate ancestors finally at the sampled individuals. Conversely it may be considered as an information network, the information entering through the individuals alenoted L, and allowing us finally to make inferences about the original ancestors. Although the genealogy has only 31 relevant marriage nodes, it is very complex. The reader may like to note

Ancestral inference

241

Sink R

974

860

\

L*

999

0

-86!

LX

LJ L

L+

Fig. 1 . Marriage node graph of the major section of the ancestral pedigree of the 1961 sampled Tristan da Cunha population. L denotes a sampled individual; L* and Lf are specific sampled individuals (see text).

16-2

E. A. THOMPSON

242

866

793

J

4L

I L

Y7g5 L

L

Fig. 2. Graphs of the separable fragments of the ancestral pedigree of the 1961 sampled Tristan da Cunha population. Note 1:unsampled individuals 881, 919 and 797 can be inferred to be necessarily M N from the types of their offspring, allowing founders 866, 793 and 795 to be separated from the remainder of the pedigree for this particular characteristic, although not in general. L+ denotes the same individual as denoted L+ in Fig. 1.

that there are ten distinct interlocking loops involving the marriage node of 963 and the founder 885, because of the marriages of their six offspring into all the other major sibships on the island. I n addition to the main pedigree of Fig. 1, there are separate sections involving more recent founders which may be separated from the main part. These are shown in Fig. 2. In the main section of the pedigree there are 12 founders principally of interest. Obviously a complete joint likelihood function is not feasible, since for three possible genotypes for each individual this would have 3l2 terms. However, the founders fall naturally into two groups, marked in Fig. 1 by the two ancestral sinks (seeprevious paper). The occurrence of one individual in both groups provides a useful check. Thus our aim will be to derive a joint likelihood for each group separately, putting in prior genotype probabilities for the other set. Thompson (19773) gives feasible peeling sequences for this pedigree. Although it is necessary to use complex peeling sequences (Canningset al. 1978, and previous paper), and to work with functions defined jointly on eight individuals over some sections of the pedigree, the analysis is within the limits of the current computer program. Some of the cut sets used in any peeling sequence which will achieve the minimum value 8 for the size of the maximal cutset are shown in Fig. 1. Two individuals (marked * and + in Fig. 1)were 83 and 85 respectively when sampled in 1961. It is clear that had these two women not been sampled the problem of deriving a feasible peeling sequence to compute ancestral likelihoods would have been much more difficult. Thus Tristan da Cunha is the ideal type of structure for ancestral inference, and moreover was sampled at an ideal time. At this time many alleles could be, and were, tested for, but the structure, although complex, wm still within the limits of current programs. The ancestors about whom we wish to make inferences are, however, in some cases, five or six generations removed from even the oldest members of the current population, Several were born before 1790, and the prospect of making any very definite inference about their genotypes must seem remote.

Ancestral inference

I/

I

0.09

I

I

243

I

0.2116 0.25 0.2916 MM - genotype frequency

I

0.36

Fig. 3. MN-blood-type likelihoods for the founders of group R (see Fig. 1) as a function of the allele frequency (or MM-genotype frequency) assumed for the non-reference founders. The solid line is the total probability of all the MN data observed on the current population. Broken lines are the MM marginal likelihoods for each individual, and the dotted lines are "-marginal likelihoods. These arc expressed, for each individual, relative to an MN-heterozygote likelihood value of 1.

JJ.THE MN BLOOD TYPES O F THE FOUNDERS

The MN blood types are an ideal characteristic for ancestral inference, in that it is a threegenotype co-dominant system for which data are available for all the 1961 sampled population. However, the two alleles occur on Tristan da Cunha, as in most populations, in approximately equal frequencies; they are approximately equally represented in each family now, and were presumably approximately equally represented amongst the original founders. Since it is not feasible to derive a joint likelihood function over all original founders, a maximum-likelihood estimate of founder allele frequency must be found (see previous paper). This is the value of the allele frequency which maximizes the overall probability of all the phenotypes observed on the pedigree. Owing to its complexity, it is only feasible to compute the probability for a few alternative values of the M-allele frequency. Those chosen were 0.3, 0.46 (the current observed frequency), 0.5, 0.54 (for symmetry) and 0.6. It is clear (Figs. 3, 4) that the maximum-likelihood estimake must be very close to 0.5. The complete joint likelihood functions for each of the two sets of original ancestors, at each of the above five allele frequencies, have been computed. The variations between different ancestral genotype combinations, for the likelihood functions evaluated at an M-allele frequency of 0-5 are substantial. Since it is impossible to display the complete functions, these having 2187 and 729 terms respectively for the two groups of founders, we give the ten largest, and ten smallest

E. A. THOMPSON

244

*‘

0.09

0.2116 0.25 0.2916 MM . genotype frequency

0.36 ___)

Fig. 4. MN-blood-type likelihoods for the founders of group G (see Fig. I), as functions of allele frequency. Notation as for Fig. 3.

Table 1 ( a ) MN-likelihoods for founders attached to ancestral sink R (see Fig. 1) : each value is expressed as a fraction of the maximum term

...

974

696

695

994

973

79’

976

I

NN

MN

NN ”

MN MN MN

MM MM

MN





NN MM

NN

MM MM MN NN MM MM NN NN MM NN

MM MM

NN

NN NN NN

NN

0.80330 0.75189

MN

MM

MM

MM

MN

MN MN



NN NN NN MN NN

Founder Largest terms

0.74305 0.73214

0.71007 0.68926 0.68622 0.68584

{E MN m

MN

” NN NN MM

NN NN MM ” NN

MN MN MN



MN MN

MM

MN MN

MM MM MN NN MM MM MN

MM MM MM

I;:

MM MM NN

Smallest non-zero terms

0.01306

MM

0’01149

MN NN

0’01052

MM

NN



NN NN NN MM NN

NN

NN



MM

NN

NN

0.00850 0.00826 0.00603

MM

0.00282 O’OOIII

MM MN

NN NN MM

MN MN

NN

MN



MN

{E

MN MN MN MN MM NN

MN

NN

NN

MN

MM

Number of zero terms = 1490 out of 2187. Coefficient of variation of non-zero terms = 1.802.

1

NN

NN

245

Ancestral inference Table 1. ( 6 ) MN-likelihoods for founders attached to ancestral sink G (see Fig. 1) : each value i s expressed as a fraction of the maximum term 1"aunder ... Largest terms

970

798

971

885

976

I

MM

"

MM

MM

MM

0.73413 0.57568

MM MN

MM MM

MM MM

NN MM

0.56796

MM

NN NN MN

MM

MM

MM

0.5451 I

MM

MM

MM

NN

0.53640 0.48710 0.47416

MM MM MM

MM MN MM

MN MM MM

MM MM MN

NN

NN

MM

"

NN

MN

NN

NN

NN

MN

"

MM

NN

NN

MN

NN

NN

MM

MN

NN

"

NN NN

"

Smallest non-zero terms

0.00383

MN

0.00381

MN

MN MN NN]

0.00279

NN

0.00253

NN

0.00234

NN

0-00224

NN

MK NN

0'00I74

NN

NN

NN

1

Number of zero terms = 108 out of 729. Coefficient of variation of non-zero terms = 1.0832.

non-zero, terms in each case (Table 1 a, b ) . Marginal likelihoods for each ancestor separately are given in Table 2; although these will of course display less variation, some conclusions may still be drawn. The likelihoods in Table 1 are expressed relative to the maximum term, and those in Table 2 relative to each individual's heterozygote (MN)likelihood. (For example, 974 is 0.8 times as likely to have been MM as MN, and is marginally more likely to have been NN.) We note that the two Italians (994 and 995), each of whom still has several living children, must have been MN. From the similarities and differences between Table 1 ( a )and (b) several general conclusions emerge. In both cases the range of non-zero likelihoods is of the order of 700-fold, and although the likelihood for founders in group R has a much higher proportion of zero terms, this is caused only by the fact that 994 is necessarily MN. The coefficient of variation is also larger for the founders of group R, and this reflects the fact that whereas for this group the tenth largest term is nearly 7 0 % of the largest, for group G there are only seven other terms within a likelihood ratio of 2 of the maximum. This indicates that firmer inferences can be drawn for the founders of group G, and indeed the most likely term stands out as substantially above the second, which is in turn substantially above any other. I n the case of group R the first term is 25% above the second, but this is followed by a group of only marginally less likely ancestral combinations. The symmetries resulting in terms of identical likelihood are also of some interest. Those in Table 1 ( b )result simply from the fact that (975,798)are a couplehaving the same descendants via

E. A. %OMPSON

246

Table 2. Marginal homozygote MN-likelihoods for the founders of Tristan da Cunha,

relutive to the individual heterozygote values Founder Group R 974

994 973 791 976 Sink G 970 975 798 971 885

976 Ilecent founders 99s 795 790 794 793 866)

Origin

MM-likelihood

NN-likeliliuod

English

0.800

1.037

S t Helene

0.853

1'075

Italian Dutoh St Helena English

0

0

0.944 0.996 1.034

0.982

U.S.A. Scotland S. Africa (Dutch) U.S.A. S. Africa (coloured) English

1-320

0.680

0.743

1.138

1 -246

1.475 1.034

0.782 0.173 1.125

Italian English English English

0

0

I

I

Irish

I .004

1-125

I

1

2

0

1-5

1'5

the same lines of descent. So also does the second pair of equal terms in Table 1(a).The third pair results from the fact that the founder 791 contributes only to the single child 867, who is the spouse of founder 973. This will cause several symmetries of the same form as the one appearing here; a symmetry between the types of the couple 973 and 867. Individual 867 receives her paternal gene from 791, and her maternal gene via 792 from 696 or 695 and we see that for the combination appearing here this latter is necessarily an N-allele. The final symmetry, the first on Table 1 (a),is of the most interest. We see that it corresponds to a complete reversal of the homozygous types of 696, 695 and 791 on the one hand and 976, 973 and 974 on the other. Again a study of the pedigree shows why this is so. There is a c.uinplete symmetry in terms of the genes that these two subsets of this group of founders pass to their descendants. This is, however, a little disappointing, in that it means that firm inferences regarding the genotype of any individual founder in this group cannot be drawn, although conclusions regarding combinations within the group may be so. We see that for every founder in group R (except 994) there are terms in the top ten for which he has genot,ypeMM and terms for which he has genotype NN, and that the same is true of the ten smallest non-zero terms. For some individuals there is a ' consensus' genotype; for example, 976 is MM for most of the largest terms and NN for most of the smallest. However, no very definite conclusions can be drawn (particularly in view of the fact that for 976 considered jointly with the founders of the second group (Table I b ) this 'consensus' disappears). This is reflected also in the marginal likelihoods for the founders of group R; few of these differ significantly from one. For the founders of group G the opposite is true. Here several founders have, with very few exceptions, one genotype amongst the terms of maximum likelihood, and another amongst the

Ancestral inference

247

smallest and zero terms. Individuals 970, 971 and 885 can be fairly firmly concluded to have been MM, both on this basis and on the basis of the marginal likelihoods. Interestingly 975 and 798 have predominantly N-genes both amongst the smallest and the largest terms, but this is an artifact of the different combinations of genes of the other founders with which these are in conjunction. Both an overall consideration of the likelihood function, including the zero terms, and the values of the marginal likelihoods indicate that they are likely to have been NN. It is also of interest to examine the changes in these joint ancestral likelihood functions with changes in the allele frequency assumed for the non-reference founders. Since there are again difficulties in displaying joint likelihood functions, we consider only the individual marginal likelihoods. The homozygote likelihoods, again expressed relative to the heterozygote values are shown in Figs. 3 and 4. As the assumed M-allele frequency increases, so also does the probability that non-reference founders brought in M alleles, and hence the likelihood that the reference founders did so decreases. Similarly the likelihood that they carried N-alleles increases. However the degree, and even the form, of the change can differ markedly between individuals. I n group R of the founders (Fig. 3) all the marginal likelihoods are similarly affected. At the maximizing value NN-likelihoods tend to be higher than MM-likelihoods, but the difference is not large, and no very firm conclusions can be drawn. It is, however, of interest to note that it is for founder 791, who has only a single child and relatively few current descendants, that least information is available, not only by the criterion of the marginal likelihoods a t the maximizing allele frequency (Table 2), but also by the slightness of the change in likelihood with allele frequency. I n group G (Fig. 4)there is much more variability both within and between individuals. Founders 970, 971 and 885 are more likely to have been MM than NN over the whole range of genotype frequencies considered, providing yet more convincing evidence that these individuals carried M-alleles. The two graphs for each of the three other individuals do cross, but at an allele frequency of 0.5 (MM-genotypefrequency 0.25) we see that they are substantially more likely to have contributed N alleles. Finally there is a further set of functions whose values convey information about the effect of different sections of the currently observed data upon the ancestral likelihood functions. I n working through the pedigree, any intermediate R-function, defined on any cut set of individuals, and resulting from the corresponding subset of the current data, may be examined. Yet again, it is not possible to display complete functions, since these consist of many terms, and furthermore there is a difficulty in interpretation, since the functions may be likelihoods with respect to some individuals, but probabilities incorporating genotype frequencies with respect to others in the same cut set (see previous paper). However some points emerge. First is the way in which the surprisingly powerful discrimination between the couple (975, 798) and their sons-in-law 970 and 971 emerges as a result of the data from the descendants of their son 967. The couple are inferred to be NN, but 970 and 971 to be MM; this discrimination is not apparent until the predominantly NN descendants of 967 are incorporated. A second point of interest is to consider the symmetries in likelihoods between individuals in any cut set, and to examine how these result in the final symmetries that have been previously discussed. The MN-blood-type results have been described in some detail, not only for their intrinsic interest,, but also because they display particularly well the type of problems which arise, the hype of conclusions that we may hope to draw, and the type of considerations that may be employed in examining the likelihood functions.

248

E. A. THOMPSON 20

19

208

4

213 [L'I

\

983 Fig. 5. The descent of the individuals affected by retinitis pigmentosa to the unsampled parents of senior sampled members. A double underlining denotes an affected individual, and a single line a necessarily carrier parent. Note that a t least one individual in each of the couples (20, 19), (990, 213) and (983, 62) must be a carrier, and that at least one of each pair is sampled.

V. RETINITIS PIOMENTOSA ON TRISTAN DA CUNHA

The occurrence of retinitis pigmentosa on Tristan da Cunha is discussed by Sorsby (1963). This defect affectsfour members of the 1961sampled Tristan da Cunha population, the eldest of whom is a grandparent of two others and is the child of a brother-sister mating. The form of retinitis pigmentosa occurring on Tristan da Cunha is an autosomal recessive, and in xriew of the rarity of this characteristic it is likely that there was only one copy of the gene amongst the original founders. It would be of some practical interest both to confirm this single-origin hypothesis, and to infer the original carrier of this gene, since this would enable us to determine those families in the current population potentially a t greatest risk. Sorsby (1963) noted that on the assumption of a single ancestral gene, either the couple (792, 976) or the couple (975, 798) must be responsible, since only they are ancestors of all the necessarily carrier parents of the affected individuals. Calculation of ancestral contributions has previously focused at'tention on 792, but, since she is an ancestor of all except one of the individuals sampled in 1961, this may be an artifact of the procedure; joint likelihood functions are required. The major new problem created by the retinitis pigmentosa data is that, being a recessive characteristic, the genotypes of the senior sampled members are not known. The yelationship of the affected individuals to the unsampled parents of senior sampled members is shown in Fig. 5; we see that there are three sampled couples having a necessarily carrier offspring, and hence in which at least one member must be a heterozygote. As described in Thompson (1977b) and in the previous paper, it is convenient to use the 'current section' of the pedigree to derive likelihoods for the genotypes of the unsampled parents of the senior sampled members, either individually or jointly, and to input these to the program deriving ancestral likelihoods, rather than to include

249

Ancestral inference

Table 3. Probabilities of observing the current retinitis pigmentosa phenotypes (normal and affected) jointly with the event of heterozygote genotypes for the stated parental combinations (see Fig. 5 ) and original founders. (Example. Prob. (observed data and 990, 52, 20, 976 heterozygotes) = 2.03 x I O - ~ .Overall probabilities, and prior probabilities arising from ancestral interrelationships only, are also given for each parental combination, and an overall relative likelihood for having introduced the allele is given for each founder.) Prior Overall probability probability of carrier r---L-_-of phenotype parent data combination Carrier in the parent couple

990 990 213 213

213

213 990 990

52 52 52

20

52

20

983 983 983 983

20

19 I9 I9 20

I9

16.32 68.67 2-44 I ‘03 I ‘04 0.95 52’93 113.27

Heterozygote marginal probabilities ( x 106) L

~~

( x 10’)

696} 695

57’19 21.97

4.64 6.00

10.0g 11.13

0’52

5’15 5’97 28.56 19.86

Overall likelihood relative t o maximum term

9Y6

975) 798

2.03

3.23

14.72 1.37

23’95

0’43 0.49 0.40 18.10 15.00

0’20 0.08

0.13 0.03 0.03

0.17 2.59 6.45

0.09 7.08 43’32

0.71

0.39

I

7

971

973

1.26 4’32 0.06

1’20

1.15

0.04

0‘02

0’00

0.03

0.07 2.36 7.29

0.09 0.06 829 5’52

0’22

0.26

living individuals themselves in the already long and complicated peeling sequence required to resolve even the ancestral section of the pedigree. Unfortunately it is clearly not feasible to incorporate R-functions defined jointly over several parental couples of senior sampled members, without substantially increasing cut set sizes on the ancestral pedigree, but such joint functions necessarily result from an analysis of the current population. It is necessary to make some approximation. To resolve first the problem of the three sampled couples, at least one member of each being necessarily a carrier, we can for each allele frequency, and each ancestral sink, perform eight runs, each computer run assuming one of the eight possible combinations of obligatory heterozygotes, no assumption being made about the genotype of the other member of each couple. For other parents of senior sampled members we may first incorporate R-functions on the basis that their offspring are not carriers, second on the basis of their offspring’sunaffected (but not necessarily non-carrier) status, and third on the basis of the unaffected status of all their descendants, but disregarding that these are the descendants also of other senior sampled members. This last presumably provides the best approximation, while the other two are extremes which provide limits on the effects of the approximations. I n fact, no differencein the overall conclusion emerges from the three approaches, so we present only results from the third. We may consider the ancestral R-functions resulting from each of the eight runs separately, and concentrate our attention on those runs which provide the highest overall probability of the phenotypic data and assumed carrier combinations. We may also combine them to give an overall picture. A joint genotype distribution for the genotypes of the six members of the three couples implicated by the data, on the basis of their ancestral interrelationships may also be derived. The results are summarized in Table 3. The allele frequency assumed in computing these results is 0.01 ;this assumption will be discussed below. The ‘prior’ probability is the probability that each

250

E. A. THOMPSON

set of three individuals would be carriers on the basis of ancestral relationships, and we note that this is almost an order of magnitude larger in the cases where 990 is assumed a carrier, than where his wife 213 is so assumed. Furthermore the marginal ancestral probabilities show the same phenomenon, and it is also clear from the data on the current population (Fig. 5) that 990 is necessarily a carrier, and thus that there is no reason to believe that 213 is so. The values of the heterozygote marginal posterior probabilities are given for each of the seven founders for which these are largest; the values being overall substantially larger than for any other founder. These probabilities are the marginal ancestral likelihoods, weighted by the heterozygote genotype frequency. Each of the two members of the couples (696, 695) (the parents of individual 792) and (975, 798) have the same marginal likelihoods since they have the same children. The combined likelihoods are given relative to the maximal term, which is that for either member of the couple (975, 798). Overall we see that each member of the couple (975, 798) is 2 g times as likely to have been responsible for the introduction of the retinitis pigmentosa allele as individual 976. The parents (696, 695) of 792 show a higher likelihood (71 yoof that for (975, 798)) because of the descent of individual 20 from the second daughter (802) of this couple, but if it is I9 and not 20 (Fig. 5 ) who js the carrier, then 975 and 798 rise to more than three times as likely as 696,695 or 976. For the term providing maximal overall probability (the combination 990, 983 and 19), 975 and 798 have a posterior probability (or equivalently likelihood) more than six times that for 976. Although no definite conclusion can be drawn, it is thus very clearly 975 and 798 who are most strongly implicated by the data. A likelihood ratio of three is a substantial indication, and the result is of particular interest since it is in contrast to previous expectat'ions. The founders so far discussed are the only ones who could be responsible for the introduction of the allele, in the event that a single original copy has resulted in all the current affectedindividuals. Nevertheless there are two other founders (971 and 973) who show likelihoods of the same order of magnitude for having introduced the allele. I n view of the rarity of the characteristic, and the overall distribution of the trait in the Tristan da Cunha population, this is unlikely. The large likelihoods obtained for these two founders are partly a result of ignoring the joint descent of the younger current members of the population from the senior sampled members. Had there been originally two copies of the allele, we would expect to see more affected younger current members of the population. It is however of some interest, both that our approximation has this effect, and that these are the two founders who show sufficiently close relationship to the known carrier parents of affected individuals to be affected in this way. To resolve this problem we may consider not only the marginal probabilities presented in Table 3, but also the terms of the joint ancestral R-functions. The terms particularly of interest are of course those where a single founder is a carrier, the others having normal alleles. These terms are zero for founders other than 696, 695, 976, 975 and 798, but for these founders are amongst the maximal terms. Terms in which one of these founders carries two retinitis pigmentosa alleles also provide high likelihoods, but of course low posterior Probabilities on weighting by the genotype frequency. Although it is not practicable to investigate several allele frequencies for all the eight possible heterozygotecarrier combinations, a variety of runs for other frequencies for specific combinations indicates that the value 0.01 gives at least a value close to the maximum overall probability of all the phenotypic data observed on the pedigree. Such a frequency is probably higher than

Ancestral inference

251

Table 4. Marginal extinction probabilities of autosomal genes for the founders of Tristan da Cunha

Pounder

Prob. (neither gene extinct)

Prob. Prob. (both (precisely one gene extinct) genes extinct)

Pounders of group R

974 994 973 79I 976

0.2761 0.2841 0.9844 0.3047 0'0

0.8092

0.6672

00567

0.6898

0.0261

0.0156 0.6500 0.6299 0.1971

0'0

0.0453 0.3701 4 x 10-6

Founders of group G

970

0'0

0'9310

0.0690

798 97I 885 976

0'4434 0.3098 0.9572 0.8029

0.5412

0.0154

0.6564 0.0428 0.1971

0'0

0.0338 4x

10-6

the actual population frequencies in those populations whence the founders originated, but since at least one of the 26 genes of our 13 founders must have been of this allelic type, the actual frequency amongst the founding genes was nearly 4 yo.The compromise figure of 1 % makes it improbable that two retinitis pigmentosa alleles were initially introduced.

VI. EXTINCTION PROBABILITIES ; THE GENES OF THE POPULATION

The joint distribut,ion of any genetic trait observed over all the members of a population derives from the joint distribution of actual genes for that trait inherited by the current population from the original founders. The complete distribution of joint gene identities over the whole population cannot of course be explicitly derived, but this distribution has two distinct components which may be considered. First there are the numbers of copies in which genes are replicated. An indication of this is given by the ancestral contributions of founders the expected proportions of genes deriving from each founder present in the current individuals. We may consider either joint ancestral contributions, or contributions for each founder separately. As regards expected contributions there is no difference, and these have been given for all the original founders of Tristan da Cunha by Roberts (1968). The other component of the gene distribution is the presence or absence of particular originalfounder genes in the current population. The evolution of a population is the joint survival of its genes, and the extinction probabilities of specific genes, either singly or more importantly jointly with others, are parameters of the pedigree structure which can lead to a clearer understanding of observed genetic data. As described in the previous paper, the methods of ancestral inference provide for the computation of extinction probabilities of any subset of original founder genes, amongst,founders attached to a single ancestral sink. Some of these extinction probabilities are given in Tables 4 and 5. Table 4 gives the extinction probabilities of both and of one gene at a particular autosomal locus for each founder separately. We can see the non-independence of the extinction of the genes of a single founder. This non-

-

252

E. A. THOMTSON Table 5. Examples of probabilities that, at a purticular uutosomul locus, no gene from the given set of founders survives in the current (1961) Tristun du Cunha populution Set of founders

Extinction probability

0'75 X 0.96 x 0.76 x x 0.18x 0.29 x 0.22

0'73 X 0.76 x 0.68 x

0.18x

10-' IO-~ 10-l~

10-7 10-l~ IO-~

I0-l' IO-~

IO-~ IO-~

0'0

0.96 x x 0.35 x 0.83 x 0.22

IO-~ 10-* 10-7 1o-O

independence arises also between founders, particularly for the Tristan da Cunha pedigree, because of the very limited number of paths of descent (Fig. 1). Joint extinction probabilities must therefore also be considered, and a few of these are given in Table 5. We note that chance survival of specfic individuals in the 1961 population has an important effectin whether or not an extinction probability is zero. The more recent founder 994 has sampled children; 885 also has a sampled child - the 83-year-old woman denoted L* in Fig. 1. Hence arise the zeros of Table 4.Note also that 791 and 970 cannot contribute more than one gene a t a given locus to the current population, since each has only one child with current descendants. The couple (975,798) have a surviving grandchild, the 85-year-oldwoman denoted L+,and hence a t every locus there must be a gene from one member of this couple (Table 5). We see that the probability that there is no gene from either 696 or 695 is very small, though not zero. There may well be some locus at which this event occurs. It is possible that no genes remain from any of the five individuals 974,791,696 (or 695), 976 and 973 although the probability is so small that there is probably no locus a t which this event occurs. As parameters of pedigree structure the relative values of the non-zero extinction probabilities are more important, since these are much less influenced by such chance events. The relative values of Table 4 show several interesting features. I n all founders the fate of the two genes a t a single locus is strongly negatively correlated, and is most strongly correlated for those founders having few children with surviving descendants. Although both 970 and 791 have zero probability of contributing both genes to the current population, 791 has only 5 greatgrandchildren while 970 has 22, resulting in markedly different single-gene extinction probabilities. The probability that neither gene from individual 696 (or equivalently 695) is extinct is low, arising from the fact that in spite of their 242 current descendants, they had only two children contributing to the current population. The probability that both genes from either member of this couple are extinct is substantially lower than for founders 974, 973 and 971, who form a group of very much equivalent individuals. The probability that both genes from both

Ancestral inference

253

members are extinct is very much smaller again (Table 5), indicating the strong effect of limited numbers of descent paths on the non-independence of gene survival. The couple (975,798) show much higher probabilities that both genes from either individual survive, mainly by virtue of the fact that they have three children contributing substantially to the current population. Table 5 gives only a few selected terms of the ancestral R-functions; it is not possible to present a complete picture. The main conclusion to be drawn is again of the strong (negative) dependencies between gene extinctions. To take a specific example, there is a probability of 0.37 that both genes of 791 have become extinct, yet there is a 10-foldfactor between the extinction probabilities for the sets (974,696,695) and (974,696,695,791). Probabilities that genes from several founders jointly are totally absent from the population are not only small, but are much smaller than single-founder extinction probabilities would indicate. SUMMARY

The Tristan da Cunha data for the MN-blood types and for retinitis pigmentosa are analysed according to the methods for ancestral inference developed in the preceding paper. Joint and marginal likelihoods for the MN data show that it is possible to make inferences about the types of original founder genes, although relative values are in some cases not large. Although relative likelihoods are robust against small changes in gene frequency, they can be distorted by assuming values that differ widely from the maximum likelihood estimate. I n the case of retinitis pigmentosa, under an assumed allele frequency of 0.01, we conclude, contrary to expectations, that the couple (975,798) are substantially the most likely founders to have been responsiblefor the introduction of the allele. The inferencesfor this trait are complicated by the fact that it is recessive,and genotypes are therefore not observable. Extinction probabilities for a variety of sets of original founder genes on the Tristan da Cunha ancestral pedigree are also computed and discussed. I arn grateful to Drs D. F. Roberts and J. C. Bear for answering all my questions about details of the Tristan da Cunha data. REFERENCES

CANKINGS, C., THOMPSON, E. A. & SKOLNICK, M. H. (1978). Probability functions on complex pedigrees. A d v . Appl. Prob. 10, 26-61. ROBERTS, D. F. (1968).Genetic effects of population size reduction. Nature, Lond. 220, 1084-8. ROBERTS, D. F. (1971).The demography of Tristan da Cunha. Population Studies 25, 465-79. SORSBY, ,4. (1963). Retinitis pigmentosa in Tristan da Cunha islanders. Trans. Roy. Xoc. Trop. Med. Hyg. 57, 15-18. THOMPSON, E. A. (1976). Inference of genealogical structure. 111. The reconstruction of genealogies. SOC. Xci. Ifiform. 15, 507-26. THOSIPSON, E. A. ( 1 9 7 7 ~ )Peeling . programs for zero-loop pedigrees. Tech. Rep., Dep. Biophys., Univ. Utah, no. 5. THOMPSON, E. A. (1977b).Peeling programs for pedigrees of arbitrary complexity. Tech. Rep., Dep. Biophys., Unit.. Utah, no. 6. THOMPSON, E . A., CANNINGS, C. & SKOLNICK, M. H. (1978). Ancestral inference. I. The problem and the method. Ann. Hum. Genet., Lond. 42, 95-108.

Ancestral inference. II. The founders of Tristan da Cunha.

Ann. Hum. Genet., Lond. (19781, 42, 239 239 Printed in Great Britain. Ancestral inference 11. The founders of Tristan da Cunha BY E. A. THOMPSON...
1MB Sizes 0 Downloads 0 Views