Ann. Hum. Genet., Lond. (1978), 42, 219

219

Printed in areat Britain

Choice of ascertainment model I Discrimination between single-proband models by means of birth order data

BY JON STENE Institute of ~ t ~ t i s t i cUniversitg s, of ~ o p e n h a ~ e n I n segregation analysis of human family data two models have played a dominant role, namely the so-called complete-ascertainment (or complete selection) model and single-ascertainment (or single selection) model, both suggested by Weinberg (1912a) b ) . I n later years a third model, the incomplete multiple-ascertainment model, also suggested by Weinberg (1928), has been applied. These models are based on different sets of assumptions regarding the data collection procedure or, as i t is often called, the method of ascertainment, a term introduced by Fisher (1934). The estimation and test procedures for the segregation parameter, the probability that a child is affected, are determined by the chosen model. Since Weinberg introduced these models the choice of ascertainment model has been a major problem for human geneticists and statisticians working with human family data. Information about the actual method of ascertainment is rarely given in such a form that the choice of model follows immediately. Suitable methods of utilizing available information for choice of model do not seem to have existed for the limited amounts of data on which most human genetical investigations are based. That the choice of ascertainment model is highly important for inference about the segregation parameter is clearly demonstrated in examples provided by Haldane (1938) and Selvin (1975). Their examples imply that a wrong choice of ascertainment model may in certain cases lead to a wrong conclusion regarding the mode of inheritance. This state of affairs has given rise to much concern and has been discussed at length by, among others, Fisher (1934), Haldane (1938) and Smith (1959). Haldane (1938) proposed a method for testing a hypothesis about the segregation parameter without making any choice between the two models. This test has been studied by Selvin (1975), who found it to have a very low power against,alternatives which were not distant from the hypothetical value, e.g. if the hypothetical value was 0.25, then the power was very Iow for all aIternatives between 0.16 and 0.35. Some other ascertainment models have also been proposed by Weinberg (1928), Morton (1959, 1962, 1969)) Rao (1965) and Stene (1978)) but the problem of discriminating between these models has still remained. The aim of the present paper is t'o provide new insights regarding the assumptions of different ascertainment models and to develop methods for choosing an ascertainment model for samples of the sizes which are normal in human genetics. It has already been demonstrated (Stene, 1977) that the three models mentioned above hold under much wider sets of assumptions than previously assumed. Their names are connected with rather special sampling procedures, and since the models hold in a number of other situations, their names are quite misleading and should be avoided.

220

J. STENE BASIC ASSUMPTIONS AND DEFINITIONS

The following assumptions are made throughout this paper unless otherwise stated.

Assumptions regarding inheritance (1) The mode of inheritance is a single locus with two alleles exhibiting complete dominance and penetrance at birth. There are two types of individuals, dominant (or normal) and recessive (or affected). These can be clearly distinguished from each other at birth. (2) The probability for a child to be affected is 8,which depends on mating type. (3) Births represent independent trials. (4) The birth of an affected child does not cause family limitation or overcompensation. From these assumptions we derive the following probability for a sibship of size 5 which includes r affected children:

(’) eyi -e)-

(r =

o , ~..., , 5).

(1)

I n the sequel s will denote the number of children in a sibship, often called family size or sibship size. Only two-generation families, i.e. unrelated sibships and their parents, will be considered. Dejinition 1. By ‘method of ascertainment ’ of families we mean a sampling procedure by which data about traits in families are collected. The focus of the argument will be on the ascertainment of families rather than of individuals. I n all situations considered in this paper the families are ascertained with certain probabilities through information about affected children. Definition 2. A ‘proband’ is an affected individual who has been detected in some way other than via the other members of the family and through whom the family can be ascertained. Dejinition 3. The ‘ascertainment probability of an individual ’ is the conditional probability that an affected child is a proband. DeJinition 4. The ‘ascertainment probability of a family’ of size 5 with r affected children is the conditional probability that the family will be ascertained. I n a real situation this may depend on social and economic factors, on where the family lives and on willingness to co-operate. For simplicity, assume that it can reasonably (plausibly) be expressed by a single number.

General assumptions regarding method of ascertainment (1) All families in our data have been ascertained through the same method of ascertainment. (2) All families have been ascertained independently of each other.

ASSUMPTIONS FOR SINGLE-PROBAND ASCERTAINMENT MODELS

Assumptions for the ascertainment models hitherto available in the literature seem to have been based almost exclusively on family size and number of affected children in the family (Elandt-Johnson, 1971). However, Li (1964, 1965, 1970) has pointed out that ascertainment often takes place through the first appearance of an affected child. Assumptions based on this idea have been formulated by Stene (1977). Otherwise, information regarding birth order does not seem to have been utilized for the choice of ascertainment model.

Choice of ascertainment model

22 1

v(re want to demonstrate for a number of ascertainment models that two different sets of assumptions can be formulated, one referring to the number of affected children and the other referring to birth order of the proband among affected children. This latter type of assumption may throw some new light on how the ascertainment in fact takes place. Ascertainment through ~ r s t - a ~ e cchild. ~ e ~ Stene (1977) has formulated the following two different sets of assumptions for the so-called complete ascertainment model and discussed the ascertainment method more thoroughly. These two assumptions lead to the same distribution (2) below.

(A) The conditional probability that a family is ascertained is independent of the number of affected children if there is at least one such child. (B) The family is ascertained through the first (or eldest) affected child, if it is ascertained. Stene (1 977) demonstrates that from either of these assumptions the conditional probability that a family has r affected children given that it has s children altogether and has been ascertained in accordance with one of these assumptions, is

Under the latter of the two assumptions it is evident that the ascertainment takes place through a single proband. Ascertainment probability independent of birth order among affected children. For the so-called single ascertainment model, a term which we also want to avoid, Stene (1977) formulated the following assumption:

(A) The conditional probability that a family is ascertained is proportional to the number of affected children in the family. This assumption refers exclusively to number of affected children. Another set of assumptions leading to the same distribution and referring to birth order is the following one: The family is ascertained through a single proband. The probability qi that the family is ascertained through the ith affected child is independent of i, i.e. qi = 77 (i = 1, ...,r ) , where 7r is a parameter referring to family size and family specific properties. (In the notation of Xtene (1977): 71 = h,o,).

From t)hispair of assumfitions we get that the probability that the family is ascertained equals m.r, i.e. the ascertainment probability of the family is proportional to the number of affected children. By the arguments given in Stene (1977) we get that the conditional probability that the family has r affected children, given that it has s children altogether and has been ascertained in accordance with either A or B, is

Ascertainment probability of an affected child depending monotonely on its birth order among affected children. The model to be considered here was originally proposed by Rao (1965), but

J. STENE

222

does not seem to have been used later either in human genetics or otherwise. For this model a similar pair of assumptions as for the two previous models can be formulated. The former one is almost the same as that originally given by Rao (1965), and refers to the number of affected children.

(A) If r 2 1 is the number of affected children in the family, the conditional probability that the family is ascertained is proportional to ra, where the parameter is a 2 0 and independent of s. The second set of assumptions refers to birth order.

(B) (i) The family is ascertained through a single proband. (ii) The probability qithat the family is ascertained through the ith affected child is q1 = n,

q$=n(ia-(i-l)a)

I

(i=2,.,.,r),

(4)

where n is a parameter referring to family size and family specific properties. From this pair of assumptions we get that the probability that the family with r affected children is ascertained, is n.ra, From either A or B we obtain by similar arguments to those given in Stene (1977) that, the conditional probability that the family has r affected children given that it has s children altogether and has been ascertained in accordance with one of these assumptions, is

where

From (4) we get that the conditional probability that the family has been ascertained through the ith affected child, given that the family has been ascertained and has r affected children, is

Prl=

q1l.i qj = 3=l

r-as

From (6) we notice that if a = 0 the ascertainment takes place through the eldest affected child with probability one, and by putting a = 0 in ( 5 ) we also get (2). If we let a = 1 in (6) we get that each afl'ected child has the same probability, i.e. l / r , of being the proband and with a = 1 in ( 5 ) ,it is reduced to (3). Although there is no conclusive evidence available, it would seem plausible that this model might give a reasonable and practical approximation to real situations, with some value of a. I n practice it is unlikely that a < 0, while a > 1 may indicate that the standard of diagnosis has improved during the time-span when the children in the family have been born. However, usually one would expect that 0 < a < 1, with a < 1 indicating that the probands are more likely to be found among the eldest affected children than among the youngest ones. (What is of practical interest here is again the ascertainment probability of the family rather than the proband.)

223

Choice of ascertainment model CHOICE OF SINGLE-PROBAND MODEL

From (6) we notice that the conditional probability that the ith-affected child is the proband given that there are r affected children depends only on r , i and a.If we have a number of families each with exactly r affected children and the families are ascertained in the same way, which means that a is the same for all families, the number of these families for which the ith affected child (i = 1 , ..., r ) is the proband, is multinomially distributed with probabilities given by (6). For families ascertained through a single proband the parameter a characterizes the ascertainment method, and if a = 0 or a = 1 we get the two well-known simple models. If a = 0 the proband is the eldest affected child in each family, and if a = 1 the birth number of the proband has an even distribution among the affected children. It should be noticed that only families with at least two affected children provide information about birth order among affected children. Therefore families with only a single affected child are omitted from the data when we want to choose the ascertainment model. By means of data where the birth number of the proband among the affected children is recorded, we want to construct a statistical test procedure for choosing the ascertainment model. I n this connexion we introduce the following notation.

Notation Maximum number of affected children in any family in the material. Birth number of proband among affected children. Number of families with r affected children and with affected number as proband (bll = 0 according to the observation above), aasuming that there is only one proband in each family.

TO

i bri

ro

b.i =

r=i

bTi Number of families with affected number i as proband.

T

bT.= C bri Number of families with r affected children. i=l

The connexion between the different quantities follows from Table 1. Model for the data. By the notation introduced, for given br., we have that brl, ...,b,, are multinomially distributed with probabilities prl,...,p,,, where pTiis given in ( 6 ) . Hence we get for fixed r :

t

Since families with different r's are considered separately, the probability for the whole material is

J. STENE

224

Table 1. Data classified after r number of affected children and i birth order of proband among affected children i r------

r

I

2

2

b,l

b,,

3

3

4,

r

br1 brO,

bTo2

bra,

b.1

b.2

b.3

To

b32

63,

b72

bT3

--7

...

... ... ...

r

brr brar b.r

...

...

...

TO

b,. b, . 6,.

b0.0.

bra.

b.,

b..

Estimators and tests for a, From (8) we can estimate a by the maximum-likelihood method. Using Fisher's scoring method for the maximum-likelihoodestimators (seeRao, 1973, pp. 366-74) a is estimated by the expression

2 = a* +L'(a*)/I(a*),

(9)

where a* is a trial value, L'(a)is the derivative of log likelihood to (8) and I(a)is the Fisher information (see Rao, 1973, pp. 329-32) given the br.'s, TO

L'(a) =

I ( a )=

i=2

b,

ialn (i)- (i- 1)aln(i - 1) ia

- (i- 1)"

- Z br.ln(r), r=2

ro br. 5 [ialn(i)-(i-l)aln(i-l)]2 c - 5 br.[In(r)I2. r=zF i"- (i- 1)" r=2

i=2

A trial value a+ may be found in the following way. The conditional probability that the eldest affected child is the proband, given that one of the two eldest affected children is so, is Prl/@r1 + ~ r z )= 2-"7

which is independent of r . Therefore

An unbiased estimator for this expression is b.l/(b.l+b.2). By putting this expression equal to 2-a and solving for u we get the trial value a* = [In (b. + i5.2) - In (b.,)]/ln 2. The variance of & can be estimated by l / I ( & ) . It is reasonable to test the hypotheses H,: a = 0 and H,: a = 1 against appropriate alternatives. These hypotheses are of interest in themselves. They are also of interest because the estimators and tests regarding 8 are much simpler in these cases than otherwise. An c-level test for the hypothesis

H,:

01

=0

against a > 0

is carried out by rejecting the hypothesis if (2- 0) ,/(I(&))2 co, where co is the (1 - s) point in the standardized normal distribution. An s-level test for the hypothesis

HI: a = 1 against a < 1

Choice of ascertainment model

225

is carried out by rejecting the hypothesis if (a- 1),/(1(1)) < c,, the €-point in the standardized normal distribution. A mdtiple-test procedure for choice of ascertainment model. When nothing is known about a except that 0 < a < 1, the choice of ascertainment model, i.e. the value of a,may be made by the following multiple-decision procedure where both hypotheses H , and H , are tested against the alternatives specified above: (i) If both H , and HI are rejected, conclude that 0 < a < 1and use the model ( 5 ) with a = a. (ii) If H , is rejected, but not H,, conclude 0 < a < 1 and use the model (3). (iii) If H , is rejected, but not H,, conclude 0 < a < 1 and the model (2) is used. (iv) If neither H , nor H I is rejected, conclude 0 < a < 1 and no choice between the extreme models can be made with the chosen e. h

If we reach decision (iv), i.e. no decisive conclusion, it may be Fossible to get a conclusion by choosing a higher value of E. The reasons for suggesting this procedure are based on the following arguments. The methods for estimation of 8 and for testing hypotheses about 8 are quite simple when a = 0 and a = 1, while they are much more complicated for any other a, as will be demonstrated in a future paper. By numerical investigations which may be published later, it has been found that if a is not far from either a = 0 or a = 1, the estimate of 8 we get from the model with a = 0 or a = 1, respectively, will not deviate very much from the estimate of 0 we will obtain by the more complicated procedure based on the model with the true value of a. The procedure proposed above is a method to determine neighbourhoods of a = 0 and a = 1 in which we can use the simple models with sufficiently good approximation. If we reach decision (i), 8 is estimated from (5) with a = 6, the estimated value. The practical procedure will be demonstrated in a forthcoming paper. If we reach decision (ii),8 is estimated by the relative frequency of affected children when the probands are omitted from the material and if we reach decision (iii),we may estimate 8 by, for example, Mantel’s estimator (seeLi & Mantel, 1968).

The properties of this multiple-test procedure will be studied elsewhere. One important point should be noticed. The analysis considered in this section is a conditional analysis given the r’s of the different families. The segregation parameter 8 does not occur in the conditional distribution (8). Therefore the same analysis can be carried out even if 8 may vary from family to family. This may sometimes be the case in human genetical investigations where clinically indistinguishable disorders in some families are caused by a recessive gene, in others by a dominant gene and in a third group of families by a sex-linked gene, giving rise to different values of 8. If the birth order is given, then the choice of ascertainment model can be carried out for the whole material in a single analysis. Example 1. I n Fig. 1 there are given 36 families, each ascertained through a single proband. In Table 2 the same data have been given as in Table 1. It will be noticed that by no means all families have been ascertained through the eldest affected child. Therefore the model specifiedby equation (2) is not the relevant one. Neither is the relative frequency of probands among the eldest affected, the second eldest one, etc., equal for each r , therefore it is not immediately apparent that the model specified by equation (3) is the relevant one. 15-2

226

J. STENE

?d f

r t ?

QI d 4r t f ? 9-

?I d

4i

?I d 9I d

Q d

ug

4

?I d

I

f

?+d

? d

I

f

?1 d

d f crd

F?

9-9f

?1 d

?7-d

Q d

Q d

?I d

?'r'd ?I d

I

? d 7-

Q d 9 d

9r:

-T-

93'

i -

Fig. 1. Data on 36 families ascertained through a single proband, indicated by an arrow. Black symbols indicate affected individuals and white normal ones. The data are constructed by simulation with 0 = 0.5, a = 0.75 and the sex ratio equal to 0.5.

Choice of ascertainment m.odel

227

Table 2. The data in Fig. 1 classiJied after r and i i A

r

>

r

I

2

3

2

9

3 4

I I

4 4

2

0

0

0

I

8

2

0

21

I1

+

4 I3

7

Hence we get d = a* L’(a*)/I(a*)= 0.78849 - 0.03548 = 0.75301. We iterate the procedure I < 0.001. until the correction IL‘(a*)/l(a*) The final result is that our maximum likelihood estimate of a is 01 = 0.75, with two significant figures. Por this value of a we get I(&) = 17.05, and the variance var (d) = l/I(&) = 0.059 and SD (a)= 0.24. Then we want to test the hypotheses Ho and HI, the latter against the alternative a < 1. I n order to test the latter we need I(1)= 11.52. We choose level 6 = 0.05 for each test. Then we reject Ho if (a- O)J[I(&)] > 1,645 and H , if (&-l),/jI(l)] < -1.645. We find that (a-O).J[I(d)] = 3.11, and that (d--l)d[I{l)]= -0.84. Hence H o is rejected, but not H,, and we conclude that 0 < a 6 1, and use the model (3). The 36 families in Fig. 1 have 96 children altogether, 66 of whom are affected. By the usual estimator for 0 in model (3) (see Elandt-Johnson, 1971) we get 8 = (66 - 36)/(96- 36) = 0.50. There will be further discussion of this example in a future paper. Example 2. I n Stene & Stengel-Rutkowski (1977) 11 families with familial translocations were considered. Some of these were pedigrees consisting of several nuclear families. All these 11 families were ascertained through a single proband in a single index sibship, which is a, sibship containing one or more probands. We have to distinguish between choosing an ascertainment model for the index sibships and building models for the ascertained pedigrees. The models considered in this paper are, in such a context, ascertainment models for index sibships. The choice of ascertainment model for the index sibships can only be based on those sibships which have two or more affected children, as mentioned before. Only 4 of the 11 index sibships

J. STENE in Stene & Stengel-Rutkowski (1977) have two affected children each, the remaining seven ones have only a single affected child each, namely the proband. I n two of the four index sibships with two affected children, the elder child was the proband, in the two others the proband was the younger affected child. By means of these data we estimate 3 = 1 and I(1)= 1.921 giving SD ( 1 ) = 0.72. If we choose e = 0.10 we reject H , since (3-0)41(1)] = 1-39 and conclude 0 < tc < 1 and use model (3) as the ascertainment model for the index sibships. (See further Stene & Stengel-Rutkowski, 1977.) HOWto build models for ascertained pedigrees has been discussed by Stene (1970a, b, c, 1975) for familial translocations and by Cannings & Thompson (1977), and in references given there for pedigrees with inherited genes. COMMENTS ON THE TERMINOLOGY

As mentioned in the introduction, the models (2) and (3) are usually denoted as the complete ascertainment (or complete selection) model and single ascertainment (or single selection) model respectively. I n this paper it is demonstrated that both models can arise in sampling of family data where each family has a single proband. This fact demonstrates that the common names are misleading. For the type of ascertainment situations considered in this paper it might be reasonable to denote model specified by equation (5) as the general single-proband model. Models (2) and (3) are special cases of single-proband models, the former one for data ascertained through the eldest affect,ed child and the latter one for data where all affected children in ;Isibship have the same probability of being the proband. CONCLUDING REMARKS

I n the present paper we have only considered the case where the ascertainment took place without regard to the sex of the affected child. This is often not the case. I n that situation families with a girl proband and a boy proband should be considered separately and it should be tested if the parameter a was equal for both sexes. Regarding the segregation analysis, i.e. inference about the parameter 0, in the models considered here, information about the number of affected children and the total number of children will be sufficient. This stage of the analysis will be Considered in a future paper, where other analyses based on such data will be discussed. SUMMARY

A statistical method for choosing between ascertainment models for human family data has been constructed. The families are assumed to have been ascertained through a single proband each and the birth number of the proband among the affected children is recorded. If the proband is the eldest affected child in each family the so-called complete ascertainment model has to be used. If each affected child has the same probability of being the proband, the so-called single ascertainment model should be used. I n intermediate cases a third ascertainment model should be used. The author wants to thank Dr S. Stengel-Rutkowski for presenting him the problem which resulted in tho method presented in this paper, his wife, Dipling. E. Stem for a number of constructive discussions and Professor C. A. B. Smith for a number of valuable comments to a previous version of the paper.

Choice of ascertainment model

229

REFERENCES

CANWINGS,C. & THOMPSON, E. A. (1977). Ascertainment in the sequential sampling of pedigrees. Clin. Genet. 12, 208-12. ELANDT-JOHNSON, R.E. (1971). Probability Models and Statistical Methods in Genetics, ch. 17. New York: J. Wiley. FISHER, R.A. (1934). The effect of methods of ascertainment upon the estimation of frequencies. Ann. Eug., Land. 6 , 13-25. HALDANE, J. B. S. (1938). The estimation of the frequencies of recessive conditions in man. Ann. Eug., Lond. 8. 256-62. LI, C. C . (1964). Estimate of recessive proportion by first appearance time. Ann. Hum. Genet., Lond. 28, 177-80. LI, C. C. (1965). Segregation of the Ellis-van Creveld syndrome as analyzed by fist appearance method. Am. J . Hum. Genet. 17, 343-51. LI, C. C. (1970). The incomplete binomial distribution. In Mathematical Topics i n Population Genetics (ed. K . Kojima), pp. 337-66. Berlin: Springer-Verlag. LI, C . C. & MANTEL,N. (1968). A simple method of estimating the segregation ratio under complete ascertainment. Am. J . Hum. Genet. 20, 61-81. MORTON,K. E. (1959). Genetic tests under incomplete ascertainment. Am. J . Hum. Genet. 11, 1-16. MORTON,N. E. (1962). Segregation and linkage. I n Methodology in Human Genetics (ed. J. Burdette), pp. 17-52. San Francisco : Holden-Day. MORTON, N. E. (1969). Segregation analysis. I n Computer Applications in Human Genetics (ed. N. E. Morton), pp. 129-39. Honolulu: University of Hawaii Press. RAO,C. R. (1965). On discrete distributions arising out of methods of ascertainment. I n Classical and Contagious Discrete Distributions (ed. G. P. Patil), pp. 320-32. Reprinted in Sankhya 27, A, 311-23. RAO,C . R. (1973). Linear Statistical Inference and Its Applicatiom, 2nd ed. New York: J. Wiley. SELVIN,S. (1975). Testing the Mendelian segregation ratio under incomplete ascertainment. Hum. Hered. 25, 194-203. SMITH, C . A. B. (1959). A note on the effects of method of ascertainment on segregation ratios. Ann. Hum. Genet., Lond. 23, 311-23. STENE,J. ( 1 9 7 0 4 . Analysis of segregation patterns between sibships within families ascertained in different ways. Ann. Hum. Genet., Lond. 33,261-83. STENE,J. (1970b). Comparisons of segregation ratios for families ascertained in different ways. Ann. Hum. Genet., Lond. 33, 395-412. STENE,J. ( 1 9 7 0 ~ )Statistical . inference on segregation ratios for D/G-translocations, when the families are ascertained in different ways. Ann. Hum. Genet., Lond. 34, 93-115. STENE, J. (1975). Sampling of pedigrees. Adw. Appl. Prob. 7 , 18-23. STENE,J. (1977). Assumptions for different ascertainment models. Biometrics 33, 523-7. STENE,J. ( 1 978). Ascertainment models for human families selected through several affected children. Biometrics 34 (to appear). STENE, J. & STENGEL-RUTKOWSKI, S. (1977). Risk for short arm 10 trisomy. A segregation analysis of eleven families with different translocations, Hum. Genet. 39,7-13. WEINBERG,W. (1912a). Methode und Fehlerquellen der Untersuchung auf Mendelsche Zahlen beim Menschon. Arch. Rass.- u. Ges. Biol. 6 , 165-74. WEINBERG, W. (1 912 b ) . Zur Vererbung der Anlage der Blutenkrankheit mit methodologischen Ergiinzungen meiner Geschwistermethode. Arch. Ram.- u. Ges. Biol. 6 , 694-709. WEINBERG, W. (1928). Mathematische Grundlage der Probandenmethode. 2. indukt. Abstamm.- u. VererbLehre. 48. 179-228.

Choice of ascertainment model I. Discrimination between single-proband models by means of birth order data.

Ann. Hum. Genet., Lond. (1978), 42, 219 219 Printed in areat Britain Choice of ascertainment model I Discrimination between single-proband models b...
708KB Sizes 0 Downloads 0 Views