Genetic Epidemiology 8:209-216 (1991)

Power of the Admixture Test to Detect Genetic Heterogeneity Steven A. Narod Division of Medical Genetics, McGill University, Montreal, Quebec, Canada Several dominant genetic diseases which appear to be homogeneous are the expression of genetic mutations at several loci. The power to detect linkage by likelihood methods is diminished for heterogeneous, as compared to genetically homogeneous, disorders. Using a simulation approach and two pedigrees typical of those available for the study of a dominant disease (with expected lod scores of 0.43 and 1 .OO at 8 = 0.05 and PIC = .59), I have evaluated the power to detect genetic heterogeneity by using the admixture test. Linkage power was determined by varying the number of families available for study, the recombination fraction ( O ) , the informativity of the hypothetical marker, and the proportion of linked families, a.For moderate and small values of a it is feasible to detect genetic heterogeneity once linkage has been established; rarely will it be possible to detect linkage and heterogeneity simultaneously given a limited number of small or moderate pedigrees. Key words: linkage heterogeneity, likelihood methods, pedigrees

INTRODUCTION

Several genetic diseases have now been shown to be heterogeneous at the gene locus. A population of families with a genetic disease may be composed of two (or more) sub-populations, each associated with a predisposing mutation at a different chromosomal location. When linkage data are available, this possibility can be evaluated statistically by asking if the observed data are consistent with a common recombination fraction between the disease phenotype and a particular marker locus in all families studied. The admixture test [Smith, 19631 is the statistical test most commonly used for the detection of linkage heterogeneity. It is assumed that a proportion of families, a,are linked to a marker locus and that the remainder, 1 - a,are unlinked. The likelihood of an observed family, or group of families, depends on the genetic model Received for publication December 18, 1990; revision accepted April 22, 1991 Address reprint requests to Dr. Steven A. Narod, Division of Medical Genetics, McGill University, 1650 Cedar Avenue, Montreal, Quebec, Canada, H3G IA4.

01991 Wiley-Liss, Inc.

210

Narod

of the disease, and is a function of both a and the recombination fraction 8. Both a and 8 are estimated by maximizing the likelihood of the observed pedigree and marker data with respect to the two parameters simultaneously. Heterogeneity of linkage is inferred from the ratio of the maximum likelihood allowing a to vary and the maximum likelihood with a fixed at 1 .O (all families linked). The likelihood ratio follows a x2 distribution on 1 degree of freedom. Similarly, the statistical test for linkage can be constructed by the conventional method, where the likelihood is maximized on 6 alone, or by allowing 8 and a to vary simultaneously. In the latter situation upward adjustment of the significance level from the conventional value assuming homogeneity is required because of the additional degree of freedom. Risch [ 19891 proposes the equivalence of type I error rates of a conventional lod score, LOD = 3.00, and a lod score assuming heterogeneity, LOD2 = 3.70. Even if homogeneity is falsely assumed the conventional LOD can be used to study genetic traits [Cavalli-Sforza and King, 19861 but the practice generally leads to over-estimation of the recombination fraction. For small pedigrees, depending on the degree of heterogeneity and the recombination fraction, either the conventional method or the heterogeneity linkage test will be the more powerful [Risch, 1989; Martinez and Goldin, 19901. Linkage power may be evaluated through estimating the expected number of families required to detect linkage or by estimating the probability of exceeding a critical value of the lod score for a given collection of families. Study power has been evaluated analytically with nuclear families (generally restricted to fully informative backcross matings) [Ott, 1983; Cavalli-Sforza and King, 1986; Risch, 19881and by simulating genotypes for extended pedigrees by using Monte Carlo methods [Ott, 1989; Narod and Amos, 1990; Martinez and Goldin, 19901. The latter approach is useful when a specific set of pedigrees is to be investigated. The generated pedigrees will be representative of the entire range of mating types and individuals with missing linkage information are permissible. In order to estimate the probability of detecting genetic heterogeneity for a dominant disease under conditions commonly encountered in linkage studies I have simulated replicate sets of a moderately sized and of a large pedigree and have analysed the generated sets by using the admixture test. I have also used the generated pedigree sets to compare the performance of the conventional and the heterogeneity linkage tests under these circumstances.

METHODS The two pedigrees selected for analysis were chosen to represent a moderately informative and a highly informative family of the type generally available for the analysis of a dominant trait (Fig. 1). Marker genotypes were generated for the pedigrees by using the SIMLINK program, version 2.0 [Boehnke, 19861. All living persons are considered available for sampling. In order to simulate linked and unlinked families, m of a total of N (N = 5, 10, or 20) pedigrees were generated at a recombination fraction of 8 < 0.50 (either 0.01, 0.05, or 0.10) and the remainder, N - m , were generated at 8 = 0.50. The entire set of N pedigrees was then analysed with the LINKAGE program, version 5.03 [Lathrop and Lalouel, 19841. Because the proportion of linked families in a sample of size N will not always agree with the expected value a, sets of N pedigrees were generated with the number of linked families, m,

Power of the Admixture Test FAMILY

211

A

2

I

FAM I LY

B

Fig. I . Pedigrees representing a moderately informative family (family A) and a highly informative family (family B) of the type generally available for the analysis of a dominant trait. Family B represents the structure of pedigree 4136 in the group of tuberous sclerosis families studied for heterogeneity by Sampson et al. [ 19891.

varying from 0 to N. For a given a the proportion expected to contain exactly m linked families was determined from the binomial distribution. For example, when N = 20 and a = 0.50 we would expect to have exactly 10 linked and 10 unlinked families 17.6% of the time, 9 linked families 16.0% of the time, 8 linked 12.0% of the time, etc. These proportions were used to construct representative samples for different levels of a. Lod scores were generated for each pedigree set at test recombination fractions of 0.01, 0.05, 0.10, 0.20, and 0.30. These lod scores become the input data for the HOMOG program [Ott, 19851. Maximum lod scores are provided at several values of

212

Narod

a , including 1.0. In both the generating and analytic models the disease penetrance was loo%, the gene frequency was 0.001, and the mutation rate was 0.0. Two marker polymorphisms were simulated, one with two and one with three equally frequent alleles (PIC = 0.37 and PIC = 0.59 respectively). Estimates were based on a minimum of 500 replicate pedigree sets. The nominal significance level for the homogeneity test was set at 5%. The conventional lod score, LOD, was significant for linkage at values greater or equal to 3.0, and the lod score under heterogeneity, LOD2, was considered significant at levels above or equal to 3.70. RESULTS

The effects of family structure, of a , and of the number of available families on the power to detect genetic heterogeneity and linkage are demonstrated in Table I. Based on the simulation, the average expected lod score for family A ( 3 alleles; 8 = 0.05) is 0.43; for family B the expected lod score is 1 .00 with this hypothetical marker. TABLE 1. Probabilities of Detecting Heterogeneity and Linkage for Different Linked Fractions of Families* Power to detect

a

Family A (N 1 .00 0.90 0.75 0.50 0.25 Family B ( N 1 .00 0.90 0.75 0.50 0.25 Family B ( N I .OO 0.90 0.75 0.50 0.25 Family B (N 1 .00 0.90 0.75 0.50 0.25

=

=

=

=

Heterogeneity

Heterogeneity assuming linkage

,036 ,139 ,260 .308 .169

,036 ,138 ,277 .38 I .38 1

,036 ,133

,028 .390 ,697 ,826 ,666

,028 .390 ,698 ,835 ,785

,028 ,390 ,693 ,652 ,135

1 .OO 1.OO

,933 ,781 .172

.vv3 .861 ,284

,024 ,223 ,428 ,503 ,397

.024 ,215 ,382 .557 ,571

,024 ,208 .321 .238 ,044

I .OO ,968 ,840 .427 ,077

,997 .959 ,847 ,486 . I13

.020 .I31 ,238 ,280 ,196

,016 ,085 ,161 ,223 .I95

.014 .06 I ,079 ,044 ,008

,880 ,719 ,490 ,197 ,041

,784 .653 ,452 ,180 .036

Heterogeneity and linkage

Linkage assuming homogeneity

Linkage assuming heterogeneity

1 .OO

I .OO ,953 ,764 ,322 .03 1

20)

,219

.I26 ,013

.965 ,792 ,330 .032

20) 1 .00 1 .00

10)

5)

*All replicates have been generated assuming a marker with three alleles at 8 = 0.05. The proportion of linked families for a given a is based on the binomial distribution (see Methods). Under the column “Heterogeneity assuming linkage” I have included only those family sets for which the sum of LOD scores exceeded 3.00.

Power of the Admixture Test

213

TABLE 11. Probabilities of Detecting Heterogeneity and Linkage With Polymorphisms of Two and Three Alleles at Various Recombination Fractions With a Sample of 10 Linked and 10 Unlinked Families Power to detect

8

Heterogeneity

Heterogeneity assuming linkage

Heterogeneity and linkage

Linkage assuming homogeneity

Linkage assuming heterogeneity

,036 .0 I 2 ,004

,188 ,108 .044

.I52

,111 .09 I

,496 ,435 .293

,290 ,134 ,044

,585

,581 ,298 ,104

.738 .573 .303

.542 .290 ,074

.I34

,970 ,861 ,580

Family A: two alleles 0.05

,172 ,102

0.10

.050

0.01

Family A: three alleles ,413 0.01 0.05 ,300 0.10

Family B: two alleles 0.0 1 0.05 0.10

.I74 .770 .580

,336

,191

.308 . I50

.SO6 ,244

,070 ,020

.820 .544 .228

Family B: three alleles 0.01 0.05 0.10

,970 ,860

,969

,940

,855

,610

,579

,736 ,336

,998 ,942

,650

The effects of varying the polymorphism and the underlying recombination fraction are seen in Table 11. The probability of detecting heterogeneity is clearly greater with the larger family (Table I). The probability is estimated twice, first using all of the generated pedigree sets and then using only those sets for which the total LOD exceeded 3.00. The two estimates vary little when the proportion of linked families is substantial. (When the probability of detecting linkage is low the figure for heterogeneity assuming linkage is based on relatively few family sets and is less reliable.) As expected, the probability for detecting heterogeneity without respect to linkage peaks at a = 0.50, but the probability for detecting both linkage and heterogeneity for a pedigree set is maximal at greater values of a. On inspection, family A appears to be suitable for studying linkage of a marker to a completely penetrant disease gene. However, the power for detecting both heterogeneity and linkage by using a tri-allelic marker (PIC = 0.59) given 20 such families is small (e.g., 22% at a = 0.75, Table I). With a less informative polymorphism (PIC = 0.37) and 10 linked and 10 unlinked families the power to detect both linkage and heterogeneity is extremely small at any of the recombination fractions (because both the power of detecting linkage and the power of detecting heterogeneity given linkage are individually low, Table 11). The results with the larger family are much more encouraging, although the power of the heterogeneity test is low for markers at greater than 5 centiMorgans distance (Table 11). A false positive result is one where homogeneity is rejected although all pedigrees are generated at the same recombination fraction. At the nominal significance level of 5%, the false positive rate (based on 500 replicates) varied from 2.0% to 3.6% (Table I). Results were similar when all families were unlinked. For example, among

214

Narod

1,000 replicate sets of family B of size 20, with all pedigrees unlinked, 33 of 1,000 replicates (3.3%) were positive for heterogeneity. The false positive rate for heterogeneity was not greater among family sets with inadequate evidence for linkage. Among a total of 2,700 sets of linked pedigrees generated for family A (N = 10; a = 1.0; 8 = 0.10; three alleles) the false positive rates were: 1.4% for LOD < 1.00; 3.3% for LOD 1.00 to 1.99; 4.4% for LOD 2.00 to 2.99; and 3.0% for LOD 2 3.0. The examples in Table I1 show how difficult it is to identify linkage with a heterogeneous group of pedigrees if the average family size is small or if the marker is not tightly linked. There was no apparent advantage to using the heterogeneity linkage test for the smaller family under any of the models studied. For the large family, when a was 0.50 or less the power of detecting linkage was generally greater with the heterogeneity test. The power of the two tests was roughly similar at a = 0.75 and the conventional test was superior at a = 0.90 and above. The linkage test under heterogeneity performed relatively better when the more informative marker was used and when 8 was small. (Table 11). In addition, the relative power of the heterogeneity linkage test increased with the number of families available for study (Table 11; family B). There were no instances of false positive linkages among 2,500 replicates at the standard significance levels for the conventional test of LOD = 3 .OOand the heterogeneity linkage test LOD2 = 3.70. When LOD was reduced to 1.5 for significance there were 4/1,000 false positives with the conventional test, but none of 1,000 replicates were positive with the heterogeneity test at the proposed equivalent of LOD2 = 2.1. The former rate corresponds well with the approximate significance level of 0.0044 asymptotically associated with a critical LOD score of 1.5 derived by using the formula: x = 21n( 10)Z where Z is the lod score observed and x is a normal variate [Ott, 19851. DISCUSSION

Several dominant genetic diseases are now believed to be alternate expressions of mutations of several genes. For some of these, including osteogenesis imperfecta [Sykes et al., 19861 and tuberous sclerosis [Janssen et al., 19901, two different chromosomal regions have been mapped. For others, including adult polycystic kidney disease [Kimberling et al., 19881 and hereditary breast cancer [Hall et al., 19901, the presence of a second susceptibility locus is inferred from a proportion of families which appear to be unlinked to the initial locus of assignment. A detailed example of the use of the admixture test to establish genetic heterogeneity for Charcot-Marie- Tooth disease, CMT 1, is presented in Ott [ 19851. When a collection of pedigrees is concluded to be a genetically homogeneous sample it is useful to have an idea of the statistical power of the test employed. Linkage power has been calculated analytically for nuclear families where all offspring are the product of informative matings [Ott, 1986; Cavalli-Sforza and King, 19861 but it is difficult to use these analyses to predict the chance of success in common linkage situations. In reality not all subjects will be typed at the marker locus and the majority of matings will be less than fully informative. Techniques based on simulation allow for both possibilities. Martinez and Goldin [ 19901 have used a similar approach to study the power of the admixture test in small pedigrees by using a highly informative marker for a disor-

Power of the Admixture Test

215

der linked to two independent loci. In contrast to the present results they conclude that it is feasible to use small pedigrees to detect linkage when as few as 25% of the families are linked. However, they have restricted their analysis to a highly polymorphic marker (PIC = 0.70) and an optimistically large number of available families (50). They further assume that genotype information is available on all individuals. Under the more realistic circumstances of the present study the probability of establishing both linkage and heterogeneity with only 25% of families linked is shown to be uniformly poor. It is hoped that as more numerous and more informative markers become available the investigator will not be limited by marker informativity and that the power of these analyses will improve. The observed rate of false positive heterogeneity tests was smaller than the nominal significance level under which the admixture test was performed. The conservative nature of the test has been reported previously [Ott, 19891. The false positive rate did not vary with the lod score at LOD levels above 1 .O. (For pedigree sets where the LOD was below 1.O the rate was only 1.4%.) Similarly, in the presence of heterogeneity the test sensitivity was not higher for those pedigree sets which were significant for linkage. A lod score of 3.00 may be associated with different significance levels, depending on the type and number of families analyzed and therefore resists a standard interpretation [Ott, 19851. Simulation permits the estimation of the actual significance level of the lod score achieved (the type I error) for a particular set of pedigrees. In family B (0 = 0.50; N = 20; three alleles) a LOD score of 2.0 was not reached in 1,000 replicates. Because the significance levels are generally more stringent for inferring linkage (e.g., a type I error of 0.01% is common) than for rejecting homogeneity (e.g., type I error of 5.0%) the situation may occur where the heterogeneity test is significant but the linkage test is not. But, it is reasonable to speak about heterogeneity only when linkage can be assumed-otherwise 0 = 0.50 for all families and the sample is homogeneous. If there is a majority of families in which a gene causing disease is linked to a marker locus it may be a long time after linkage is established before an unlinked pedigree is encountered. On the other hand, if the families are more or less evenly divided among two or more loci, linkage will be difficult to establish and it may be more efficient to search for linkage and locus heterogeneity simultaneously by using the heterogeneity linkage test. Because of the rarity of false positive linkages, it is not possible to assess critically the equivalence of the significance levels of the two types of lod scores at the values proposed by Risch [1989]. If anything, the linkage test assuming heterogeneity is the more conservative of the two. The HOMOG test is commonly applied to a data set consisting of LOD scores calculated at discrete values of a and 0 , and therefore the true maximum under heterogeneity is not obtained. For this and other reasons it is possible that the LOD2 scores proposed are too stringent in most situations. This simulation extends the analytic results of Risch [ 19891 based on fully informative matings. There appears to be an advantage to using the heterogeneity linkage test when the linked fraction is small, when the families studied are large, and when the marker used is closely linked and highly informative. These results also show that the heterogeneity linkage test may be preferred when a large number of families are available for study. In practice, only the size and number of the families and the marker polymorphism information content will be known to investigators beforehand.

216

Narod

Accurate knowledge of (Y is important in mapping because of the effects of misspecification of the parameter on the estimation of the recombination fraction. If the proportion of linked families in a population is known, and if a family presenting for genetic counselling is representative of the population, the figure can be used as the prior probability of linkage and will permit the computation of the most precise individual risks possible. ACKNOWLEDGMENTS

This study was funded in part by the Ontario Ministry of Health and the Association de Recherche contre le Cancer. It was done at the International Agency for Research on Cancer, 150 Cours Albert-Thomas, Lyon, France. REFERENCES Boehnke M (1986): Estimating the power of a proposed linkage study: a practical computer simulation approach. Am J Hum Genet 39:s 13-527. Cavalli-Sforza LL, King MC (1986): Detecting linkage for genetically heterogeneous diseases and detecting heterogeneity with linkage data. Am J Hum Genet 38599-616. Hall JM, Lee MK, Newman B, Morrow JE, Anderson LA, Huey B, King M-C (1990): Linkage of earlyonset breast cancer to chromosome 17q21. Science 250: 1684-1689. Janssen LAJ, Sandkuyl LA, Merkens EC, Maat-Kievit JA, Sampson JR, Fleury P, Hennekam RCM, Grosveld GC, Lindhout D, Halley DJJ (1990): Genetic heterogeneity in tuberous sclerosis. Genomics 8:237-242. Kimberling WJ, Fain PR, Kenyon JB, Goldgar D, Sujansky E, Gabow P (1988):Linkage heterogeneity of autosomal dominant polycystic kidney disease. N Engl J Med 319:913-918. Ldthrop GM, Lalouel JM (1984): Easy calculations of lod scores and genetic risks on small computers. Am J Hum Genet 36:460-465. Martinez M, Goldin LR (1990): Power of the linkage test for a heterogeneous disorder due to two independent inherited causes: A simulation study. Genet Epidemiol7:2 19-230. Narod SA, Amos C (1990): Estimating the power of linkage analysis in hereditary breast cancer. Am J Hum Genet 46:266-272. Ott J (1983): Linkage analysis and family classification under heterogeneity. Ann Hum Genet 4731 1-320. Ott J (1985): “Analysis of Human Genetic Linkage.” Baltimore: Johns Hopkins University Press. Ott J (1986): The number of families required to detect or exclude linkage heterogeneity. Am J Hum Genet 39: 159- 165. Ott J (1989): Computer simulation methods in human linkage analysis. Proc Natl Acad Sci USA 86:4175-4178. Risch N (1988): A new statistical test for heterogeneity. Am J Hum Genet 42:353-364. Risch N (1989): Linkage detection test under heterogeneity. Genet Epidemiol6:473-480. Sampson JR, Yates JRW, Pirrit LA, Fleury P, Winship 1, Beighton P, Connor JM (1989): Evidence For genetic heterogeneity in tuberous sclerosis. J Med Genet 2651 1-516. Smith CAB (1963): Testing for heterogeneity of recombination fraction values in human genetics. Ann HumGenet 27:175-182. Sykes B, Ogilvie D, Wordsworth P, Anderson J (1986): Osteogenesis imperfecta is linked to both type 1 collagen structural genes. Lancet 2:69-72.

Edited by G.P. Vogler

Power of the admixture test to detect genetic heterogeneity.

Several dominant genetic diseases which appear to be homogeneous are the expression of genetic mutations at several loci. The power to detect linkage ...
483KB Sizes 0 Downloads 0 Views