Behav Genet (2015) 45:573–580 DOI 10.1007/s10519-015-9729-3

ORIGINAL RESEARCH

Estimating Twin Pair Concordance for Age of Onset Thomas H. Scheike1 • Jacob B. Hjelmborg2 • Klaus K. Holst1

Received: 22 December 2014 / Accepted: 30 June 2015 / Published online: 15 July 2015 Ó Springer Science+Business Media New York 2015

Abstract Twin and family data provide a key source for evaluating inheritance of specific diseases. A standard analysis of such data typically involves the computation of prevalences and different concordance measures such as the casewise concordance, that is the probability that one twin has the disease given that the co-twin has the disease. Most diseases have a varying age-of-onset that will lead to agespecific prevalence. Typically, this aspect is not considered, and this may lead to severe bias as well as make it very unclear exactly what population quantities that we are estimating. In addition, one will typically need to deal with censoring in the data, that is the fact that we for some subjects only know that they are alive at a specific age without having the disease. These subjects needs to be considered age specifically, and clearly if they are young there is still a risk that they will develop the disease. The aim of this contribution is to show that the standard casewise concordance and standard prevalence estimators do not work in general for age-of-onset data. We show how one can in fact do something easy and simple even with censored data. The key is to take age into account when analysing such data. Keywords Age of onset  Casewise concordance function  Concordance function  Cumulative incidence probability  Prostate cancer  Recurrence risk ratio Edited by Gitta Lubke. & Thomas H. Scheike [email protected] 1

Department of Biostatistics, University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen, Denmark

2

Department of Epidemiology and Biostatistics, and Demography, University of Southern Denmark, J. B. Winsløwsvej 9B, 5000 Odense, Denmark

Introduction In twin research a standard measure to describe the degree of association present within a twin-pair is the casewise concordance, that is the probability that both twins experience the disease of interest given that the co-twin has the disease. Several authors have discussed how to estimate this quantity, see for example Witte et al. (1999), Smith (1974), Hannah et al. (1985), Hannah et al. (1983), and Hopper et al. (1990). We start by considering the case where dichotomous traits are observed for a twin pair. To be specific we let Yj be one if twin j in a twin pair (j ¼ 1; 2) has ‘‘cancer’’ and zero otherwise. A typical analysis might start by computing the standard estimators of prevalence and casewise concordance. Focusing on the same sex twin case we start by assuming symmetry in Table 1, so that p10 ¼ p01 , thus leading to equivalent marginal rates p0 ¼ p0: ¼ p:0 , and p1 ¼ 1  p0 . For opposite sex twins we do not have symmetry but results similar to those presented here holds. Given observation from a cohort of size n (with binary observations ðYk1 ; Yk2 Þ) we thus have the counts nij ¼ Pn k¼1 IðYk1 ¼ i; Yk2 ¼ jÞ for i; j ¼ 0; 1, and define nd ¼ n01 þ n10 as the number of discordant pairs. Here I() is the indicator that is one when the condition is fulfilled and zero otherwise thus counting the number of pairs where we see (i, j) for the twin pair. The casewise concordance is defined as p11 p11 Pc ¼ PðY1 ¼ 1jY2 ¼ 1Þ ¼ ¼ ; p1 p01 þ p11 and the maximum likelihood estimators (MLE’s) under symmetry are P^c ¼

n11 ; n11 þ ð1=2Þnd

p^1 ¼

n11 þ ð1=2Þnd 2n11 þ nd ; ¼ n11 þ nd þ n0;0 2n

123

574

Behav Genet (2015) 45:573–580

Table 1 Bivariate dichotomous trait for twin pair Twin 2, no-cancer

Twin 2, cancer

Total

Twin 1, no-cancer

p00

p01

p0:

Twin 1, cancer

p10

p11

p1:

Total

p:0

p:1

1

see for example Witte et al. (1999). Estimators of the related standard errors also exist but we omit these to save space, and refer to Witte et al. (1999). We later refer to these estimators as the binary-MLE’s. We remind the reader that the probandwise concordance is an estimator of the casewise concordance in the case of incomplete ascertainment, see Witte et al. (1999). We return to the case of incomplete ascertainment in the Discussion, but for now assume that we have complete ascertainment. Similarly, one may also compute the pairwise concordance that is p11 Pp ¼ PðY1 ¼ 1 and Y2 ¼ 1jY2 ¼ 1 or Y1 ¼ 1Þ ¼ : 1  p00 ð1Þ We here focus on the casewise concordance for simplicity but the developed results can also be applied when estimating the pairwise concordance. The marginal probability of cancer p1 is often denoted as the prevalence, and the casewise concordance should be seen in the light of its size. It is also quite common and sensible to compute the relative recurrence risk ratio that compares the casewise concordance to the prevalence, i.e. the excess risk of disease for a twin given the co-twin got the disease compared to the marginal (population) risk, and is defined as kR ¼

Pc p1

ð2Þ

and by the transformation invariance of the MLE can be estimated as k^R ¼ P^c =^ p1 . We apply these methods to the population based cohort of Danish male twins born from 1870 with follow-up on death-status till January 1st 2009. Prostate cancer occurrences were identified from the National cancer registry which began registration January 1st 1943. The data consist of 31,498 male twins eligible for study we identified 473 prostate cancer cases. In total 25 MZ and 14 DZ pairs were concordant for prostate cancer. We note that both twins in a pair had to be alive January 1st 1943 at which time cancerregistration starts. We had a total of 11,135 MZ twins and 20,363 DZ twins in the cohort, with just a few singletons. A full analysis of prostate cancer based on all Nordic twin registries were reported in Hjelmborg et al. (2014).

123

A first look at the data consist of computing the number of cancer concordant and discordant pairs, that are show in Table 2. Based on these we can apply the standard estimators and compute the prevalence, casewise concordance and recurrence risk ratio that are shown in Table 3. The population parameters can also be estimated using a liability threshold model that also provide MLE estimates of these quantities. These are not shown here but are very similar. We note that the estimated prevalence without taking the timing of the events into account is around 1.5 %, and that the casewise concordance and relative recurrence risk ratio is larger for MZ twins than for DZ twins. A more careful interpretation of the numbers is that the prevalence is 1.6 % for DZ twins with an age-distribution similar to what we have in our cohort. This prevalence may go from 0, when all the twins are young, up to something like 7 %, when all the twins are at least 100 years old, and thus depend critically on the specific age-distribution of the cohort. Similarly, the observed prevalence is 1.4 % for the MZ twins, this reflects the age-distribution present in our MZ twins in the cohort. The age distributions of the MZ and DZ twins in our cohort are different, with the MZ twins being 1 year older on average. This is a consequence of the slight increase in DZ twins over time. The casewise concordance estimates, are estimates of the casewise probability for a cohort with an age-distribution equivalent to the one we see in our cohort for DZ and MZ, twins respectively. Since all these estimates relate directly to the age-distribution in the specific cohorts, they are not comparable across cohorts, and not even within the same cohort when the data is updated. These problems are carried over to subsequent analyses that for example tries to decompose the correlation for MZ and DZ twins into genetic and environmental parts, as was also shown in Scheike et al. (2014), see also Holst et al. (2015). Clearly, if MZ and DZ twins have different age distributions we cannot compare the casewise concordance or recurrence risk ratio for these. To remedy these problems one solution is to consider age-specific prevalence and age-specific concordance, and base further calculations on such properties. This in reality amounts to taking the age timings into consideration when Table 2 Concordant and discordant pairs for MZ and DZ twin pairs based on Danish twin cohort MZ

DZ

No-cancer

Cancer

No-cancer

Cancer

No-cancer 0

5354

0

9721

0

Cancer

109

25

286

14

Behav Genet (2015) 45:573–580

575

Table 3 Prevalence, casewise concordance and recurrence risk ratio based on binary twin data of cancer occurrences in Danish twin data Prevalence

95 % CI

Casewise

95 % CI

Recurrence

95 % CI

MZ

0.0143

(0.012–0.0171)

0.313

(0.228–0.413)

21.8

(15.4–28.3)

DZ

0.0157

(0.014–0.0175)

0.0892

(0.0542–0.143)

5.69

(2.97–8.42)

All estimates with 95 % confidence intervals (CI)

analysing the data. We show that this is easy to do in the next section and it is of course the natural thing to do when considering diseases with variable age of onset. Smith (1974) also pointed out that the standard estimators of concordance cannot be applied directly when there is variable age of onset. He points out that a sensible analysis should indeed be age-specific. This is what we suggest doing in an extended setting where some data may also be censored, that is, for some subjects we only know they are alive without the disease when the study is closed. A common misconception is that one can simply adjust for age in the liability threshold model to get correct age adjusted estimates. This is not so. The correct way to do this, as we show in the next section, is to do censoring adjustments in the analyses. Many authors have pointed out that one needs to deal with the censorings in such an analysis, see for example Thomas et al. (2004) and references therein. To deal with censorings, event history methods must be applied, and here there are various options as pointed out in Scheike et al. (2014). One possibility is to work on the hazard scale and apply frailty models, see for example Thomas et al. (1990), Gorfine and Hsu 2011), Eriksson and Scheike 2015). Working with frailty models does not directly link to the concordance. We here work directly on the risk scale trying to estimate the probabilities directly as suggested in Scheike et al. (2014), Holst et al. 2015).

Considering age of onset information To more formally describe what the casewise concordance estimator actually estimates, and thus how it can be interpreted, we describe in further detail what is observed from the cohort study. Let T~j denote the time where twin j (j ¼ 1; 2) within a twin pair got cancer or died (before getting cancer). If T~j was the time where twin j got cancer, we let the cause indicator be ~j ¼ 1. If T~j was the time of death before experiencing cancer we let ~j ¼ 2. All subjects will die or get cancer eventually. This type of data is called competing risks data, see for example Kalbfleisch and Prentice (1980). In reality we typically have data collected from different cohorts, and typically the data is then summarized at some point in time where we extract the data from the registries.

This means that we do not have full follow up time on all cohorts, and some subjects are still alive without the disease when we close the study. To combine the different cohorts we might then assume stationarity which we do here for simplicity of presentation. Due to the limited follow up we observe the full data ðT~j ; ~j Þ j ¼ 1; 2 subject to what is called right censoring, with censoring time C. The data we actually see is thus ðTj ; j Þ, where Tj ¼ minðT~j ; CÞ, and j ¼ ~j IðT~j \CÞ (with I() the indicator function, that is one if the condition is fulfilled and 0 otherwise). If the subjects are censored at time Tj we have that j ¼ 0. The censoring times depend primarily on the length of follow up, and could also be due to emigration or the fact that subjects leave the study for other reasons. One consequence of this sampling scheme is that a twin pair is typically censored at the same time C because they are from the same cohort. This assumption is often met in register based studies. If this is not the case we censor them at the first censoring time and then they are in fact censored at the same time. This leads to technical simplification, but the general discussion is still valid. In general if the censorings follow a bivariate distribution, Gc ðs; tÞ ¼ PðC1 [ s; C2 [ tÞ, one can estimate the concordance by inverse probability of censoring weighted estimators or the estimator suggested by Cheng et al. (2007). We assume that the censoring times are independent of the event times T~j and the event indicators ~j . Based on this assumption it is possible to estimate the quantities of interest to us, such as the age-dependent prevalence and concordance of cancer. The age dependent prevalence is defined as F1 ðtÞ ¼ PðT~j  t; ~j ¼ 1Þ;

ð3Þ

and is thus the proportion of the cohort that experiences the disease before dying before time ‘‘t’’. This population defined quantity can be estimate based on censored data from our twin cohort consisting of twin pairs ðTk;1 ; k;1 ; Tk;2 ; k;2 Þ, k ¼ 1; . . .; n that are independent identically distributed, under independent censoring given by Ck , k ¼ 1; . . .; n. The censoring times follow a distribution, that we denote by Gc ðtÞ ¼ PðCk [ tÞ. Similarly, we define the age-dependent concordance as F1;1 ðtÞ ¼ PðT~1  t; ~1 ¼ 1; T~2  t; ~2 ¼ 1Þ:

ð4Þ

123

576

Behav Genet (2015) 45:573–580

This is the population probability that both twins have had cancer before a given age t.We define the age-specific casewise concordance, Pc ðtÞ, as Pc ðtÞ ¼ PðT~2  t; ~2 ¼ 1jT~1  t; ~1 ¼ 1Þ ¼

F1;1 ðtÞ : F1 ðtÞ

In the binary version of the data we considered earlier Yj ¼ IðTj \s; j ¼ 1Þ, which is exactly one if the subject got cancer, and 0 otherwise. To be precise it is the event that a subject gets cancer and is not censored before time s. We here select a specific time point s, such as for example s ¼ 100 (lifetime risk). To understand what the binaryMLE estimates we compute the limit of the estimator. First computing EðYj Þ ¼ PðYj ¼ 1Þ we get PðYj ¼ 1Þ ¼ EðIðTj \s;j ¼ 1ÞÞ ¼ EðIðT~j \s; ~j ¼ 1; T~j \CÞÞ ¼ EðEðIðT~j \s; ~j ¼ 1; T~j \CÞjT~j ; ~j ÞÞ Z s ¼ Gc ðsÞdF1 ðsÞ: 0

ð5Þ Similarly, we also get that EðIðY1 ¼ 1; Y2 ¼ 1ÞÞ ¼ PðY1 ¼ Rs 1; Y2 ¼ 1Þ ¼ 0 Gc ðsÞ dF1;1 ðsÞ. This relies on the fact that the two twins are censored at the same time, due the administrative censoring in the cohorts. If the data had been sampled in a different way, such that for example the two twins were censored independently of each other, then PðY1 ¼ 1; Y2 ¼ 1Þ ¼ Rs 2 0 ðGc ðsÞÞ dF1;1 ðsÞ, and the binary casewise concordance estimator would be considerably more biased as an estimator of the lifetime casewise concordance, which we show below. In the case without any censoring in the cohort such that Gc ðsÞ ¼ 1 then we have that PðYj ¼ 1Þ ¼ F1 ðsÞ and PðY1 ¼ 1; Y2 ¼ 1Þ ¼ F1;1 ðsÞ. When there is censoring, such that Gc ðsÞ\1 then PðYj ¼ 1Þ\F1 ðsÞ and PðY1 ¼ 1; Y2 ¼ 1Þ\F1;1 ðsÞ. The binary-MLE, P^c , is an estimator of the ratio of PðY1 ¼ 1; Y2 ¼ 1Þ and PðY1 ¼ 1Þ that is Rs Gc ðsÞdF1;1 ðsÞ f1;1 ðxÞ PðY1 ¼ 1; Y2 ¼ 1Þ ¼ R0 s ¼ PðY1 ¼ 1Þ f1 ðxÞ 0 Gc ðsÞdF1 ðsÞ for some x 2 ½0; s by the Cauchy mean value theorem where f1;1 ðÞ and f1 ðÞ are the densities related to F1;1 ðsÞ and F1 ðsÞ. An important consequence of this is that when the ratio f1;1 ðsÞ=f1 ðsÞ is constant then this ratio is equivalent to Pc ðsÞ, the lifetime casewise concordance. This is seen applying also the Cauchy mean value theorem to the quantity Pc ðsÞ. In this situation the casewise concordance will be constant over time. When the ratio is not constant over time then P^c can be all possible values of the age specific casewise concordance, Pc ðtÞ for t 2 ½0; s and

123

depends crucially on the censoring pattern, that is the age distribution of the cohort. We give an example of this in the next section. In contrast, if the two twins are censored independently of each other, then the binary-MLE, P^c , would estimate Rs 2 G ðsÞdF1;1 ðsÞ Gc ðxÞf1;1 ðxÞ PðY1 ¼ 1; Y2 ¼ 1Þ ¼ R0 s c ¼ PðY1 ¼ 1Þ f1 ðxÞ 0 Gc ðsÞdF1 ðsÞ for some x 2 ½0; s, again by the Cauchy mean value theorem. There is even larger potential for severe bias in this quantity as an estimator of the lifetime casewise concordance Pc ðsÞ due to the censoring distribution. Even when Pc ðtÞ is approximately constant there will be bias when there are censorings present in the data. To compute a sensible concordance or prevalence, and then subsequently casewise concordance or recurrence risk ratio we suggest to make the computations age specific. This makes it more sensible to compare across studies, because the censoring and thus specific age distribution is then removed from the estimates. This turns out be to quite easy to do. Estimation of age specific casewise concordance First we take a more detailed look at the data where we also show the censored observations specifically (Table 4). We note that the data is heavily censored. This explains the very low prevalence when censoring is ignored, a not negligible part of those without the disease have a high chance of getting prostate cancer later in life. To estimate the age-specific prevalence, that in the field of survival analysis is called cumulative incidence, we use the standard product limit estimator with cluster corrected standard errors, see Chen et al. (2008), Scheike et al. 2010). In the remainder of this paper we denote this quantity as the transition probability, and its product limit estimator as F^1 ðtÞ. The 1.6 % prevalence for DZ twins, based on the binary estimator found earlier, is a mix of the specific age distribution among the DZ twins, and thus not equal to for example the lifetime-prevalence. We note that DZ and MZ twins have almost the same transition probabilities. Similarly, the age-specific concordance F1;1 ðtÞ can be estimated using the same estimator on the pairs when the two twins are censored at the same time, see Scheike et al. (2014). This is the case for almost all the twin-pairs except for a few that we then artificially censor at the same time. This estimator we denote as F^1;1 ðtÞ. We here show estimates based on the product limit estimator for the concordance function in Fig. 2. We note that MZ have a higher concordance probability for all ages compared to the DZ twins. In addition, the DZ twins also seem to have a positive

Behav Genet (2015) 45:573–580 Table 4 Concordant, discordant and censored twins for MZ and DZ twin pairs based on Danish twin cohort.

577

No cancer and alive

No cancer and dead

Cancer

No cancer and alive

4019

0

0

No cancer and dead

492

843

0

Cancer

39

70

25

No cancer and alive

6708

0

0

No cancer and dead

1319

1694

0

Cancer

108

178

14

MZ

DZ

Table 5 Lifetime prevalence, casewise concordance and recurrence risk ratio based on censored time to event data of cancer occurrences in Danish twin data using product limit estimators from Figs. 1 and 2. Lifetime

95 % CI

Casewise

95 % CI

Recurrence

95 % CI

MZ

0.0649

(0.0531–0.0767)

0.335

(0.189–0.483)

5.16

(2.91–7.40)

DZ

0.0644

(0.0566–0.0722)

0.117

(0.035–0.201)

1.82

(0.543–3.11)

dependence that results in a slightly higher concordance than under independence (F^12 ðtÞ). This does not suggest a large shared environmental component for DZ twins. Given the two well-defined age-specific population based quantities we can subsequently estimate the casewise concordance, F^1;1 ðtÞ=F^1 ðtÞ, and the age-specific recurrence risk ratio, F^1;1 ðtÞ=ðF^1 ðtÞÞ2 . First reporting the prevalence, the casewise concordance and the recurrence risk ratio for 100 year old twins, see Table 5. We note that the prevalence is severely biased when we do not properly adjust for censorings comparing with Table 3. The lifetime casewise concordance is not terribly different from the binary-MLE computed earlier. This is due to the fact that the age specific casewise concordance probabilities are rather constant for this cancer type (see Fig. 3a, below). We also carried out a formal test for constant casewise concordance based on testing if the change in the casewise concordance was significant. We did this by fitting a linear model to the casewise concordance and then report its slope and standard error. For the DZ twins the slope was 0.0042 with an estimated standard error 0.021, leading to a p value at 0.84. The MZ twins had a slope estimate of -0.022 with an estimated standard error 0.035, leading to a p value at 0.54. We note the slopes are both non significant, thus suggesting that the simple binaryMLE will work reasonably for these data. It is also evident that the actual slope is rather small for both MZ and DZ twins. The recurrence risk ratio is also severely biased without adjustment for the censorings. The prevalence, casewise concordance and recurrence risk ratio can also be computed adjusting for censorings using the liability threshold model. This maximum likelihood estimator provides a slightly more efficient estimator

of the concordance probabilities but are not shown here to save space, see Holst et al. (2015). We also used the age specific product limit estimators to compute the age specific casewise concordance and the age specific recurrence risk ratio. These estimates are shown in Fig. 3. We note that the casewise concordance seems rather constant for both MZ and DZ twins. In contrast, the recurrence risk ratio appears to change quite dramatically over the age time scale. Starting at around 55 for age 65 and declining to around 6 for age 100 for MZ twins. DZ twins in contrast have a recurrence risk ratio that starts out around 5 and decreases towards around 2. This may suggest a stronger genetic component in the occurrence of cancer in the early ages, but clearly we have rather limited information for the early ages, and no firm conclusion can be made because of uncertainty in the estimates. As reported earlier we tested if there was a trend in the casewise concordance estimates, and here found that neither MZ nor DZ twins had a significant trend in these estimates. All the calculations done in this paper are directly available in the R-package mets Holst and Scheike (2014) that can be downloaded from CRAN, but after manipulating the data any survival analysis program can be used.

Simulation example: bias of binary-MLE for variable age of onset We also have carried out a small simulation that illustrates that the binary-MLE estimator of the casewise concordance is generally biased. We generated data with a transition probability (dotted line) given in Fig. 4a. For this prevalence function we considered positive dependence leading

123

578

Behav Genet (2015) 45:573–580

60

70

80

90

100

0.10

30

Age (years)

Fig. 3 Casewise concordance and recurrence risk ratio with 95 % confidence intervals based on product limit estimator as a function of age for MZ (broken lines), DZ (solid lines) and marginal prevalence (dotted line).

concordance independence prevalence

casewise Binary late censoring Binary early censoring lifetime−casewise independence

70

75

80

85

90

Age (years)

Fig. 2 Concordance with 95 % confidence intervals based on product limit estimator as a function of age for MZ (broken lines), DZ (solid lines), marginal prevalence (broken-dotted line) and under independence (dotted line).

to the concordance function given by the solid line, and compared it to the case of independence given by the broken line. We censored approximately 15 % of the observations by uniform censoring on the interval from [0.6, 1.6], and both twins were censored at the same time. Thus making sure that we observe the early ages without censorings; we call this ‘‘late censoring’’ below. When we censored subjects with a uniform [0.0, 1.0] we termed it ‘‘early censoring’’. The datawere generated from the random effects model described in Scheike et al. (2010) but the key point is really that it produces the prevalence and concordance function depicted in Fig. 4a. We computed the true quantities under the specified distributions, the

123

0.1 0.0

65

0.00

0.00

0.02

0.02

0.04

0.2

0.04

0.06

0.06

0.3

0.08

concordance DZ concordance MZ independence age specific prevalence

0.08

65 70 75 80 85 90

Age (years)

Fig. 1 Probability of getting cancer as a function of age for MZ (dotted line), DZ (broken line) and overall probability with 95 % confidence limits (solid lines).

concordance probability

20

Recurrence risk ratio

10 65 70 75 80 85 90

Age (years)

0.4

50

0

0.0

40

DZ MZ independence

40

0.8 0.6 0.4

Casewise concordance

0.06 0.04 0.00

0.02

Probability

0.08

DZ MZ overall

Recurrence 50

1.0

Casewise DZ MZ independence

0.2

0.10

Age specific prevalence

0.0

0.2

0.4

0.6

time

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

time

Fig. 4 Panel a concordance (solid line), prevalence (dotted line) and under in dependence (broken line). Panel b casewise concordance function (solid line), casewise concordance under independence (broken-dotted line), binary-MLE under two different censoring schemes (broken lines) and lifetime casewise concordance (dotted line).

R1 binary-MLE estimator estimates the ratio ð 0 Gc ðsÞ R1 dF1;1 ðsÞÞ=ð 0 Gc ðsÞdF1 ðsÞÞ that we computed explicitly for the specific choice of the censoring distribution. In Fig. 4 panel (b) we show the true age-specific casewise concordance (solid line), the limit of the binary-MLE (upper broken line, censoring 15 %, late censoring), the casewise concordance at time 1 (dotted line), and the true transition probability (broken-dotted line). We note that the binary-MLE estimates something different from the

Behav Genet (2015) 45:573–580

lifetime concordance (at time 1), and is thus severely biased. Without any censorings the binary-MLE would estimate the lifetime casewise concordance. With censorings it is the casewise probability for the particular agedistribution given by the specified censoring distribution. Increasing the censoring distribution will give a different limit as indicated by the second binary-MLE estimator (lower broken line, censoring 75 %, early censoring). Generally we can move it around the entire range of the true age-specific casewise concordance estimator depending on which age distribution we see in the data.

Conclusion The concordance and casewise concordance are routinely computed to characterize the dependence for twin data. It is unfortunately rarely that this is done in a time to event setting that takes the censoring into account. We here show that this is in fact simple to do, as standard programs can be used when a univariate censoring condition is satisfied. The assumption that both twins are censored at the same time is in fact fulfilled for many registries based on cohort sampling, and should be used when the aim is to estimate the concordance function. If the univariate censoring is not fulfilled for a few individuals it can be enforced by simply censoring the data prior to the analyses. When the twins are not censored at the same time it can be inefficient to enforce it. In this situation, however, other estimators can be used to estimate the concordance, see Cheng et al. (2007). This is an issue that could use further research. When the aim is to consider dependence in diseases with variable age of onset, which is the case for almost all diseases, then one needs to consider age specifically in the models. We have shown how this can be done quite easily based on censored data of the timings of the disease status. Here it is crucial to recognize that subjects can also die which will prevent them from being able to get the disease. This is called competing risks. We analysed the binary-MLE of the casewise concordance, prevalence and recurrence risk ratio. The prevalence and recurrence risk ratio will typically be dramatically biased as estimates of the lifetime prevalence and lifetime recurrence risk ratio. Luckily, the casewise concordance can provide a consistent estimator of the lifetime risk when the age specific casewise concordance is constant as a function age. We stress that this needs to be validated by doing something along the lines of what we suggest here. When it is not constant this estimator will also be biased compared to the lifetime casewise concordance. The size of the bias depends on the age distribution present in the data. With somewhat similar age distribution (censorings) among MZ and DZ twins the classical estimator will reveal

579

a correct trend. But this as well as the estimates depends directly on the censoring distribution. Therefore, these are very hard to interpret and compare across studies, and even in updated versions of the same studies. In the case where both twins are censored independently of each other, then typically the bias will be even worse. One consequence of this is that it is not generally useful to use the binary-MLE even in an attempt make results comparable to earlier calculated binary-MLE estimates of the casewise concordance. However, if the twins are censored at the same time and the casewise concordance is constant over the age time scale then the earlier estimate and new censoring adjusted estimates will be comparable as they estimate the same quantity. Several other aspects of incomplete or miss-measured data may also lead to additional complications. One problem that often arises in for example cancer or other disease registries is some sort of delayed entry, due to the early cohort only having cancer occurrence from the existence of the cancer registries. This type of incompleteness will typically lead to a slight overestimation of the dependence measures. Dealing with truncation is considerably more complicated, but the basic structure of the problem is the same. For the Danish twin data considered here we have seen that this gives a slight increase in the estimates, see Scheike et al. (2014). Another misconception is that one can simply correct for age in the classical binary analysis to get correct estimates of for example tetrachoric correlation of odds-ratio dependence measures. An additional complication not dealt with here is ascertained data. For the Nordic registries is often possible to get the entire cohort and therefore not a big problem for data from these registries. For ascertained data the standard estimators cannot be used either; again it is crucial to include age (the time-scale) into the analysis. We are currently studying these issues in details. When dealing with more general random effects models, such as the liability threshold model, to try to separate genetic and environmental effects it is still clear that the censorings must be dealt with, the details of this has been developed in Holst et al. (2015) and applied in the context of prostate cancer in a combined Nordic twin study in Hjelmborg et al. (2014). We finally remark that the methods not only provide a correction for possible bias due to censorings, but also provide insight into the dynamics of the studied diseases. Acknowledgments We are truly grateful to two referees and the editor for their careful reading of our manuscript and their very useful suggestions. Conflict of Interest Thomas H. Scheike, Jacob B. Hjelmborg and Klaus K. Holst declare that they have no conflict of interest.

123

580 Human and Animal Rights and Informed Consent The paper only works on data collected by Danish twin registry that complies with the rules of Human and Animal rights.

References Chen BE, Kramer JL, Greene MH, Rosenberg PS (2008) Competing risks analysis of correlated failure time data. Biometrics 64:172–79 Cheng Y, Fine JP, Kosorok MR (2007) Nonparametric association analysis of bivariate competing-risks data. J Am Stat Assoc 102:1407–1415 Eriksson F, Scheike T (2015) Additive gamma frailty models with applications to competing risks in related individuals. Biometrics doi:10.1111/biom.12326 Gorfine M, Hsu L (2011) Frailty-based competing risks model for multivariate survival data. Biometrics 67:415–426 Hannah MC, Hopper JL, Mathews JD (1983) Twin concordance for a binary trait. I. Statistical models illustrated with data on drinking status. Acta Genet Med Gemellol 32(2):127–137 Hannah MC, Hopper JL, Mathews JD (1985) Twin concordance for a binary trait. II. Nested analysis of ever-smoking and ex-smoking traits and unnested analysis of a ‘‘committed-smoking’’ trait. Am J Human Genet 37(1):153 Hjelmborg J, Scheike T, Holst K, Skytthe A, Christensen K, Adami H-O, Czene K, Harris JR, Kaprio J, Mucci L et al (2014) The heritability of prostate cancer in the Nordic Twin Study of

123

Behav Genet (2015) 45:573–580 Cancer. Cancer Epidemiol Biomarkers Prev 23(11):2303–2310. doi:10.1158/1055-9965 Holst KK, Scheike T (2014) Mets: analysis of multivariate event times. R package version 0.2.8.1 Holst KK, Scheike TH, Hjelmborg JB (2015) The liability threshold model for censored twin data. Comput Stat and Data Anal doi:10.1016/j.csda.2015.01.014 Hopper JL, Hannah MC, Macaskill GT, Mathews JD, Rao D (1990) Twin concordance for a binary trait: III. A bivariate analysis of hay fever and asthma. Genet Epidemiol 7(4):277–289 Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley, New York Scheike TH, Holst K, Hjelmborg JV (2014) Estimating heritability for cause specific mortality based on twin studies. Lifetime Data Anal 20:210–233 Scheike TH, Sun Y, Zhang MJ, Jensen TK (2010) A semiparametric random effects model for multivariate competing risks data. Biometrika 97:133–145 Smith C (1974) Concordance in twins: methods and interpretation. Am J Human Genet 26(4):454 Thomas DC et al (2004) Statistical methods in genetic epidemiology. Oxford University Press, Oxford Thomas DC, Langholz B, Mack W, Floderus B, Rao D, Vogler G (1990) Bivariate survival models for analysis of genetic and environmental effects in twins. Genet Epidemiol 7(2):121–135 Witte J, Carlin J, Hopper J (1999) Likelihood-based approach to estimating twin concordance for dichotomous traits. Genet Epidemiol 16(3):290–304

Estimating Twin Pair Concordance for Age of Onset.

Twin and family data provide a key source for evaluating inheritance of specific diseases. A standard analysis of such data typically involves the com...
513KB Sizes 1 Downloads 6 Views