HHS Public Access Author manuscript Author Manuscript

Int J Biostat. Author manuscript; available in PMC 2017 November 01. Published in final edited form as: Int J Biostat. 2016 November 01; 12(2): . doi:10.1515/ijb-2016-0001.

Semiparametric Regression Estimation for Recurrent Event Data with Errors in Covariates under Informative Censoring Hsiang Yu1, Yu-Jen Cheng1, and Ching-Yun Wang2 1Institute

of Statistics, National Tsing-Hua University, General Building III, Hsin-Chu 300, Taiwan

2Division

Author Manuscript

of Public Health Sciences, Fred Hutchinson Cancer Research Center, P.O. Box 19024, Seattle, WA 98109-1024, U.S.A

SUMMARY

Author Manuscript

Recurrent event data arise frequently in many longitudinal follow-up studies. Hence, evaluating covariate effects on the rates of occurrence of such events is commonly of interest. Examples include repeated hospitalizations, recurrent infections of HIV, and tumor recurrences. In this article, we consider semiparametric regression methods for the occurrence rate function of recurrent events when the covariates may be measured with errors. In contrast to the existing works, in our case, the conventional assumption of independent censoring is violated because the recurrent event process is interrupted by some correlated events, which is called informative dropout. Furthermore, some covariates may be measured with errors. To accommodate for both informative censoring and measurement error, the occurrence of recurrent events is modeled using an unspecified frailty distribution and accompanied with a classical measurement error model. We propose two corrected approaches based on different ideas, and we show that they are numerically identical when estimating the regression parameters. The asymptotic properties of the proposed estimators are established, and the finite sample performance is examined via simulations. The proposed methods are applied to the Nutritional Prevention of Cancer trial to assess the effect of the plasma selenium treatment on the recurrence of squamous cell carcinoma.

Keywords Informative censoring; Measurement error; Surrogate covariate; Recurrent event data

Author Manuscript

1 Introduction In many longitudinal follow-up studies, recurrent event data are collected when subjects experience an event multiple times. For example, patients with superficial bladder cancer may experience tumor recurrence many times; patients with cystic fibrosis may experience repeated lung exacerbations; and patients with chronic granulomatous disease may experience repeated pyogenic infections (Morgan, Butler, Johnson, Colin, FitzSimmons, Geller, Konstan, Light, Rabin, Regelmann et al., 1999, Fleming and Harrington, 1991).

Supplementary Materials The Supplementary Information referenced in this paper is available at the website of the Journal.

Yu et al.

Page 2

Author Manuscript

Models for recurrent event data can be categorized into two different classes: time-to-event or gap time models. In time-to-event models, interest focuses on the occurrence rate of an event over time (Lawless, Hu, and Cao, 1995, Hu and Lawless, 1996, Hu, Lagakos, and Lockhart, 2009). In gap time models, interest lies in the gap time between two consecutive events (Lin, Sun, and Ying, 1999).

Author Manuscript Author Manuscript

In this study, we focus on the time-to-event models. The time-to-event models may be constructed based on an intensity function (Prentice, Williams, and Peterson, 1981) or a rate function (Hu and Lagakos, 2007, Hu et al., 2009). The intensity function uniquely determines the probability structure of the recurrent event process. However, it needs to correctly specify the occurrence of an event given the prior event history. On the other hand, the rate function allows for arbitrary dependence among the recurrent events and provides a direct interpretation of the occurrence rate without conditioning the prior event history. Our primary focus is to assess the average effects of treatments or risk factors, that is, we are mainly interested in the inference of the rate function. Lawless and Nadeau (1995) estimated the cumulative rate function nonparametrically and applied their approach to industrial warranty data. In addition, Hu and Lagakos (2007) proposed a nonparametric method to study the rate function of the viral load changing process for HIV-infected patients. Nevertheless, all of the above approaches need to assume non-informative censoring, that is, the observation mechanism is independent of the recurrent process. In practice, the assumption is usually violated, for example, when the recurrent event process is interrupted by some terminal events that are related to the recurrent events. A potential remedy is to consider a frailty model, which allows dependence between the recurrent event process and the informative drop-out through a non-negative frailty variable. In general, the distribution of the frailty variable is assumed to be known (Lancaster and Intrator, 1998) and thus the likelihood-based approach (Nielsen, Gill, Andersen, and Sørensen, 1992) is preferred. More recently, Kalbfleisch, Schaubel, Ye, and Gong (2013) proposed a weighted estimating equation approach, with the weight specified by a gamma frailty distribution. However, in general it is not easy to verify the frailty distribution due to invisibility of the frailty variable. To avoid specification of the frailty distribution, Wang, Qin, and Chiang (2001) and Wang and Huang (2014) considered a conditional likelihood approach, where the unobserved frailty variables are “conditioned away” in their proposed estimating equations.

Author Manuscript

The aforementioned approaches, nevertheless, require that the covariates are correctly measured. In many epidemiologic or medical studies, the covariates may suffer from measurement errors. For example, the baseline plasma selenium level is an important predictor for the occurrence of skin cancers in the Nutritional Prevention of Cancer (NPC) trial study (Clark, Combs, Turnbull, Slate, Chalker, Chow, Davis, Glover, Graham, Gross et al., 1996). However, the true value of the plasma selenium level can never be measured because of intrinsic biological variability or limited instrumental precision. Instead, the values we observed are contaminated by measurement errors. The most convenient approach is to treat the observed covariates as the true covariates in the regular estimating procedure, which is also referred to as the naive approach. However, the naive estimator obtained from this approach is generally known to be inconsistent (Carroll, Ruppert, Stefanski, and Crainiceanu, 2006, Chapter 3). In survival and longitudinal data analysis, intensive research has been performed to address measurement error problems. For Cox regression, Prentice Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 3

Author Manuscript

(1982) proposed a likelihood approach with normal measurement error and rare disease assumptions. Wang, Hsu, Feng, and Prentice (1997) applied regression calibration to the partial score function and investigated the performance of the regression calibration estimator through simulation studies, whereas Nakamura (1992) constructed unbiased estimating equations based on the concept of corrected scores. For nonlinear mixed models, Wu (2002), Liu and Wu (2007) and Wu, Liu, and Hu (2010) proposed estimating approaches for longitudinal response data when the covariates are measured with errors, which can also account for censoring in the response and missing data. In recurrent event analysis, nonetheless, little has been addressed regarding measurement error problems. Under a normal measurement error assumption, Jiang, Turnbull, and Clark (1999) proposed a moment corrected method to adjust for the bias of a naive estimator under a semi-parametric model. However, their approach not only requires the assumption of non-informative censoring but also assumes that the censoring distribution is independent of the covariates.

Author Manuscript Author Manuscript

The present study is motivated by the NPC trial study, which aimed to assess the efficacy of the oral supplement of plasma selenium in preventing the development of skin cancer, such as squamous cell carcinoma (SCC). This clinical trial began in 1983 and included approximately 1300 patients with dermatologic cancer histories. Nearly half of the patients in the NPC trial were randomly assigned to either the placebo or treatment groups. Patients in the treatment arm were supposed to take 200μg of plasma selenium supplement per day. In the study period, the patients might experience SCC events repeatedly. Each incidence of a new SCC was diagnosed and recorded by the certified doctors. The medical records were re-viewed by the clinical coordinators at each semi-annual visit, annual contact or by selfreport to ensure the completeness of the data. At the time of randomization, many prognostic risk factors of SCC were recorded, including the baseline plasma selenium level. As mentioned, the plasma selenium level may include errors. In the original study, Clark et al. (1996) did not take measurement error into account and found a nonsignificant negative plasma selenium effect on developing SCC. The result contradicted the evidence of the previous studies, which showed high correlation between the plasma selenium level and several types of cancer. Later, many studies focused on the effect of plasma selenium level on the recurrences of SCC by assuming an independent censoring assumption, some of which also took measurement error into account (Jiang et al., 1999). However, we found a negative relationship between the censoring time and the SCC occurrence rate. This implies that the independent censoring assumption is not satisfied. Therefore, the existing methods are not appropriate for the NPC trial data.

Author Manuscript

This paper is organized as follows. In Section 2, statistical models for recurrent events and measurement errors are given. In Section 3, we propose a regression calibration method and a moment corrected method to correct the measurement errors in the presence of informative censoring. The simulation results are given in Section 4 to investigate the finite sample performance of the proposed methods. Then, we applied the proposed methods to the NPC trial data to evaluate the effect of selenium on the recurrence of SCC in Section 5. Finally, we concluded with a discussion in Section 6. The regularity conditions and technical proofs are provided in the appendices and the Supplementary Information.

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 4

Author Manuscript

2 Model illustration 2.1 Recurrent event model Assume that there are n independent individuals in the cohort. Let subscript i be the index for a subject, where i = 1, …, n. For the ith subject, let Ni(t) denote the number of recurrent events occurring up to t within a fixed time period [0,τ], where the recurrent event process could be observed beyond τ. Let Zi be a q × 1 vector of covariates that is precisely measured and Xi be a p × 1 vector of covariates that can be measured with errors. Let ℰ denote expectation over the samples, νi be the unobserved frailty variable with mean ℰ (νi | Xi,Zi) = μν, which does not depend on (Xi,Zi), and Ci be the informative censoring time. Suppose that conditional on (νi,Xi,Zi), Ni(t) follows a Poisson process with a multiplicative intensity function

Author Manuscript

(1)

Author Manuscript

where λ0(t) is a baseline function and is a vector of regression parameters. Note that when ν is given, model (1) is also a rate function due to the assumption of the Poisson process. In general, regression parameters can be estimated either by a likelihood-based approach or by solving a set of unbiased estimating equations. If the distribution of ν is assumed and the true covariates are observed, then the standard procedure of the likelihoodbased approaches can be conducted by integrating out ν (Cook and Lawless, 2007, Chapter 3). There are several popular choices for the frailty distribution, such as gamma, log-normal, and positive stable distribution. Balakrishnan and Peng (2006) advocated using the generalized gamma distribution as the frailty distribution because it includes many distributions (e.g., Weibull, log-normal, gamma, positive stable distribution) as special cases. Recently, Mazroui, Mathoulin-Pelissier, Soubeyran, and Rondeau (2012) and Zeng, Ibrahim, Chen, Hu, and Jia (2014) proposed a joint frailty model with two independent frailty variables to distinguish the dependence within the recurrent events and the association between the recurrent event process and terminal events. However, the determination of the frailty distribution usually depends on computational convenience instead of on biological reasons or data characteristics. Balakrishnan and Peng (2006) noted that an inappropriate frailty distribution may result in a large bias in the estimation. Alternatively, we can construct a set of unbiased estimating equations based on the cumulative rate function. According to model (1), the cumulative rate function up to time t is

Author Manuscript

(2)

where and α0 = log(μν). An advantage of using estimating equations over a likelihood-based approach is that one can avoid misspecifying the frailty distribution. However, to solve estimating equations based on (2), Λ0(t) needs to be known and the true Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 5

Author Manuscript

covariates need to be observed. Both deficiencies motivate us to consider the recurrent event process with an unspecified distribution of the frailty variable and an unknown Λ0(t) in this article. 2.2 Measurement error model For subject i, let Wi j be the jth replicated surrogate measurement of the true covariate vector Xi and ki be the number of replicates of Wi. Assume that the surrogate measurement satisfies the classical measurement error model

Author Manuscript

where Ui j are random errors. Suppose that Ui j are independent of (νi,Xi,Zi) and Ci, which implies that the measurement errors are non-differential. In other words, Wi provides no additional information regarding the event process when the true covariate Xi is given (Carroll et al., 2006, Chapter 2). Let μs and Σs be the mean and covariance matrix of a random vector s, Σsh be the covariance matrix of two random vectors (s,h), and γ = (μX,μZ,ΣU,ΣX,ΣZ,ΣXZ) be the parameter of the distribution of X given (W,Z). We assume that X given (W̄,Z) follows a multivariate normal distribution with mean

Author Manuscript

and variance

As in Carroll et al. (2006, Chapter 4), the formula given above is the best linear approximation of ℰ (X | W̄,Z,γ), and it can also be applied when Z is discrete.

3 Correction for errors-in-variable

Author Manuscript

Assume that the observed data {(Ci, (Ti1, …,Timi),{Wi1, …,Wiki},Zi), i=1, …,n} are independent and identically distributed (iid), where Ti j denotes the observed event time for j = 1, …,mi, and mi denotes the number of recurrent events that occurred before Ci. As mentioned in Section (2.1), C is conditionally independent of the recurrent event process N(t) given (ν,X,Z). Then, by (2), we have

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 6

If Λ0(t) and X are known, the estimating equations

Author Manuscript

for (βX,βZ) are unbiased. In practice, they cannot be implemented because Xi is unobserved and Λ0(t) is unknown. To deal with the unknown function Λ0(t), we start with the conditional likelihood function of (Ti1, …,Timi) given (Ci,νi,mi,Xi,Zi). Under the assumption of the Poisson process, such a conditional likelihood can be constructed from a set of iid random variables with truncated density

λ0(t)/Λ0(τ) and

. Define a rescaled baseline function ϕ (t) ≡ for t ∈ [0,τ], where Φ(τ)=1. The conditional , which is proportional to

likelihood is given by

Author Manuscript

. As noted by Wang et al. (2001), the conditional likelihood shares the same form as the nonparametric likelihood for right-truncated data. Thus, Φ(t) can be consistently estimated using the product limit estimator

where {T(l)} are the ordered and distinct values of {Ti j}i=1,…,n; j=1,…,mi, n(l) is the number of events that occurred at T(l), and N(l) is the number of events that satisfy Ti j ≤ T(l) ≤Ci. Note that the non-parametric estimation of Φ does not require any information from the covariates and the unobserved frailty variable. Hence, Φ̂(t) is a consistent estimator even if X is measured with errors or the frailty distribution is unspecified.

Author Manuscript

For the issue of identifiability, let μν = 1 without loss of generality. The expectation of the event number divided by the rescaled baseline function before time C is

where β0 = log(Λ0(τ)). With the above equation, we can construct the unbiased estimating equations by using Φ(t) instead of the unknown Λ0(t). After replacing the unknown X with the average of the replicates equations

, we can obtain the naive estimating

Author Manuscript

(3)

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 7

Author Manuscript

Then, the naive estimator

is obtained by solving equation (3) and

Λ0(t) can be estimated by

. Due to the measurement errors, it can be

̂ does not converge to the true parameter . Based on (3), we shown that βN develop a regression calibration method and a moment corrected method to adjust for the measurement errors in the following subsections. 3.1 Regression calibration approach

Author Manuscript

The regression calibration (RC) method is based on the assumption that the induced model of the response conditioning on (W̄,Z) can be well approximated by the underlying model, with X being replaced by the conditional mean ℰ (X | W̄,Z). The RC estimator is obtained by treating ℰ (X | W̄,Z) as the true covariate X in the standard estimating procedure (Carroll et al., 2006, Chapter 4). Although the RC method generally leads to inconsistent estimation in nonlinear models, it is still valuable, with the advantages of computational efficiency and limited bias under some conditions (Carroll et al., 2006, Prentice, 1982). Under our framework, the RC method substitutes W̄ with ℰ (X | W̄,Z,γ) in equation (3). If the measurement error covariance matrix ΣU is known, we can estimate the other components of γ using the observed data without replicates. If not, replicates are needed to estimate ΣU (Wang et al., 1997, Wang, 1999, Carroll et al., 2006). By using the method of moments, the estimator γ̂ of γ can be obtained by solving is given in Appendix A. Then, the RC estimator solving the equations

, where Ψi(γ) is obtained by

Author Manuscript

(4) Coincidently, the conditional expectation of mΦ−1(C), given the observed covariate (W̄,Z), is . Thus, the RC estimator β̂R converges . The result implies that the RC estimator is to a limit consistent for the regression coefficients but not for the intercept.

Author Manuscript

̂ 0 converges to Note that βR, calculated as

, where

. Let Σ̂ be the estimator of Σ(γ), which is

. The RC estimator of Λ0(t) can be adjusted as , which converges to Λ0(t). In the Supplementary

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 8

Author Manuscript

Information, we show that is asymptotically normally distributed with mean −1 −1 zero and variance A Σg{A }′; A and Σg are defined in Proposition 1 in Appendix A. The covariance matrix estimation of the RC estimator is also given in Appendix B. 3.2 Moment corrected approach The moment corrected (MC) method is motivated by the bias-correction method proposed by Stefanski (1985). Under the classical measurement error model, Stefanski (1985) showed that the naive estimator converges to a limit that is a function of the true parameter and the error variance. Accordingly, the bias of the naive estimator can be corrected based on the relationship between the limit of the naive estimator and the true parameter. Based on this idea, we show that the naive estimator β̂N converges to a limit

Author Manuscript

, which satisfies

(5)

In the Supplementary Information, we have shown that the root of (5) is unique. As described in Section (2.2), we assume that X given (W̄,Z) follows a multivariate normal distribution. For the convenience of derivation, we re-parametrize the conditional mean as ℰ (X | W̄,Z,γ) =η0+ηWW̄+ηZZ, where Ip denotes an identity matrix of size p, η0 = (Ip −ηW)μX

Author Manuscript

− ηZμZ,

, and . Based on

the non-differential error assumption, it follows that . Thus, we can easily show that the unique root βN of (5) is related to the true parameter β as and

. Specifically, βN =

D(β, η) is a one-to-one function of the true parameter when the nuisance parameter η = (η0,ηW,ηZ) is given. Therefore, substituting the estimates of bN and η in the inverse function D−1 results in the moment corrected estimator

Author Manuscript Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 9

̂ =(β̂M,0, βM,X ̂ , β̂M,Z) and η̂0 =(Ip−η̂W)μ̂X −η̂Zμ̂Z, where βM

Author Manuscript

. Because ̂ 0 is consistent for the true intercept β0, Λ0(t) can also be consistently estimated by βM, . In summary, the estimating procedure of the MC method is 1.

2.

Solve equation (3) and and γ̂.

illustrated in Appendix A to obtain β̂N

Apply β̂N and η̂ = η(γ̂) to the function D−1 to obtain the MC estimator β̂M = D−1(β̂N, η̂).

Author Manuscript

In the Supplementary Information, we show that is asymptotically normally distributed with mean zero and covariance matrix B−1Σh{B−1}′; B and Σh are defined in Proposition 2 in Appendix A. The covariate matrix estimation of the MC estimator is also illustrated in Appendix C. An important feature of the MC estimator is that it is numerically identical to the RC estimator for the regression parameter but not for the intercept β0. That is, the estimating equations for the two estimators will have exactly the same roots for the ̂ ̂ regression parameters. The proof of βM,X = β̂R,X and βM,Z = β̂R,Z is provided in Appendix D.

Author Manuscript

4 Simulation study In this section, we evaluate the performance of the RC and MC methods with the naive approach via the simulation studies. Additionally, the corrected partial likelihood (CPL) approach, proposed by Jiang et al. (1999), is also listed for comparison. The CPL estimator takes measurement error into account but assumes non-informative and covariateindependent censoring. We consider a regression model with a continuous covariate X and a discrete covariate Z. Let be an error-prone covariate that is unobserved and Z ~ Bin(0.5) be a random treatment assignment that is precisely obtained. For subject i, we generate ki repeated surrogates Wi j = Xi+Ui j for Xi, where ki follows a discrete uniform distribution

Author Manuscript

ranging from 1 to 4 and nuisance parameter γ by solving

. With the repeated surrogates, we estimate the , where Ψ is shown in the appendices.

We conduct the simulations with reliability ratio (RR) and 0.5. The reliability ratio is used to represent the magnitude of the error contamination; lower RR indicates higher error contamination. We generate from a mixture model, in which ν* follows a uniform distribution ranging from 0.5 to 1.5 when Zi =0 and follows a uniform distribution ranging from 1.5 to 4 otherwise. Then, the frailty variable is

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 10

Author Manuscript

. When (νi,Xi,Zi) is given, the recurrent event process {Ni(t)} is generated with the corresponding intensity function λ (t | νi,Xi,Zi) = νiλ0(t)exp(βXXi+βZZi), in which λ0(t) = (t − 6)3=360+0.6, t ∈ [0,τ], τ = 10. We consider two distinct coefficient parameters (βX,βZ) = (log(1.5), log(1.5)) and (βX,βZ) = (log(3), log(1.5)) in each scenario. The first two scenarios are conducted under different censoring time settings. In Scenario 1, we let the censoring time C depend on W. If Wi1 > 0, Ci is generated from an exponential distribution with mean

and is truncated after τ = 10;

and is otherwise, Ci is generated from an exponential distribution with mean truncated after τ = 10. In Scenario 2, let the censoring time C depend on X. We generate Ci from the mixed exponential distribution in the same way as in Scenario 1, with Wi replaced by Xi. Next, we investigate the sensitivity of the proposed methods to the conditionally normal assumption imposed on X. In Scenario 3, X is uniformly distributed over the interval

Author Manuscript

(

) and Z is allowed to be correlated with X. Let Z* = X + ε, where

; Z = 1 if Z* ≤ 0 and Z = 0 otherwise. The other variables are generated in the same manner as those in Scenario 2. A non-normal measurement error case is considered in Scenario 4. We generate measurement error U from a skew normal distribution with mean 0,

Author Manuscript

variance and skewness parameter α = −2 and X from . The remaining variables are generated in the same manner as those in Scenario 3. A total of 200 replicates with sample sizes n = 300 and n = 600 are generated in each simulation configuration. In the tables, BIAS denotes the average bias, ASE denotes the average standard error estimation, ESD denotes the empirical sample standard deviation, and CP and CL denote, respectively, the coverage probability and average interval length of the 95% confidence interval based on the 200 runs. The standard errors of the proposed estimators are obtained by taking the square roots of the diagonal elements from the sandwich variance estimators given in Appendices B and C. The results of Scenarios 1 to 4 are demonstrated in Tables 1 to 4. In general, the naive estimator for βX has a bias problem with low coverage probabilities, as shown in all tables. This phenomenon is due to the common attenuation effect. The degree of bias becomes critical when βX is large and RR is low. In Scenarios 1 and 2, the naive estimation of βZ is not affected by the measurement errors because X and Z are generated to be mutually independent. When X and Z are correlated (as shown in Tables 3 and 4), the naive estimator for βZ also has a bias problem. Moreover, the numerical equivalence of the RC and MC estimators is seen in the simulation results.

Author Manuscript

In Table 1, we can see that the CPL estimator has ignorable biases, with coverage probabilities close to 95% when C depends on W. However, when C depends on X, the coverage probabilities of the CPL estimator for βX dramatically decline due to the substantial biased problem, which is presented in Table 2. The bias problem becomes more serious as βX increases and RR decreases. In Table 3, when X follows a uniform distribution, the CPL estimator has large biases and low coverage probabilities, especially when βX is large and RR is low. In contrast, the proposed methods have good performance with at least 92% coverage probabilities and limited biases. In Table 4, it can be seen that the proposed estimators still have good performance in terms of bias and coverage probability Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 11

Author Manuscript

compared to the CPL estimator. However, when the sample size increases to n = 2000, the coverage probabilities of the 95% confidence intervals for the proposed estimators may be lower than 90%. To summarize, the simulation study reveals that the proposed methods can effectively correct the bias due to measurement errors even when the conditionally normal assumption of X is violated. However, the CPL estimator is biased when C depends on X and is sensitive to the distributional assumption imposed on X. The simulation study also shows that the naive approach generally has a serious bias problem. We note that the proposed estimators are not consistent in Scenarios 3 and 4 because of a violation of the normal assumption imposed on X given (W,Z). Hence, the corresponding coverage probabilities obtained from the 95% confidence intervals may be lower than 90% when the sample size is large (such as n = 2000), especially under a skewed measurement error distribution.

Author Manuscript

5 Data analysis In this section, we apply the proposed methods to the NPC trial dataset to assess the effect of plasma selenium treatment on SCC recurrences. This randomized, double-blinded clinical trial recruited 1312 patients with histories of skin cancer, including 653 and 659 patients in the treatment and placebo groups, respectively. The study period lasted up to 12 years.

Author Manuscript

Many critical risk factors for SCC were recorded at the baseline, particularly the plasma selenium level. As mentioned, the plasma selenium level is measured with error due to the measuring instrument or temporary biological fluctuation. Some patients in the placebo group had more than one plasma selenium measurement, which can be treated as replicates. However, patients in the treatment group had only one baseline plasma selenium measurement because successive measurements cannot represent the baseline values. A new incidence of SCC was diagnosed and recorded during the follow-up time; thus, the times of SCC occurrences were available.

Author Manuscript

In this analysis, we consider two covariates: the baseline plasma selenium measurement and the treatment assignment indicator. The latter is our primary covariate of interest, whereas the former is an important predictor for adjusting the model but is contaminated with measurement errors. Let X be the logarithm of the baseline plasma selenium value (abbreviated as log(selenium)) and Z be the treatment assignment. We assume that the recurrence of SCC follows a non-homogeneous Poisson process, with intensity function λ (t | ν,X,Z)=νλ0(t)exp(βXX + βZZ), where the frailty variable ν accounts for the correlations among the SCC recurrences and between the SCC event process and informative censoring time. Here, X is independent of Z because the NPC trial is a randomized clinical trial. Assume that X, given W, follows a conditional normal distribution. By using the replicates data, the variance of X, given W, is estimated by . To verify the distributional assumptions imposed on the covariates, a subset consisting of 292 placebo-grouped patients with 10 or more selenium measurements is used. Because the numbers of replicates of these patients are large enough, the average of replicates should be

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 12

very close to the true value of the plasma selenium level. We estimate Xi by

Author Manuscript

and Ui by Ûi = Wi1 − X̂i for the ith patient in this subset. Figure 1 shows the histograms of X̂ and Û, which suggest the marginal normal distributions for X and U. Moreover, the correlation between X̂ and Û is −0.069, with P-value=0.234. Under the assumption of normality, the non-significant correlation implies the independence between X and U. Hence, the conditional normal assumption of X is appropriate in the NPC dataset.

Author Manuscript

The patients in the trial were arranged to receive the dermatologic examination periodically. Define the censoring time C as the last examination time from the randomization and τ = 149.5 (months) as the maximum time among the C’s. The existing recurrent event studies (Clark et al., 1996, Jiang et al., 1999) for the NPC data assumed that the censoring is noninformative, which might be improper. Figure 2 shows the weighted average of the SCC recurrences versus time for subjects in the four selected risk sets (t1 = 54.9, t2 = 86.3, t3 = 115.5, t4 = 135.2). Note that the number of SCC recurrences for time t is calculated as Ni(t ∧ Ci) for subject i, where a ∧ b = min(a,b). If the censoring time is independent of the SCC recurrence, we expect that all lines should be close to each other. However, it can be observed that the subjects who stayed in the trial longer (censoring time after 115.5 months and 135.2 months) tended to have fewer SCC recurrences in the early and middle stages. The result implies that the independent censoring assumption is not satisfied and the proposed methods are necessary.

Author Manuscript

After excluding 55 patients without any records of examination and SCC events and 2 without baseline plasma selenium measurements, we included 1255 patients in the analysis to fit the semi-parametric model for the SCC recurrences. Among these patients, 473 had at least one SCC occurrence. The result of the fitted model is presented in Table 5. Because the RC and MC estimates are identical, only the RC estimate is shown in the table. The results in the table show that the treatment effect estimates of all approaches are positive but statistically non-significant. That is, the supplement of plasma selenium has no significant effect on preventing the recurrence of SCC. This result is consistent with those of the previous studies (Clark et al., 1996, Jiang et al., 1999). Moreover, the phenomenon of attenuation can also be observed in the naive estimate of log(selenium). Under the 95% confidence level, the adjusted estimates obtained from the RC and MC methods are significant, with values equal to −1.502. This implies that patients with higher plasma selenium level at baseline have fewer SCC recurrences.

6 Discussion Author Manuscript

To identify the population risk factor in the recurrent event analysis, inference of the rate function is commonly preferred. The existing methods depend on the assumptions of either accurately measured covariates or independent censoring, which may not always be realistic. In this article, we consider statistical methods for recurrent event data with measurement error and informative censoring. When the error-prone covariates are conditionally normally distributed, our proposed estimators are consistent. In our estimating procedure, we do not need additional assumptions of the frailty distribution or of the censoring time. The numerical results show that the naive method, which ignores measurement errors in the

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 13

Author Manuscript

covariates, leads to a large biased estimator and that the CPL method strongly depends on the independence between the covariates and censoring time. Meanwhile, our proposed methods correct measurement errors effectively and give accurate confidence intervals under different scenarios.

Author Manuscript

The corrected methods considered in this paper are developed under parametric distributions for the covariates and measurement errors. In the NPC data example, these distributional assumptions can be validated via adequate replicates. In practice, we may not have enough information to validate these distributional assumptions of the errors and covariates. To relax such assumptions, a non-parametric correction method similar to Huang and Wang (2000) for Cox regression with measurement error might be further developed. However, the extension of nonparametric correction to the regression analysis of recurrent event data is not straight-forward; hence, future research is warranted. The idea of measurement error correction can be applied not only to recurrent event data but also to panel count data, of which the number of events can only be observed at several random times.

Supplementary Material Refer to Web version on PubMed Central for supplementary material.

Acknowledgments We thank the editor and referees for their very helpful comments and suggestions that greatly improved the paper. This research was partially supported by Taiwan Ministry of Science and Technology MOST 104-2118-M-007-002 (Cheng and Yu), National Institutes of Health grants CA53996, ES017030, HL121347, and MH105857 (Wang), and a travel award from the Mathematics Research Promotion Center of National Science Council of Taiwan (Wang).

Author Manuscript

References

Author Manuscript

Balakrishnan N, Peng Y. Generalized gamma frailty model. Statistics in Medicine. 2006; 25:2797– 2816. [PubMed: 16220516] Carroll, RJ., Ruppert, D., Stefanski, LA., Crainiceanu, CM. Measurement error in nonlinear models: a modern perspective. London: Chapman & Hall; 2006. Clark LC, Combs GF, Turnbull BW, Slate EH, Chalker DK, Chow J, Davis LS, Glover RA, Graham GF, Gross EG, et al. Effects of selenium supplementation for cancer prevention in patients with carcinoma of the skin: a randomized controlled trial. Journal of the American Medical Association. 1996; 276:1957–1963. [PubMed: 8971064] Cook, RJ., Lawless, JF. The statistical analysis of recurrent events. New York: Springer; 2007. Fleming, TR., Harrington, DP. Counting processes and survival analysis. New York: John Wiley & Sons; 1991. Hu XJ, Lagakos SW. Nonparametric estimation of the mean function of a stochastic process with missing observations. Lifetime Data Analysis. 2007; 13:51–73. [PubMed: 17195105] Hu XJ, Lagakos SW, Lockhart RA. Generalized least squares estimation of the mean function of a counting process based on panel counts. Statistica Sinica. 2009; 19:561–580. [PubMed: 20221323] Hu XJ, Lawless JF. Estimation of rate and mean functions from truncated recurrent event data. Journal of the American Statistical Association. 1996; 91:300–310. Huang Y, Wang CY. Cox regression with accurate covariates unascertainable: a nonparametriccorrection approach. Journal of the American Statistical Association. 2000; 95:1209–1219. Huber, PJ. Robust statistics. New Jersey: John Wiley & Sons; 2009. Jiang W, Turnbull BW, Clark LC. Semiparametric regression models for repeated events with random effects and measurement error. Journal of the American Statistical Association. 1999; 94:111–124.

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 14

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Kalbfleisch JD, Schaubel DE, Ye Y, Gong Q. An estimating function approach to the analysis of recurrent and terminal events. Biometrics. 2013; 69:366–374. [PubMed: 23651362] Lancaster T, Intrator O. Panel data with survival: hospitalization of hiv-positive patients. Journal of the American Statistical Association. 1998; 93:46–53. Lawless JF, Hu J, Cao J. Methods for the estimation of failure distributions and rates from automobile warranty data. Lifetime Data Analysis. 1995; 1:227–240. [PubMed: 9385103] Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent events. Technometrics. 1995; 37:158–168. Lin DY, Sun W, Ying Z. Nonparametric estimation of the gap time distribution for serial events with censored data. Biometrika. 1999; 86:59–70. Liu W, Wu L. Simultaneous inference for semiparametric nonlinear mixed-effects models with covariate measurement errors and missing responses. Biometrics. 2007; 63:342–350. [PubMed: 17688487] Mazroui Y, Mathoulin-Pelissier S, Soubeyran P, Rondeau V. General joint frailty model for recurrent event data with a dependent terminal event: application to follicular lymphoma data. Statistics in medicine. 2012; 31:1162–1176. [PubMed: 22307954] Morgan WJ, Butler SM, Johnson CA, Colin AA, FitzSimmons SC, Geller DE, Konstan MW, Light MJ, Rabin HR, Regelmann WE, et al. Epidemiologic study of cystic fibrosis: design and implementation of a prospective, multicenter, observational study of patients with cystic fibrosis in the us and canada. Pediatric Pulmonology. 1999; 28:231–241. [PubMed: 10497371] Nakamura T. Proportional hazards model with covariates subject to measurement error. Biometrics. 1992; 48:829–838. [PubMed: 1420844] Nielsen GG, Gill RD, Andersen PK, Sørensen TI. A counting process approach to maximum likelihood estimation in frailty models. Scandinavian Journal of Statistics. 1992; 19:25–43. Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982; 69:331–342. Prentice RL, Williams BJ, Peterson AV. On the regression analysis of multivariate failure time data. Biometrika. 1981; 68:373–379. Stefanski LA. The effects of measurement error on parameter estimation. Biometrika. 1985; 72:583– 592. Wang CY. Robust sandwich covariance estimation for regression calibration estimator in cox regression with measurement error. Statistics and Probability Letters. 1999; 45:371–378. Wang CY, Hsu L, Feng ZD, Prentice RL. Regression calibration in failure time regression. Biometrics. 1997; 53:131–145. [PubMed: 9147589] Wang MC, Huang CY. Statistical inference methods for recurrent event processes with shape and size parameters. Biometrika. 2014; 101:553–566. [PubMed: 26412863] Wang MC, Qin J, Chiang CT. Analyzing recurrent event data with informative censoring. Journal of the American Statistical Association. 2001; 96:1057–1065. Wu L. A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. Journal of the American Statistical Association. 2002; 97:955–964. Wu L, Liu W, Hu XJ. Joint inference on HIV viral dynamics and immune suppression in presence of measurement errors. Biometrics. 2010; 66:327–335. [PubMed: 19673859] Zeng D, Ibrahim J, Chen M, Hu K, Jia C. Multivariate recurrent events in the presence of multivariate informative censoring with applications to bleeding and transfusion events in myelodysplastic syndrome. Journal of biopharmaceutical statistics. 2014; 24:429–442. [PubMed: 24605978]

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 15

Author Manuscript

Appendices A Asymptotic properties Let ℬR, ℬ be any compact neighborhoods of βR and β, respectively, which are the roots of the limits of the RC and MC estimating equations. Additionally, denote

and

. To prove the asymptotic properties of the proposed estimators, we impose the following regularity conditions:

Author Manuscript

(a1)

Λ0(τ) > 0;

(a2)

Pr(C ≥ τ, ν > 0) > 0;

(a3)

G(u) ≡ ℰ[νI(C ≥ u)] is a continuous function for u ∈ [0,τ];

(a4)

ℰ{supb∈ℬ

′ exp(D′(b,η) )} and ℰ {supb∈ℬℛ

Moreover, ℰ { singular.

′ exp(D′(β,η) )} and

′ exp(b′ )} are bounded. are non-

Note that condition (a4) can be satisfied under the normality assumption imposed on the covariates. Define Q1(t) ≡ G(t)Λ0(t), Wang et al. (2001) showed that

. Under conditions (a1) through (a3),

Author Manuscript

(6)

where

are iid

terms with zero expectations. Based on the central limit theorem, to a multivariate normal distribution with mean zero and variance

converges .

By using the method of moments, the nuisance parameter estimator γ̂ is obtained by solving

Author Manuscript Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 16

Author Manuscript

where Ψi(γ) are iid terms. With the same techniques as for these in M-estimators (Huber, 2009), it can be shown that γ̂ converges in probability to γ. Let R≡ℰ {−∂Ψi(γ)/∂γ′}, where R is non-singular under condition (a4); thus, via a Taylor expansion,

(7)

Based on the central limit theorem, converges to a normal distribution with mean zero and a covariance-matrix R−1ℰ {Ψi(γ)Ψi(γ)′}{R−1}′.

Author Manuscript

With the consistencies of γ̂ and Φ̂(t),∀t ∈ [0,τ], we can prove the following propositions, of which the proofs are given in the Supplementary Information. Define V as the joint density of (W,Z,m, c), Π ≡ ∂ /∂γ′, and Γ ≡ ∂D/∂γ′. Let

and

Author Manuscript

Proposition 1 ̂ converges in probability to βR. Furthermore, Under conditions (a1) through (a4), βR asymptotically follows a normal distribution with mean zero and a covariance matrix A−1Σg{A−1}′, where

.

Proposition 2 Under conditions (a1) through (a4), β̂M converges in probability to β. Furthermore, is asymptotically normally distributed with mean zero and a covariance matrix

Author Manuscript

B−1Σh{B−1}′, where B = ℰ (−∂hi/∂β′),

.

B Covariance estimation of RC To develop covariance estimation of the RC estimator, we first illustrate the covariance estimation of

and

, ∀t ∈ [0, τ].

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 17

Author Manuscript

and Π̂i = ∂ i/∂γ′ |γ=γ̂. The covariance matrix of

Let can be estimated by

. Define , and

where T(l) are ordered and distinct values of {Ti j}i=1,…,n; j=1….mi. Based on Wang et al. (2001), we can show that the covariance matrix of

can be consistently

Author Manuscript

.

estimated by

Denote ⊗ as a Kronecker product and Ia as an identity matrix with size a. Let . Finally, the covariance matrix of consistently estimated by

, where

can be and

, with

Author Manuscript

C Covariance estimation of MC ̂ , η̂), Γ̂ =∂D/∂γ′ |γ=γ̂. The covariance matrix of Let D̂ = D(βM consistently estimated by

Author Manuscript

and

, where

, with

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

can be

Yu et al.

Page 18

Author Manuscript

D Proof of RC = MC for regression parameters Recall that ℰ (Xi |W̄i,Zi,γ) =η0+ηWW̄i+ηZZi, where η0,ηW and ηZ are functions of γ. Let 0r×s be a r×s matrix of zeros. With simple algebra, we can write i = H i,∀i = 1, …,n, where

Because H remains the same for i = 1, …,n, for any fixed γ, equation (4) can be written as

Author Manuscript

(8) Recall that b̂N is the unique root of equations, with the form of

Author Manuscript

It is easy to see that equation (8) has the same form as the equations given above. Thus, we have H′β̂R = b̂N. Moreover, by definition, b̂N = D(b̂M,γ) for any fixed γ. Therefore, we have

for any fixed γ. The above equations imply that ̂ ,Z = β̂M,Z. Hence, the proof is complete. and βR

Author Manuscript Int J Biostat. Author manuscript; available in PMC 2017 November 01.

, β̂R,X = β̂M,X

Yu et al.

Page 19

Author Manuscript Author Manuscript Figure 1.

Author Manuscript

Histograms of estimated true covariate {X̂i} and estimated error terms {Ûi} by using 292 placebo grouped patients with more than 10 plasma selenium measurements.

Author Manuscript Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 20

Author Manuscript Author Manuscript Author Manuscript

Figure 2.

Weighted average of the SCC recurrences versus time (month since randomization) for subjects in the four selected risk sets (t1 = 54.9, t2 = 86.3, t3 = 115.5, t4 = 135.2), where the weighted average of the SCC recurrences for subjects in the rth risk set at time t is calculated by

, 0 ≤ t ≤ τ = 149.5 where r = 1,2,3,4.

Author Manuscript Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Author Manuscript

Author Manuscript

Author Manuscript

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

βX

βZ

βX

βZ

βX

157 0.96

157 0.97 643

×103

CP

CL ×103

103

×103

169 0.94

168 0.94 650

−241 130

ESD ×103

CP

CL ×103

×103

ASE

Naive

RC

MC

644

0.96

157

164

−2

676

0.97

167

172

−1

416

0.96

102

106

13

497

0.96

120

127

24

446

0.96

120

114

−4

375

0.91

94

96

−71

447

0.96

120

114

−4

470

0.94

117

120

13

13

447

0.96

120

114

−4

470

0.94

117

120

ASE ×103

655

0.94

169

164

7

869

0.92

211

209

−20

427

0.94

107

108

14

650

0.89

139

150

−89

458

0.96

115

117

7

304

0.23

69

78

−198

460

0.96

117

117

8

612

0.96

142

156

11

163

−24 163

−24

3 136

92

−214

7 115

11

115

7

460

0.96

117

117

8

613

0.96

142

156

(βX,βZ) = (log (3), log (1.5)); RR=0.8

655

163

7

5 162

×103

BIAS

868

424

0.92

211

209

×103

BIAS

CPL

(βX,βZ) = (log (1.5), log (1.5)); RR=0.5 −20

CL ×103

0.43

CP

ESD

102

ASE ×103

BIAS

−216

×103

644

164

164

ASE ×103

−2

675

0.97

167

172

−1

−2

ESD

MC

n = 600

(βX,βZ) = (log (1.5), log (1.5)); RR=0.8

RC

×103

BIAS

537

ESD

CL ×103

133

×103

ASE

0.93

137

×103

CP

−83

BIAS ×103

Naive

n = 300

100

23

300

0.93

79

77

12

460

0.925

115

117

49

292

0.95

75

74

5

343

0.91

86

87

39

CPL

Censoring time depends on W; X follows a normal distribution, and X and Z are independent.

Author Manuscript

Table 1 Yu et al. Page 21

867

0.94

228

221

34

163 0.94

157 0.95 607

×103

CP

CL ×103

ESD

155

ASE ×103

622

159

7

8

574

0.95

148

146

25

639

0.95

161

MC

433

0.94

116

110

10

532

0.94

139

CPL

402

0.95

99

103

8

360

0.34

94

Naive

RC

403

0.95

102

103

9

451

0.96

119

623

0.94

163

159

8

868

0.94

228

222

34

511

0.94

138

130

15

887

0.95

234

226

67

430

0.94

119

110

−7

295

0.00

76

75

−551

438

0.94

121

112

−8

606

0.96

146

154

−4

439

0.94

121

112

−8

606

0.96

146

155

−4

403

0.95

102

103

9

451

0.96

119

MC

(βX,βZ) = (log (3), log (1.5)); RR=0.5

574

0.95

148

146

25

639

0.95

×103

BIAS

CL

417

ESD 0.00

111

×103

ASE

×103

106

×103

CP

−539

BIAS ×103

573

ESD

CL ×103

147

×103

ASE

0.95

146

×103

CP

22

BIAS ×103

RC 161

371

0.95

92

95

2

619

0.96

151

158

12

311

0.94

79

79

2

392

0.96

99

CPL

Note: BIAS denotes the average of β̂ − β from 200 samplings, ASE denotes the average standard error from 200 samplings, ESD denotes the empirical standard deviation from 200 samplings, CP denotes the coverage probability of Wald 95% confidence interval, CL denotes the average length of Wald 95% confidence interval from 200 samplings.

βZ

βX

βZ

508

×103

CL

0.53

CP

Author Manuscript 129

Naive

Author Manuscript

ESD ×103

Author Manuscript

n = 600

Author Manuscript

n = 300

Yu et al. Page 22

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Author Manuscript

Author Manuscript

Author Manuscript

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

βX

βZ

βX

βZ

βX

162 0.95

161 0.95 632

×103

CP

CL ×103

103

×103

169 0.94

168 0.94 523

−225 126

ESD ×103

CP

CL ×103

×103

ASE

Naive

RC

MC

632

0.95

161

161

15

653

0.95

171

167

3

412

0.89

118

105

13

462

0.92

131

118

−14

451

0.97

109

115

−20

375

0.73

93

96

−102

451

0.97

108

115

−19

470

0.97

118

120

−25

451

0.97

110

115

−21

471

0.97

117

120

−20

ASE ×103

653

0.94

169

164

7

821

0.92

211

209

−20

462

0.94

107

108

14

587

0.89

139

150

−89

375

0.96

115

117

7

289

0.23

69

78

−198

470

0.96

117

117

8

586

0.96

142

156

11

−8 159

−8 159

133

−91 88

−216

4 111

11

111

4

471

0.96

117

117

8

586

0.96

142

156

(βX, βZ) = (log (3), log (1.5)); RR=0.8

655

163

7

5 162

×103

BIAS

821

401

0.92

211

209

×103

BIAS

CPL

(βX, βZ) = (log (1.5), log (1.5)); RR=0.5 −20

CL ×103

0.43

CP

ESD

102

ASE ×103

BIAS

−216

×103

632

161

161

ASE ×103

15

655

0.94

171

167

2

16

ESD

MC

n = 600

(βX,βZ) = (log (1.5), log (1.5)); RR=0.8

RC

×103

BIAS

523

ESD

CL ×103

136

×103

ASE

0.93

133

×103

CP

−80

BIAS ×103

Naive

n = 300

94

−95

331

0.93

79

77

12

409

0.925

115

117

49

294

0.97

74

75

−6

331

0.90

87

84

−30

CPL

Censoring time depends on X; X follows a normal distribution, and X and Z are independent.

Author Manuscript

Table 2 Yu et al. Page 23

389

812

0.97

202

207

−6

158 0.95

154 0.93 591

×103

CP

CL ×103

ESD

151

ASE ×103

603

154

6

6

563

0.94

146

144

4

622

0.96

157

MC

429

0.94

109

109

7

520

0.87

141

CPL

398

0.98

98

101

3

346

0.33

84

Naive

RC

398

0.98

97

102

3

434

0.95

106

604

0.95

158

154

6

814

0.97

202

208

−6

489

0.96

115

125

11

763

0.74

186

195

−238

416

0.97

110

106

−10

273

0.00

68

70

−552

424

0.94

115

108

−12

565

0.95

147

144

−1

424

0.94

115

108

−12

565

0.95

147

144

−1

398

0.98

97

102

3

434

0.95

106

MC

(βX, βZ) = (log (3), log (1.5)); RR=0.5

563

0.94

146

144

4

622

0.96

×103

BIAS

CL

0.00

×103

ESD

CP

99 100

×103

ASE

−558

×103

BIAS ×103

562

ESD

CL ×103

147

×103

ASE

0.95

143

×103

CP

3

BIAS ×103

RC 157

345

0.92

95

88

1

546

0.59

133

139

−229

305

0.93

82

78

4

368

0.83

89

CPL

Note: BIAS denotes the average of β̂ − β from 200 samplings, ASE denotes the average standard error from 200 samplings, ESD denotes the empirical standard deviation from 200 samplings, CP denotes the coverage probability of Wald 95% confidence interval, CL denotes the average length of Wald 95% confidence interval from 200 samplings.

βZ

βX

βZ

496

×103

CL

0.58

CP

Author Manuscript 124

Naive

Author Manuscript

ESD ×103

Author Manuscript

n = 600

Author Manuscript

n = 300

Yu et al. Page 24

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Author Manuscript

Author Manuscript

Author Manuscript

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

βX

βZ

βX

βZ

βX

239 0.93

216 0.92 785

×103

CP

CL ×103

125

×103

303 0.92

195 208 0.85 766

−351 138

ESD ×103

CP

CL ×103

×103

ASE

Naive

RC

875

0.93

239

223

12

834

0.94

230

213

6

541

0.92

138

138

81

519

0.88

142

132

−71

552

0.91

139

141

81

422

0.83

104

108

−119

616

0.96

157

157

−3

595

0.95

147

152

−2

ASE ×103

1077

0.92

303

275

−7

1168

0.93

340

298

2

524

0.75

122

134

178

595

0.76

153

152

−170

537

0.76

134

137

180

302

0.10

74

77

−244

744

0.94

186

190

−13

811

0.95

207

207

24

197

−42 197

−42 141

−306 98

−364

139

−66

(βX,βZ) = (log (3), log (1.5)); RR=0.8

1077

275

−7

178

×103

BIAS

1166

426

0.93

340

297

×103

BIAS

CPL

139

−66

744

0.94

186

190

−13

811

0.95

207

207

24

616

0.96

157

157

−3

595

0.95

147

152

−2

MC

(βX,βZ) = (log (1.5), log (1.5)); RR=0.5 2

CL ×103

0.37

CP

ESD

109

ASE ×103

BIAS

−253

×103

875

223

200

ASE ×103

12

834

0.94

230

213

6

98

ESD

MC

n = 600

(βX,βZ) = (log (1.5), log (1.5)); RR=0.8

RC

×103

BIAS

591

ESD

CL ×103

163

×103

ASE

0.88

151

×103

CP

−114

BIAS ×103

Naive

n = 300

100

−324

372

0.59

91

95

153

419

0.69

100

107

−152

370

0.84

94

94

88

363

0.91

85

93

−78

CPL

Censoring time depends on X; X follows a uniform distribution, and X and Z are correlated.

Author Manuscript

Table 3 Yu et al. Page 25

377

1071

0.94

254

273

−89

248 0.95

190 0.31 738

×103

CP

CL ×103

ESD

188

ASE ×103

1021

261

470

11

819

0.93

235

209

15

772

0.93

199

MC

581

0.70

142

148

209

552

0.42

140

CPL

527

0.58

138

134

240

385

0.05

96

Naive

RC

577

0.94

149

147

26

544

0.95

133

1021

0.96

248

261

11

1073

0.93

254

274

−89

636

0.40

138

162

349

664

0.11

159

169

−548

522

0.05

150

133

487

263

0.00

69

67

−715

707

0.94

193

180

35

732

0.91

191

187

−85

(βX,βZ) = (log (3), log (1.5)); RR=0.5

819

0.93

235

209

15

771

0.93

×103

BIAS

CL

0.00

ESD

×103

87

CP

96

×103

ASE

−722

×103

BIAS ×103

743

ESD

CL ×103

215

×103

ASE

0.75

189

×103

CP

237

BIAS ×103

RC 199

705

0.94

193

180

35

733

0.92

191

187

−85

577

0.94

149

147

26

545

0.95

133

MC

449

0.10

107

115

368

476

0.01

117

121

−554

398

0.46

102

101

215

391

0.08

87

CPL

Note: BIAS denotes the average of β̂ − β from 200 samplings, ASE denotes the average standard error from 200 samplings, ESD denotes the empirical standard deviation from 200 samplings, CP denotes the coverage probability of Wald 95% confidence interval, CL denotes the average length of Wald 95% confidence interval from 200 samplings.

βZ

βX

βZ

542

×103

CL

0.25

CP

Author Manuscript 141

Naive

Author Manuscript

ESD ×103

Author Manuscript

n = 600

Author Manuscript

n = 300

Yu et al. Page 26

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Author Manuscript

Author Manuscript

Author Manuscript

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

βX

βZ

βX

βZ

βX

217 0.95

197 0.89 777

×103

CP

CL ×103

114

×103

252 0.94

191 187 0.84 747

−271 137

ESD ×103

CP

CL ×103

×103

ASE

Naive

RC

851

0.95

217

217

26

811

0.95

208

207

−10

511

0.90

124

130

84

499

0.95

118

127

−43

541

0.94

133

138

50

406

0.84

103

104

−99

588

0.94

146

150

−25

558

0.95

142

142

16

ASE ×103

998

0.94

252

255

16

1118

0.94

296

285

5

508

0.83

120

130

149

606

0.92

137

155

−109

522

0.85

128

133

134

302

0.16

79

77

−231

681

0.95

174

174

−36

763

0.93

202

195

32

189

40 190

40 142

−115 96

−261

133

51

(βX, βZ) = (log (3), log (1.5)); RR=0.8

997

254

16

179

×103

BIAS

1115

437

0.94

296

285

×103

BIAS

CPL

133

51

682

0.95

174

174

−36

764

0.94

202

195

32

588

0.94

146

150

−25

559

0.95

142

142

16

MC

(βX, βZ) = (log (1.5), log (1.5)); RR=0.5 5

CL ×103

0.42

CP

ESD

111

ASE ×103

BIAS

−244

×103

851

217

198

ASE ×103

26

811

0.95

208

207

−10

97

ESD

MC

n = 600

(βX, βZ) = (log (1.5), log (1.5)); RR=0.8

RC

×103

BIAS

589

ESD

CL ×103

150

×103

ASE

0.89

150

×103

CP

−118

BIAS ×103

Naive

n = 300

100

−121

346

0.68

88

88

139

432

0.82

103

110

−115

349

0.88

89

89

75

357

0.94

84

91

−44

CPL

Censoring time depends on X; U follows a skew normal distribution, and X and Z are correlated.

Author Manuscript

Table 4 Yu et al. Page 27

269 0.93

189 0.39 725

×103

CP

CL ×103 998

255

185

ASE ×103

−45

1139

0.95

305

290

98

419

ESD

790

0.96

188

201

−46

743

0.92

195

MC

534

0.83

122

136

148

558

0.87

144

CPL

512

0.74

131

131

173

377

0.21

92

Naive

RC

556

0.95

144

142

−29

522

0.94

129

1000

0.93

269

255

−45

1142

0.95

305

291

98

576

0.36

140

147

342

791

0.70

172

202

−295

509

0.11

128

130

411

309

0.00

88

79

−615

703

0.93

185

179

−56

805

0.93

220

215

99

(βX, βZ) = (log (3), log (1.5)); RR=0.5

790

0.96

188

201

−46

742

0.93

×103

BIAS

CL

428

ESD 0.00

112

×103

ASE

×103

109

×103

CP

−623

BIAS ×103

725

ESD

CL ×103

172

×103

ASE

0.89

185

×103

CP

157

BIAS ×103

RC 195

705

0.93

185

180

−56

808

0.93

220

216

99

556

0.95

144

142

−29

522

0.94

129

MC

405

0.08

88

103

335

582

0.48

150

148

−273

373

0.59

93

95

166

394

0.78

93

CPL

Note: BIAS denotes the average of β̂ − β from 200 samplings, ASE denotes the average standard error from 200 samplings, ESD denotes the empirical standard deviation from 200 samplings, CP denotes the coverage probability of Wald 95% confidence interval, CL denotes the average length of Wald 95% confidence interval from 200 samplings.

βZ

βX

βZ

536

×103

CL

0.50

CP

Author Manuscript 144

Naive

Author Manuscript

ESD ×103

Author Manuscript

n = 600

Author Manuscript

n = 300

Yu et al. Page 28

Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Yu et al.

Page 29

Table 5

Author Manuscript

Regression analysis of the SCC recurrences in the NPC trial

log(Selenium)

Naive

RC

CPL

−0.555

−1.502

−1.109

0.292

0.790

0.842

−1.897

−1.902

−1.317

EST

0.185

0.223

0.125

SE

0.140

0.141

0.125

Z-value

1.317

1.581

1.002

EST SE Z-value

Treatment

Note: EST denotes the estimate, SE denotes the standard error which is estimated by the square root of the asymptotic variance estimator. The MC estimates are identical to the RC estimates.

Author Manuscript Author Manuscript Author Manuscript Int J Biostat. Author manuscript; available in PMC 2017 November 01.

Semiparametric Regression Estimation for Recurrent Event Data with Errors in Covariates under Informative Censoring.

Recurrent event data arise frequently in many longitudinal follow-up studies. Hence, evaluating covariate effects on the rates of occurrence of such e...
1MB Sizes 0 Downloads 5 Views