STATISTICS IN MEDICINE, VOL. 11, 1615-1618 (1992)

A NOTE ON LAST CASE INFLUENCE USING THE PARTIAL LIKELIHOOD APPROACH UNDER HEAVY CENSORING PAUL OMAN Department of Mathematics and Statistics. Ellison Building, Newcastle upon Tyne Polytechnic, NEI 8ST, U . K

SUMMARY The Cox proportional hazards model always treats the last observation in a time ordered data set as an inexact survival time. In certain situations this can lead to this observation always having a low case influence and consequently never being highlighted by a case influence analysis. Thus an influential value can become non-influential by a miscalculation of survival time and this, in turn, can affect conclusions of an analysis based on statistical significance (P-values).

1. INTRODUCTION In survival studies the data for the ith patient are of the form ( t i , Si, z i ) where tiis the survival time,

Siis an indicator of whether the survival time is exact (Si = 1) or censored (Si = 0) and ziis a p x 1 vector of covariates ( i = 1,2,. . . , n). Here the survival times are assumed to be in increasing order with no ties. A popular method of analysis of such data is the proportional hazards model of Cox,' defined by

I ( t ;z) = n o ( t )exp ($z) where i ( r ; z) is the hazard at time t for a patient with covariate vector z, I,,(t) is the unspecified baseline hazard function and @ is a p x 1 vector of regression coefficients. An estimate of /3, denoted i, is found by maximizing the partial likelihood, given in Section 2. A feature of this partial likelihood approach is that the last case in the data set (the patient with longest survival time) is always treated as a censored or inexact survival time even when failure was observed. Since the work of Cook' it has been widely acknowledged that any case with a disproportionate influence on the estimate of /Ishould be identified and examined. One method of detecting such cases is called case deletion and examines the differences between the estimate of the regression coefficient for the complete data set and @ ( i ) , the estimate with case i removed. Cases for which the differences are most extreme are deemed the most influential. Detection of such cases is obviously important if, for example, they are found to have errors in either their covariates or survival time. In this note, we examine how the influence of the last case in a data set is affected by always being treated as a censored survival time. In Section 2, we demonstrate that in certain situations, the influence of the last case will always be low. In Section 3, an illustration of this effect is given and we see how a case with low influence can still be important.

b,

B &,

0277-67 1519211216 15 04$07.00 0 1992 by John Wiley & Sons, Ltd.

Received December 1990 Revised April 1992

1616

P.OMAN

2. LAST CASE INFLUENCE The partial likelihood L(B) for the proportional hazards model can be written

where wi = exp(/lTzi)and R i is the risk set of j patients for whom t j 3 ti. By differentiating log L( B) with respect to /l we get the score vector U(B, given by

where

c wjzj

Ui(B)=zi--

jeRi

@ is found by solving the equation U(g, = 0 using an iterative process such as the NewtonRaphson method. Notice that equation (1) is the sum of n - 1 quantities since Un= 0. The influence for case k is given by - j ( k ) but since there is no explicit formula for we cannot examine such terms directly. Instead we focus on the term U(B) - u(k)(h where U(k)(fi is the score vector calculated by omitting case k and evaluated at We know that for small case influence, &, is near @ and so u(k)(B) will be near u(B) ( = 0). We can write explicitly

s,

b

a.

where

Notice that when k = n, the first term in equation ( 2 )disappears, Note also that if there are few failures in the data set with a good proportion of these failures appearing early (as in many transplant survival data sets), the risk sets will be large at most failure times. If further p z k < 0 then the terms Bi will be small and hence the second term in equation (2) will be small also. Hence, if the last observation in a data set corresponds to a failure with p z k < 0 then we expect this observation to have little influence on irrespective of the value of zk, and in particular irrespective of whether or not this case is unusual by comparison with the others. 3. ILLUSTRATION

We illustrate with an analysis of a set of kidney transplant data on 81 patients. The data for each patient consist of a graft survival time, an indicator of graft failure and two covariates - the time (in months) from the start of dialysis to transplant (dialysis time) and the time from removal of the kidney from the donor until transplantation to the recipient (total ischaemic time). Both covariates

LAST CASE INFLUENCE USING PARTIAL LIKELIHOOD APPROACH

1617

Table I. Proportional hazards analysis of a set of kidney transplant data with an incorrect survival time Dialysis time Coefficient (j) (Standard error) P-value Relative risk (95 per cent confidence interval)

- 0.4255

(0.212 1 ) 0.0455 0.6534 (0.43,0.99)

Total ischaemic time - 0.1471

(0.2201) 0.505 1 0.8632 (0.56,1.33)

Table 11. Proportional hazards analysis of a set of kidney transplant data with all survival times correct Dialysis time Coefficient (S, (Standard error) P-value Relative risk (95 per cent confidence interval)

- 03200

(0.2069) 0.1219 0.7261 (0.48,1-09)

Total ischaemic time -

0~0911

(0.2130) 0.6526 0.9069 (0.86,1.38)

were standardized to zero mean and unit variance. Survival times ranged from 1 day to 40.83 months and there was 73 per cent censoring. Fitting the Cox proportional hazards model to the data gives the results in Table I. Notice that the coefficient for dialysis time is significant. The longest observed time of 40.83 months corresponded to a graft failure with z8, = (2.2074, 0.9768),for which bTZ81 = - 1.08. Calculating the case influences using an overall measure akin to that in Cook’ indicated that this case had only the 32nd largest influence on @. Suppose now that the survival time for the last case should have been 30.83 months instead of 40.83 months so that the true position of this case was 74th instead of last. Refitting the proportional hazards model gives the coefficient estimates in Table I1 and the case now has the second largest influence on Notice that the coefficient for the first covariate is no longer significant at P = 005. Thus, had the case not been identified as having an erroneous survival time, a spuriously significant relationship with one of the covariates would have resulted (and may have been published) despite a case influence analysis. For reference, a parametric analysis of these data using a Weibull model indicated that the original 40.83 month survival time for the patient in question was certainly worth checking:- case influence analysis under this model ranked the observation second out of 81. Examination of relative risks and the associated 95 per cent confidence intervals (given in Tables I and 11) produces the same conclusions as the Cox regression analysis. From Table I the relative risk confidence intervals calculated from the incorrect data imply that increasing dialysis time is significantly beneficial whereas no effect is demonstrable for total ischaemic time at P = 0.05. Results for the corrected data (Table 11) are inconclusive for both covariates. In summary, the proportional hazards model is of course extremely important in survival studies but using P-values alone should be avoided with case influence analyses undertaken

b.

1618

P.OMAN

routinely. However, special care should be taken before concluding that the largest observation time has little or no influence: this could be an artefact of the fitting process. ACKNOWLEDGEMENT

I thank Dr. R. Henderson for his advice and many useful discussions. REFERENCES 1. Cox, D. R. ‘Regression models and life tables (with discussion)’, Journal of the Royal Statistical Society, Series B, 34, 187-220 (1972) 2. Cook, R. D. ‘Detection of influential observations in linear regression’, Technometria, 90,No. 1 , 15-18 (1977)

A note on last case influence using the partial likelihood approach under heavy censoring.

The Cox proportional hazards model always treats the last observation in a time ordered data set as an inexact survival time. In certain situations th...
186KB Sizes 0 Downloads 0 Views