Simulation program for estimating statistical power of Cox's proportional hazards model assuming no specific distribution for the survival time.

Computer Methods and Programs in Biomedicine, 35 (1991) 203-212

203

© 1991 Elsevier Science Publishers B.V. All rights reserved 0169-2607/91/$03.50

C O M M E T 01199

Section II. Systems and programs

Simulation program for estimating statistical power of Cox's proportional hazards model assuming no specific distribution for the survival time Kouhei Akazawa 1, Tsuyoshi N a k a m u r a 2, Sunao Moriguchi l, Mitsuo Shimada and Yoshiaki Nose i 1 Department of Medical Informatics, Kyushu Unicersity Hospital, Higashi-ku, Fukuoka, Japan and 2 School of Allied Medical Sciences, Nagasaki Unit'ersity, Nagasaki, Japan

Small sample properties of the m a x i m u m partial likelihood estimates for Cox's proportional hazards model depend on the sample size, the true values of regression coefficients, covariate structure, censoring pattern and possibly baseline hazard functions. Therefore, it would be difficult to construct a formula or table to calculate the exact power of a statistical test for the treatment effect in any specific clinical trial. The simulation program, written in S A S / I M L , described in this paper uses Monte-Carlo m e t h o d s to provide estimates of the exact power for Cox's proportional hazards model. For illustrative purposes, the program was applied to real data obtained from a clinical trial performed in Japan. Since the program does not assume any specific function for the baseline hazard, it is, in principle, applicable to any censored survival data as long as they follow Cox's proportional hazards model.

Simulation; Proportional hazards; Hazard; Survival analysis; Statistical power; Censored survival analysis

1. Introduction

Cox's proportional hazards model [1] has been used frequently in censored survival analysis to test equality of treatment effects on survival times, adjusting for covariate effects. The model quantitatively represents the effect of covariates on survival times without assuming any specific form for the distribution of survival times. When applying this model to a clinical trial, it is important to be aware of the relationship between sample size and testing power. This can

Correspondence. K. Akazawa, D e p a r t m e n t of Medical Informatics, Kyushu University Hospital, 3-1-1 Maidashi, Higashiku, Fukuoka 812, Japan.

help in determining the reliability of the results of the clinical trial [2]. Several authors have studied the statistical power of Cox's proportional hazards model [3-5]. They have generally assumed that the survival times are exponentially distributed. Johnson et al. [3] evaluated small-sample performances of the maximum partial likelihood estimates of the regression coefficients in two-covariate case. They concluded that, in a rather 'ideal' case of balanced covariates and no censoring, sample sizes of 40 yield reasonable results in terms of bias, agreement between asymptotic and finite-sample variances, and power, but degraded results occur with either unbalanced covariates or censoring. Amini et al. [4] determined the empirical size and power of the four statistics (likelihood ratio, Wald,

204

conditional and unconditional score) for misspecified proportional hazards models. Schoenfeld derived an asymptotic formula for the sample size required for the comparison of two groups with exponential survival curves when covariates are balanced [5]. The results of these studies indicate that the size and power of a hypothesis test under Cox's proportional hazards model depends heavily on the sample size, the true values of coefficients, covariate structure, censoring pattern, and possibly baseline hazard functions. Unfortunately, there is no reliable formula to determine the power under any specific set of design conditions. This led us to develop the following computer program. This paper presents a S A S / I M L simulation program for evaluating the statistical power of specific proportional hazards models. The user of this program can assign sample size, Type I or Type II censoring pattern, number of covariates, type of covariate (dichotomous, polytomous, or continuous), and covariate structure. The baseline hazard function is left completely arbitrary in this program, unlike others, which normally assume an exponential distribution. Thus, the resuits of the program will be correct even if data fail to fit any classical distributions such as Weibull or exponential.

2. Theoretical background Let A(t,z) represent the hazard function at time t for an individual with a row vector z of covariates. The proportional hazards model treated by Cox is A ( t , z ) = Ao(t ) exp(z/3),

(1)

where A0(t) is an arbitrary unspecified baseline hazard function, and /3 is a column vector of unknown parameters [1]. Eqn. 1 leads to the partial likelihood k L(/3) = 1-I ,=,

exp(z(i)/3) E j~Ri

exp(zfl3)'

(2)

proposed by Cox [1], where k is the number of failures, zj is the vector of covariates associated with the jth individual, (i) denotes the individual who fails at ith failure time and R i is the risk set of all individuals who are in the study and alive just before the ith failure. We assume no tied failures. Under certain regularity conditions, /3 that maximizes Eqn. 2 is asymptotically normally distributed with mean /3, a true value, and under the null hypothesis that the true value /3i is zero, or that there is no effect of z~ on the survival, /3ffS.E.(/3z) is asymptotically distributed as normal with mean 0 and variance 1 [6]. In Cox's proportional hazards model, the underlying hazard rate A0(t) is left unspecified. Generally, only the rank statistic, the order of failures, can carry information about /3 when A0(t) is completely unknown [6]. To generate the rank statistic following Cox's proportional hazards model, the program does not assume any distribution for A0(t).

3. Program description The source code of the program is listed in the Appendix. The simulation program consists of the following three modules: a data module for generating sample data, an estimation module for the estimation of the coefficient/3 and an asymptotic variance, and a test module for the estimation of testing power.

3.1. Data module Table 1 describes the input parameters required to run the program; the user is supposed to assign values to these parameters, as determined from the relevant testing problem. The s HAP E function shapes a new matrix from a matrix with different dimensions. For example, MTRX=SHAPE

({0

1},

25,2);

results in a 25 × 2 matrix of which every row is { 0 1}. R=l(n); sets up an identity matrix with n-dimensional rows and columns. A= R E P E A T (x,m,n) ;

205

MT5=UNIFORM(REPEAT(SEED,100,1)); MTRXI=MT1 //MT2; MTRX2=MT3/ /MT4/ /MT3/ /MT4; Z=MTRX1 I MTRX2 I MT5;.

TABLE 1 Description of input parameters Parameters

Description

NS NF NV BT NR SEED/ (i = 1,2) NC

Number of individuals Number of failures Number of covariates True value of/3 Number of repetitions Prime number used to initialize random number generators NC(k ) is the number of individuals censored between kth and (k + 1)th death

sets up a new m × n matrix by repeating the value x. The UNI F0 RM function is a scalar function returning one or more pseudo-random numbers with a uniform distribution over 0 to 1. For instance;

MT1=SHAPE(0,50,1); MT2=SHAPE(1,50,1) ; MT3=SHAPE(0,25,1); MT4=SHAPE(1,25,1);

The last statement above creates the 100 x 3 covariate matrix z which consists of two dichotomous covariates and one continuous covariate. A row and a column represent an individual and a covariate, respectively. The first column of z is called the treatment covariate, and equals one or zero according to whether or not the individual is given a treatment. It has elements 0 for the first through 50th rows and 1 for the 51st through 100th rows. The second column of z is called a prognostic variable. It has 0 for the first through 25th and 51st through 75th rows, and 1 for the 26th through 50th and 76th through 100th rows. The elements of the third column of z, a different prognostic variable, are uniform random numbers over 0 to 1. Table 2 describes the parameters used in the data module and Fig. 1 illustrates how the ranks and censored individuals are determined; in words, when individuals for the first m - 1 failures

TABLE 2 Description of parameters used in the data module Parameter

Size

Description

FT

I×NS

FO

1 × NF

Y

NSxNF

RZ

NS x 1

RISK

NS x 1

TRK HIT

Scalar Scalar

PRISK

Jx 1

PTR TC

Scalar Scalar

PTC

Scalar

FT(j) > 0 indicates jth individual died at FT(j)th failure time, but FT(j) = 0 indicates the jth individual was censored. F O ( k ) denotes the label of the individual who died at kth failure time. Y ( j , k ) = 1 if the jth individual is at risk at the kth failure time, and 0 otherwise. R Z ( j ) denotes the relative risk exp(zj/3) for jth individual. RISK(j) = exp(zfl3) if jth individual is at risk. Otherwise, it is 0. TRK ~.NS1 exp (zj/3). H I T = r I x T R K , where r I is a uniform random number in (0,1). PRISK(i) = exp(zi/3) if the ith individual is at risk, and 0 otherwise, for i = 1. . . . . J. Sum of elements of PRISK. The number of individuals at risk.bHICScalarHIC = r e XTC, where r e is a uniform random number in (0,1). PTC = Y~{=~Y(i,k + 1). =

206

Assign Input parameters NS, NF, NV, BT, NR, SEED/, NC, Z.

4' K=0

K=K+I

1

( mo)

I

Compute RISK(j) = e x p ( Z i ~ ) for jth individual at risk and TRK = Z RISK(j)

4'

HIT = r, x TRK, r, being a uniform random number in (0,1)

4' Choose jo such that I j0 = Min{ j I S~= 1 exp(Zq/9 ) > HIT} FT (jo) = K FO (k) = j o Y(jo,~) = 0 for ~ such as K+I =10E -5 THEN DO; CORR=SOLVE(V,R)*U; B-B+CORR'; RCORR-CORR'*CORRI(B*B'); VAR-SOLVE(V,R); CNTNO=I; END; ELSE DO; B~REPEAT(999,I,NV]; VAR-REPEAT(999,NV,NV); CNTNO-O; RCORR=0; END; IF SQRT(RCORR)>0.01 THEN GO TO REPEAT; I* ELSE PRINT B; */ FINISH; /***** MODULE FOR THE ESTIMATION OF POWER *****/ START TEST; BA-REPEAT(0,NR,NV); TT-REPEAT(O,NR,NV); T0-REPEAT(0,NR,NV); VARMT-REPEAT(0,NR,NV); COUNT-REPEAT(I,NR,I); DO IT-I TO NR; /* PRINT IT; */ RUN DATA; RUN ESTIMATION; PWRP-REPEAT(O,NR,NV); BA(IIT,I)=B; COUNT(IITI)-CNTNO; IF COUNT(lIT1)=1 THEN DO JTI=I TO NV; SDJ=SQRT(VAR(IJTI,JTII)); TT(IIT,JTII)=(B(IJTII)-BT(IJTII))/SDJ; T0=(IIT,JTII)=B(IJTII)/SDJ; VARMT(IIT,JTII)=VAR(IJTI,JTII); END; ELSE DO JT2=I TO NV; TT(IIT,JT21)=0; T0(IIT,JT21)=999; VARMT(IIT,JT21)=0; END; END; DO IS=I TO N-R; DO JS=I TO NV; IF T0(IIS,JSI)999 ~ ABS(T0(IIS,JSI))>=I.645 THEN PWRP(IIS,JSI)=I; END; END; PWRT=PWRP ( I+, I) ; CNTT=COUNT(I+I); POWER=PWRT~CNTT; BAT-BA#COUNT; BMEAN-BAT(I+,I)ICNTT; TTMEAN-TTT(I+,I)/CNTT; TOMEAN=TOT(I+,I)/CNTT; VARMTT=VARMT ( I+, I ) ; VMEAN=VARMTT/CNTT; PRINT NS, NF, BT, PWRT, CNTT, POWER, BMEAN, VMEAN; FINISH; RUN TEST; QUIT;

212 [6] J.D. Kalbfleisch and R.L. Prentice, The Statistical Analysis of Failure Time Data (Wiley, New York, 1980). [7] W.J. Dixon, M.B. Brown, L. Engelman, M.A. Hill and R.I. Jennrich, BMDP Statistical Software Manual Vol. 2, (University of California Press, Berkeley, 1988). [8] SAS Manual, SAS/IML User's Guide, Version 5 Edition, (SAS Institute Inc., Cary, NC, 1985). [9] M.C. Bryson and M.E. Johnson, The Incidence of Monotone Likelihood in the Cox Model, Technometrics 23 (4) (1981) 381-383.

[10] W.M. Gregory, Adjusting Survival Curves for Imbalances in Prognostic Factors, Br. J. Cancer 58 (1988) 202-204. [11] J.M. Lachin, Statistical Properties of Randomization in Clinical Trials, Controlled Clin. Trials 9 (1988) 289-311. [12] K. Akazawa et al., Prediction of prognosis in advanced gastric carcinoma in Proc. Soc. Survival Time Stud. Cancer Patients (155W 0289-3312; 1988); in Japanese (English abstract).

GOFCOX: a computer program for the goodness-of-fit analysis of the Cox proportional hazards model.

On estimation of covariate-specific residual time quantiles under the proportional hazards model.

Three measures of explained variation for correlated survival data under the proportional hazards mixed-effects model.

Proportional exponentiated link transformed hazards (ELTH) models for discrete time survival data with application.

Continuous-Time Proportional Hazards Regression for Ecological Monitoring Data.

A marginal cure rate proportional hazards model for spatial survival data.

Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent.

A Bayesian proportional hazards model for general interval-censored data.

A statistical design for estimating functional survival.

A Proportional Hazards Regression Model for the Sub-distribution with Covariates Adjusted Censoring Weight for Competing Risks Data.

Survival analysis 1982-1991: the second decade of the proportional hazards regression model.

Tests for equivalence of two survival functions: Alternative to the tests under proportional hazards.

Goodness-of-fit test of the stratified mark-specific proportional hazards model with continuous mark.

Single-Arm Phase II Survival Trial Design Under the Proportional Hazards Model.

The consequences of proportional hazards based model selection.

Bayesian Variable Selection under the Proportional Hazards Mixed-effects Model.

Flexible covariate effects in the proportional hazards model.

Covariance adjustment of survival curves based on Cox's proportional hazards regression model.

Mediation Analysis with Survival Outcomes: Accelerated Failure Time vs. Proportional Hazards Models.

Use of the normal generated distribution for estimating population survival.

Nanos gigantium humeris insidentes: the awarded Cox proportional hazards model.

Testing goodness-of-fit for the proportional hazards model based on nested case-control data.

Predicting the Survival Time for Bladder Cancer Using an Additive Hazards Model in Microarray Data.

MODELLING COUNTY LEVEL BREAST CANCER SURVIVAL DATA USING A COVARIATE-ADJUSTED FRAILTY PROPORTIONAL HAZARDS MODEL.