Research Article Received 19 March 2013,

Accepted 26 November 2013

Published online 18 December 2013 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/sim.6075

A spatial scan statistic for survival data based on Weibull distribution Vijaya Bhatt* † and Neeraj Tiwari The spatial scan statistic has been developed as a geographical cluster detection analysis tool for different types of data sets such as Bernoulli, Poisson, ordinal, normal and exponential. We propose a scan statistic for survival data based on Weibull distribution. It may also be used for other survival distributions, such as exponential, gamma, and log normal. The proposed method is applied on the survival data of tuberculosis patients for the years 2004–2005 in Nainital district of Uttarakhand, India. Simulation studies reveal that the proposed method performs well for different survival distribution functions. Copyright © 2013 John Wiley & Sons, Ltd. Keywords:

cluster detection; geographical surveillance; spatial scan statistic; survival data; tuberculosis; Weibull model

1. Introduction Spatial and space time scan statistics are popularly used methods in disease surveillance and detection of disease outbreaks. Early detection of disease outbreaks enables public health officials to implement disease control and preventive measures at the earliest possible time. In epidemiology, scientists have proposed a large number of methods for detection of disease clusters, after having adjusted for spatial variations in the density of the background population that is itself at risk. Among them, spatial scan statistic, proposed by Kulldorff and Nagarwalla [1], is frequently used for geographical disease cluster detection and cluster evaluation in disease surveillance. Besides epidemiology, cluster detection is commonly used in many other fields including criminology, genetics, mining, astronomy, and so on [2]. Kulldorff and Nagarwalla proposed the spatial scan statistic for geographical disease surveillance and evaluation of geographical disease cluster alarms. With the spatial scan statistic, a circular searching window moves across a whole study region. For any given position of the circular centroid, the radius of the window is changed continuously to take any value between zero and an upper limit. In this way, all the sub regions, whose centroids are located in a searching circular area, make up a candidate cluster area Z, consisting of a subset of all the subregions in the study area. Spatial scan statistic quantifies the spatial region into areas of potential clusters versus the rest of the study region and conducts a likelihood ratio test. Different types of discrete or continuous data can be analyzed using spatial scan statistics for Bernoulli and Poisson [3], ordinal [4], exponential [5], normal [6], and multivariate [7] models. Bernoulli and Poisson models are among the most popular models for discrete data in geographical disease surveillance such as disease prevalence, incidence, or mortality. The ordinal model is used for categorical data. The exponential model has been developed for survival data (with or without censoring) and the normal for continuous outcome of the data. Cook et al. [8] have developed the spatial scan statistic for censored outcome data using score statistic from Cox’s proportional hazards model [9] instead of a likelihood ratio test to avoid specifying the baseline hazard function. However, it is computationally burdensome, requires a strong assumption of the Cox’s proportional hazards model being correctly specified to have exchangeability to validate the permutation test, and is limited in its ability to define more than one significant cluster. Huang et al. [5] have proposed a spatial scan statistic based on the exponential model to analyze the geographic variations in health events measured on a continuous scale, such as survival time and disease

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1867–1876

1867

Department of Statistics, Kumaun University, S.S.J. Campus, Almora, India *Correspondence to: Vijaya Bhatt, Department of Statistics, Kumaun University, S.S.J. Campus, Almora, India. † E-mail: [email protected]

V. BHATT AND N. TIWARI

progression. The exponential model is the simplest parametric model and assumes a constant risk over time, which reflects the property of the distribution called ‘lack of memory’ appropriately. The model is very sensitive to even a modest variation because it has only one adjustable parameter, the inverse of which is both mean and standard deviation. The Weibull model is an important generalization of the exponential model with two positive parameters. The second parameter in the model allows great flexibility of the model and different shapes of the hazard function. The convenience of the Weibull model for empirical work stems on the one hand from this flexibility and on the other from the simplicity of the hazard and survival function. The primary advantage of Weibull analysis is the ability to provide reasonably accurate failure analysis and failure forecasts with extremely small samples. These advantages of Weibull distribution over exponential distribution inspired us to propose a spatial scan statistic based on Weibull distribution. In this paper, we propose a spatial scan statistic for survival data based on the Weibull model to determine if there are geographical clusters of people with shorter than the expected survival time that may reflect inadequate treatment or health practices. Similarly, locations having longer than expected survival time may reveal good treatment and health practices in the observed cluster area. As the spatial scan statistic for other distributions, the proposed test statistic is based on a likelihood ratio test and evaluated using Monte Carlo hypothesis testing. Simulation study has been carried out to evaluate the performance of proposed method. The Weibull model together with other survival distributions such as gamma, log normal, and exponential is used for survival data. An application of the proposed method is demonstrated for tuberculosis (TB) patients in the years 2004–2005 for Nainital district of Uttarakhand, India. The Weibull model is used for determining the exceptionally high or low survival rate areas within the study area. For the detected clusters, power and sensitivity analysis was performed using simulation study.

2. Spatial scan statistic for Weibull distribution 2.1. Test statistic Let G be the whole study area and let Z be any sub area in G, inside the cluster and Z c is the sub area in G, outside the cluster (Z; Z c 2 G). Now, let the survival time for each individual inside zone Z be distributed according to the Weibull distribution with scale parameter i n and shape parameter pi n , while outside the zone Z(Z c ), the survival times be distributed with scale parameter out and shape parameter pout . The null hypothesis Ho W i n D out for any Z is contrasted with the alternative Ha W i n < out , for at least one zone when one wants to detect clusters with longer survival, Ha Wi n < out , for at least one zone when one wants to detect clusters with shorter survival, and with Ha :i n ¤ out for at least one zone when one wants to find clusters with either shorter or longer survival. Let there be N individuals under study area, the i th individual being associated with a lifetime Ti and a fixed censoring time Li . For the time being, Ti is assumed to be independently and identically p1

T

p

distributed (i.i.d.) with Weibull probability density function f .Ti / D 1 pTi e  i . The lifetime Ti of an individual is observed only if Ti 6 Li and if Ti > Li , the survival time is censored considering a right censoring mechanism. We define the observed time ti D min.Ti ; Li /. Let ıi D 1 if Ti 6 Li and ıi D 0 if Ti > Li , where ıi indicates whether the lifetime Ti is censored or not [8]. We define ri n D †i2´ ıi (the number of non-censored individuals inside zone Z), and rout D †i62´ ıi (the number of non-censored individuals outside zone Z). Let ni n and nout be the total number of individuals inside and outside the zone Z, respectively. The total number of individual in G is N D ni n C nout , and the total number of non censored individual is R D ri n C rout . The circular spatial scan statistic is defined through a large number of overlapping circles [10]. For each circle z, a log likelihood ratio (LLR) (Z) is calculated, and the test statistic is defined as the maximum LLR over all circles. We use the set of circular zones centered at one of the patient location. The collection of scanning windows is a parameter space for the cluster, over which the likelihood ratio is maximized. The likelihood function [11] for a Weibull model can be written as

1868

L.Z; i n ; out / 2 !ıi p T i n Y i 1 .p 1/ 4 D pi n Ti i n e i n i n i2´

Copyright © 2013 John Wiley & Sons, Ltd.

pi n

e

Li i n

!1ıi 3 5

Statist. Med. 2014, 33 1867–1876

V. BHATT AND N. TIWARI



Y

2 1

4

out

i62´

D

Y

1

i2´

.i n /ıi

pout Ti

pi n

ıi Ti

.pout 1/

ıi .pi n 1/

e

e

p T out i out

T i

!ıi 

p

Li out e out

pi n p ıi .1ıi /L i n i i n i n



Y i62´

Y .p 1/  1 ri n D  p ti i n e .i n /ri n i n

P pi n i 2´ ti i n

i2´

1ıi

3 5

1 .out /ıi

pout

ıi .pout 1/

e

p p T1 out ıi .1ıi /L out i out out

Y .p 1/  1 rout  p tout out e out .out /rout

P pout i 62´ ti out

i62´

Taking log on both sides lnL.´/ D ri n In.pi n / C rout In.pout / C .pi n  1/

X i2´

P  ri n I n.i n /  rout I n.out / 

In.ti / C .pout  1/

i2´ ti

pi n

i n

In.ti /

i62´

P

i62´ ti



X

pout

out

The related likelihood ratio test statistic for a test with the alternative Ha W i n ¤ out for at least one zone Z is   L ZO max´ ;i n ¤out L.Z; i n;out / D D max´ ;i n Dout L.Z; i n;out / L0 where ZO is the zone that maximizes L.Z; i n; ; out / under Ha and L0 is the maximum of L.Z; i n; ; out / under H0 . Given an arbitrary zone Z, the maximum likelihood estimators (MLE’s) of i n ; out ; pi n ; and pout can be obtained as the solutions of following equations, respectively: P pi n ılnL.´/ ri n i2´ ti D C D0 ıi n i n i n2 P pi n rout ılnL.´/ i62´ ti D C D0 ıout out out 2 ılnL.´/ ri n X 1 X pi n D C In.ti /  ti In.ti / D 0 ıpi n pi n i n i2´

i2´

X rout 1 X pout ılnL.´/ D C In.ti /  ti In.ti / D 0 ıpout pout out i62´

i62´

P

p Oi n

t Thus, the MLE’s of scale parameters i n and out are Oi n D i 2´ri ni and Oout D respectively. The MLE’s of shape parameter pi n and pout are obtained by the equations 1 X ri n X pO C In .ti /  t i n In .ti / D 0 i2´ i2´ i pOi n Oi n

P

i 62´ ti

p O out

rout

,

and X 1 X rout pO C In .ti /  t out In .ti / D 0; i62´ i62´ i pOout Oout respectively. However, we have used R software for finding the MLE’s of both shape and scale parameters. Now we have   X X I n.L ZO / D max´ ri n In .pOi n / C rout In .pOout / C .pOi n  1/ In .ti / C .pOout  1/ In .ti /

Copyright © 2013 John Wiley & Sons, Ltd.

i2´ i2´ ti

Oi n

pO i n

P



i62´

i62´ ti

pO out

Oout Statist. Med. 2014, 33 1867–1876

1869

     ri n I n Oi n  rout I n Oout 

P

V. BHATT AND N. TIWARI

Under H0 Wi n D out , we have   X X In .ti / C .pOout  1/ In .ti / I n LO 0 D max´ ri n In .pOi n / C rout In .pOout / C .pOi n  1/ 

i2´



i62´

 Rln OG  R O O 0 or more conveniently, the As the test statistic, we use maximum likelihood ratio max´ L.Z/= L maximum LLR

b

O0 max´ l nL.Z/= lnL The censoring information is incorporated into the likelihood functions L and L0 through ri n , rout and R, which are functions of the censoring indicator ı. For the alternative i n ¤ out , we can now write the test statistic as  R  R P R p  rout OG i 2G t D max  D  ri n Oout ri n  i rout ri n rout max´ Oi n P

i 2´ ti

For the alternative i n < out , this function is multiplied by I   alternative i n > out , it is multiplied by I P ri nt p > P routt p . i 2´ i

P

pi n

i 62´ ti

 P

ri n p i 2´ ti

pout


out /. 2.3. Power and precision of cluster detection

1870

To test the performance of the proposed method, we performed a simple simulation study. We simulated survival data of 66 non censored individuals. For the geographical locations, we used real data, consisting of the locations of home residence of individuals going for treatment of TB in Nainital district of Uttarakhand, India during the years 2004–2005. To test the robustness of the developed method, different data sets from exponential, gamma, log normal, and Weibull distributions were generated. The power Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1867–1876

V. BHATT AND N. TIWARI

of test to detect a cluster will depend on the number of factors, so data were generated under different scenarios using different values of scale and shape parameters inside and outside the true cluster. We first generated 1000 random data sets under null hypothesis to obtain the critical values at the significance level (˛/ of 0.05 for each distribution. Then 1000 random data sets were generated under each alternative hypothesis to estimate power, sensitivity, and positive predicted value (PPV). Power was estimated as the proportion of the number of rejected data sets out of 1000 at ˛ D 0:05. To evaluate the precision of the detected cluster, we estimate sensitivity as the proportion of individuals correctly detected among the cases in true cluster and PPV as the proportion of the number of cases belonging to true cluster among the cases in the detected cluster. We estimate sensitivity and PPV, respectively, by M 1 X number of individuals in both true and detected clusters in mth stimulation M mD1 number of individuals in true cluster in mth stimulation

and M 1 X number of individuals in both true and detected clusters in mth stimulation : M mD1 number of individuals in detected cluster in mth stimulation

The value of PPV and sensitivity lies between 0 and 1, and larger values of PPV and sensitivity together means high precision for detecting the right location of the cluster. However, a large value of PPV and small value of sensitivity together does not mean a good precision, and vice versa.

3. Cluster detection analyses of tuberculosis data in Nainital district of Uttarakhand, India using Weibull model Tuberculosis, a disease cause by the bacterium Mycobacterium tuberculosis, has affected mankind for over 5000 years, and the disease continues to be a major cause of morbidity and mortality. According to the WHO Report 2004 on Global TB Control, India is sharing 20% burden of TB patient worldwide, and is leading the 22 high burden countries in the world. The estimated number of TB cases in India is 422.6 million (44% of the total population) with 1.8 million people developing TB every year and nearly 0.5 million dying annually due to TB. More than 1000 people in a day and one in every minute die of TB in India [13]. To contain this scourge, the National Tuberculosis Control Program (NTCP) was adopted in India in 1962. However, the desired results were not forthcoming. On the recommendation of an expert committee in 1992, a revised strategy known as Revised NTCP (RNTCP) was adopted to control the spread of TB in India. The goal of RNTCP is to cure at least 85% of new sputum-positive patients after achieving the target of detecting 70% of the newly infected sputum-smear-positive cases. 3.1. Tuberculosis data of Nainital district of Uttarakhand

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1867–1876

1871

We have collected the secondary data of tuberculosis patients from Nainital district of Uttarakhand, India. Nainital is one of the thirteen districts of the newly carved state of Uttarakhand, India. The district lies between 29.38˚ North to 79.45˚ East and encompasses a geographical area of 3968 km2 from low-lying foothill belt in the south to the lesser Himalayan ranges in the north. The district has a total population of 7,62,909 persons as per the record of Census of India, 2001, which is distributed in 1076 villages and as many as seven urban centers. Geographically, the district is divided into two zones hills and bhabar. Administratively, the district has been divided into eight development blocks. The location of the study area is shown in Figure 1. The secondary data was obtained from TB registers available at the offices of District Tuberculosis Centers, Tuberculosis Units, and Microscopic Centers. The secondary data consists of two years data (2004–2005) collected from the District Tuberculosis Center Haldwani, Tuberculosis Units (Hospital Ramnagar, Combined Hospital Padampuri and DTC Haldwani) and Microscopic Centres (S.S.J Base Hospital Haldwani, Additional PHC Lalkuan, Jeewandan Medical Hospital Motinagar, B.D Pandey Hospital Nainital, C.H.C. Kotabag, P.H.C. Bailparao, C.H.C. Betalghat, Combined Hospital Ramnagar, Bhawali Sanitorium (Bhawali), C.H.C. Bhimtal, Combined Hospital Padampuri and P.H.C Okhalkanda) working under RNTCP.

V. BHATT AND N. TIWARI

Figure 1. Study area, i.e., Nainital district of Uttarakhand, India.

3.2. Weibull scan statistic for tuberculosis data In this section, we examine the patterns of survival times in Nainital district of Uttarakhand, India using the spatial scan statistic with the proposed Weibull model. TB registers were used to obtain data for the survival times of TB patients during the years 2004–2005. Survival times were georeferenced to the addresses of individuals registered in the registers. We have observed a total of 1109 patients. Out of these 1109 patients, 774 patients were retained and 335 cases were discarded due to incomplete information. Out of the observed 774 cases, 66 cases have complete time to death follow-up, while 708 cases were right censored. We have used the Weibull spatial scan statistic to detect both the longer survival time clusters and shorter survival time clusters. The most likely cluster with significantly longer survival time was obtained at Motahaldu block of Nainital district, with LLR of 19.5547 (p D 0:001) for which individuals living within this block have longer survival time, which means that the death due to TB is minimum in this block. A statistically significant secondary zone with longer survival time was obtained at Kotabag and Ramnagar blocks with LLR of 15.5894 as presented in Table I. The most likely cluster for shorter survival time was obtained at Bhimtal and Ramgarh blocks of Nainital district with LLR of 30.0374 and statistically significant secondary cluster was obtained at Dhari and Okhalkanda block with p-value 0.001 and LLR of 15.7622 as presented in Table I. Most likely cluster with secondary zones for shorter and longer survival times are shown in Figure 2 and Figure 3, respectively. 3.3. Power, positive predicted value, and sensitivity analysis for the proposed model

1872

Let Z denote the true cluster and Z c denote the remaining blocks of the Nainital district. For true cluster at Motahaldu block with lower observed number of cases than the rest of the district, we randomly simulate the survival time with scale parameter ´ and shape parameter p´ inside the cluster and with ´ c and p´ c for individuals outside the cluster. For all simulations, we set ´ c D 9 and p´ c D 9 outside the cluster for different parameter values of ´ and p´ inside the cluster. We have simulated 1000 random data sets for each cluster scenario and power is estimated as the proportion of the 1000 random data sets for which the null hypothesis was rejected. Even when the null hypothesis is rejected, the detected cluster is usually not exactly the same true cluster. Thus spatial accuracy of the detected cluster was obtained using sensitivity and PPV. For the random data set generated Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1867–1876

V. BHATT AND N. TIWARI

Table I. Cluster detection analysis results for tuberculosis data in Nainital district of Uttarakhand, India, using Weibull model. Cluster

Observed cases

Expected cases

Obs/Exp

LLR

p-value

5

51.04

0.10

19.5547

0.001

16

65.53

0.24

15.5894

0.001

28

10.47

2.63

30.0374

0.001

2.69

15.7622

0.001

(Longer survival) Most likely cluster:Motahaldu Secondary clusters:Kotabag and Ramnagar (Shorter survival) Most likely cluster: Bhimtal and Ramgarh Secondary clusters:Dhari and Okhalkanda

15

5.5622

Clusters with longer and shorter survival times with observed and expected number of cases within the cluster. LLR is the log likelihood ratio.

Figure 2. Significant high rate zones for shorter survival time in Nainital, Uttarakhand, India.

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1867–1876

1873

Figure 3. Significant high rate zones for longer survival time in Nainital, Uttarakhand, India.

V. BHATT AND N. TIWARI

Table II. Estimated power, sensitivity, and positive predictive value when the Weibull scan statistic is used to detect different types of clusters. In cluster Scale parameter

Outside the cluster

Shape parameter

Scale parameter

Shape parameter

Power

PPV

Sensitivity

3 3 3

1 3 5

3 3 3

9 9 9

0.99 0.74 0.32

1.00 0.97 0.88

1.00 096 056

5 5 5

1 3 5

5 5 5

9 9 9

0.99 0.98 0.72

0.93 0.92 0.75

0.83 0.77 0.62

7 7 7

1 3 5

7 7 7

9 9 9

0.99 0.99 0.77

0.99 0.97 0.79

0.99 0.89 0.71

3 5 7

5 5 5

9 9 9

5 5 5

1.00 1.00 0.67

1.00 1.00 0.97

1.00 1.00 0.90

3 5 7

3 3 3

9 9 9

3 3 3

1.00 0.95 0.74

1.00 0.95 0.74

1.00 0.93 0.63

3 5 7

2 2 2

9 9 9

2 2 2

0.96 0.43 0.21

0.99 0.72 0.66

0.99 0.58 0.47

Outside the cluster, the value of shape and scale parameter is fixed at 9 and for different variations in shape and scale parameters for inside the cluster is given in this table. PPV, positive predictive value.

from Weibull distribution, if scale is kept constant inside and outside the cluster, greater the difference between shape parameter at different values of scale parameter, greater will be the power of the test. If shape is kept constant inside and outside the cluster, then power will be high if the difference between scale parameter is large. The results of power, PPV, and sensitivity analysis are presented in Table II. We have also obtained power of the test for randomly simulated data sets from different distributions. The power is approximately same (i.e., high) for the simulated data from Weibull distribution and Gamma distribution having different scale and shape parameter inside and outside the cluster. Sensitivity value and PPV are also high having good precision strength of clusters. For the simulated data from exponential with varying mean, power keeps on decreasing as the difference between inside and outside cluster means decreases and the same effect is on the PPV and sensitivity values. The same is the case with log normal distribution. As presented in Table III, it is clear that the Weibull model test statistic has low power for exponential distribution, but it works well for other distributions like log normal and gamma distribution. A comparison was made for Weibull and exponential based statistic and the power analysis shows that Weibull and exponential models are working in almost similar manner for the random data set generated from different distributions. However, for the random data set generated from log normal distribution, Weibull scan statistic has more power than exponential scan statistic. The results for power analysis with PPV and sensitivity measures are shown in Table III and Table IV for Weibull and exponential models, respectively. It may be noted that lower power for the exponential survival times has more to do with the greater variability in survival times produced by that distribution and less to do with the miss specification of the distribution.

4. Discussion

1874

We have proposed a spatial scan statistic for survival time data based on the Weibull model to determine if there are geographical clusters of people with shorter than the expected survival time. The Weibull distribution is more general than the exponential distribution that was currently being used when spatial scan statistics were applied to survival data. Spatial scan statistic quantifies the spatial region into areas Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1867–1876

V. BHATT AND N. TIWARI

Table III. Power, sensitivity, and positive predictive value analysis for Weibull statistic for different distributions. In cluster Distribution

Outside the cluster

Scale parameter

Shape parameter

mean

Scale parameter

Shape parameter

mean

Power

PPV

Sensitivity

3 5 7

5 5 5

2.76 4.57 6.45

9 9 9

9 9 9

8.52 8.52 8.52

1.00 1.00 0.97

1.00 1.00 0.97

1.00 1.00 0.96

3 5 7

5 5 5

15.09 25.07 35.23

9 9 9

9 9 9

79.1 79.1 79.1

1.00 1.00 0.99

1.00 1.00 1.00

1.00 1.00 0.99

3 5 7

0.5 0.5 0.5

1.01 3.50 6.47

9 9 9

9 9 9

99.23 99.23 99.23

0.99 0.97 0.94

0.97 0.88 0.85

0.95 0.86 0.82

3 5 7

— — —

3.11 4.89 7.01

9 9 9

— — —

8.98 8.98 8.98

0.75 0.49 0.38

0.85 0.74 0.62

0.78 0.60 0.51

Weibull

Gamma

Log normal

Exponential

PPV, positive predictive value.

Table IV. Power, sensitivity, and positive predictive value analysis for exponential model for different distributions. In cluster Distribution

Outside the cluster

Scale parameter

Shape parameter

mean

Scale parameter

Shape parameter

mean

Power

PPV

Sensitivity

3 5 7

5 5 5

2.76 4.57 6.45

9 9 9

9 9 9

8.52 8.52 8.52

1.00 1.00 0.97

1.00 1.00 0.99

1.00 1.00 0.98

3 5 7

5 5 5

15.09 25.07 35.23

9 9 9

9 9 9

79.1 79.1 79.1

1.00 1.00 0.99

1.00 1.00 1.00

1.00 1.00 0.99

3 5 7

0.5 0.5 0.5

1.01 3.50 6.47

9 9 9

9 9 9

99.23 99.23 99.23

0.70 0.67 0.64

0.79 0.77 0.54

0.49 0.46 0.43

3 5 7

— — —

3.11 4.89 7.01

9 9 9

— — —

8.98 8.98 8.98

0.79 0.51 0.41

0.85 0.69 0.65

0.81 0.61 0.51

Weibull

Gamma

Log-Normal

Exponential

PPV, positive predictive value.

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1867–1876

1875

of potential clusters versus the rest of the study region and conducts a likelihood ratio test. The proposed spatial scan statistic has been applied to the real data set of TB patients in Nainital district of Uttarakhand, India. The findings of the survival times for TB patients suggest that the area having observed number of cases less than the expected have longer survival times showing good treatment/health facilities at Motahaldu, Ramnagar and Kotabag blocks of the district. The areas having observed number of cases more than the expected have shorter survival times showing lack of good treatment and health facilities. The fact is obvious, as the most likely clusters for shorter survival were obtained at Ramgarh and Bhimtal blocks, whereas secondary clusters were found at Okhalkanda and Dhari blocks, which are located at the interiors of the Nainital district and medical facilities are not that good in interior areas as in the urban areas where the clusters have longer survival time than the rest of the blocks. Also, the people in interior

V. BHATT AND N. TIWARI

blocks of the district lack in awareness about the symptoms, causes and treatment of TB, which results in comparatively larger number of deaths than the rest of the district and thus corresponds to the cluster with shorter survival time. In the present study of the survival times for TB patient data, we have collected the records of patients from the TB registers in which the information regarding the treatment was specified. Survival time of a patient is taken to be not censored if the patient died in the course of treatment, i.e., if we have complete time to death information from the starting date of treatment. In any one of the cases where treatment is complete, patient has defaulted (left the treatment in the middle), transferred out (shifted his treatment place), or cured, we do not have complete time to death information and the data is assumed to be right censored. We have type I censored data as we have collected the survival times for the duration of DOTS program and test whether the patient has recovered or died in the course of treatment after TB has been diagnosed. The simulation study suggests that the Weibull based spatial scan statistic has good power for different types of divergence from the null hypothesis and detects cluster fairly precisely in most of the alternative cases. The Weibull model does not lead to biased p-values associated with most likely clusters even when the true survival times are not distributed by Weibull distribution. This is because the permutation of the observed coordinates and survival times ensures that the correct ˛ level is preserved, no matter how the survival times are distributed. The proposed method has been presented for purely spatial analysis in this paper. Scan statistic has also been developed for the space–time setting to detect clusters that exist in both space and time, in either a retrospective [14] or prospective situation[15]. The proposed method can be extended to such settings. The values of power, sensitivity, and PPV for different cluster scenarios with different scale and shape parameter suggests that it has a good power for almost all the cluster scenarios. For survival data simulated under different distributions, the proposed method has good power. The calculations for scan statistic were performed using the R software [16]. The computing time for TB analysis using the proposed method is approximately 20–25 min with R software.

Acknowledgements The authors are grateful to the two referees for their constructive comments, which led to considerable improvement in presentation of this work. Moreover, the TB data of the Nainital district of Uttarakhand, India used for this study was collected under Indian Council of Medical Research (ICMR) Project File No. : 5/8/5/6/2006-ECD-I. The authors are thankful to ICMR, New Delhi, India for supporting this project and allowing us to use the TB data for this work.

References

1876

1. Kulldorff M, Nagarwalla N. Spatial disease clusters-detection and inference. Statistics in Medicine 1995; 14:799–810. 2. Glaz J, Pozdnyakov V, Wallenstein S. Scan Statistics- Methods and Application. Springer: Birkhauser, Boston, 2009. 129–142. 3. Kulldorff M. A spatial scan statistic. Communication in Statistics- Theory and Methods 1997; 26:1481–1496. 4. Jung I, Kulldorff M, Klassen A. A spatial scan statistic for ordinal data. Statistics in Medicine 2007; 26(7):1594–1607. 5. Huang L, Kulldorff M, Gregorio D. A spatial scan statistic for survival data. Biometrics 2007; 63:109–118. 6. Kulldorff M, Huang L, Konty K. A scan statistic for continuous data based on normal probability model. International Journal of Health Geographics 2009; 8:58. 7. Kulldorff M, Mostashari F, Duczmal L, Yih WK, Kleinman K, Platt R. Multivariate scan statistics for disease surveillance. Statistics in Medicine 2007; 26(8):1824–1833. 8. Cook AJ, Gold DR, Li Y. Spatial cluster detection for censored outcome data. Biometrics 2007; 63:540–549. 9. Cox DR. Regression models and life -tables. Journal of the Royal Statistical Society B 1972; 34(2):187–220. 10. Kulldorff M, Huang L, Pickle L, Duczmal L. An elliptic spatial scan statistic. Statistics in Medicine 2006; 25:3929–3943. 11. Cohen AC. Maximum likelihood estimation in Weibull distribution based on complete and on censored samples. Technometrics 1965; 7:579–588. 12. Dwass M. Modified randomization test for non parametric hypothesis. Annals of Mathematical Statistics 1957; 28:181–187. 13. Directorate General of Health Services: TB India, Central TB Division, New Delhi, 2005. RNTC Status Report. [http://www.tbcindia.org] webcite. 14. Kulldorff M. Prospective time-periodic geographical disease surveillance using a scan statistic. Journal of the Royal Statistical Society A 2001; 164:61–72. 15. Kulldorff M, Athas W, Feuer E, Miller B, Key C. Evaluating cluster alarms: a space time scan statistics and brain cancer in Los Alamos. American Journal of Public Health 1998; 88:1377–1380. 16. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2012. ISBN 3-900051-07-0, http://www.R-project.org/.

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1867–1876

A spatial scan statistic for survival data based on Weibull distribution.

The spatial scan statistic has been developed as a geographical cluster detection analysis tool for different types of data sets such as Bernoulli, Po...
698KB Sizes 0 Downloads 0 Views