American Journal of Epidemiology Copyright © 1992 by The Johns Hopkins University School of Hygiene and Public Health All rights reserved

Vol. 136, No. 6 Printed in U.S A

The Analysis of Regional Patterns in Health Data II. The Power to Detect Environmental Effects

S. D. Walter

Three measures of spatial clustering (Moran's /, Geary's c, and a rank adjacency statistic, D) were evaluated for their power to detect regional patterns in health data. The patterns represented various environmental effects: a latitude gradient; residence near a contaminated water supply; disease "hot spots"; relation to socioeconomic status and urbanization; and general spatial autocorrelation. While the methods had high power to detect certain patterns, they were also affected by factors such as the shape of the map, its regional structure, and the spatial distribution of explanatory variables. The power was sometimes low, even for strong geographic trends, particularly for D. Moran's / had the highest power most often. We conclude that use of these methods requires careful specification of the anticipated geographic pattern and awareness of idiosyncratic effects in the study of particular maps. Am J Epidemiol 1992;136:742-59. environment; geography; vital statistics

The motivation for carrying out a spatial analysis of regional health data is often to generate or test hypotheses concerning disease etiology. Such analyses are often based on subjective assessments of disease maps, or on statistical methods that fail to take the spatial aspects of the data into account, thus introducing possible bias (1). It therefore seems desirable to use a more formal approach to the detection of spatial patterns. In earlier work (1), we investigated how three measures of spatial aggregation

(Moran's /, Geary's c, and the rank adjacency statistic D; refs. 2-4) were affected by case sample sizes and by characteristics of the populations under study. Appropriate critical values for the indices were established, taking these regional factors into account. Although these indices have been used in previous analyses of spatial pattern (4-9), little is known about their power. Some simulation studies have found that / and c tend to have similar power levels, with / having the advantage (7). Also it appears that nonparametric join count methods are generally less powerful, presumably because of their loss of information by reducing the data to a binary classification of regions as being at high or low risk. However, this work has mostly used simple data lattices; two exceptions are empirical studies of / with county data from Eire (7) and of D in certain British counties (10). No work has been done to assess the potential of these methods to detect the patterns in regional health data that might be expected under the influence of environmental effects.

Received for publication June 13, 1991, and in final form, May 22, 1992 Abbreviations: CV, coefficient of variation; SD, standard deviation. From the Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario L8N 3Z5, Canada. (Reprint requests to Dr. Walter at this address.) This project was supported in part by research grants from the Ontario Ministry of Health and from the Natural Sciences and Engineering Research Council of Canada. The author is the holder of a National Health Scientist award from the National Health Research Development Program of Canada. The research assistance of Susan Birnie and Ken Stam is gratefully acknowledged.

742

Power to Detect Environmental Health Effects

In this paper, we will investigate the power of /, c, and D to detect typical geographic patterns associated with various types of environmental effect. Each type of pattern was characterized by a parameter that was allowed to vary, representing different strengths of spatial aggregation. Once the power had been estimated for each parameter value, it was compared empirically with Ontario cancer incidence rates, to see if there might be adequate power to detect spatial aggregation in practice. METHODS Population under study

This work was based on cancer incidence in the 49 counties of the province of Ontario, Canada; an outline of the county boundaries is given in figure 1. The total population size in the 1981 census was approximately 8.6 million. The county-specific populations and age distributions were also derived from the census. A typical map showing the regional variation in cancer incidence is shown in figure 1 of the companion paper (1). The provincial incidence data were for the period 1975-1986, during which there were approximately 350,000 cases recorded in the Ontario Cancer Registry (11, 12). Regional patterns of disease rates

Six types of spatial pattern were considered. The first three were created on a geographical basis; these were a gradient of cancer risk with latitude, a risk excess associated with residence near a potentially contaminated drinking water source, and a risk excess in one or more "hot spot" areas. There were also two patterns generated by supposing that cancer incidence was related to explanatory variables, namely, socioeconomic status or the level of urbanization. The final pattern used a general autocorrelation, which induced neighboring regions to have positively correlated rate values, but without imposing any particular geographic pattern a priori. More specifically, the patterns were as follows.

743

Latitude pattern. This pattern represented a general north-south trend in incidence, such as has been suggested for melanoma (13). To generate the pattern, the expected cancer incidence rate for each county was assumed to be linearly related through a simple regression equation to the latitude of its largest population center (see Appendix for details). A zero value of the regression coefficient, b, corresponds to the null hypothesis of no spatial pattern, while increasingly large values of b correspond to stronger latitude gradients in incidence. Lakefront pattern. Contamination of drinking water and an association with cancer has been of concern for some time (1417). In particular, toxic discharges into the lower Great Lakes could affect very large populations in Ontario and several northeastern US states. To investigate whether such an effect would be detectable in a spatial analysis, a pattern was generated whereby the 18 counties adjacent to Lakes Erie and Ontario were assumed to have a cancer incidence rate that exceeded the rate for the 31 other counties by a certain amount. Here, the difference between lakefront "exposed" and "unexposed" counties is the parameter b that represents the strength of the spatial effect: as b increases, the rate excess in lakefront counties becomes more pronounced and should be more easily detected. A related pattern was to define the 28 counties adjacent to any of the Great Lakes to be "exposed," relative to the 21 counties without a lake boundary. Hot spot patterns. Many investigations of spatial pattern focus on apparent excesses of disease around particular entities such as nuclear power facilities (18, 19) or in a particular area. To reflect this possibility, simulations were carried out in which a hot spot focus of disease excess was assumed in several adjacent counties. Two typical hot spot simulations will be described here. First, the hot spot was defined through Hamilton and three neighboring counties that were subject to a disease excess, b. All other counties were assumed to have constant background risk. In the second simu-

744

Walter

N

t

100 km.

FIGURE 1. Map of Ontario, Canada, showing the boundaries of its 49 counties.

lation, Hamilton was again defined as the hot spot focus with excess risk b, but it was assumed that disease excesses declined geometrically as distance from Hamilton increased. Specifically, counties immediately adjacent to Hamilton were assumed to have an excess risk of ]hb, counties two boundaries away from Hamilton to have an excess risk of lAb, and counties more than two boundaries away to have the baseline risk. Other hot spot patterns were considered, including those with more than one hot spot

focus, different relations of the disease excess to the distance from the hot spot, and different county locations for the hot spots. However, the results on power were all qualitatively similar, so they will not be presented in detail. Socioeconomic pattern. Socioeconomic status has been identified as a correlate of cancer risk (20, 21), so simulations were carried out with the Ontario data using the Canadian census definition of socioeconomic status. County-specific socioeco-

Power to Detect Environmental Health Effects

nomic status values were used in a linear regression to relate them to the expected cancer incidence, in the same way as for the latitude model (see Appendix). Urbanization pattern. Urbanization, or population density, has also been noted as being associated with cancer (22-25). We used the census definition of urbanization in a linear regression, as was done for the analyses of latitude and socioeconomic status. General autocorrelation pattern. In the last set of analyses, a pattern was induced by assuming a general spatial autocorrelation to apply among the county incidence rates. In this model, the rates for adjacent pairs of counties have a certain positive correlation (p), whereas nonadjacent counties are not directly correlated. Of course, two counties separated by an intervening county will also tend to have a positive correlation, induced by the common intervening value. More distant county pairs will have progressively weaker correlations. The overall pattern is thus one of groups of several neighboring counties having elevated or reduced rates. The strength of the effect is determined by the magnitude of p. Technical details of this pattern are given in the Appendix. Simulation method

Each of the patterns described above defines the expected values of the disease incidence rate in each county. The actual rates were simulated by Monte Carlo sampling for the numbers of cancer cases. The simulations took the size of the county populations and their age structures into account, as was described in the companion paper (1). Random variation in the number of cases was incorporated using normal distributions, as before. Each of the six types of spatial pattern was simulated with the parameter b varying from 0 to 1 in intervals of 0.1, representing a range of patterns from randomness to strong spatial clustering. Within each simulation, the number of cancer cases was generated for each 5-year age group in each of the 49

745

counties; expected frequencies were based on the provincial figures for all cancers combined, for males and females separately. Results were very similar for both sexes, so only male results are reported here. There were 200 simulation samples for each value of b within each pattern. The power was estimated from the percentage of samples in which each index exceeded its 5 percent one-sided critical value, established earlier. With 200 samples, the standard error in each estimated power value is 3.5 percent if the actual power is about 50 percent (the worst case). Comparison of the relative performance of the three indices is enhanced by considering all the power estimates over the range of parameter settings, as the strength of the spatial pattern is modified. RESULTS Latitude pattern

Figure 2 shows the power curve for /, c, and D with the male data under the latitude gradient pattern. At b = 0, all three indices have 5 percent power, corresponding to type I errors when the null hypothesis of no spatial pattern is true. High power values are attained very rapidly, and for b > 0.2, the power is essentially 100 percent. There are no substantial differences in power between the three indices. Lakefront pattern

Figure 3 shows corresponding results for the lakefront pattern. Here / and c both attain high power for b > 0.4, but / has superior power for lower values of b. Interestingly, the power of D does not converge to 100 percent as b increases. In fact, it appears to reach a limiting value of about 80 percent above b = 0.2. The inferior performance of D arises because it is based on the ranked data, rather than on the cancer incidence rates themselves. When the spatial pattern has attained a certain prominence with a suitably large value of b, no further information can be incorporated in the nonparametric calcula-

746

Walter

Power (%) XU^XSS^^^^^

80-

60-

40-

20-

0

0.2

0.4

0.6

0.8

1

Regression coefficient b FIGURE 2. The power of three methods of spatial analysis (Moran's /, Geary's c, and the rank adjacency statistic D; refs. 2-4) to detect a latitude pattern.

tion of D by increasing b. Hence, there is a threshold value of b (approximately 0.2 in figure 3) that is the point by which counties have already been clearly separated into the two groups of high and low risk. Simulations for b above the threshold differ only by random fluctuation of county ranks within the high- and low-risk groups, but not between them. When the lakefront model was extended to allow for high risk in counties adjacent to any of the Great Lakes, all three indices showed a total lack of power (figure 4). In fact, the power curves for / and c drop below 5 percent when b is large. This anomalous finding is a result of the configuration of the Ontario counties and the Great Lakes. Inclusion of all Great Lakes counties as high risk

creates a situation where an outer group of high-risk counties surrounds an inner group of low-risk areas (see figure 5). In fact, all low-risk counties have a boundary with at least one high-risk county. Because of this idiosyncratic pattern of contiguity, adjacent areas do not have positively correlated rates on average; hence, power to detect the pattern is absent. The tendency of power to drop below 5 percent for / and c may be because this arrangement of high- and lowrisk counties actually induces an overall negative correlation between adjacent data values. Hot spot patterns

Results for the first hot spot pattern are shown in figure 6. Again, / and c reach

Power to Detect Environmental Health Effects

747

Power (%) 100 -,

80-

60-

40-

20-

0

0.2

0.4

0.6

0.8

1

Regression coefficient b FIGURE 3. The power of three methods of spatial analysis (Moran's /, Geary's c, and the rank adjacency statistic D; refs. 2-4) to detect a lakefront pattern: expected excess of cancer cases associated with the lower Great Lakes.

approximately 100 percent power for b > 0.2 (with an advantage for / in smaller b values), while D has a maximum power of only about 20 percent. Again this reflects the limited information available from the ranked data, even for pronounced hot spots. Figure 7 shows the results for the second hot spot model with a dispersed effect over counties up to two boundaries away from the focus. Here there are more counties involved in the hot spot, and more possible risk levels. The result is that D can achieve high power for sufficiently large b, and actually has a power advantage over c for the intermediate value b = 0.1.

able, that for socioeconomic status; the power results are shown in figure 8. The results indicate an advantage of / over c, with D having intermediate power. Examination of the pattern of socioeconomic status itself (see figure 9, male data) shows some spatial structure, with areas of relatively high socioeconomic status in the urban centers around Lake Ontario and in the northwest. The / and D statistics were both significant when applied to the county-specific socioeconomic status data. Here we see that the spatial structure in the explanatory variable (socioeconomic status) is sufficient to induce a spatial structure in the cancer incidence rates.

Socioeconomic status pattern

Urbanization pattern

We now turn to the first of the patterns modelled directly on an explanatory vari-

Figure 10 shows the results when cancer incidence is assumed to be linearly related

748

Walter

Power (% 100-

80

60

c 40-

20-

0.2

0.4

0.6

0.8

1

Regression coefficient b FIGURE 4. The power of three methods of spatial analysis (Moran's /, Geary's c, and the rank adjacency statistic D; refs. 2-4) to detect a lakefront pattern: expected excess of cancer cases associated with all Great Lakes.

to urbanization. / approaches 100 percent power more slowly in this case, while c and D lack substantial power for any value of b. In fact, the latter indices show declining power with large values of b. To explain the results for c and D, it is necessary to consider the spatial pattern of urbanization itself (figure 11). Visual inspection of the map reveals no clear-cut tendency for urban areas to aggregate. The spatial indices support this, with none being significantly different from the random spatial pattern when applied to the urbanization values. This surprising result may be because the census defines the urbanization level of a county as the percentage of its population residing in an area with a population concentration of 1,000 or more and a population density of at least 400/km2. Thus, counties that are large in

area and have low overall population densities may be defined as relatively urban if most of their inhabitants live in cities. Despite creating some anomalies, this type of definition is appropriate if one is concerned with influences on cancer arising from the local environment rather than from the environment in distant parts of the county of residence. However, a consequence is that the spatial organization of urban areas is lost at the county level of analysis: other levels of scale might reveal a different picture (26). In view of the disappointing power values obtained with continuous representation, urbanization was reexamined with a binary (median split) definition of counties as urban or rural. The power was even lower than before, and / approached only 40 percent power for large b values. However, a

Power to Detect Environmental Health Effects

749

N

REGIONS NOT BORDERING GREAT LAKES

REGIONS BORDERING GREAT LAKES FIGURE 5. model.

Map of Ontario, Canada, showing the pattern of high- and low-risk counties under the second Iakefront

subgroup analysis of the counties in southern Ontario showed a rapid rise to 100 percent power for all three indices. These divergent results indicate the potentially strong effects of the map shape and structure and of how the explanatory variable is represented. In particular, the relatively poor power for continuous urbanization in the whole province illustrates that an effect of this variable might not be easily

identified through a spatial analysis. This is in some contrast to the earlier findings for socioeconomic status as the explanatory variable. General autocorrelation pattern

The final set of results for the general autocorrelation model (figure 12) show that / has a small power advantage over c and D,

750

Walter

0.2

0.4

0.6

0.8

1

Regression coefficient b FIGURE 6. The power of three methods of spatial analysis (Moran's /, Geary's c, and the rank adjacency statistic 0; refs. 2-4) to detect a hot spot pattern: expected excess of cancer cases associated with four contiguous counties.

which have similar power for all values of p. All three statistics achieve approximately 100 percent power for p > 0.7. Comparison of power results to empirical variation in cancer incidence

The results thus far allow immediate comparisons of the relative power of /, c and D to respond to various spatial patterns. However, it is also instructive to relate them to the empirical variation in the data. Unless there exists a sufficient level of regional variation in the disease rate, the spatial analysis will have limited power. We will therefore develop an interpretation of the numerical values of/? relative to the empirical variation in the data.

As shown in the Appendix, in most situations one may interpret b as twice the proportional change in incidence expected under the assumed spatial pattern in the highest risk area relative to the mean. To use this fact in relation to the empirical variation in the data, we may recall that for normally distributed variables, about 95 percent of the data will lie between limits two standard deviations (SD) above and below the mean. Hence, the maximum and mean risks are approximately related through Rmax =

2SD«.

To summarize the variation in the set of data values, the coefficient of variation (CV) is defined as SDR/R. Combining these re-

Power to Detect Environmental Health Effects

751

Power (%) 100-

80

60-

40-

20-

0

0.2

0.4

0.6

0.8

1

Regression coefficient b FIGURE 7. The power of three methods of spatial analysis (Moran's /, Geary's c, and the rank adjacency statistic D; refs. 2-4) to detect a hot spot pattern: expected excess of cancer cases associated with one county, with diminishing excess in neighboring counties.

suits, we have b = 4 CV.

(I)

Equation I may be used to compare the coefficient of variation in the actual data with the power results shown earlier. For instance, suppose one wished to test the socioeconomic status hypothesis. Figure 8 shows that b must be larger than about 0.3 in order to have a power of at least 80 percent. When equation 1 is used, this corresponds to requiring that the coefficient of variation be 0.075 or greater: CV = 0.3/4 = 0.075. A more accurate calculation is to use the expected value of the maximum of n vari-

ables with standard normal distributions, the extreme order statistic (27). For n = 49, as in Ontario, this value is 2.24. Thus, equation 1 should be replaced by b = 4.48 CV for greater accuracy. Table 1 shows the distribution of values of the coefficient of variation for 39 sexspecific cancer sites in Ontario. Most values lie between 0.1 and 0.4, suggesting reasonable power against some of the simulated patterns. It should be noted, however, that the simulations were carried out using incidence rates for all cancers combined; the additional variability associated with smaller case frequencies for particular cancer sites may cause the power to be lower. Also shown in table 1 are the distributions of / and I/Imax, where Imax is the maximum

752

Walter

Power (%) 100-,

80-

60

40-

20-

0

0.2

0.4

0.6

0.8

1

Regression coefficient b FIGURE 8. The power of three methods of spatial analysis (Moran's /, Geary's c, and the rank adjacency statistic O; refs. 2-4) to detect an association of cancer incidence with socioeconomic status.

value of / implied by the contiguity matrix of the Ontario county map (7, 8). Almost all values of / greater than 0.1 were significant at the 5 percent level, whereas figure 12 suggests that 50 percent power is achieved only for b > 0.3. The fact that more cancer sites show significant spatial autocorrelation than was expected from the simulations indicates that the empirical data contain spatial structures that are richer than the general autocorrelation model. For instance, if cancer incidence is associated with environmental or other explanatory variables that are spatially organized (e.g., socioeconomic status), the regional rates may assume a pattern that deviates from the general autocorrelation model in a way that yields greater power.

DISCUSSION

Most of the results suggest that / has slightly higher power than c. This is consistent with earlier work with simple lattice format data (7). Given the conceptual similarity o f / t o the ordinary correlation coefficient, it seems to be the index of choice. The power results also reveal a previously unrecognized characteristic of D, namely, that its power may be severely limited with certain types of spatial patterns. In particular, patterns involving only a few regions at high risk (e.g. the first hot spot model), or in which a few different risk levels are being compared (e.g., the lower Great Lakes analysis), may have low power for detection. The generally inferior performance of D reflects

Power to Detect Environmental Health Effects

753

N

TOPQUINTILE 2NDQUINTILE 3RDQUINTILE §§§§ 4TH QUINTILE BOTTOM QUINTILF FIGURE 9.

The distribution of socioeconomic status in Ontario, Canada, by county.

the limitations of nonparametric data; low power has also been found for join count methods, in which the data are reduced still further to a simple binary classification (7). All three statistics had low power for some patterns, for instance, in the "all Great Lakes" lakefront analysis. This illustrates the necessity to consider carefully the contiguity structure of the map and the appearance of

a postulated spatial pattern in the map. Despite the obvious pattern implied by the hypothesis, in cases in which there is a binary high/low risk hypothesis, such as the Great Lakes analysis, it appears preferable to abandon the spatial autocorrelation approach in favor of simpler comparisons of high- and low-risk regions, but without direct regard to their spatial relation.

754

Walter

0.2

0.4

0.6

0.8

1

Regression coefficient b FIGURE 10. The power of three methods of spatial analysis (Moran's /, Geary's c, and the rank adjacency statistic D; refs. 2-4) to detect an association of cancer incidence with the degree of urbanization.

Some of the other patterns involving explanatory variables were also lacking in power. For instance, c and D had low power to detect even a strong relation of cancer incidence to urbanization, because of the arrangement of the urban areas as defined by the census. This indicates that one cannot simply assume a spatial relation to apply; careful inspection of the pattern of the explanatory variable itself is required. When the urbanization analysis was restricted to only part of the province, very high power was easily obtained. The contrasting results according to whether all or part of the province is included in the analysis indicate that the shape of a map and the regional contiguity pattern within its parts can have a substantial impact on the analysis.

Certain "shape" effects of this kind have been recognized before. The edge effect in particular is well known, whereby regions at the map perimeter influence the calculation of spatial indices differently from interior regions (28, 29). Another example is an empirical discussion of the effect of the county sizes and contiguity in North Carolina in an analysis of sudden infant death syndrome (30). Because most jurisdictions have irregular regions and contiguity structure, idiosyncratic effects may often be expected. As yet, there are no guidelines to indicate the types of pattern for which power might be most affected in a particular map. The calculation of the spatial statistics themselves is relatively straightforward. However, if the indices do not show significant spatial structure in the data, one cannot

Power to Detect Environmental Health Effects

755

N

TOPQUINTILE 2NDQUINTILE 3RDQUINTILE ifjfl

4THQUINTILE BOTTOM QUINTILE

FIGURE 11. The distribution of urbanization in Ontario, Canada, by county.

be sure if it is due to lack of power. Resolving the problem will require simulation, along the lines of this paper. The hypothesized spatial pattern must be specified and incorporated into the simulated sampling, with due allowance for regional population effects. If particular types of cancer or other disease are under study, the expected case frequencies should reflect this. Note that in

the simulations performed in this paper, rates for all cancers combined were used: the power to detect patterns in particular cancers would clearly be lower. A thorough analysis will require a separate simulation for each disease. Even a finding of a significant spatial structure may not be simple to interpret. The statistical indices only indicate the over-

756

Walter

Power (%) 100-,

80-

60-

40-

20-

0

0.2

0.4

0.6

0.8

1

Value of autocorrelation FIGURE 12. The power of three methods of spatial analysis (Moran's /, Geary's c, and the rank adjacency statistic D; refs. 2-4) to detect a general spatial autocorrelation pattern.

TABLE 1. Distribution of values of the coefficient of variation (CV), Moran's statistic (/), and the ratio '/'m« for 39 sex-specific analyses of cancer sites in Ontario, Canada, 1975-1986 Value

No. of sites with CV value

No. of sites with / value

No. of sites with / / / „ „ value

0.60

0 1 11 14 8 3 1 1

11 12 7 3 6 0 0 0

11 4 7 4 3 6 3 1

Total

39

39

39

all strength of spatial autocorrelation, and do not immediately identify specific regions where the spatial clustering is most pronounced. This highlights the fact that the

indices are based on an integrated analysis of all the data, and that they are not necessarily appropriate or optimal for the study of rate aberrations in particular locations. Indeed, a disease excess that is limited to only one region will typically have limited effect on the spatial analysis. In keeping with the overall nature of the spatial indices, one should usually interpret significant results in the spatial analysis by examining the regional pattern as a whole. Usually one has the task of identifying effects that might explain the pattern in the entire set of regions. Explanatory variables may then be introduced into regression analyses, in order to reflect the spatial and nonspatial patterns in the data; spatial analysis of the regression residuals may also be considered, to identify further the effects that act on neighboring regions and that have not thus far been measured at the regional level.

Power to Detect Environmental Health Effects

One's success in this endeavor will depend in part on the spatial structure of the explanatory variables themselves, as was seen in the contrasting power results from the simulations involving the lakefront pattern, socioeconomic status, and urbanization. In some cases where the spatial analysis is significant, one may observe a "hot spot" group of regions at high risk, and these may then become the target of further investigation. Some examples of this type of response to mapped data have been published (31) including, for instance, the focused studies on exposure to asbestos and lung cancer in the shipbuilding industry, on the elevated risk for lung cancer among persons living near a Pennsylvania zinc smelter, and on the association of nasal adenocarcinoma with the furniture industry in North Carolina. In summary, the methodological difficulties identified in this paper and its companion paper indicate that it is important to recognize the spatial structure in the data; failure to do so will generally lead to a biased or inefficient analysis (1). We also have experimental evidence that subjective visual impression of spatial patterns in maps may not be completely reliable (Walter SD, unpublished manuscript); observers may be influenced by extraneous factors, such as the type of shading used in the map, as much as by the data themselves. Spatial analysis potentially yields useful results not apparent in the other approaches, but statistical characterization of spatial clustering requires careful attention to the type of pattern being investigated, in order to assure adequate power. One should be very familiar with the map under study, and alert to the possibility of edge effects and idiosyncrasies in its contiguity structure. The spatial pattern of potential explanatory variables should also be considered. Much of the methodology for spatial analysis has been developed only recently, and the number of its applications to environmental epidemiology has been limited. More experience is required with this approach, so that positive and negative results can be interpreted with greater certainty.

757

REFERENCES 1. Walter SD. The analysis of regional patterns in health data. I. Distributional considerations. Am J Epidemiol 1992; 136:730-41. 2. Moran PAP. The interpretation of statistical maps. J R Stat Soc [B] 1948; 10:243-51. 3. Geary R. The contiguity ratio and statistical mapping. The Incorporated Statistician 1954;5:115-45. 4. International Agency for Research on Cancer. Atlas of cancer in Scotland: 1975-1980. (IARC scientific publication no. 72). Lyon, France: International Agency for Research on Cancer. 1985. 5. Cislaghi C, DeCarli A, La Vecchia C, et al. Data, statistics and maps on cancer mortality, Italia, 1975/1977. (In English and Italian). Bologna, Italy: Pitagora Editrice, 1986. 6. Glick B. The spatial autocorrelation of cancer mortality. Soc Sci Med 1979; 13D: 123-30. 7. Cliff AD, Ord JK. Spatial processes. London, England: Pion Limited, 1981. 8. Odland J. Spatial autocorrelation. (Scientific geography series, Vol 9). Beverly Hills, CA: Sage Publications, 1988:41-3. 9. Griffith DA. Spatial autocorrelation: a primer. Washington, DC: Association of American Geographers, 1987. 10. Leukaemia Research Fund. Leukaemia and lymphoma. An atlas of distribution within areas of England and Wales 1984-1988. London, England: Leukaemia Research Fund, 1990. 11. Clarke EA, Marrett LD, Kreiger N. Twenty years of cancer incidence, 1964-83: The Ontario Cancer Registry. Toronto, Canada: Ontario Cancer Treatment and Research Foundation, 1987. 12. McLaughlin J, King W. Cancer incidence projections for Ontario to the year 2000. In: Cancer Incidence in Ontario 1989. Toronto, Canada: Ontario Cancer Treatment and Research Foundation, 1989. 13. Elwood JM, Lee JAH, Walter SD, et al. Relationship of melanoma and other skin cancer mortality to latitude and ultra-violet radiation in the United States and Canada. Int J Epidemiol 1974,3:32532. 14. Crump KS, Guess HA. Drinking water and cancer: review of recent epidemiological findings and assessment of risks. Ann Rev Public Health 1982,3:339-57. 15. Clark RM, Goodrich JA. Drinking water and cancer mortality. Sci Total Environ 1986;53:153-72. 16. Wigle DT, Mao Y, Semenciw R, et al. Contaminants in drinking water and cancer risks in Canadian cities. Can J Pubic Health 1986;77:335-42. 17. Meigs JW, Walter SD, Heston JF. Asbestos cement pipe and cancer in Connecticut 1955-74. J Environ Health 1980,42:187-91. 18. Gardner MJ. Review of reported increase of childhood cancer rates in the vicinity of nuclear installations in the UK. J R Stat Soc [A] 1989; 152:30725. 19. Beral V. Childhood leukemia near nuclear plants in the United Kingdom: the evolution of a systematic approach to studying rare disease in small geographic areas. Am J Epidemiol 1990; 132 (suppl):S63-8. 20. Jenkins CD. Social environment and cancer mortality in men. N Engl J Med 1983;308:395-8.

758

Walter

21. Leon DA. Longitudinal study. Social distribution of cancer. A report on the relationship between sociodemographic factors and the incidence of cancer, based on data collected in the OPCS Longitudinal Study. London, England: Her Majesty's Stationery Office, 1988. (Office of Population Censuses and Surveys, Series LS, no. 3). 22. Nasca PC, Burnett WS, Greenwald P, et al. Population density as an indicator of urban-rural differences in cancer incidence, upstate New York, 1968-1972. Am J Epidemiol 1980; 112:362-75. 23. Mahoney MC, LaBrie DS, Nasca PC, et al. Population density and cancer mortality differentials in New York State 1978-1982. Int J Epidemiol 1990; 19:483-90. 24. Bako G, Dewar R, Hanson J, et al. Population density as an indicator of urban-rural differences in cancer incidence, Alberta, Canada 1969-1973. Can J Public Health 1984;75:152-6. 25. Haynes R. Cancer mortality and urbanisation in China. Int J Epidemiol 1986,15:268-71.

26. Cleek RK. Cancers and the environment: the effect of scale. Soc Sci Med 1979;13D:241-7. 27. Pearson ES, Hartley HO, eds. Biometrika tables for statisticians. Vol 2. Cambridge, England: University Press, 1976. 28. Griffith DA. The boundary value problem in spatial statistical analysis. Journal of Regional Science 1983;23:377-87. 29. Griffith DA, Amrhein CG. An evaluation of correction techniques for boundary effects in spatial statistical analysis: traditional methods. Geogr Analysis 1985; 15:352-60. 30. Grimson RC, Wang KC, Johnson WC. Searching for hierarchical clusters of disease: spatial patterns of sudden infant death syndrome. Soc Sci Med 1981;15D:287-93. 31. Anderson L. Research contributions made possible by the NCI cancer atlases published in the 1970s. (Backgrounder). Bethesda, MD: Office of Cancer Communications, National Cancer Institute, July 1987.

APPENDIX Specification of Spatial Pattern Models

Each of the six spatial patterns required simulation sampling to generate the expected rate value of the rate x-, for county /, with / = 1,2,...«. For the latitude, socioeconomic status, and urbanization patterns, there was an explanatory variable z, which was to be assumed linearly related to x. For the latitude pattern, z was defined as the latitude of the largest population center in each county. For the socioeconomic status and urbanization patterns, census definitions were adopted for z at the county level. In order to have comparability of the results from different patterns, z was transformed to a scaled variable u defined as "

\Z

Zmm)/\Zmax

Znvn),

where zmm and zmax are the minimum and maximum values of z. Thus, u ranges from 0 to 1. Conceptually, x, was related to u through the linear regression E(x,) =

b[u, - u]),

(Al)

where R is the relevant provincial cancer rate, u is the mean value of u, and b is the regression coefficient. The simulation sampling took regional population sizes and age distributions into account by using model Al to generate expected numbers of cancer cases within each age group, by county. From model Al we may see that the expected incidence rate at u is R, the provincial mean rate; in contrast, the maximum expected incidence Rmax \sR[\ + b{\ - ii)]. Thus, we have b(\ — u) = {Rmax ~ R)/R- For many continuous z variables, u = 0.5; hence, b — 2(Rmax - R)/R, or twice the proportional change in incidence from the mean to the highest risk. For binary z variables, it might deviate substantially from 0.5, for instance, if a small number of regions are defined as being at higher risk than the others, as in the first hot spot model. If there are k high-risk and n - k low-risk regions, u = k/n, implying that b = n{Rmax - R)/[R{n - k)]. For the general autocorrelation pattern, the following model was assumed for each observation x,: x, = PY,WIJXJ +

f

i>

(A2)

Power to Detect Environmental Health Effects

759

where p is the assumed value of the autocorrelation coefficient between x, and xJ} w:j is the weight associating county pair (i,j), and e, is the random error. Solving equation A2 for the set ofx, values gives x

= (/ -

pW)-'t,

(A3)

where / and W are n x n matrices, and x and« are n x 1 vectors. The simulations under the general autocorrelation model were executed by generating appropriately distributed random errors e and applying them in model A3.

The analysis of regional patterns in health data. II. The power to detect environmental effects.

Three measures of spatial clustering (Moran's I, Geary's c, and a rank adjacency statistic, D) were evaluated for their power to detect regional patte...
3MB Sizes 0 Downloads 0 Views