Report LBL-36630 Rev.3
Presented at National Cancer Institute GIS Advisory Meeting
November 25, 1996
Deane W. Merrill and Steve Selvin
Lawrence Berkeley National Laboratory
University of California, Berkeley School of Public Health
dwmerrill@lbl.gov
http://merrill.wwh.net/pdocs/nci9611/nci9611.html
Objective: Small-area geographic analysis of health outcome data is becoming feasible and desirable due to the increasing availability of geographic information systems (GIS) on small computers. At the same time, surveillance data with subcounty geographic detail (e.g. SEER) are becoming available. But conventional mapping techniques are unsuitable, because in small-area studies or in sparsely populated areas, disease rates cannot be reliably calculated. Aggregation to larger subareas causes geographic detail to be lost.
Method(s): The technique of Density Equalizing Map Projections (DEMP) avoids the calculation of rates and preserves the geographic detail of the original data. If one plots cases on a map where population density has been equalized, geographic variations of risk are readily observed, and can be evaluated statistically. A new, more efficient DEMP algorithm is available at LBNL, along with tract-level map files and population data from the 1970, 1980, and 1990 U.S. Censuses. Areas of high and low relative risk can be graphically portrayed in a contour map, permitting the analyst to develop hypotheses for further investigation.
Results: The DEMP technique was applied to 401 childhood cancer cases, observed between 1980 and 1988 in four California counties. The geographic variation of estimated risk was displayed in contour maps. The statistical significance of the observed patterns was assessed by similarly analyzing samples of artificial cases, generated under the null hypothesis of uniform risk. The results are compared with those from the original investigation (Refs. 1 and 2), which was prompted by a reported cluster in McFarland CA.
Recent Results: Preliminary results were published in Ref. 3. Since that time, the analysis has been refined as follows:
Work in Progress: The work in progress will be completed and presented at the 1977 CDC/ATSDR Symposium on Statistical Methods, Atlanta GA, January 28-29.1977.
Conclusions: Childhood cancer rates in the four-county area display measurable geographic variation that is portrayed in the contour maps. Some consistency is observed between independent subsamples in the five stratified analyses (not shown here). Overall, the geographic variability of rates is somewhat greater than expected from chance alone; however, no single region has rates sufficiently high or low to be identified as statistically significant.
The DEMP technique is a useful analysis and display tool, given the increasing power of small computers. Systematic hypothesis-free analysis of routinely collected surveillance data can be automated. However, preparation of the necessary map files and population data files is tedious. For the DEMP technique to become widely used, the program needs to be implemented as a public Web application, linked to comprehensive small-area population data and map files. On the Web, the analyst could then generate and download a density-equalized map, and use that map to analyze the unit record health outcome data in his/her own computer.
Figure Captions:
Map of the 259 census tracts used in the analysis, simplified for computational analysis. This map differs only slightly from the map in Ref. 3: a few 1980 Census tracts have been aggregated for comparability with 1990 Census tracts. The heavier lines are county boundaries.
Density equalized map, based on person-years in the entire study (3.3 Mpy for all races, 1980-88, ages 0-14, both sexes). The legend indicates the area that corresponds to 0.1 Mpy, anywhere on the map.
Same as Figure 2, for white non-Hispanics (1.6 Mpy).
Same as Figure 2, for Hispanics (1.3 Mpy).
Same as Figure 2, for nonwhite non-Hispanics (0.4 Mpy). The legend indicates the area that corresponds to 0.05 Mpy, anywhere on the map.
Scatter plot of y = adjusted area versus x = target area, for each of the 259 tracts in the density equalized map. The departure of points from the 45 degree line indicates the degree to which the map is imperfectly equalized. Perfect equalization could be achieved by including more points in the map and performing more iterations, with considerably more computing effort.
Locations on the original map of 8020 artificial cases, randomly generated under the assumption that risk is everywhere uniform.
Locations on the density equalized map of the 8020 artificial cases in Figure 7. The legend indicates the area within which 500 cases are expected, anywhere on the map. Slight non-uniformities are observed, which are the result of imperfect density equalization. The non-uniformities do not affect the validity of the analysis, because the distributions of the real cases are compared with distributions of artificial cases plotted on the same imperfect map.
(a) The upper left map is the distribution of the 401 real cases in the full data set, each plotted at a random location within its own census tract on the density equalized map. The legend indicates the area within which 20 cases are expected, anywhere on the map.
(b) The upper right map is the same as (a), with each case plotted at a different random location in its own tract.
(c) The lower left map is the distribution of 401 artificial cases, randomly distributed among tracts under the null hypothesis of uniform risk. As in (a), each case is plotted at a random location within its own tract.
(d) The lower right map is the same as (c), with each case plotted at a different random location in its own tract. For comparability with (a) and (b), the same distribution of cases among tracts is used in (c) and (d).
Any apparent clusters in (c) and (d) are, by design, not statistically significant. (c) and (d) are presented for comparison with (a) and (b), to illustrate the degree to which random data can appear to be non-random. (a) and (b) may be slightly less random than (c) and (d), but quantitative analysis is required to determine whether the effect is statistically significant.
Same as Figure 9, for white non-Hispanics. Separate plots (not shown) were produced for Hispanics and nonwhite non-Hispanics; for 1980-84 and 1985-88; for ages 0-4 and ages 5-14; for males and females; and for leukemia, brain cancer, and all other cancers.
References:
2. Reynolds P, Satariano E and Smith D. The Four County Study of Childhood Cancer Incidence, Interim Report II. Environmental Epidemiology and Toxicology Program, California Department of Health Services, October 1991.
3. Merrill DW, Selvin S, Close ER and Holmes HH, Statistics in Medicine, Vol. 15, 1837-1848 (1996).