Cases are generally distributed in regular patterns within a given polygon, e.g. in houses located along streets. After density equalization, all the polygons will have the correct areas relative each other, but the cases in a given polygon will still be distributed non-randomly within that polygon. The clustering is real, but not necessarily indicative of increased risk. This bias is not an issue in analyzing data sets in which only the census tract or county of residence is known. But if you have exact case locations and your statistical analysis is a test for spatial randomness, you should do one of the following:
Frequently the numerator case data and denominator population data are obtained from different agencies, or have different definitions or different time coverage. These problems are not specific to the DEMP technique; they can bias any population-based analysis of the same data. However, because these biases are related to geography, they may be hidden in analyses that lack the geographic sensitivity of the DEMP technique. Some important examples are given below.
If a consistent bias is observed for all numerators, for example all types of cancer, there is a good chance that the denominator data are biased or mismatched to your case data.
If you cannot obtain better denominator data, you may have to discard Census population data entirely and density-equalize the map on (for example) total cancer cases. In such an analysis, clusters of cases on the density-equalized map would be interpreted as increased risk of one particular type of cancer, relative to all cancers. Some potential sources of bias will have been eliminated, but by sacrificing statistical power and the ability to calculate rates.
Probably the most serious bias of all, and the most difficult to quantify, is "Texas sharpshooting," which applies to statistical analysis in general. The name derives from an apochryphal story about a Texas cowboy who demonstrated his sharpshooting ability by firing at paper targets which had been carefully prepared in advance with well-placed bullet holes.
The analyst must recognize that the significance of the final result is diminished if multiple tests were applied to the data, even if those intermediate tests were not used for the final analysis. This is because every analyst tends to keep only the most interesting results. Selecting for publication only the most significant findings constitutes an important source of bias. This is compounded by "publication bias," i.e. editors' reluctance to publish papers with boring negative results.
Also, because random fluctions will always occur, choosing to study only those areas or diseases where elevated rates are known to occur, will cause the true significance of the results to be overestimated. To avoid such biases of interpretation, the conscientious scientist should systematically analyze comprehensive data sets with methods that are selected in advance, and should not modify the methods or limit the analysis on the basis of intermediate results.
We contend that systematic and comprehensive application of the DEMP methodology, without a priori consideration of risk hypotheses, constitutes a relatively unbiased method for detecting and evaluating spatial anomalies in routinely collected surveillance data.