ANALYZING GEOGRAPHIC CLUSTERED RESPONSE

Deane W. Merrill, Steve Selvin and Michael S. Mohr, Lawrence Berkeley Laboratory
Deane W. Merrill, Building 50B, Room 3238, Lawrence Berkeley Laboratory, Berkeley CA 94720

KEY WORDS: cartogram, density equalizing map
projection, disease clusters

ABSTRACT

   In the study of geographic disease clusters, an
alternative to traditional methods based on rates is to
analyze case locations on a transformed map in which
population density is everywhere equal.  Although
the analyst's task is thereby simplified, the specifica-
tion of the density equalizing map projection (DEMP)
itself is not simple and continues to be the subject of
considerable research.  Here a new DEMP algorithm
is described, which avoids some of the difficulties of
earlier approaches.  The new algorithm (a) avoids
illegal overlapping of transformed polygons; (b) finds
the unique solution that minimizes map distortion; (c)
provides constant magnification over each map
polygon; (d) defines a continuous transformation over
the entire map domain; (e) defines an inverse trans-
formation; (f) can accept optional constraints such as
fixed boundaries; and (g) can use commercially
supported minimization software.  Work is continuing
to improve computing efficiency and improve the
algorithm.

INTRODUCTION

   In analyzing the geographic distribution of disease
one needs to adjust for variations in population
density.  Calculation of rates is unsuitable if the
number of cases in individual geographic subareas is
small, and aggregation of subareas leads to loss of
geographic detail.  An alternate approach is to adjust
geographic boundaries so that population density is
everywhere equal; then on the density-equalized map
projection (DEMP), cases will be randomly distribut-
ed if disease risk is constant.

   On a DEMP, the statistical analysis of disease
cases is greatly simplified and has been discussed
elsewhere.1-4  The purpose of this paper is to describe
a new algorithm for producing the density-equalized
map itself.  Methods for producing such maps,
sometimes known as cartograms, have been described
by various authors.1,5-8  Relative to earlier approaches,
the new algorithm has significant advantages, which
are enumerated in the abstract of this paper.
PRE-DEMP MAP PROCESSING

Required inputs

   Required inputs for the DEMP transformation
include a map file with geographic regions described
as polygons, whose vertices are specified by x and y
coordinates.  Two formats are common: (a) a list of
polygons, each of which is represented by a list of
points; or (b) a list of line segments, each of which
is identified by its starting point, its end point, the
polygon lying to its left, and the polygon lying to its
right.  The latter format is known as DIME (Dual
Independent Map Encoding) and is used, for exam-
ple, by the U.S. Census Bureau in its 1990 Census
TIGER map files.  Also required for the DEMP is a
specification of the variable whose density is to be
equalized, for example the population of each geo-
graphic region.

Transformation of map coordinates

   Customarily map coordinates are given in latitude
and longitude.  Prior to the DEMP transformation,
these must be transformed to Cartesian x and y
coordinates by an appropriate projection.  For small
regions far from the poles (noting that one degree of
latitude is approximately 111.195 km) we use the
equirectangular projection

                                 x = 111.195 c0 (long - a0) cos ( b0  / 180 )

                                           y = 111.195 c0 (lat - b0)

where (a0,b0) are the (long,lat) coordinates of the
approximate center of the map, and c0 is a scale
factor approximately equal to the number of map
units in one kilometer.

Point removal

   Next, in order to shorten computation time, unnec-
essary geographic detail is removed from the map by
a point removal algorithm.  The algorithm preserves
nodes, i.e. points that cannot be removed without
altering the map topology.  For every point in a
chain connecting two nodes, we calculate the area of
the triangle formed by that point and its two neigh-
bors in the chain.  The point corresponding to
smallest-area triangle in the map is removed, and its
two neighbor points connected by a straight line. 
Triangle areas are recalculated as necessary and the
process repeated for the next smaller triangle in the
map.  The process is repeated until no triangles
remain having area smaller than a specified value.

Triangulation of polygons

   Next, each polygon in the map is subdivided into
triangles.  The Delaunay triangulation is used because
it is unique for a given polygon, and produces trian-
gles that are as equirectangular as possible.9  To
triangulate a polygon, we arbitrarily choose three
adjacent points on the polygon boundary and draw
the circle which passes through all three points.  This
process continues until we find a circle which con-
tains no other points of the polygon boundary.  In
that case the three points form a Delaunay triangle; it
is removed from the polygon, leaving a polygon with
one less vertex than before.  Triangle removal contin-
ues until the original polygon is completely subdivid-
ed into triangles; in turn, each polygon in the map is
similarly subdivided.

DEMP ALGORITHM

Piecewise linear transformations

   Next the density equalizing map projection is per-
formed.  For simplicity, we require that the transfor-
mation within each Delaunay triangle be constant and
linear; under such a transformation, each triangle
maps into another triangle.  All points (x,y) within a
given triangle i are mapped into points (u,v) by the
linear transformation
   The DEMP is defined by specifying the constants
ai,bi,...fi for every triangle in the map.  More eco-
nomically, one can specify the original (x,y) and
transformed (u,v) coordinates of every triangle vertex
in the map, thus automatically guaranteeing that adja-
cency of triangles is preserved in the DEMP.  


   If the vertices of triangle i are (xj,yj), (xk,yk), (xl,yl)
and (uj,vj), (uk,vk), (ul,vl) before and after the DEMP
respectively, then ai,bi,...fi are equal to
   As a consequence of the linearity of the DEMP
transformation, area magnification within any triangle
is constant and equal to the Jacobian
   Constant area magnification avoids spurious varia-
tion of population density in the transformed map;
such variation, if present, could be erroneously
interpreted as evidence for disease clustering.

   The assumed transformation is continuous over the
domain of the original map, although its derivatives
are discontinuous at triangle boundaries.  Except for
regions with zero population, the inverse transforma-
tion is defined and is also continuous.

Area constraints

   For population density to be equalized over the
entire transformed map, we require that (a) trans-
formed polygon areas be proportional to population;
and (b) areas of triangles within each polygon be in
the same proportion before and after the DEMP. 
Furthermore, we specify that (c) total map area be
unchanged by the DEMP.  Conditions (a), (b) and (c)
together determine the correct area Bi0 of every trian-
gle i in the transformed map.  For a given DEMP the
actual area Bi(u,v) of triangle i is a quadratic function
of the transformed coordinates (u,v):

Bi(u,v)   (ujvk - ukvj + ukvl - ulvk + ulvj - ujvl)

where j,k,l (in counterclockwise order) are the
vertices of triangle i.  For population density to be
equalized, the DEMP must satisfy a quadratic relation

Hi(u,v)  Bi(u,v) - Bi0 = 0

for every triangle i in the map.

   The imposition of separate area constraints on each
triangle i and not only on the polygons is a critical
feature of the DEMP algorithm.  This feature pre-
vents the occurrence of negative-area triangles, which
could result in self-intersecting polygon boundaries.

Map distortion

   The area constraint functions Hi(u,v)=0 are neces-
sary but not sufficient to uniquely define the DEMP. 
Tobler recommends that a unique optimum solution
be defined by making the DEMP as nearly conformal
as possible.5  Conformal transformations are those
that locally preserve the shape (but not necessarily the
size, location, or orientation) of each infinitesimal
portion of the map.  Conformal linear transforma-
tions include all combinations of translations, rota-
tions and magnifications, corresponding to the
following transformation matrices:
but not reflections, compressions or shear transforma-
tions, corresponding to the following transformation
matrices:


   Noting that conformal transformations obey the
Cauchy-Riemann conditions
(i.e., ai = di and bi = -ci for triangle i), we define
the non-conformal distortion of triangle i as

Gi(u,v) = ( ai - di )2 + ( bi + ci )2

where ai, bi, ci and di are previously defined functions
of (x,y) and (u,v).  The resulting expression is
quadratic in uj, vj, uk, vk, ul and vl, the transformed
coordinates of triangle i.  The overall distortion
G(u,v) is calculated as the sum of Gi(u,v) over all
triangles i, weighted by triangle area.  (For weighting
purposes, the average of the original and transformed
areas is used.)

Summary of DEMP algorithm

   In summary, the DEMP algorithm consists of
finding the values of transformed coordinates (u,v)
which minimize overall map distortion G(u,v),
subject to the triangle area constraints Hi(u,v)=0. 
Equivalently, one can impose the single constraint
H(u,v)=0, where H(u,v) is defined as the sum over
all triangles i of hi [Hi(u,v)]2.  The constants hi must
be positive and nonzero; good convergence is ob-
tained for the choice

hi = 1 / [Bi0 + Bmin]

where Bi0 is the transformed area of triangle i, and
Bmin is the transformed area equivalent to a population
of one person.

   In principle one should impose three additional
constraints to prevent arbitrary x- or y- translation or
rotation of the entire map; in practice such constraints
are not needed.  Optionally, additional constraints can
be imposed during the DEMP optimization, provided
a feasible solution of the area constraints exists.  For
example, one could require u=x and v=y for all
points on the external map boundary.

POST-DEMP MAP PROCESSING

   In analyzing the geographic distribution of disease,
the primary purpose of the DEMP is to plot cases on
a transformed map where population density has been
equalized.  In addition one may wish to transform the
locations of various geographic features.  For exam-
ple, one would normally wish to plot on the trans-
formed map the location of a supposed environmental
hazard such as a microwave tower or a toxic waste
dump.  Contours of equal distance from a point
source, although no longer circular, can still serve to
identify areas of supposedly equal risk.  A trans-
formed latitude-longitude grid can help the analyst
locate additional features on the transformed plotted
map.

   Case locations or geographic features are trans-
formed as follows:

  The (long,lat) location of a disease case or geo-
   graphic feature k is projected into (xk,yk), in the
   same coordinate system as the pre-DEMP poly-
   gon map.

  The DEMP solution provides values of (u,v), the
   vertices of all triangles in the DEMP transform-
   ed map.  From these one can calculate ai,bi,...fi
   for each triangle i. 

  A point-in-triangle routine, along with the pre-
   DEMP triangle coordinates (x,y), is used to
   determine in which polygon i each point k is
   located.

  The appropriate ai,bi,...fi are used to transform
   the point (xk,yk) into density-equalized coordi-
   nates (uk,vk).

   In a density equalized map, analysis of disease
distributions is relatively straightforward.  A number
of quantitative techniques have been developed,
which have certain advantages over traditional rate
comparisons or relative risk calculations.1-4

PRELIMINARY RESULTS

   The new DEMP algorithm is able to utilize com-
mercially supported, highly sophisticated minimiza-
tion software.  In our preliminary tests of the DEMP
algorithm we have used the NAG (Numerical Algo-
rithms Group) Fortran library routine E04VDF,
minimizing the quadratic objective function G(u,v)
subject to the fourth order constraint condition H(u,v)
= 0.

   To illustrate the results of the preliminary DEMP
implementation, we have chosen the state of Ver-
mont, which has 14 counties.  A point removal
criterion of 500 km2 was applied.  The filtered and
triangulated Vermont map, which is shown in Figure
1(a), has 30 points and 43 triangles.  In Figure 1(b)
the same map has been transformed by a DEMP; in
this example the variable equalized by the DEMP was
the 1980 total population.  Note that in the trans-
formed map (b) the areas of triangles within a given
county are proportional to their areas in the original
map (a).

   For the example shown, the same final map was
consistently obtained regardless of the initial values
of (u,v).  Therefore we suspect that the solution
found is a local minimum (perhaps even the global
minimum) of the distortion function G(u,v) subject to
the area constraint H(u,v) = 0.  For other examples
(not shown) two or more distinct local minima were
found.  In such cases the solution with the minimum
value of G(u,v) is preferred.

   Trial and error is necessary to determine optimum
weighting coefficients for the two functions G(u,v)
and H(u,v).  Large G relative to H causes conver-
gence to be slow; large H relative to G produces
solutions with correct areas but with non-minimum
distortion.

   For the example shown (with 30 points and 43
triangles), convergence was achieved in 90 iterations,
in about 2.8 CP (central processor) minutes on a Sun
SPARCstation 1 computer.  For more complicated
maps, computing time appears to increase approxi-
mately as the third or fourth power of map complexi-
ty.  A map with 268 points and 498 triangles re-
quired about 250 iterations, and about 2.5 CP days
on the same computer.  Work is continuing to im-
prove computing efficiency. 

CONCLUSIONS

   Density equalizing map projections (DEMPs)
correct for the confounding effect of varying popula-
tion density, thereby providing a useful tool for
analyzing the geographic distribution of disease.  The
statistical analysis of a density equalized map is
straightforward, since under the null hypothesis of
equal risk, the distribution of disease cases on such a
map is expected to be uniform.

   Although a DEMP simplifies the task of the
analyst, the specification of the DEMP transformation
itself is not simple and continues to be the subject of
considerable research.  The algorithm described here
represents a radical new approach to DEMP calcula-
tions, and has important advantages over previous
techniques.

NOTE

   This paper is a summary of a 44-page report by
Deane W. Merrill, Steve Selvin and Michael S.
Mohr, "Analyzing Geographic Clustered Response,"
Lawrence Berkeley Laboratory Report LBL-30954,
August 1991.  Copies may be requested from Deane
W. Merrill, Information and Computing Sciences
Division, Building 50B, Room 3238, Lawrence
Berkeley Laboratory, 1 Cyclotron Road, Berkeley CA
94720.  Tel: 510-486-5063.  Fax: 510-486-4004. 
Bitnet: merrill@lbl.  Internet: dwmerrill@lbl.gov.

   This work was supported by the Director, Office
of Epidemiology and Health Surveillance; Office of
Health; Office of Environment, Safety and Health;
U.S. Department of Energy under Contract No. DE-
AC03-76SF00098.













REFERENCES

1. Schulman J, Selvin S and Merrill DW.  Density
Equalized Map Projections: a method for analyzing
clustering around a fixed point.  Statistics in Medi-
cine 7:491-505 (1988).

2. Schulman J, Selvin S, Shaw G and Merrill D.  
Detection of excess disease near an exposure point: a
case study.  Arch Env Hlth 45:168-174 (1990).

3. Selvin S, Merrill D, Schulman J, Sacks S, Bedell
L and Wong L.  Transformations of maps to investi-
gate clusters of disease.  Soc Sci Med 26:2 215-221
(1988).

4. Shaw GM, Selvin S, Swan SH, Merrill D and
Schulman J.  An examination of three disease cluster-
ing methodologies.  Int Jour Epi 17:4 913-919
(1988).

5. Tobler W.  Cartogram Programs.  Cartographic
Laboratory Report, Ann Arbor, Michigan (1974).

6. Dougenik JA, Chrisman NR and Niemeyer DR. 
An algorithm to construct continuous area carto-
grams.  Professional Geographer 37(1): 75-81 (1985).

7. Rase, W.  Bundesforschungsanstalt fr Landes-
kunde und Raumordnung, Bonn-Bad Godesberg,
Germany.  Private communication (1989).

8. Wesseling, C.  Faculty of Geographical Sciences,
Utrecht University, Netherlands. Private communica-
tion (1991).

9. Boots, B.N.  Voronoi (Thiessen) Polygons.  In
series: Concepts and Techniques in Modern Geogra-
phy (CATMOG), Geo Books, Norwich UK (1987).