|
Indian Journal of Cancer, Vol. 48, No. 1, January-March, 2011, pp. 105-109 Review Article Integrating the geographic information system into cancer research AT Najafabadi1, M Pourhassan2 1 Department of Computer Science, Systems and Production, University of Tor Vergata, Rome, Italy Correspondence Address: Code Number: cn11017
Abstract Cancer control researchers seek to reduce the burden of cancer by studying interventions, their impact on defined populations, and the means by which they can be better used. The first step in cancer control is identifying where the cancer burden is elevated, which suggests locations where interventions are needed. Geographic information systems (GIS) and other spatial analytic methods provide such a solution and thus can play a major role in cancer control. The purpose of this article is to examine the impact of GIS on the direction of cancer research. It will consider the application of GIS techniques to research in cancer etiology.Keywords: Geographic Information Systems, Cancer Research, spatial analysis, data integration and management Introduction In the last 30 years, the Geographic Information Systems (GIS) have had an ever-increasing impact on the course of research and planning in many diverse fields, including geography, geology, environmental studies, business, and criminal justice. Relatively recently, healthcare research, including cancer research, has entered this domain. Epidemiology, the study of disease patterns in human populations according to person, place, and time, has been the traditional means of approaching cancer etiology. [1] Combining its tools with those of GIS has enabled researchers to look at the distribution of cancer in new ways and uncover relationships not previously seen in the traditional epidemiological methods alone. Through its data integration function, GIS has enabled the use of existing data, collected for other purposes, to be applied to cancer research. GIS techniques can enhance the visualization of spatial patterns of cancer, examine the contribution of various risk factors for cancer in new ways and allow the hypotheses on cancer etiology to be tested in a spatial framework. [2] Geographic Information Systems The geographic information system (GIS) is a set of hardware and software for inputting, storing, managing, displaying, and analyzing geographic or spatial data or any information that can be linked to a geographic location, such as, events, people or environmental characteristics. [3] Some of the most common sources of geographic data for a GIS are: printed maps, aerial and satellite images, and global positioning systems, which allow the determination of a geographic location (e.g., and y coordinates on a map) from a street address. The more widely available sources of non-geographic data for a GIS include satellite remote sensing information. [3] The capacity of the GIS to integrate data on the three epidemiological components of person, place, and time makes it particularly suitable as a tool for cancer epidemiological research. With respect to a person, it is well established that many cancers are related to demographic factors such as race or sex. Using GIS, the location of cancer cases can be overlaid on the maps of population data to visualize the relationships between the demographic factors and patterns of cancer. With respect to place, epidemiologists have traditionally examined geographic variation in cancer incidence using maps. Continuing interest in this application is demonstrated by the existence of cancer mortality and morbidity atlases in many countries (Atlas of Cancer Mortality in Central Europe, 1996; Atlas of Cancer Mortality in the European Economic Community, 1992; Atlas of cancer mortality in the European Union and the European Economic Area 1993 - 1997, 2008; Check, E.,2007; Uhlen, M. et al.,2005; Lai, 1997; Pickle, Mungiole, Jones, and White, 1999; Semenciw et al., 2000; New cancer mortality atlas, 2000; Shelton, R. M., 2001; Atlas of cancer mortality in central Europe, 1996). [4],[5],[6],[7],[8],[9],[10],[11],[12],[13] The ability of the GIS to handle spatial data on a much smaller scale (by pinpointing the exact location of cancer cases) coupled with its ability to handle multiple levels of the scale (block group, census tract, city, county, state, etc.) enhance the possibility of uncovering spatial patterns, which would be missed by traditional epidemiological methods. In addition, the existence of known environmental risk factors for cancer, which may vary with geographic location, can be investigated with GIS. With respect to the third factor (time), information on the date of diagnosis, death or recurrence of cancer cases can be entered into a GIS so that temporal and spatiotemporal relationships may be examined. The visualization and analytic capabilities of GIS enable the user to examine and model the inter-relationship between the factors on all three epidemiological dimensions of cancer. Geographic Information Systems Functions Applicable to Cancer Research GIS-specific functions can be grouped into four broad categories: [3]
Data integration and management A key function of GIS is the integration of data from many existing sources. This often eliminates the need to collect primary data for new studies. Another manner in which a GIS can create new data is to calculate the degree of environmental exposure to carcinogens. This is exemplified in a case-control study by Lewis-Michl et al. on the relationship of toxic chemical pollutant exposure and breast cancer, on Long Island, New York. [14] The authors used the location history of breast cancer cases, manufacturing facilities, and vehicle density estimates, for selected highways, during a twenty-year time period, to compute a weighted-average yearly exposure for each case or control, based on the distance of residence from these sources of toxic chemical pollutants. Smoothing is a mathematical operation often used by GIS to enhance geographic patterns in the phenomenon under study. One application is to smooth out geographic fluctuations in the rates that are caused by unstable rates, in areas with small underlying populations. A study by Osnes and Aalen (1999) applied a form of Bayesian smoothing to survival rates for breast cancer and malignant melanoma in Norway. to look at small-scale survival differences between municipalities. [15] Another useful function of GIS is to calculate the distances to be used in statistical analyses based on spatial contiguity. A study by Athas and Amir-Fazli (2000) used the GIS to calculate a patient′s travel distance to a major population center, to examine the geographic differences in the breast cancer stage at diagnosis. In another study, the authors used a GIS to measure the travel distance to radiation treatment facilities, to examine the relationship between travel distance and receiving radiotherapy after breast-conserving surgery. [16] Ward et al. used remote sensing data in the GIS to reconstruct historical crop patterns and determine zones of probable pesticide exposure to agricultural pesticides. They then measured the proximity of residence for non-Hodgkin′s lymphoma patients to these zones, to determine their degree of exposure. [17] Another database function of GIS is to establish ′topology,′ that is, to determine neighbors or establish neighborhoods. A ′neighbor′ can be defined in numerous ways - areas or entities related by sharing a common geographic border, trade routes or common acquaintances. [18] Visualization The second function of GIS is visualization, consisting primarily of mapping. Using a process called geocoding, dot density maps of cancer cases by exact location can be automatically generated. Using the geocoded data, the total number of cases for a geographic area (e.g., state, county, town, census tract) can be counted and divided by the underlying population of that area, to determine the prevalence or incidence rates. Choropleth maps can then be generated for different areal configurations. [19] Examples of configurations of a geographic region are given in [Figure - 1], which show census block groups for USA. This ability to summarize the data in different ways is a key advantage of GIS. Investigators can define geographic areas (zones) to suit the purposes of their particular study, rather than accepting predefined geographic areas that have been established for other purposes. White and Aldrich (1999) provide an example of purposeful aggregation in a study on pediatric cancer. [20] The authors defined zones based on a one-mile buffer around hazardous waste sites, because of their interest in the proximity to environmental toxins as a risk factor for pediatric cancer. By defining zones according to different types of environmental exposure, different hypotheses about environmental risk factors could be explored. Varying the aggregation scheme or intervals by which the attribute to be mapped is classified on a choropleth map, can enhance or hide geographic patterns in the data and generate hypotheses. Larger geographic areas or classification intervals result in larger sample sizes and more stable estimates for each area, but can hide patterns in the data due to greater heterogeneity within each area or classification interval. Small areas or classification intervals result in more homogeneity and can enhance meaningful patterns, but may result in unstable estimates. Smoothing techniques can be used to eliminate some of the irregularities seen in 2D mapping, and can be particularly useful in mapping cancer incidence rates. Selvin, Merrill, Erdmann, White, and Ragland (1998) used kernel smoothing to create a ′density equalized map′ to depict late-stage breast cancer incidence on a continuous three-dimensional surface, with no regional boundaries. [21] This adjusts for the effect of small population denominators in sparsely populated regions, the disproportionate visual impact of large geographic areas on a two-dimensional choropleth map, and the distorted visual impression given by many white areas indicating zero rates. [22] The ability of the GIS to utilize many types of new technologies for recording and accurately quantifying data on environmental exposure and its capability to map this data has led to more emphasis being laid on the environmental factors in cancer research. Point and polygon overlay and buffering are two GIS techniques especially applicable to visualizing the relationship between environmental exposure and cancer. The investigator can overlay the distribution or cases and / or controls (represented by points) with the distribution of environmental features (represented by polygons) to generate hypotheses about risk factors, which can then be studied further at the individual level with traditional epidemiological study designs such as cohort or case control (Turnbull, Iwano, Burnett, Howe and Clark, 1990). [3],[23],[24] An example of an overlay is given by the study of White and Aldrich (1999), in which the authors have mapped pediatric cancer cases and overlaid buffer zones around the National Priorities List (NPL) sites. [20] Spatial analysis Spatial analysis builds on the results of visualization and examines whether visualized patterns or relationships occur by chance. Although many types of spatial analysis are possible with GIS, its application to cancer has primarily been in testing for clustering of cancer cases. Statistical evidence of clustering in a particular geographic location (point clustering) gives the impetus to look for the presence of possible risk factors in the area and generate hypotheses to be tested, to explain the clustering. The ability of GIS to determine the exact location of cancer cases makes it suitable for testing for clustering. Beyond testing for clustering at pre-determined locations, methods such as the spatial scan (Pickle, L., et al., 2006), Spatial cluster analysis (Meliker, J. R., et al., 2009 and Lorenzo-Luaces Alvarez, P., et al, 2009) have been developed to search an area and find locations of clusters. [18],[25],[26] Hjalmars, Kulldorff, Gustafsson, and N Agarwalla (1996), used a GIS to search for evidence of the clustering of leukemia cases in Sweden, using a spatial scan statistic. [27] Known cancer risk factors that vary geographically in the underlying population can be adjusted for verifying the presence of clustering. [28],[29],[30] In addition to hypothesis generation, tests for clustering have been applied to monitoring cancer incidence from the cancer registry data as part of a cancer surveillance program. Person, et al. (2006) outlined a procedure, which they call the ′cluster evaluation permutation procedure,′ for periodic monitoring of cancer clusters as a substitute for reactive testing of cluster alarms after they occur. They applied this to cancer surveillance in upstate New York. [31] Mathematical modeling The final function of GIS is mathematical modeling, which can be used to estimate the form of the relationship between various factors, or to predict or estimate unknown values. Spatial interpolation is an example of the latter, and is used widely in GIS, in other fields. The main application of mathematical modeling in cancer research has been in estimating carcinogen exposure in geographic locations, to test causal hypotheses about carcinogen exposure and cancer. [15] An example of this is given by Kennedy (1988), who used spatial regression to examine local and global trends across the United States in lung cancer for males and females. [32] Limitations of Geographic Information Systems The Geographic Information System has several limitations. One problem inherent in using data from a GIS is the aggregation problem, which refers to the information loss that occurs when substituting aggregate data for individual-level data. One aspect of this is the ′ecological fallacy,′ which the danger in making causal inferences about individuals based on findings from the aggregate or group data. Another aspect is the modifiable areal unit problem that refers to the statistical bias that results from different levels of aggregation (the ′scale effect′) or different alternative groupings of data at the same level of aggregation (the ′zone effect′). Besides the statistical and inferential problems inherent in aggregation, there is the added problem of interpretation of the groupings used, as spatial data in a GIS have often been derived for administrative or political purposes. One must also be mindful of another problem with GIS when interpreting studies using this technique. A GIS is only as good as its input data. Inaccuracies in the original sources of geographic data, such as maps or aerial photographs or errors introduced in the process of encoding, must be considered. Many problems in geocoding data from a street address can occur, and this problem is magnified in rural areas. [20] In addition to the spatial data quality, the quality of non-spatial data obtained from many sources must be verified. The Federal Geographic Data Committee (FGDC-STD-001-1998) has published a set of standards for data sharing and dissemination, which includes making information available on the accuracy and quality of data to be used in a GIS. These standards are implemented in the form of ′metadata,′ the documentation that should accompany any GIS data available. [33] Discussion Despite the above limitations, GIS is a powerful tool for cancer research that has only begun to be utilized in this area. One area in which GIS offers the most potential is its application to mathematical modeling. The ability of a GIS to integrate data on complex spatial phenomena and readily integrate continually updated information, make it ideal for investigating the role of environmental factors and modeling their role in the etiology of various forms of cancer, creating changes and making more precise models as new data become available. Another area where GIS stands to contribute most to cancer research is the study of the sociodemographic factors. Considering the strong links shown by previous research between many types of cancer and the demographic factors, coupled with the availability of population demographic and socioeconomic data, the utility of using GIS for cancer incidence data seems obvious. This will probably drive much GIS-related cancer research in the future, because of the increasing emphasis that has been placed on the demographic factors in the treatment, prevention, and resource allocation. Demographic population data can be used to characterize geographic areas with increased cancer incidence, to assist in planning intervention programs and allocating resources. References
Copyright 2011 - Indian Journal of Cancer The following images related to this document are available:Photo images[cn11017f1.jpg] |
|