Epidemiological Data Visualization in Google Earth
Department of Computer Science
Wanyu Wang


Project Description

Introduction

Nowadays, disease cluster becomes more important spatial analysis in public health. So there are lots of research about disease clustering. After getting useful results from effective detection method of disease clustering, the visualization of those useful results is inescapable. This project provide convenient and flexible visualization of disease outbreak data. Because of the geographic nature of the data, and because of the importance of its relationships with human geographical and social contexts, the data is presented as a layer in Google Earth. In addition, I visualize the results from the paper "Density-equalizing Euclidean minimum spanning trees for the detection of all disease cluster shapes."

Motivation & Real world application

Visualization of disease outbreak data can not only directly reflect the distribution of disease all over the world but also help to prevent and control the spreading and help researchers for their further research.

Data source and Background

In this project, I visualize the results from the paper "Density-equalizing Euclidean minimum spanning trees for the detection of all disease cluster shapes."

A disease cluster is a greater-than-expected number of epidemic cases that occurs within a group of people in a geographic area over a defined period of time. Disease cluster is becoming more important spatial analysis in public health.

Here is a part of the input data:

-0.0857  0.1738  1  1999-02-16
-0.0742  0.1738  2  1999-02-19
-0.0629  0.1742  1  1999-02-28
 0.0723  0.1627  1  1999-03-02
-0.0861  0.1635  3  1999-03-11
-0.0747  0.1633  1  1999-03-16
-0.0624  0.1630  3  1999-03-27
-0.0770  0.1461  1  1999-03-28
-0.0765  0.1356  2  1999-04-03
-0.0648  0.1354  2  1999-04-08
-0.0655  0.1458  1  1999-04-19
-0.0538  0.1461  1  1999-04-22
-0.0822  0.1171  3  1999-04-29
-0.0820  0.1070  1  1999-04-30
-0.0705  0.1070  1  1999-05-01
-0.0705  0.1171  1  1999-05-04
-0.0590  0.1171  2  1999-05-10

 ...    ...  ...   ...

Note that data I had to work with, while real data, does not give a geographic location for the cases on Earth; all coordinates were relative. Therefore, I just picked a location in the United States at random (in this case, in New York State), and placed the data on the map. Therefore the cases on my map below don't represent locations of real flu outbreaks, they are just to demonstrate the program.

User Guide

In this project, I visualize the geographic data in Google earth. Therefore, the program can be used for general. The dataset have four columns and any number of rows. The first column is latitude; the second column is longitude; the third column is the cluster ID for these places; and the last column is timestamp which can be used for animation in Google earth. Then you can run the program for the given data set and you will get the corresponding output.kml for your dataset. Next we can open this KML file in Google earth. You will see the nice visualization animation in Google earth and use this result for your further research.


Visualization design


Well designed presentations of interesting data are a matter of substance, of statistics, and of design. So we need ask a few question before start:
- Who is the intended audience?
- What task, goal or question of the data?
- Data types? Data dimensions? Data format and style?
- What information does the visualization show?

And basic types of visual encoding includes position, size, color, orientation, shape and so on. Therefore, I pay more attention to these aspects in later visualization.


Tools and Programming Language


The tool I use is Google Earth, which is a virtual globe, map and geographic information program. And Google Earth features many layers as a source for information:
- Street view and roads
- 3D images of terrain and buildings
- Historical imagery
- Weather
- Global awareness
- Geographic web

I program using C++ to generate the Keyhole Markup Language (KML) file. KML has the following features:
- XML data format used to display information in a geographic context.
- Earth browsers such as Google Earth read and display KML files.
- Enable you to create powerful presentations that paint your own geographic data and imagery over the global palette provided by many popular Earth browsers.


Experiment and implementation

I did lots of experiment by changing different colors and shapes for icons. Compared with all the results, I get the result in Figure 1. We need to consider many factors which may effect the result, such as the high way label in White, highway name in Blue and the high way in Yellow. In figure 1, we can see the result seems very nice and clusters are differentiated very well. The yellow color is not good to use and all other colors (red, green, light green, purple) can be used.

 

At the same time, in order to know how the disease is distributed, we need to animate the spread based on the time when the cases are detected and reported. Here is the animation version result. (You need to install Google Earth 5.0 to run this result).

             
                              Figure 1

Presentation

Presentation can be download in power point format.


References:


Paper
http://www.pnas.org/content/104/22/9404.full.pdf+html


KML learning
http://code.google.com/apis/kml/documentation/
http://www.opengeospatial.org/standards/kml/
http://code.google.com/apis/kml/documentation/kml_tut.html
http://code.google.com/apis/kml/documentation/kmlreference.html
http://www.informit.com/articles/article.aspx?p=1276353


Google Earth
http://bbs.keyhole.com/ubb/ubbthreads.php/Cat/0
http://www.godeyes.cn/html/2008/12/19/handbook_3196.html

 
Basic concepts
http://clusteralliance.org/


Acknowledgement:


Thank you very much for the help of Prof. Alexandre Francois and Prof. Lenore Cowen.
Thanks for sharon who provides the data for me to visualize.

¡¡