Front | Resume | Data Analysis and Applications | MSI Course Work | Writings | Contact

Part 1 : Pie | Part 2 : Exploration | Part 3 : Factors | Part 4 : Layers | Part 5 : Geoms/Stats | Part 6 : Polishing | Part 7 : Final Thoughts

: EPA TRI Data Analysis : Part 1

Introduction and Making a Toxic Pie

The goal of this exercise is to demonstrate the capabilities of R and ggplot2 for use in exploratory data analysis.

For the analysis I have chosen a dataset from the EPA, the early version of the 2008 Toxics Release Inventory "a publicly available EPA database that contains information on toxic chemical releases and waste management activities reported annually by certain industries as well as federal facilities."

Click here to go the the official EPA site to find out more or download different datasets.

The dataset comes as a tab-delimited .csv file, and can be downloaded from data.gov

This dataset will be used for a series of exercises in analysis and visualization. The results of the work, along with the commands used to achieve those results, will be posted here.

Section 1 : Preparing the Data

The data downloads into a self-extracting .zip file. This will automatically decompress when executed on a Windows machine, on a Mac or something else you might have to do a little more work.

Also, to save some time and learn from my mistakes, you should open the data sheet in Excel or another spreadsheet program and add a new column to the start of the file that will hold a unique id number for each row.

Section 2 : Loading Data into R


Step 1: Load the ggplot library (assuming it is already installed on your system).

library(ggplo2)

Step 2: Read the text file into a data object in R. This command is based on a Windows version of R - Mac, Linux or other users won't have to use the double slashes.

chem = read.delim("C:\\Users\\dnfehren\\Desktop\\tri_2008_US_v08.txt")

Section 3 : Getting a sub-set of the data


Step 1: Attach the data object to your R workspace, this will save some typing later.

attach(chem)

Step 2: Grab just the rows of the dataset that deal with Washtenaw County (or choose your own county), and assign it to a new data object.

local_chem <- subset(chem, County == 'WASHTENAW')

Section 4 : Making the pie


Step 1: Create the initial ggplot object by telling ggplot where your data is coming from and basic aesthetic information about the factors and fill color.

pie <- ggplot(local_chem, aes(x=factor(1), fill = factor(Chemical)))

Step 2: A pie chart is really a bar chart mapped using polar coordinates, so the next layer of the graph adds a bar geometric element of a specific width and with a black border.

pie = pie + geom_bar(width=5, color="black")

Step 3: The last step is to layer the polar coordinate system on top of those bar geometries to get the appearance of a pie. The angle of the pie slice, theta, is taken from the y coordinate in what would have been a bar chart/p>

pie = pie + coord_polar(theta="y")

Step 4: display the pie

pie

Images

Click for larger images

pie chart of chemicals dumped in Washtenaw County in 2008

Files

Pie Maker R Script this can be loaded in R and used to reproduce this exercise's commands.

Zip compressed tab-delimited text file of the data used in this part.