Description
In this lab you’ll apply your data wrangling skills to analyze data from the US Fish and Wildlife Service (USFWS) Law Enforcement Management Information System (LEMIS) data on wildlife and wildlife product imports into the United States. This data was obtained via Freedom of Information Act (FOIA) requests by EcoHealth Alliance and contains over 5.5 million observations (rows) of 28 variables (columns), showing the flow of live wildlife and wildlife products into the USA at every major port city from 2000-2014.
An analysis of a portion of this large dataset was published by EcoHealth Alliance here.
What you need to do
- Download this cleaned up version of the LEMIS data
- Once the download is complete, unzip the file, and read the data into your R environment. Don’t try to open the file in Excel….you’ll regret it. Also, do not try to
View()
the object after importing into R….it’s simply too large.
To help you get started, here’s a bit of code to read in the data and explore the variables. With the data in your R environment, you’ll then want to use dplyr and ggplot to complete each of the tasks below. Tip: the magrittr pipe (%>%) is your friend.
library(tidyverse)
lemis <- read_delim("lemis_cleaned.tsv")
#take a minute to explore this huge dataset
glimpse(lemis)
#with large datasets like this, it's useful to know how to see the 'levels' for any variable of interest
unique(lemis$description)
Using Tidyverse tools, do your best to come up with answers to the following questions related to the LEMIS dataset. Reach out to the TAs if you need help.
- Identify the most common (by ‘quantity’) live mammal taken from the wild for import into the US.
- Building on your analysis above, produce a plot showing live mammals (use ‘generic_name’) imported for the purposes of science/research. (tip: use
geom_col()
in ggplot for this). Feel free to play around with different themes to make your plot more exciting. - Identify the countries from which we import the most macaques (again, a simple plot will suffice).
- Using the same approach as above, create a plot showing the countries from which we import live bats.
- For what purposes do we import bats?
- How does the type of bat (use ‘specific_name’) imported differ between countries (hint: use
facet_wrap
in your ggplot code)? - Identify the most expensive (by ‘value’) shipment of live mammals to enter the US.
- How does the answer above compare with the most expensive shipment of any kind (live or not)?
- You are alerted to a concerning new viral disease of humans that is believed to originate from Fruit bats (though the exact type of fruit bat is not clear). Identify the US cities that would would be most likely to be exposed to such a virus from the import of live fruit bats.
- A recent case of Anthrax in NYC was traced back to a contaminated Wildebeest hide that was stretched and used to make a traditional drum. Through which port(s) did this animal product most likely enter the country? (note: this actually happens).
On your own
If you’re working through this lab on your own, you should be able to complete at least one of the ten questions listed above. If you’re an in-person learner and were unable to attend this lab, you should turn in an answer with your code to at least one of these question to the TAs before the start of class next week to get credit.
Solution
script
Download this script to see my answers to the questions above.