Corresponding lecture

Lecture 5 – Wrangling gene expression data

Description

In this lab, you’ll tackle data from the US Fish and Wildlife Service (USFWS) Law Enforcement Management Information System (LEMIS) data on wildlife and wildlife product imports into the United States. This data was obtained via Freedom of Information Act (FOIA) requests by EcoHealth Alliance and contains over 5.5 million observations of 28 variables, showing the flow of live wildlife and wildlife products into the USA from 2000-2014. Given that most infectious diseases have a zoonotic (animal to human) component, this dataset offers critical insight into the potential for the global spread of both human and animal diseases.

An analysis of a portion of this large dataset was published by EcoHealth Alliance here.

What you need to do

To start, you’ll need to download this cleaned up version of the LEMIS data

Once the download is complete, unzip the file, and set-up a new R project. Then read the data into your R environment. Don’t try to open the file in Excel….you’ll regret it. Also, don’t try to View() the object after importing into R.

To help you get started, here’s a bit of code to read in the data and explore the variables. With the data in your R environment, you’ll then want to use dplyr and ggplot to complete each of the tasks below. Tip: the magrittr pipe (%>%) is your friend.

library(tidyverse)
lemis <- read_delim("lemis_cleaned.tsv")

#take a minute to explore this huge dataset
glimpse(lemis)
  1. Identify the most common (by ‘quantity’) live mammal taken from the wild for import into the US.
  2. Building on your analysis above, produce a plot showing live mammals (use ‘generic_name’) imported for the purposes of science/research. (tip: use geom_col() in ggplot for this). Feel free to play around with different themes to make your plot more exciting.
  3. Identify the countries from which we import the most macaques (again, a simple plot will suffice).
  4. Using the same approach as above, create a plot showing the countries from which we import live bats.
  5. For what purposes do we import bats?
  6. How does the type of bat (use ‘specific_name’) imported differ between countries (hint: use facet_wrap in your ggplot code)?
  7. Identify the most expensive (by ‘value’) shipment of live mammals to enter the US.
  8. How does the answer above compare with the most expensive shipment of any kind (live or not)?
  9. You are alerted to a concerning new viral disease of humans that is believed to originate from Fruit bats (though the exact type of fruit bat is not clear). Identify the US cities that would would be most likely to be exposed to such a virus from the import of live fruit bats.
  10. A recent case of Anthrax in NYC was traced back to a contaminated Wildebeest hide that was stretched and used to make a traditional drum. Through which port(s) did this animal product most likely enter the country? (note: this actually happens).

Extended learning

  • Ask and answer a question that you care about!
  • Create an animated ggplot using the date information in LEMIS and the gganimate package