Image credit: Daniel Horowitz for NPR

Lecture slides on iCloud


We’ll begin this class by filtering and normalzing our data, all while using the ggplot2 graphing package to visualize the impact these changes have our data. You’ll also be introduced to Hadley Wickham’s philosophy of ‘tidy data’ and expand your understanding of tools within the tidyverse.

Learning objectives

  • Briefly review ‘the essentials’ from Step 1 script
  • Start and finish the Step 2 script
  • Understand the concept of a layered ‘grammar of graphics’ and how to use ggplot2
  • Discuss basics of ‘tidy’ vs messy data and the tidyr package
  • Filter data to remove lowly expressed genes
  • Normalize data (to allow between-sample comparisons)


Step 2 script

Lecture video

“Warning - videos below are from 2020 lectures and will soon be updated for 2021.”

Part 1 - Starting Step 2 script

Part 2 - Walking through the Step 2 script, and relating our work to the ‘grammar of graphics’ and ‘tidy’ data


original TMM normalization manuscript.

Tidy Data - Hadley Wickham (author of Tidyverse packages and Chief Scientist at RStudio) describes the philosophy of tidy data in this paper.

Grammar of graphics - Another paper by Hadley Wickham. This one explains the rationale behind ggplot2.

ggplot2 cheatsheet – a very helpful guide as we continue to use ggplot2 for all of our plotting needs.

catalog of R graphs - Take a look at some of the various ways to graph your data and the underlying R code in this

R Graphics Cookbook - If you end up using R to make a lot of graphs, you will find the to be an important reference. It’s available free to UPenn folks as an Ebook.