Wrangling gene expression data – DIY.transcriptomics

Image credit: Daniel Horowitz for NPR

Overview

We’ll begin this class by filtering and normalzing our data, all while using the ggplot2 graphing package to visualize the impact these changes have our data. You’ll also be introduced to Hadley Wickham’s philosophy of ‘tidy data’ and expand your understanding of tools within the tidyverse.

Learning objectives

Briefly review ‘the essentials’ from Step 1 script
Start and finish the Step 2 script
Understand the concept of a layered ‘grammar of graphics’ and how to use ggplot2
Discuss basics of ‘tidy’ vs messy data and the tidyr package
Filter data to remove lowly expressed genes
Normalize data (to allow between-sample comparisons)

Code

Step 2 script

Lecture videos

Part 1 - Starting Step 2 script

Part 2 - Walking through the Step 2 script, and relating our work to the ‘grammar of graphics’ and ‘tidy’ data

Reading

original TMM normalization manuscript.

Tidy Data - Hadley Wickham (author of Tidyverse packages and Chief Scientist at RStudio) describes the philosophy of tidy data in this paper.

Grammar of graphics - Another paper by Hadley Wickham. This one explains the rationale behind ggplot2.

ggplot2 cheatsheet – a very helpful guide as we continue to use ggplot2 for all of our plotting needs.

catalog of R graphs - Take a look at some of the various ways to graph your data and the underlying R code in this

R Graphics Cookbook - If you end up using R to make a lot of graphs, you will find the to be an important reference. It’s available free to UPenn folks as an Ebook.