Overview
Now that we’ve aligned our reads, it’s time to discuss units for measuring gene expression. We’ll discuss differences between RPKM and TPM, and how these units relate to basic properties of your reference file and data. We’ll also discuss normalization within and between samples. To conclude this class, we’ll fire up RStudio and take a look at our first script.
Learning objectives
- Review steps from last class (using Kallisto).
- Discuss output from Kallisto and units of measurement for RNAseq and ‘normalization’
- Start an RStudio Project directory that we’ll use for the rest of the course.
- Open and discuss our first script, including installation of packages
If you’re new to R
Please take time to work through this Learn R! module
Code
Lecture videos
Preamble
Part 1 - Measuring digital gene expression
Part 2 - Starting our R project and step 1 script
Reading
The RNA-seq abundance zoo - Blog post by Rob Patro (developer of Salfish and Salmon software) that describes units for RNAseq, and has a nice description of ‘effective length’ for transcripts.
What the FPKM? - Blog post by Harold Pimentel discussing within sample normalization and the meaning of RNAseq expression units
Between sample normalization in RNAseq - another great blog post from Harold Pimentel on between-sample normalization.