Image credit: 'Abacus' ca. 1946: textile, Paul Rand

Lecture slides on iCloud

Overview

Now that we’ve aligned our reads, it’s time to discuss units for measuring gene expression. We’ll discuss differences between RPKM and TPM, and how these units relate to basic properties of your reference file and data. We’ll also discuss normalization within and between samples. To conclude this class, we’ll fire up RStudio and take a look at our first script.

Learning objectives

  • Review steps from last class (using Kallisto).
  • Discuss output from Kallisto and units of measurement for RNAseq and ‘normalization’
  • Start an RStudio Project directory that we’ll use for the rest of the course.
  • Open and discuss our first script, including installation of packages

If you’re new to R

Please take time to work through this Learn R! module

Code

Step 1 script


Lecture videos

Preamble

Part 1 - Measuring digital gene expression

Part 2 - Starting our R project and step 1 script


Reading

The RNA-seq abundance zoo - Blog post by Rob Patro (developer of Salfish and Salmon software) that describes units for RNAseq, and has a nice description of ‘effective length’ for transcripts.

What the FPKM? - Blog post by Harold Pimentel discussing within sample normalization and the meaning of RNAseq expression units

Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory in Biosciences, Dec 2012

Between sample normalization in RNAseq - another great blog post from Harold Pimentel on between-sample normalization.