Reading material

Table of Contents

There are no required readings for the course, but I’ve provided links to Ebooks, primary literature, youtube videos of lectures or technical blog posts for each topic we discuss in class.

Ebooks on general R/bioconductor

Introduction to RNAseq data and technology

Read mapping with Kallisto

papers, blogs posts and videos on Kallisto

more general info about ultra lightweight methods for transcript quantification

Understanding RNAseq count data

  • What the FPKM? - Blog post by Harold Pimentel discussing within sample normalization and the meaning of RNAseq expression units

  • Between sample normalization in RNAseq - another great blog post from Harold Pimentel on between-sample normalization.

  • Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples Theory in Biosciences, Dec 2012

  • Revisiting global Gene Expression Analysis Cell, Oct 2012. A great example of the perils of normalizing to total read depth.

  • the original manuscript describing the Trimmed Mean of M values (TMM) method for normalizing between samples.

Starting your analysis script

Exploring, graphing and wrangling your data in R

  • Take a look at some of the various ways to graph your data and the underlying R code in this catalog of R graphs

  • If you end up using R to make a lot of graphs, you will find the R Graphics Cookbook to be an important reference. It’s available free to UPenn folks as an Ebook.

  • Colors palettes are an often underappreciated aspect of making beautiful and informative plots in R. You can access a suite of color palettes using the RColorBrewer package. These palettes can be viewed in this cheatsheet. Unfortunately, these standard palettes often don’t cut it, and you’ll need custom palettes. For this, I love using Sip to pick, organize and access color palettes.

  • I mentioned various unsupervised methods for dimensional reduction of your data (PCA, MDS, T-SNE). In particular, T-SNE has become popular in representing single-cell RNAseq data, but it is also one of the more complex visualization methods to understand. Although we didn’t discuss this in class, I wanted to include a link to a great blog post describing T-SNE, as well as the original T-SNE paper. Please familiarize yourself with these if you plan on using this visualziation method.

  • Hadley Wickham (author of Dplyr, Reshape2 and ggplot2 packages) has a nice pre-print paper on the key aspects of making ‘Tidy Data’

  • This dplyr cheatsheet is useful resource to have on hand

  • Although I love working with graphs in R, sometimes I just can figure out how to produce the final graphic exactly the way I want it. So, I also really like the program DataGraph. Incredibly powerful graphing program, and very inexpensive!

Differential Gene Expression

Functional Enrichment Analysis

Producing dynamic reports with Rmarkdown

Deploying your data to the web