Welcome to DIY Transcriptomics

A full course covering best practices for RNAseq data analysis, with a primary focus on empowering students to be independent in the use of lightweight and open-source software and the R/bioconductor environment.

Pre-course preparation

review prior to course start

Although there are no formal prerequisties for the course, and no official course text book, here you'll find some introductory reading material and tips on how to best prepare in advance so you get the most from the course material.

Intro to RNAseq technology and data

Lecture 1 • 2019.01.23 • Hill 130 • 3-5pm

After a brief overview of the course, we'll spend the opening lecture talking about Illumina's 'Sequencing by Synthesis' technology, and walk through the steps involved in planning for a transcriptomics experiment.

Setting up your software environment

Lecture 2 • 2019.01.30 • Hill 131 • 3-5pm

In the first half of this lecture we'll discuss the open-source, cross-platform R/bioconductor software and environment that will be used throughout the course, and each student will set-up their own laptop to be a powerful, stand-alone bioinformatics workstation.

Ultra-fast read mapping with Kallisto

Lecture 3 • 2019.02.06 • Hill 130 • 3-5pm

In this class we'll finally get down to the business of using Kallisto for memory-efficient mapping of your raw reads. You'll carry out this mapping in class, right on your laptop, while we discuss what's happening under the hood. During this process, we'll touch on a range of topics, from reference files, to command line basics, and using shell scripts for automation and reproducibility.

Understanding RNAseq count data

Lecture 4 • 2019.02.13 • Hill 131 • 3-5pm

Now that we've aligned our reads, it's time to discuss units for measuring gene expression. We'll discuss differences between RPKM and TPM, and how these units relate to basic properties of your reference file and data. We'll also discuss normalization within and between samples. To conclude this class, we'll fire up RStudio and take a look at our first script.

Starting your R workflow

Lecture 5 • 2019.02.20 • Hill 131 • 3-5pm

We'll begin this class by reviewing how to access R packages and help documentation, as well as understanding the basic structure of a script. We'll then access annotation data and read our Kallisto results into R. Our class concludes with discussing study design and using Sleuth for differentially transcript expression analysis.

Wrangling gene expression data

Lecture 6 • 2019.02.27 • Hill 130 • 3-5pm

After filtering and normalizing our data in R, we'll apply the 'grammar of graphics' and the principles of 'tidy data' to plot, change, sort, filter, arrange and summarize expression data quickly and easily.

Multivariate analysis

Lecture 7 • 2019.03.06 • Hill 130 • 3-5pm

In this class we’ll use multivariate statisical methods, including Principal Component Analysis (PCA), to explore how experimenal covariates contribute to variance in our data. We’ll discuss common missteps and sources of variance in transcriptional data sets. You’ll also continue to build your plotting skills by using ggplot2 to graph the results of PCA analyses.

Accessing public data

Lecture 8 • 2019.03.13 • Hill 131 • 3-5pm

In this class we’ll discuss how you can use R/Bioconductor to tap into vast amounts of RNAseq data available through the Sequence Read Archive (SRA) and Gene Expression Omnibus (GEO).

Hackdash I

Lecture 9 • 2019.03.18 • Hill 130 • 3-5pm

You may have heard of a Hackathon before - an event that typically lasts several days and brings together people for a collaborative challenge in computer programming. Well, we don't quite have time for a full Hackathon, so we'll do a mad dash to the finish line instead. The class will be broken up into small groups. No grades will be given, but fun will be had by all!

Differential gene expression

Lecture 10 • 2019.03.20 • Hill 131 • 3-5pm

The ultimate goal of most transcriptional profiling experiments is to identify differentailly expressed genes or transcripts. In this class, we'll dig into differential expression using the popular and venerable Limma package in R. Finally, we'll continue to explore options for producing compelling plots from your differential expression results.

Module identification

Lecture 11 • 2019.04.03 • Hill 130 • 3-5pm

Coordinately expressed genes, or modules, often contain a more coherent functional signature. We'll discuss strategies for clustering expression data to identify these modules, setting the stage for downstream function enrichment analysis to be covered in the next class.

Functional enrichment analysis

Lecture 12 • 2019.04.10 • Hill 130 • 3-5pm

Now that you've identified differentially expressed genes, what do they mean and how do you begin to elucidate the biological pathways governed by these genes? To address this question, in this class you'll learn how to carry out functional enichment analyses using Gene Ontology and Gene Set Enrichment methods. You'll also explore different options for how to present your functional enrichment results.

Hackdash II

Lecture 13 • 2019.04.17 • Hill 130 • 3-5pm

In this second and final Hackdash, expect your most challening problem – one that will incorporate both differential gene expression analysis and downstream functional enrichment analysis.

Making your analysis transparent and reproducible

Lecture 14 • 2019.04.24 • Hill 130 • 3-5pm

In order to make your analysis pipeline transparent and reproducible, in this class you'll use Rmarkdown and Knitr to wrap up all your code and outputs together in a dynamic document that can be placed in your lab notebook or published as a supplementary file in your manuscript.

Final exam

Lecture 15 • 2019.05.01 • Hill 130 • 3-5pm

During this in-class exam, I'll test your knowledge using a fun but challenging Jeopardy style Q&A (and we'll eat pizza!).