Table of Contents

Introduction to RNAseq data and technology

Aug 29th — After some introductions and a brief overview of the course, we’ll spend the opening lecture talking about Illumina’s ‘Sequencing by Synthesis’ technology, and walk through the steps involved in planning and budgeting for a transcriptomics experiment. After this lecture, jargon like flow cell, single-end, paired-end, indexing, cluster density, library, mRNAseq, total transcriptome, ribo-depletion, scRNAseq, and fastq will pratically roll off your tongue. Lecture slides available here.

Setting up your software environment

Sept 5thThis class will meet in Hill 301. In the first half of this lecture we’ll discuss the open-source, cross-platform software that will be used throughout the course, and each student will set-up their own laptop to be a powerful, stand-alone bioinformatics workstation. In preparation for this class, you’ll need to download and install the R Programming Language for your operating system, the graphical user interface for R, called RStudio, and the powerfull text editor Sublime. Don’t know what any of these things are? Not to worry, we’ll delve into these tools the next time we meet. Lecture slides available here.

Read mapping with Kallisto

Sept 12th and 19thThis class will meet in Hill 301. I know what you’re thinking: “please stop talking already!” Wish granted. In this class we’ll finally get down to the business of installing and using Kallisto, a new program for memory-efficient mapping of your raw reads to a reference transcriptome. You’ll carry out this mapping in class, right on your laptop. While reads are being mapped, we’ll discuss what’s happening ‘under the hood’ with Kallisto and how this compares to more traditional alignment methods. Lecture slides available here.

Measuring gene expression

Sept 26th — Now that we’ve aligned our reads, it’s a good time to discuss the units we need to use to measure gene expression. We’ll talk about the differences between RPKM and TPM, and how these units relate to basic properties of your experiment. We’ll also discuss how these units have to be handled between samples (a.k.a. normalization). To conclude this class, we’ll read our Kallisto data into the R environment. Lecture slides available here.

Exploratory analysis of expression data

Oct 3rdThis class will meet in Hill 301. Although we’re all here to find key genes and transcripts involved in our favorite biological process, it’s critical to first take a gene agnostic approach to explore the structure of our entire dataset. To do this we’ll use Principle Component Analysis (PCA) to reduce the dimensionality of our data and try to identify the variables (sex, age, treatment, etc) that have the strongest influence over the transcriptional landscape in your study. We’ll discuss common mis-steps and sources of variance in transcriptional data sets. You’ll also be introduced to the popular graphing package ggplot2, to graph the results of your PCA analysis. Lecture slides available here

Managing and tidying data in R

Oct 10th — While Excel might be great for small spreadsheets, it fails miserably at managing large datasets. In this workshop we’ll use Hadley Wickham’s dplyr package to take control over our dataframes, allowing us to change, sort, filter and arrange large data sets quickly and easily using simple commands in R.

hackdash #1

Oct 17th — You may have heard of a Hackathon before - an event that typically lasts several days and brings together people for a collaborative challenge in computer programming. Well, we don’t quite have time for a full Hackathon, so we’ll do a mad dash to the finish line instead. The class will be broken up into small groups. You’ll be emailed a problem at the start of class and each group will have two hours to come up with a solution using the tools and skills you’ve acquired in the course thus far. This is just in the spirit of learning and having fun. No grades will be given, but the first team to post the correct answer on Slack will win a prize. Good luck!

Identifying differentially expressed transcripts

Oct 24th — The ultimate goal of most transcriptional profiling experiments is to identify differentailly expressed genes or transcripts. We’ll use the program Sleuth which like Kallisto is also a product of Lior Pachter’s lab. Because Sleuth is relatively new to the game, we’ll also use the popular and venerable Limma package in R. This gives us a chance to compare and contrast these two methods for identifying DETs. We’ll also have a chance to talk about special cases when your analyses should include a paired design or correct for batch effects.

Visualizing and dissecting DETs

TBD — How do you move from spreadsheet to informative data visualization? Lists of differentially expressed transcripts often include different patterns or modules of genes that are coordinately regulated across treatments or conditions, and these patterns can provide powerful insight into biology. In this class you’ll use hierarchical clustering and heatmap visualization to interrogate DETs to reveal modules of co-regulated transcripts.

hackdash #2

TBD — Rules and teams will be the same as in the first Hackdash, but the problem will be different.

Understanding and leveraging Gene Ontology

Nov 14th — Now that you’ve identified differentially expressed genes, what do they mean and how do you begin to elucidate the biological pathways governed by these genes? Toward this end, you will learn how to carry out functional enichment analyses using Gene Ontology, a structured vocabulary that describes the biological processes and molecular functions carried out by a gene. You’ll also see some different options for how to represent your analyses in papers. Lecture slides available here.

Gene Set Enrichment Analysis (GSEA)

Nov 21st — We’ll spend the next two classes learning the principles and practice of using GSEA in conjunction with large collections of gene signatures available through the Broad Institute’s Molecular Signatures Database to discover pathways involved in your biological system. In the first class we’ll use the GSEA program from the Broad Institute to carry out out analysis. In the second class we’ll run GSEA through R.

hackdash #3

Nov 28th — In this third and final Hackdash, expect your most challening problem!

Making your analysis transparent and reproducible

Dec 5th and 12th — At this point, you are in a situation many bioinformatics folks find themselves in: you have a folder that contains some raw data, an analysis script, and outputs that include graphs, tables and other images. Unfortunately, this trail of digital breadcrumbs is hard for anyone to follow (even your future self!). In order to make your analysis pipeline transparent and reproducible, in the next two classes you’ll use R Markdown and Knitr to wrap all these elements together in a dyanmic document that can be placed in your lab notebook or published as a supplementary file in your manuscript. Lecture slides available here

Final exam

Dec 19th — During this in-class exam, I’ll test your knowledge using a fun but challenging Jeopardy style Q&A.