Course overview
precourse lectures
A brief (but important!) overview that describes how the course is structured and how you can get the most from the material on this website.
A semester-long course covering best practices for the analysis of high-throughput sequencing data from gene expression (RNA-seq) studies, with a primary focus on empowering students to be independent in the use of lightweight and open-source software using the R programming language and the Bioconductor suite of packages. This course follows a hybrid format in which online lectures are paired with in-person labs where students participate in hands-on, live coding exercises using real ‘omic datasets. The course is focused on datasets and topics central to infectious disease research, immunology, and One-Health, but the concepts and approaches covered are applicable to any genomic study.
precourse lectures
A brief (but important!) overview that describes how the course is structured and how you can get the most from the material on this website.
precourse lectures
Learn all about Illumina's 'Sequencing by Synthesis' technology, and the steps involved in planning for a transcriptomics experiment.
Lecture 1 • watch by Jan 24, 2024
In the first half of this lecture we'll discuss the open-source, cross-platform R/bioconductor software that we will use throughout the course. Then each student will set-up their own laptop to be a powerful, stand-alone bioinformatics workstation.
Lecture 2 • watch by January 31, 2024
In this class we'll finally get down to the business of using Kallisto for memory-efficient mapping of your raw reads. You'll carry out this mapping in class, right on your laptop, while we discuss what's happening under the hood. During this process, we'll touch on a range of topics, from reference files, to command line basics, and using shell scripts for automation and reproducibility.
Lecture 3 • watch by February 7, 2024
Now that we've aligned our reads, it's time to discuss units for measuring gene expression. We'll discuss differences between RPKM and TPM, and how these units relate to basic properties of your reference file and data. We'll also discuss normalization within and between samples. To conclude this class, we'll fire up RStudio and take a look at our first script.
Lecture 4 • watch by February 14, 2024
We'll begin this class by reviewing how to access R packages and help documentation, as well as understanding the basic structure of an R script and RStudio project. We'll then access annotation data before reading our Kallisto results into R.
Lecture 5 • watch by February 21, 2024
We'll begin this class by filtering and normalizing our data, all while using the ggplot2 graphing package to visualize the impact these changes have our data. You'll also be introduced to Hadley Wickham's philosophy of 'tidy data' by using the dplyr package, expanding your understanding of tools within the Tidyverse.
Lecture 6 • watch by February 28, 2024
In this class you'll learn about a variety of approaches exploring your data. You'll use multivariate statistical approaches such as Principal Component Analysis (PCA) to understand sources of variance in our data, while continuing to build your plotting skills by using ggplot2 to graph the results of PCA analyses. You'll also learn how to use the dplyr package to take control over our gene expression dataframes, allowing us to change, sort, filter, arrange and summarize large data sets quickly and easily using simple commands in R. We’ll discuss common missteps and how to identify sources of bias in transcriptional data sets.
Lecture 7 • watch by March 13, 2024
In this class we’ll discuss how you can use R/Bioconductor to tap into vast amounts of RNAseq data available through the Sequence Read Archive (SRA) and Gene Expression Omnibus (GEO).
Lecture 8 • watch by March 20, 2024
The ultimate goal of most transcriptional profiling experiments is to identify differentially expressed genes or transcripts. In this class, we'll dig into differential expression using the popular and venerable Limma package in R, while continuing to explore options for producing compelling plots from your differential expression results. Finally, we'll discuss a workflow for going beyond DGE analysis to look at differentail transcript (isoform) usage (DTU).
Lecture 9 • watch by March 27, 2024
Coordinately expressed genes, or modules, often contain a more coherent functional signature. We'll discuss strategies for clustering expression data to identify these modules, setting the stage for downstream functional enrichment analysis to be covered in the next class.
Lecture 10 • watch by April 3, 2024
Now that you've identified differentially expressed genes, what do they mean and how do you begin to elucidate the biological pathways governed by these genes? To address this question, in this class you'll learn how to carry out functional enichment analyses using Gene Ontology and Gene Set Enrichment methods. You'll also explore different options for how to present your functional enrichment results.
Lecture 11 • watch by April 10, 2024
In order to make your analysis pipeline transparent, in this class you'll use Rmarkdown and Knitr to wrap up all your code and outputs together in a dynamic document that can be placed in your lab notebook or published as a supplementary file in your manuscript.
Lecture 12 • watch by April 17, 2024
Reproducing an analysis requires more than just code. You need the original raw data, access to the appropriate programming languages, and application specific packages (and often specific versions of these packages). This poses a major impediment to reproducibility, even for researchers with a background in bioinformatics. To address this challenge, you'll learn how to 'containerize' your data, scripts and software, making it easy to share and rerun an entire analysis with the push of a button.
Lecture 13 • watch by April 24, 2024
Now that you're comfortable with bulk RNA-seq data analysis, we'll shift our focus to the rapidly developing landscape of single cell RNA-seq (scRNA-seq). In this lecture, you'll learn about the underlying technology and we'll demonstrate how to process raw single cell data directly on your laptop (!) for importing into R/bioconductor.
Lecture 14 • watch by May 1, 2024
In this lecture, you'll learn to use Seurat to analyze scRNA-seq data, including carrying out dimensional reduction and display using UMAP, identifying cell clusters and cluster-specific marker genes, and how to integrate data from multiple samples.
Lecture 15 • watch by May 8, 2024
In this final lecture of the course, you'll learn how to handle multi-omic single cell data.