Single cell RNA-seq – principles and processing

Overview

Now that you’re comfortable with bulk RNA-seq data analysis, we’ll shift our focus to the rapidly developing landscape of single cell RNA-seq (scRNA-seq). In this lecture, you’ll learn about the underlying technology and we’ll demonstrate how to process raw single cell data directly on your laptop (!) for importing into R/bioconductor.

Learning objectives

Understand droplet-based scRNA-seq technology
Be able to compare and contrast single cell and bulk RNA-seq methods
Understand cost and experimental design considerations for scRNA-seq experiments.
Familiarity with multiplexed single cell assays (CITE-seq, ‘multiome’, TEA-seq)
Be able to define common terms and concepts in single cell genomics
Use Kallisto-BUStools to preprocess raw scRNA-seq data (via kb-python)

What you need to do

Download raw files. You will need about 5Gb of storage space on your harddrive to accomodate this download. please do not uncompress these files (leave them as .gz files). This is data from 1000 peripheral blood mononuclear cells (PBMCs) and is one of the sample datasets provided by 10X Genomics here. I merged the separate lane files to make this simpler to work with for the course.

Human transcriptome reference index file - this is the index you created using Kallisto way back in lecture 2. If you don’t have this, remember it’s easy to create using kallisto index.

t2g.txt - this is a human transcript-to-gene mapping file that we will use with Kallisto-Bustools to preprocess our data. This file is easy to generate with kb ref, but downloading it now will save you some time.

kb-python - You will need to have this software installed in a Conda environment on your laptop. We did this way back in lecture 1. If you are unable to install or use kb-python, just follow along with the lecture so you understand the concepts.

Lecture videos

Part 1 – Intro to single cell RNA-seq (scRNA-seq)

Part 2 – Practical considerations for single cell experiments

Part 3 – Pre-processing scRNA-seq data using Kallisto-bustools

Reading

Modular and efficient pre-processing of single-cell RNA-seq - describes the full Kallisto-Bustools workflow for memory efficient processing of scRNA-seq data.

The barcode, UMI, set format and BUStools, Bioinformatics - Describes the BUS format as an efficient and platform-independent way to store information from scRNA-seq data.

A curated database reveals trends in single-cell transcriptomics - describes the growing collection of scRNA-seq experiments found here, which I used to produce two of the plots in the slides for this lecture.

Acknowledgements

I am deeply grateful to Dr. Eoin Whelan for hours of conversation that helped structure this lecture and code.