Analysis of scRNA-seq data using R

No lecture slides for this class. We’ll spend the entire time working on scRNA-seq script.

Overview

With your data already preprocessed with Kallisto-Bustools, you’re now ready to import into R and use a variety of packages to filter, plot and analyze your data.

Learning objectives

Be able to import preprocessed data into R and create a Seurat object
Carry out filtering using DropUtils
Generate a Quality Control report of your scRNA-seq data directly within R
Use standard QC metrics and plots to filter your data
Generate clusters and visualize via UMAP dimensional reduction
Find cluster-specific marker genes with Seurat
Annotate unknown clusters using public databases and CellAssign and SingleR
Integrate multiple samples and use sample details to analyze integrated data

Code and files

pre-processed data for 1000 PBMCs - You only need to download this if you were unable to use kb-python in the last lecture to process raw scRNA-seq data. This ensures that everyone can follow along with this lecture, regardless of whether you were able to install or use Kb-python.

DIY_scRNAseq_basic.R - this is the R script that we’ll use for this lecture.

functions.R - this is the custom R function we’ll use for generating a QC report with our scRNA-seq data (see Reading material below for source).

Seurat objects - this folder contains two Seurat objects from an unpublished mouse experiment (courtesy of Chris Hunter’s lab). One sample is from a naive control mouse, while the second is from a mouse infected with the protozoan parasite, Toxoplasma gondii (14 days post-infection). We’ll use these data in the second 1/2 of the lecture to practice integration and differential gene testing between conditions.

Lecture videos

Note: the lecture videos below are not longer up-to-date with the most current script. Updated videos are currently in production. Check back soon!

Part 1 – Importing scRNA-seq data into R and carrying out basic QA analysis.

Part 2 – Dimensional reduction with UMAP, and cluster identification

Part 3 – Integration of multiple samples and working with sample metadata

Reading

EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data.. - This is the paper describing the DropletUtils package that we use in this lecture to identify empty drops.

Sarah Ennis’ Github repo for preprocessing scRNA-seq data - This is the source of the custom script we use to generate the CellRanger-esque html QC report.

Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling - Describes the CellAssign algorithm and R package that we use to identify clusters.

Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage – describes the SingleR and celldex packages that allow us to leverage bulk RNA-seq data in public repositories to curate clusters in our scRNA-seq.

Comprehensive Integration of Single-Cell Data - This 2019 paper describes the underlying statistical approach for data integration in Seurat.

Acknowledgements

I owe a big THANK YOU to Lindsey Shallberg and Dr. Chris Hunter for their willingness to share scRNA-seq data from their experiment in Toxoplasma-infected mice, which is used at the end of this lecture to illustrate data integration and downstream analysis.