No lecture slides for this class. We’ll spend the entire time working on scRNA-seq script.
With your data already preprocessed with Kallisto-Bustools, you’re now ready to import into R and use a variety of packages to filter, plot and analyze your data.
- Be able to import preprocessed data into R and create a Seurat object
- Carry out filtering using DropUtils
- Generate a Quality Control report of your scRNA-seq data directly within R
- Use standard QC metrics and plots to filter your data
- Generate clusters and visualize via UMAP dimensional reduction
- Find cluster-specific marker genes with Seurat
- Annotate unknown clusters using public databases and CellAssign and SingleR
- Integrate multiple samples and use sample details to analyze integrated data
Code and files
pre-processed data for 1000 PBMCs - You only need to download this if you were unable to use kb-python in the last lecture to process raw scRNA-seq data. This ensures that everyone can follow along with this lecture, regardless of whether you were able to install or use Kb-python.
DIY_scRNAseq.R - this is the R script that we’ll use for this lecture.
functions.R - this is the custom R function we’ll use for generating a QC report with our scRNA-seq data (see Reading material below for source).
Seurat objects - this folder contains two Seurat objects from an unpublished mouse experiment (courtesy of Chris Hunter’s lab). One sample is from a naive control mouse, while the second is from a mouse infected with the protozoan parasite, Toxoplasma gondii (14 days post-infection). We’ll use these data in the second 1/2 of the lecture to practice integration and differential gene testing between conditions.
Part 1 – Importing scRNA-seq data into R and carrying out basic QA analysis.
Part 2 – Dimensional reduction with UMAP, and cluster identification
Part 3 – Integration of multiple samples and working with sample metadata
EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data.. - This is the paper describing the DropletUtils package that we use in this lecture to identify empty drops.
Sarah Ennis’ Github repo for preprocessing scRNA-seq data - This is the source of the custom script we use to generate the CellRanger-esque html QC report.
Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling - Describes the CellAssign algorithm and R package that we use to identify clusters.
Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage – describes the SingleR and celldex packages that allow us to leverage bulk RNA-seq data in public repositories to curate clusters in our scRNA-seq.
Comprehensive Integration of Single-Cell Data - This 2019 paper describes the underlying statistical approach for data integration in Seurat.
Letitia Parcalabescu (and Ms. Coffee Bean) explains UMAP in 10min. Great video!
I owe a big THANK YOU to Lindsey Shallberg and Dr. Chris Hunter for their willingness to share scRNA-seq data from their experiment in Toxoplasma-infected mice, which is used at the end of this lecture to illustrate data integration and downstream analysis.