Corresponding lectures
Lecture 12 - Single cell RNA-seq – principles and processing Lecture 13 - Analysis of scRNA-seq data using R
Description
In the past two lectures, you learned to do ~90% of the common tasks associated with analyzing single cell RNA-seq (scRNA-seq) data, including QC analysis, producing UMAP dimensional reductions, labeling cell clusters, and identifying cluster-defining genes. You then applied this to spleen samples from naive and Toxoplasma gondii infected mice. In this lab, we’ll review these steps and extend what we’ve learned to intestinal samples from a large unpublished study from the ‘MIST’ program, an NIH/NIAID-supported group of infectious disease and mucosal immunology researchers.
What you’ll need to do
To get started, download this Seurat object. This is downsampled scRNA-seq data from the distal most portion of the small intestine (ileum) from mice.
Of course, you’ll also need to download the course DIY_scRNAseq_basic.R script
Part 1
Read the unintegrated Seurat object into a clean R environment. Carry out filtering of the seurat object, and run through the standard workflow to from normalization to creating a UMAP plot. Finally, to complete this part of the lab, integrate the data and produce a new UMAP plot.
Tips
It’s easy to waste a ton of time in this lab if you go down a rabbit hole of running code that doesn’t need be run. Here are few tips to help:
- Do NOT worry about calculating the % mitochondrial reads, this has already been done for you. Instead, explore the metadata in the Seurat object you’ve been given.
- Do NOT try to generate an html QC report
- Try to get to integration pretty quickly, because the actual integration itself will take approximately 15-30 minutes…or longer, depending on your computer.
Part 2
Now that you’ve gone through the basic processing and integration, you’re ready to label clusters. Use the SingleR and celldex packages to apply labels to your cell clusters. What is the most abundant cell type present in this dataset, and what are the genes that best define this cluster?
Tips
Remember that cluster labeling is all about choosing the right reference dataset for label transfer:
- Do NOT worry using Azimuth (unless you prefer it over SingleR/CellDex!)
- You do NOT need to use every reference dataset in CellDex…think about which one makes the most sense to use given the type of data you’re working with.
On your own
If you’re working through this lab on your own, you should try to complete both parts above. If you’re an in-person learner and were unable to attend this lab, you should pick one cell type and compare the gene expression expression in that cell type between an infection of your choice versus the same cells from naive mice. You can turn in your script and a list (or figure) showing the DEGs from your analysis.
Bonus
Ready to take your analysis a step further? Try using all cells in the dataset, and compare each infection to the naive to get DEGs. Then, make a single heatmap that shows the top 30 DEGs from all comparisons.
Solution
script
Download this script to see my answers to the questions above.