Corresponding lectures
Lecture 10 - Analysis of scRNA-seq data using R
Description
In the past two lectures, you learned to do ~90% of the common tasks associated with analyzing single cell RNA-seq (scRNA-seq) data, including QC analysis, producing UMAP dimensional reductions, labeling cell clusters, and identifying cluster-defining genes. You then applied this to spleen samples from naive and Toxoplasma gondii infected mice. In this lab, we’ll review these steps and extend what we’ve learned to intestinal samples from a recently released study from the ‘MIST’ program, an NIH/NIAID-supported group of infectious disease and mucosal immunology researchers.
What you’ll need to do
To get started, download this Seurat object. This is downsampled scRNA-seq data from the distal most portion of the small intestine (ileum) from mice.
Of course, you’ll also need to download the course DIY_scRNAseq_basic.R script
Part 1
- Read the unintegrated Seurat object into a clean R environment. Carry out filtering of the seurat object, and run through the standard workflow from normalization and scaling to creating a UMAP plot.
- Once you have plotted the umap, go back and change the number of variable genes to only 100 and rerun the PCA, clustering, and UMAP. What is different? What does this mean for how you should interpret clusters and umaps in your own data and in published data?
Tips
It’s easy to waste a ton of time in this lab if you go down a rabbit hole of running code that doesn’t need be run. Here are few tips to help:
- Do NOT worry about calculating the % mitochondrial reads, this has already been done for you. Instead, explore the metadata in the Seurat object you’ve been given.
- Do NOT try to generate an html QC report
- Try to get to integration pretty quickly, because the actual integration itself will take approximately 15-30 minutes…or longer, depending on your computer.
Part 2
Integrate the data and produce a new UMAP plot. The experimental design of this project required that each Mouse was a separate sequencing batch.
Part 3
Now that you’ve gone through the basic processing and integration, you’re ready to label clusters. Use the SingleR and celldex packages to apply labels to your cell clusters. What is the most abundant cell type present in this dataset, and what are the genes that best define this cluster?
Tips
Remember that cluster labeling is all about choosing the right reference dataset for label transfer:
- Do NOT worry about using Azimuth (unless you prefer it over SingleR/CellDex!)
- You do NOT need to use every reference dataset in CellDex…think about which one makes the most sense to use given the type of data you’re working with.
On your own
If you’re working through this lab on your own, you should try to complete both parts above. If you’re an in-person learner and were unable to attend this lab, you should pick one cell type and compare the gene expression expression in that cell type between an infection of your choice versus the same cells from naive mice. You can turn in your script and a list (or figure) showing the DEGs from your analysis.
Bonus
Ready to take your analysis a step further? Try using all cells in the dataset, and compare each infection to the naive to get DEGs. Then, make a single heatmap that shows the top 30 DEGs from all comparisons.
Solution
script
A solution script will be posted approximately 1-week after the in-person lab takes place.