In this lab you’ll work with a large unpublished scRNA-seq dataset from PBMCs, courtesy of John Wherry’s lab, Allie Greenplate, and the UPenn Immune Health group. This PBMC dataset expands on a study that they recently published in Science, and includes a subset of the same patients, selected to represent a range in disease severity. Your goal is to import these data into R and carry out an analysis of your choosing.
What you need to do
I’ve already pre-processed the data for you using Kallisto-Bustools. Integrating data from all patients and healthy controls took over 5 hours on my laptop! You will be given a single integrated Seurat object and study design file that describes the metadata associated with patient. Note, severity scores are based on the NIH scale shown below.
Your goal is to apply the clustering, annotation and analysis approaches you learned in Lectures 13 and 14 to this dataset. Although each student team is free to approach this final project however they wish, I suggest choosing one or more cell clusters and comparing the expression of these cells between COVID patients and healthy controls, and/or between COVID patients that differ by disease severity. Don’t forget that all the useful code you’ve learned for handling bulk RNA-seq (e.g. heatmap generation, functional enrichment analysis, etc) could be used with any gene lists you identify in your analysis.
Each team will have the 2hrs in this lab to get started, as well as the remaining full week leading up to our final lab (Dec 8th) to work together to prepare one or more figures that summarize their findings. Teams will present their results (10-12 min per team) to your classmates as well as the researchers who generated the data!
ImmuneHealth.seurat - This Seurat object contains integrated data for PBMCs from 12 COVID patients and 4 healthy controls, for a total of ~65,000 cells.
ImmuneHealth_COVID_studyDesign.txt - This file contains the metadata for each patient included in the integrated Seurat object above.