The enrichment lab – DIY.transcriptomics

Image credit: Midjourney v5.2 with prompt 'abstract pathway made from mosaic pieces'

Corresponding Lectures

Lecture 9 - Module identification Lecture 10 - Functional enrichment analysis

Description

Throughout this course, we’ve been comparing normal skin with the lesional skin from patients with leishmaniasis. A legitimate criticism of any findings we may have made with these data up to this point (DEGs, enriched pathways, etc) is that these findings are not unique to leishmaniasis, and that we would in fact see exactly the same results if we were to look at any inflammatory skin condition. Is that true? In this lab, we’ll get to the heart of this question by using functional enrichment to look at pathways associated with two other inflammatory skin conditions, atopic dermatitis (AD) and psoriasis. Unlike leishmaniasis, which is a neglected tropical disease not found in the US, AD and psoriasis are common in the US, affecting approximately 15 and 7 million Americans, respectively (7% and 3.5% of the population, respectively).

For this lab, we’ll use data from this really nice 2019 Journal of Investigative Dermatology paper

Plan your approach

To help you succeed in this lab, consider using the following approach:

use provided sample IDs (Part 1) to get data from ARCHS4 and make a DGEList
filter and normalize data (don’t worry about plotting after filtering and normalizing)
use provided study design file (Part 2) to capture variables of interest and carry out a PCA
set up a design and contrast matrix for the comparisons of interest

Part 1

You’ll begin this lab by using the skills you learned in Lecture 7: Accessing Public Data and the Step4_publicData.R script to retrieve the Kallisto count data for the study above. To help you get started, I’ve made a list of all the sample IDs you’ll want to pull from ARCHS4 HERE. You can read these IDs into R and move forward with querying ARCHS as follows:

mySamples <- read_tsv("skin_ids.tsv")
mySamples <- mySamples$ids
my.sample.locations <- which(all.samples.human %in% mySamples)
# now use 'my.sample.locations' to pull all the data from ARCHS4 using the Step 7 script
# remember, this is HUMAN data

Part 2

Create a PCA plot to explore the entire dataset you just retrieved in Part 1 above. To make this PCA meaningful, you’ll need some sample metadata. Since it’s always a bit of a headache to get sample metadata from ARCHS4, I’ve taken the liberty of doing it for you. Download this study design file that I created. I suggest you combine ‘patient_condition’ and ‘skin_type’ into a single variable as follows, and use this variable to color your PCA. How do you interpret this PCA?

condition <- factor(studyDesign$patient_condition)
skin_type <- factor(studyDesign$skin_type)
condition_skin <- factor(paste(condition,skin_type,sep="_"))

Part 3

After exploring the data above, use CAMERA from the Limma package to carry out functional enrichment analysis using a local copy of the C2 ‘Canonical Pathways’ (C2CP) collection of signatures from MSigDB. I suggest you download the the .gmt file for this collection directly from MSigDB HERE. For each of the comparisons below, what pathways seem to uniquely define these two related but distinct diseases?

psoriasis vs healthy control
atopic dermatitis vs healthy control
atopic dermatitis vs psoriasis

On your own

If you’re working through this lab on your own, you should complete all three parts above. If you’re an in-person learner and were unable to attend this lab, you should turn in an answer with your code for all three parts to the TAs before the start of class next week to get credit.

Solution

script

Download this script to see my answers to the questions above.