Image credit: Picture-alliance/AP Photo/M. Schreiber

Corresponding lecture

Lecture 7 – Differential gene expression

Description

You will be given raw counts and a study design file for an extensive RNA-seq exploration of the host response to SARS-CoV-2 and related viruses in different cell lines, tissues and time points, and which include human primary samples as well as in vivo studies in ferrets – an incredible dataset from Benjamin TenOever’s lab at Mt Sinai! You should check out their recent Cell paper: Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19. This lab certainly falls into the category of ‘easier said than done’.

What you’ll need to get started

Task 1 - pick a question and get to work

Your challenge is to look at the study design file and decide on a question you would like to ask using this highly multivariate dataset, and then carry out your analysis using code from the course.

You have a full week to complete this lab. Use this time to work through the dataset either alone, or together with your classmates, but each student must turn in a final figure to get credit for the lab. Please include a description of your question and your interpretation of the results.

Tips

  • You can ignore the step 1 script for this challenge since we didn’t align the data, but instead got raw counts directly from the authors entry in the Gene Expression Omnibus repository (GEO).
  • Since the data is already in the form of a count table (genes as rows and samples as columns) you don’t need to worry about annotations either. Go straight to creating a DGEList object.
  • dplyr is going to be critical in this challenge, as you will need to wrangle the study design file and the raw count tables to get what you need to address your question(s).

On your own

If you were not able to attend this lab, please complete Task 1 and turn it in to the TAs via Discord.

Discussion points

  • Thinking about what you want to ask with a large dataset, learning to priortize questions when there are many possible things you could ask, and then constructing your analysis around one or a few key questions.
  • Flying solo on a data analysis project, then coming together with collaborators at different points in the project to see how colleagues differ in their approach and perspective, then incorporating these different perspectives in a final product is a key part of the research process when large datasets are involved.