Image credit: “Passing Time” by Lucia Martinez

The worm challenge

A colleague has asked for your help to mine data produced from a very large RNAseq study of the parasitc worm, Schistosoma mansoni. In this experiment, male (M), female (F), juvenile (J) and mixed sex (X) worms were recovered from infected mice at various timepoints (control, 3hr, 12hr, and 24hr) following in vivo treatment with a low dose of the frontline anti-parasitic drug, praziquantel. Experiments were carried out with three different strains of worms: NMRI, LE, and LEPZQ. In total, 144 samples were sequenced. To begin this challenge, you’ll need to Download the processed data and the study design and read both files into a clean R project.

To succeed at the worm challenge, you will need to use the tools you’ve learned in class for exploring multivariate data and wrangling dataframes. Note: you do not need to filter, normalize, or preprocess the data in any way. The data is represented at Log2 CPM.

The first team to submit the most complete answer (via the #hackdash channel on Slack) to the following two questions will win the challenge.

Question #1

Which biological variables explain the majority of the variance in this dataset? Please include PCA plot(s) to support your answer.

Question #2

Using dplyr ‘verbs’, identify the top 10 parasite genes induced by praziquantel treatment in female LE strain worms at the 24hr timepoint compared to control worms. The top 10 genes should be selected and ordered by average Log fold change for the groups of replicates. Please include a table to support your answer.

Bonus

Anything interesting stand out to you about the top genes on your list (requires a literature search)?