Image credit: Adelaide Tyrol

Lecture slides on iCloud


Lists of differentially expressed transcripts often include different patterns or modules of genes that are coordinately regulated across treatments or conditions, and these patterns can provide powerful insight into biology. In this class you’ll use correlation-based clustering methods and heatmap visualization to interrogate DEGs to reveal modules of co-regulated genes.

Learning objectives

  • Be able to interpret and construct heatmaps
  • Understand color choice in R
  • Understand how clustering methods are used to identify coordinately expressed genes (a.k.a modules)
  • Learn to use the command-line clust program


Step 6 script

Lecture videos

Part 1 - Intro to clustering and starting the Step 6 script

Part 2 - Making and interpreting heatmaps

Part 3 - Alternative clustering methods


If you want to try Clust on your own, you’ll need to install the program first (see github page for instructions). You can test it out using the Schisto hackdash dataset to reproduce what I showed you in class. There are two files you’ll need for this: 1) this text file of DEGs in female LE-strain worms; and 2) a reps file that maps the columns (samples) to groups (conditions). You’ll need to read these two files into your R environment before running Clust. You can view an example of the output here


Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data. Genome Biology, 2018.

Colors in R - Colors palettes are an often underappreciated aspect of making beautiful and informative plots in R. You can access a suite of color palettes using the RColorBrewer package. These palettes can be viewed in this cheatsheet. Unfortunately, these standard palettes often don’t cut it, and you’ll need custom palettes. For this, I love using Sip to pick, organize and access color palettes.