Lecture slides

Homework: Introduction to the Tidyverse (~4hrs) - due before the start of class today!


In this class we’ll use multivariate statisical methods, including Principal Component Analysis (PCA), to explore how experimenal covariates (e.g., sex, age, treatment) contribute to the overall structure of our data. We’ll discuss common missteps and sources of variance in transcriptional data sets. You’ll also continue to build your plotting skills by using ggplot2 to graph the results of PCA analyses.


  • Start/finish step 3 script
  • Discuss basics of multivariate statistical analysis
  • Carry out hierarchical clustering of samples
  • Discuss and perform principal component analyses (PCA)
  • Produce ‘small multiples’ plot
  • Use standard dplyr ‘verbs’ to quickly query our data
  • Produce interactive graphics using the plotly package
  • Produce interactive tables with DT package


Step 3 script


Blog post describing T-SNE - I mentioned various unsupervised methods for dimensional reduction of your data (PCA, MDS, T-SNE). In particular, T-SNE has become popular in representing single-cell RNAseq data and flow cytometry data, but it is also one of the more complex visualization methods to understand.

Original T-SNE paper.

UMAP - A new algorithm, called uniform manifold approximation and projection (UMAP) has been recently published and is gaining popularity in single cell RNAseq and flow cytometry analysis. UMAP is proposed to preserve as much of the local and more of the global data structure than t-SNE, with a shorter run time.