What is the goal of this course?

As access to high-throughput sequencing technology increases, the bottleneck in biomedical research has shifted from generating data, to analyzing and integrating diverse data types. Addressing these needs requires that students and postdocs equip themselves with a toolkit for data mining and interrogation. This course focuses specifically on studying global gene expression (transcriptomics) through the use of the R programming environment and the Bioconductor suite of software packages – a versatile and robust collection of tools for bioinformatics, statistics, and plotting. During this semester-long course, students participate in a mix of virtual lectures and guided code review, all while working with real infectious disease datasets directly on their laptop. Students will learn to analyze RNAseq data using a lightweight and reusable set of modular scripts that leverage open-source software. In addition, students will learn best practices in data science for working in R/Bioconductor, including creating interactive data visualizations, making their analyses transparent and reproducible, and identifying experimental bias in large datasets.

Who teaches the class and maintains the website?

Dan Beiting designed and teaches the course. He is an Associate Professor of Pathobiology at PennVet. Megha Lal (postdoc), Qianxuan She (grad student), Leena Babiker (grad student), Nikhil Joshi (Bioinformatician) are all former students who now serve as teaching assistants for the course in 2024.

Meet your instructors!

Who supports and sponsors this course?

This course is made possible in part by generous support for TA stipends from the UPenn Institute for Immunology (IFI) and the Biomedical Graduate Studies (BGS) program. In addition, we thank the folks at RStudio for allowing access to RStudio Server Pro for this course, and for their continued free access to RStudio desktop, which is a critical resource for academic research in R. We also thank DataCamp for generously providing free and unrestricted access to their online learning content to all students enrolled in the course.

What is the format of the course?

This class is being run as a ‘hybrid’ class. Lectures are entirely virtual and all lecture videos will be posted to this website. In-class time will be devoted to working through structured labs that focus on building better data science skills using datasets from infectious disease.

What can I expect to learn?

  • Learn to analyze bulk RNAseq and single cell RNA-seq (scRNA-seq) data
  • Develop a lightweight and reusable RNAseq pipeline.
  • Learn best practices for working in R/bioconductor (extensible to other datatypes)
  • Learn the basics of being a good ‘data scientist’
  • Learn how to report your analysis and results in a transparent and reproducible way
  • Leann how to use emerging AI tools to enchance you data analysis skills.

Who can take the course?

All lectures are freely available, and lab materials will be posted the website after each lab. In-person attendence for labs is available for gradate students in the Biomedical Graduate Studies (BGS) group at the University of Pennsylvania. Space permitting, the course is open to graduates students outside of BGS. If you are not a graduate student at UPenn you can still access the lectures, course slides, code, videos and reading material on the site. This course is ideal for students and postdocs who have little or no experience in bioinformatics, and we encourage students to bring their own RNA-seq data to the course.

Can I just follow along online?

Yes! All lectures, reading material and code are freely available and are organized on the website by lecture, so you can proceed at your own pace.

What are the advantages of taking the course in-person?

There are some elements of the course that are only available for people who have officially registered and participate in-person. This includes access to DataCamp for homework and extended learning, participation in our in-person labs, priority support from the course instructor and TAs through our Discord community page, access to Github classroom and Copilot, and last but not least, course credit!

How will I be graded in this course?

All students who officially register for the course through UPenn will receive a letter grade. At this time we are unable to provide grades or proof-of-completion for virtual learners.

Can I cite the course in my publications?

Yes! Please cite our recent open-access publication that describes the course philosophy and teaching strategies.