As access to high-throughput sequencing technology increases, the bottleneck in biomedical research has shifted from generating data, to analyzing and integrating diverse data types. Addressing these needs requires that students and postdocs equip themselves with a toolkit for data mining and interrogation. This course focuses specifically on studying global gene expression (transcriptomics) through the use of the R programming environment and the Bioconductor suite of software packages – a versatile and robust collection of tools for bioinformatics, statistics, and plotting. During this semester-long course, students participate in a mix of lectures and guided code review, all while working with real datasets directly on their laptop. Students will learn to analyze RNAseq data using a lightweight and reusable set of modular scripts that leverage open-source software. In addition, students will learn best practices in data science for working in R/Bioconductor, including creating interactive data visualizations, making their analyses transparent and reproducible, and identifying experimental bias in large datasets.
The course is taught by Dan Beiting, Assistant Professor of Pathobiology at PennVet. Camila Amorim and Alex Berry, both post-docs, serve as teaching assistants.
Support and Sponsors
This course is made possible in part by generous support for TA stipends from the UPenn Institute for Immunology (IFI). In addition, we thank RStudio for allowing access to RStudio Server Pro for this course, and for their continued free access to RStudio desktop, which is a critical resource for academic research in R. We also thank DataCamp for generously providing free and unrestricted access to their online learning content to all students enrolled in the course. Finally, we thank the folks at Code Ocean, who provide all students with convenient access to dockerized resources for transparency and reproducibility.
This class is being run as an online virtual course that will ‘meet’ twice per week. These virtual classes require active participation (usually in form of coding and data analysis), so be prepared to work along with the videos. In addition to recorded lectures, the course consists of three live ‘hackdash’ data analysis challenges, which will be hosted via Zoom.
- Learn to analyze your own RNAseq data
- Develop a lightweight and reusable RNAseq pipeline.
- Learn best practices for working in R/bioconductor (extensible to other datatypes)
- Learn the basics of ‘data science’
- Learn how to report your analysis and results in a transparent and reproducible way
Who can take the course?
This is a gradate level course offered through the Cell And Molecular Biology (CAMB) graduate group at the University of Pennsylvania. Space permitting, the course is open to students outside of CAMB. If you are not a graduate student at UPenn you can still access the course slides, code, videos and reading material on the site. This course is ideal for biomedical graduate students and postdocs who have little or no experience in bioinformatics, and we encourage students to bring their own RNAseq data to the course.
Can I just follow along online?
For the most part. All lectures, reading material and code are freely available and are organized on the website by lecture, so you can proceed at your own pace. However, there are some elements of the course that are only available for people who have officially registered. This include access to DataCamp for homework and extended learning, participation in our in-class hackathons (a.k.a. hackdashes), access to our course Code Ocean group, access to our class Slack page for 1:1 help from the instructor and our TAs throughout the course (and with your own data), and last but not least, course credit.
How will I be graded in this course?
All students taking the course for credit will be assigned a pass/fail grade. Pass/fail will be determined based on performance on three in-class hack-a-thons (called hackdashes) and completion of homework assignments.