Overview
Reproducing an analysis requires more than just code. You need the original raw data, access to the appropriate programming languages, and application specific packages (and often specific versions of these packages). This poses a major impediment to reproducibility, even for researchers with a background in bioinformatics. To address this challenge, you’ll learn how to ‘containerize’ your data, scripts and software, making it easy to share and rerun an entire analysis with the push of a button.
Learning objectives
- Learn how to make your research analyses reproducible
- Create a reproducible package environment with renv
- Share your project via GitHub and git
- Understand how to streamline code using custom R functions.
- Share your work as an R package
- Discuss the basics of Docker and containerized software
What you need to do
- Sign-up for a free GitHub account (doesn’t matter which email you use)
- Download this gitignore file - useful for updating your own .gitignore file in a project repo
- Download this script that walks through how to turn any analysis project into an R package. You may also want this text file as a simple starting point for data documentation, and this function file also as an example.
Lecture videos
Part 1 - Reproducibility via the renv package
Part 2 - Connecting your project to GitHub
Part 3 - Keeping your code clean via custom functions
Part 4 - How to turn your analysis project into a stand alone R package
Reading
Happy Git and GitHub with RStudio - Jenny Bryan and team walk through every step of how to install git, connect to GitHub and access version control from within RStudio.
There’s a lot of reading material for how to get started making functions and packages. Beyond the extensive and very well written book on building R packages and excellent documentation for the usethis package, you may also want to check out some great lab posts on making R packages (here, here, here, and here).
Google Collaboratory - Write, edit and share Python code directly in your browser