Corresponding lecture

Lecture 12 – Making your analysis portable and reproducible

Homework

Homework #4: Case Study: Exploratory Data Analysis in R (~4hrs) - is due today!

Description

At this point in the course, you have a working directory full of scripts, tabular data, plots, and many other breadcrumbs from your project. So, it’s well past time for us to start taking steps to make our analysis reproducible. To do this, we’ll spend this lab praticing what you learned in this week’s lectures. You’ll produce an renv lock file, upload our entire project to github, learn to edit the repo remotely using RStudio, and then archive the repo using Zenodo so that you can include a Digital Object Identifier (DOI; a permanent link) for your code that you can then include in a manuscript.

What you need to do

Initialize a reproducible environment in R using renv

At any point in the process of working on an analysis project in R, you can initialize a reproducible environment within your working directory using the wonderful renv package. Simply run the following commands in RStudio:

library(renv)
#initialize a reproducible environment and captures a snapshot of your package environment
renv::init() 
#OPTIONAL: at any point during your work, you can update the snapshot with:
renv::snapshot() 

Now that you have initialized using renv, take a look at your library tab in RStudio. Notice anything different? You should now see that you have both a ‘Project Library’ as well as your usual ‘System Library’. The Project Library contains

OK, now let’s test renv by deleting a few packages from your Project Library and updating others (you can always find packages with updates available by clicking on the ‘Update’ icon in the Library pane on RStudio). Once you’ve done this, try restoring our project environment using the information stored in the renv.lock file. This is as easy as:

renv::restore() 

And, voila! You should now see that the packages you deleted or updated in the step above have now been reinstalled or returned to their previous version.

I recommend using renv early during a project and using renv::snapshot() periodically to update the lockfile.

Install git on your laptop

You might already have git installed (you can check by opening a terminal window and typing ‘git’). If not, download using one of the links below:

Configure git and authenticate with github

Once you have git installed, you’ll need to configure it so that you can link git to github. We’ll do this using the usethis R package, which contains lots of handy functions for project management.

Before we configure git, head to github.com and sign-up for a free account. Take note of your username and the email you used to set up the account. We’ll use both below.

library(usethis)
usethis::use_git_config(user.name = "YourName", user.email = "your@mail.com") #add your github username and email 

# create a personal access token for authentication:
usethis::create_github_token() 

# set personal access token:
credentials::set_github_pat("YourToken") #add your token from github

# if the command above gave you any errors, you can also store your PAT manually in '.Renviron':
usethis::edit_r_environ() #opens your .Renviron file directly in RStudio
# store your personal access token with: GITHUB_PAT=xxxyyyzzz
# and make sure '.Renviron' ends with a newline
# save and close the .Renviron file and restart R

Set-up a GitHub repository (‘repo’) and connect to it using RStudio

  • Got to GitHub and create a new private repo with a readme. Click on the green ‘code’ button on your repo and copy the https URL
  • Start new R project in RStudio, choosing the ‘version control’ option and ‘Git’. Paste in the URL that you copied in the step above
  • If you correctly authenticated above using your token, you should be able to connect to the repo using RStudio and will see the contents of the repo appear in your RStudio file browser.
  • Copy relevant files over from your active R project folder to the newly created version control R project
  • When ready to publish, make your GitHub repo public (Settings -> Manage Access -> Manage -> Change Repository Visibilty)

Archive your repo with Zenodo

  • Sign-in to Zenodo using the ‘login with github’ option
  • Once logged in, under your email login in upper right of the Zenodo page, toggle the menu and select ‘github’
  • Archiving is now super easy, just toggle the on button next to any of your repos listed under the ‘enabled repos’ section on zenodo
  • Create a ‘release’ of your Github repo (see here for an explanation of how to do this)
  • You should now see a DOI for that repo on Zenodo. Copy/paste this DOI for use in any publication!

Putting it all together

Working through the steps outlined above may seem like a lot, but keep in mind that most of this only needs to be done once (e.g., installing Git) or infrequently (e.g., authenticating with github). Here’s how you would incorporate these best practices for reproducibility into your work in R:

  1. Start you work in RStudio and use renv early and often to capture snapshots of your working environment
  2. At any time during an analysis project, set-up a repo on github for your project. You can choose to make this a private or public repo…it’s up to you.
  3. Connect to this repo using RStudio (may require authentication).
  4. Once connected, update your .gitignore file using RStudio. You can use mine to get started.
  5. Copy all your project files to this new version controlled project directory. Stage, commit and push these files up to your Github repo.
  6. When you’re ready to publish, make sure your repo is set to public.
  7. Login to Zenodo and archive your project to get a DOI (instructions on Zenodo).
  8. Include the DOI in your paper!