Bioconda & R

This is a suggested procedure for using R on the cluster. It’s by no means the only way to do this but it’s an approach that’s tried and tested and lets conda deal with the issue of R package dependencies, which appears to be a more reliable approach than using R itself for package management.

Important

Ensure you’ve installed conda using the instructions given on the Bioconda page. Do not install it via any other method.

First, create a dedicated conda environment which will contain R and any packages subsequently added to it. In this example, we’ll install a specific version (4.5), so we’ll call the environment r4.5:

$ conda create --name r4.5 r=4.5

Let’s assume we now want to add an R package, for example ggplot2, which we can find via conda search. The convention for packaging R packages in conda is to prefix the name of the R package with r-, so in this case r-ggplot2:

$ conda search r-ggplot2

This returns a lot of version entries for r-ggplot2 and confirms that a conda version of this R package is available (output shown here is truncated):

Loading channels: done
# Name                       Version           Build  Channel
r-ggplot2                      2.0.0        r3.2.2_0  bioconda
r-ggplot2                      2.1.0        r3.2.2_0  bioconda
r-ggplot2                      2.1.0        r3.3.1_0  bioconda
r-ggplot2                      2.2.0        r3.3.1_0  bioconda
r-ggplot2                      2.2.0        r3.3.2_0  conda-forge
r-ggplot2                      2.2.0        r3.4.1_0  conda-forge
r-ggplot2                      2.2.1        r3.3.1_0  bioconda
r-ggplot2                      2.2.1        r3.3.2_0  conda-forge
[...]

Let’s install the package into our dedicated R environment. First, we need to activate it:

$ conda activate r4.5

This should be reflected by a change in your command prompt:

(r4.5) username@gruffalo:~$

Anything we install with conda will now be installed into this environment:

$ conda install r-ggplot2

Conda will handle all the dependency management and also install other packages required by ggplot2.

Tip

Additional R packages can be installed into the conda R environment as described above.

Note

The advantages to this approach over “traditional”” install-into-command-line-R-directly include dependency management (handled seamlessly by conda) and a full record of which R packages and versions are installed can easily be obtained. You can export this list (conda env export --name r4.5 > r4.5.yml) which is useful if you need to migrate the environment to another system and/or share with another user.

Once we have our packages installed, we can use them in our scripts. The following is an example of submitting an R job with Slurm:

#!/bin/bash
#SBATCH --job-name="Example R Script"

# Activate our conda R environment
conda activate r4.5

# Run an R script that uses ggplot2 etc
Rscript myPlottingScript.r

# If running any other steps after this, don’t forget to deactivate the environment
conda deactivate

You can also submit R jobs directly to Slurm:

$ conda activate r4.5
$ sbatch --wrap "Rscript myPlottingScript.r"

Tip

It’s highly recommend to run all your R code from within scripts as opposed to through platforms like RStudio where reproducibility requirements are generally not satisfied. Once you’ve got your code in a script, it can be tweaked and re-executed readily and it’ll also be ready for submission to e.g. a Github account as part of your publication requirements.