Bioconda

Warning

Anaconda Inc, the company behind some aspects of the conda ecosystem no longer licence their defaults and r channels for free use by organisations with >200 members of staff. Crop Diversity HPC blocks access to these channels and you should remove them from any existing installs using conda config --remove channels [channel name].

Bioconda (https://bioconda.github.io) is a community managed set of packages providing bioinformatics software, with a repository of over 10,000 packages (as of August 2024) ready to install. It is part of the conda system which includes standard Linux/Unix tools as well.

Using conda allows you to pick and choose the software (and versions) that you want without any danger of clashing with anyone else’s requirements, and as most packages require nothing more than running conda install <packagename>, the process is incredibly easy.

Follow the instructions below to get up and running with Bioconda in just a few minutes.

Tip

Conda can be quite memory hungry, so it’s best to start an interactive job asking for additional memory before running these commands, eg: srsh --mem=4G.

Installing Bioconda

Warning

In order to maintain compatibility with the system’s backup and data policies, you should not attempt to install Bioconda using any other method than the one listed here.

To install Bioconda, simply run: install-bioconda while logged into gruffalo. This will automatically download the necessary files for you, install it to an appropriate area, and then setup the correct channels for finding software for you. By default, it’ll install channel information for Bioconda and conda-forge.

Important

You must log out and in again (or open a new shell window) before the changes made by the install script will take effect.

You should test that installation was successful:

$ conda --version
conda 24.3.0

Finding packages

There are several options for finding software packages that you can install:

For example:

$ conda search samtools
Fetching package metadata .................
samtools                     1.3                           0  bioconda
                             1.3                           1  bioconda
                             1.3                           2  bioconda
                             1.3.1                         0  bioconda
                             1.3.1                         1  bioconda
                             1.3.1                         2  bioconda
                             1.3.1                         3  bioconda
                             1.3.1                         4  bioconda
                             1.3.1                         5  bioconda
                             1.4                           0  bioconda
                             1.4.1                         0  bioconda
                             1.5                           0  bioconda
                             1.5                           1  bioconda
                             1.5                           2  bioconda
                             1.6                           0  bioconda

Once you know the name of the package (and optionally its version), you can query for more information using conda search <packagename> --info or conda search <packagename>=<version> --info:

$ conda search samtools=1.4 --info
Fetching package metadata .................

samtools 1.4 0
--------------
file name   : samtools-1.4-0.tar.bz2
name        : samtools
version     : 1.4
build string: 0
build number: 0
channel     : bioconda
size        : 981 KB
arch        : x86_64
has_prefix  : True
license     : MIT
md5         : ba63ece45b20644cbbb753e9ca0394c0
noarch      : None
platform    : linux
requires    : ()
subdir      : linux-64
url         : https://conda.anaconda.org/bioconda/linux-64/samtools-1.4-0.tar.bz2
dependencies:
    curl
    libgcc
    xz
    zlib

Importantly, this will tell you what dependencies the package may have, although conda will always resolve these for you automatically.

Note

If you don’t specify a version (by using <packagename>=<version>) then Bioconda will assume you’re interested in the latest version, which is probably what you want most of the time anyway.

Installing packages

To install, use conda install <packagename>. For example:

$ conda install samtools
Fetching package metadata .................
Solving package specifications: .

The following NEW packages will be INSTALLED:

    bzip2:     1.0.6-1           conda-forge
    curl:      7.54.1-0          conda-forge
    krb5:      1.14.2-0          conda-forge
    libgcc:    7.2.0-h69d50b8_2
    libssh2:   1.8.0-1           conda-forge
    samtools:  1.6-0             bioconda

Proceed ([y]/n)? y

bzip2-1.0.6-1. 100% |#####################################| Time: 0:00:00 476.24 kB/s
krb5-1.14.2-0. 100% |#####################################| Time: 0:00:01   3.07 MB/s
libssh2-1.8.0- 100% |#####################################| Time: 0:00:00  26.50 MB/s
libgcc-7.2.0-h 100% |#####################################| Time: 0:00:00  19.51 MB/s
curl-7.54.1-0. 100% |#####################################| Time: 0:00:00   3.23 MB/s
samtools-1.6-0 100% |#####################################| Time: 0:00:01 999.31 kB/s

To update an existing package at a later date (eg to its newest version), you can use:

$ conda update samtools

Note

At some point you will run into a conflict and/or find this process gets slower and slower. One recommendation is to use “environments” (see below). Another suggestion is conda install mamba and then use mamba install <packagename>. Continuing the snake naming theme, Mamba is an alternative to the Conda installation tool which uses a different and faster approach to solving the dependency tree - hopefully that technique will be adopted into the standard conda tool at some point.

Listing packages

To retrieve a list of installed packages, use:

$ conda list
libssh2                   1.8.0                         1    conda-forge
readline                  6.2                           2
requests                  2.18.4           py36he2e5f8d_1
samtools                  1.6                           0    bioconda
setuptools                36.5.0           py36he42e2e1_0

This returns entries not only for Bioconda, but also for packages from repositories that Bioconda relies upon, such as conda and conda-forge. You can filter the list using:

$ conda list | grep bioconda

Removing packages

Removing packages is as simple as:

$ conda remove samtools
Fetching package metadata .................
Solving package specifications: .

The following packages will be REMOVED:

    samtools: 1.6-0 bioconda

Proceed ([y]/n)? y

Note

Removing a package doesn’t remove its dependencies, so over time you may find your Bioconda install growing quite large, so run conda clean to tidy things up.

Environments

While conda is good at resolving package dependencies, it’s likely you’ll (eventually) find a package you can’t install because its dependencies clash with those of an already-installed package (which often happens when packages rely on one of the major versions of Python (2 or 3)). Another problematic situation arises if you want to have multiple versions of the same package installed.

Both of these issues can be resolved using environments, which are best thought of as a standalone, isolated working copies of conda.

To use a separate environment, you first need to create it:

$ conda create -n samtools-old

This environment is isolated from your main conda installation, so you need to activate it before use (note how the command prompt changes when this happens):

$ conda activate samtools-old
(samtools-old) $

You can then proceed to install packages into your new environment:

(samtools-old) $ conda install samtools=1.4

Tip

You can merge creating a new environment and installing packages into it using just a single command: conda create -n samtools-old samtools=1.4.

You can continue to install more packages into this environment if need be, and run scripts and analyses as normal. Once finished with an environment, return to a normal prompt (and your default conda environment) using:

(samtools-old) $ conda deactivate

Here’s how to get a list of all available environments:

$ conda env list
# conda environments:
#
samtools-old             /$APPS/conda/envs/samtools-old
root                  *  /$APPS/conda

Conda refers to your base environment as root and marks the active one with a *.

If you want to get rid of an environment, make sure it’s not active, then run:

$ conda remove --all -n samtools-old

Note

One school of thought suggests installing every package into its own unique environment. While this certainly avoids any dependency clash problems, it can make things a little awkward if you have pipelines or scripts relying on multiple packages as you’re then constantly running conda activate and conda deactivate. Some people go further and suggest your base environment should include only the alternative installation tool mamba, and everything else should go in environments. Ultimately though, it’s up to you how you set up and manage Bioconda.

Removing Bioconda

Conda and Bioconda are installed in $APPS/conda. Simply delete this folder to remove Bioconda and any additional packages you’ve installed or environments you’ve created.

Bioconda and Slurm

When using the conda activate command in an sbatch job script you may encounter an error mesage:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.

And subsequently when trying to use commands from the package you may get a command not found error.

You can work around this by using source activate <environment> instead.

Note

If conda cannot be found on your $PATH then you’ll need to provide its full path with the command, for example: source /full/path/to/conda/bin/activate <environment>.