Data Storage & Management Policy
The cluster has approximately 1.5 PB of storage available
Most user-accessible folders are found under
You may need to contact an administrator when starting a new project
All important project data should be kept in
Intermediate working data should be kept on a scratch drive
Please restrict data in
/hometo small and/or miscellaneous files; total usage here should be under 10 GB
You can check your current data usage using the Monitoring & Tracking page.
There are two main locations for storing data on the system:
BeeGFS shared network storage
Node-specific local scratch storage
Where you decide to store data will have an effect on performance, available capacity, and backup policies, so it’s important that you understand the differences between the storage locations and the folders they contain. These are described in more detail below.
Many of the locations listed here are automatically added (as symlinks) to your home folder.
The cluster’s primary storage is a high-performance parallel file system running BeeGFS. This system, dsitributed across five servers and expansion units, has 1.5 PB of capacity offered as a single global namespace -
/mnt/shared - that is visible from all nodes of the cluster.
It holds the following data:
Backed up: yes
All important Institute-related project data should be stored in
The Projects folder holds subfolders for the Supported Organisations (eg
/mnt/shared/projects/jhi) and there may be further (local) guidelines on how you should structure your data below this point.
Joint projects shared between multiple institutes are located in
/mnt/shared/projects/joint. Please Contact Us if working on a joint project and access by multiple users is required.
JHI/NIAB users should Contact Us when starting any new project, or to request help with moving existing data into the correct folder structure.
User home folders
Backed up: yes
This is where your home folder is located (your Linux equivalent of My Documents on Windows). Athough backed up, it’s not suitable for storing large data sets and should be restricted to small and/or miscellaneous files only – perhaps common scripts you find handy across multiple projects or random files that don’t really “fit” anywhere else.
We’d appreciate it if your total usage within
$HOME can be kept to less than 10 GB.
Backed up: no
This is a special area that must be used for all downloaded (ie external) software applications – either in binary or compiled-from-source form. You can also store Singularity containers here. If you install Bioconda, it uses
$APPS/conda for its data.
If something was a pain to install or compile, keep some notes about it in
/home where they’ll be safely backed up in case you ever need to repeat the process.
Each node also has space for temporary working data, and because it’s directly attached to the node where your job is running it can be significantly faster for most file-based operations. The only downside is that you may have to copy your data here first, and that might take longer than just running the job from shared scratch, although often you can leave your input files on shared scratch and only produce new output on local scratch. Either way, you’ll need to remember to copy any results back to shared storage at the end of a job’s run.
Path: dynamically generated
Backed up: no
The path for this location is only generated (and accessible via the
$TMPDIR environment variable) once a Slurm job has started, and is unique to that job.
Bear in mind that these scratch drives are unique per node, which means any data stored there can only be seen by that node. The contents are automatically erased when the job ends, so you must copy any files you need to keep back to somewhere on shared storage as the final step in your job script.
It’s also important to be aware of the differences between local scratch drives, as the different nodes may have different capacities. Check the System Overview page for more details.