Data Storage & Management Policy¶
The cluster has approximately 1.5 PB of storage available
Most user-accessible folders are found under
You must contact an administrator when starting a new project
All important project data should be kept in
Intermediate working data should be kept on a scratch drive
Please restrict data in
/hometo small and/or miscellaneous files; total usage here should be under 10 GB
You can check your current data usage using the Monitoring & Tracking page.
There are two main locations for storing data on the system:
BeeGFS shared network storage
Node-specific local scratch storage
Where you decide to store data will have an effect on performance, available capacity, and backup policies, so it’s important that you understand the differences between the storage locations and the folders they contain. These are described in more detail below.
The cluster’s primary storage is a high-performance parallel file system running BeeGFS. This system, dsitributed across five servers and expansion units, has 1.5 PB of capacity offered as a single global namespace -
/mnt/shared - that is visible from all nodes of the cluster.
It holds the following data:
Backed up: yes
All important Institute-related project data should be stored in
This location holds subfolders for the Supported Organisations (eg
/mnt/shared/projects/jhi) and there may be further (local) guidelines on how you should structure your data below this point.
Please Contact Us when starting a new project, or to request help with moving existing data into the correct folder structure.
User home folders¶
Backed up: yes
This is where your home folder is located (your Linux equivalent of My Documents on Windows). Athough backed up, it’s not suitable for storing large data sets and should be restricted to small and/or miscellaneous files only – perhaps common scripts you find handy across multiple projects or random files that don’t really “fit” anywhere else.
We’d appreciate it if your total usage within
$HOME can be kept to less than 10 GB.
Backed up: no
This is a special area that must be used for all downloaded (ie external) software applications – either in binary or compiled-from-source form. You can also store Singularity containers here. If you install Bioconda, it uses
$APPS/conda for its data.
If something was a pain to install or compile, keep some notes about it in
/home where they’ll be safely backed up in case you ever need to repeat the process.
Backed up: no
The path for this location is only generated (and accessible via the
$TMPDIR environment variable) once a Slurm job has started, and is unique to that job.
Each node also has space for temporary working data, and because it’s directly attached to the node where your job is running it can be significantly faster for most file-based operations. The only downside is that you have to copy your data here first, and that might take longer than just running the job from shared scratch. Similarly, you need to remember to copy any results back to shared storage at the end of a job’s run.
Bear in mind that these scratch drives are unique per node, which means any data stored there can only be seen by that node. The contents are automatically erased when the job ends, so you must copy any files you need to keep back to somewhere on shared storage as the final step in your job script.
It’s also important to be aware of the differences between local scratch drives, as the different nodes may have different capacities. Check the System Overview page for more details.