Data Storage & Management Policy
Storage at-a-glance
The cluster has approximately 5.2 PB of storage available
Most user-accessible folders are found under
/mnt/shared
You may need to contact an administrator when starting a new project
All important project data should be kept in
/mnt/shared/projects
Intermediate working data should be kept on a scratch drive
Please restrict data in
/home
to small and/or miscellaneous files; total usage here should be under 15 GB
You can check your current data usage using the Monitoring & Tracking page.
Data management
There are three main locations for storing data on the system:
NFS shared network storage
BeeGFS shared network storage
Node-specific local scratch storage
Where you decide to store data will have an effect on performance, available capacity, and backup policies, so it’s important that you understand the differences between the storage locations and the folders they contain. These are described in more detail below.
Note
Many of the locations listed here are automatically added (as symlinks) to your home folder.
NFS storage
This is a small array of fast, NFS-based storage that offers 35 TB of capacity. Visible from all nodes of the cluster, it hosts the following areas:
User home folders
Path:
/home/<username>
Shortcut:
$HOME
Backed up: yes
This is where your home folder is located (your Linux equivalent of My Documents on Windows). Athough backed up, it’s not suitable for storing large data sets and should be restricted to small and/or miscellaneous files only – perhaps common scripts you find handy across multiple projects or random files that don’t really “fit” anywhere else.
We’d appreciate it if your total usage within $HOME
can be kept to less than 15 GB.
Warning
If you store more than 250,000 files or folders in $HOME
your ability to run jobs will be severely restricted until you reduce your file count.
User applications
Path:
/mnt/apps/<username>
Shortcut:
$APPS
Backed up: no
This is a special area that must be used for all downloaded (ie external) software applications – either in binary or compiled-from-source form. You can also store Apptainer (Singularity) containers here. If you install Bioconda, it uses $APPS/conda
for its data.
Tip
If something was a pain to install or compile, keep some notes about it in /home
where they’ll be safely backed up in case you ever need to repeat the process.
BeeGFS storage
The cluster’s primary storage is a high-performance parallel file system running BeeGFS. This system, dsitributed across multiple servers and disk arrays, has 5.2 PB of capacity offered as a single global namespace - /mnt/shared
- that is visible from all nodes of the cluster.
It holds the following data:
Project data
Path:
/mnt/shared/projects
Shortcut:
$PROJECTS
Backed up: yes
All important Institute-related project data should be stored in /mnt/shared/projects
.
The Projects folder holds subfolders for the Supported Organisations (eg /mnt/shared/projects/jhi
) and there may be further (local) guidelines on how you should structure your data below this point.
Joint projects shared between multiple institutes are located in /mnt/shared/projects/joint
. Please Contact Us if working on a joint project and access by multiple users is required.
Warning
If you store more than 250,000 files or folders in $PROJECTS
your ability to run jobs will be severely restricted until you reduce your file count.
Local scratch
Each node also has space for temporary working data, and because it’s directly attached to the node where your job is running it can be significantly faster for most file-based operations. The only downside is that you may have to copy your data here first, and that might take longer than just running the job from shared scratch, although often you can leave your input files on shared scratch and only produce new output on local scratch. Either way, you’ll need to remember to copy any results back to shared storage at the end of a job’s run.
Path: dynamically generated
Shortcut:
$TMPDIR
Backed up: no
Note
The path for this location is only generated (and accessible via the $TMPDIR
environment variable) once a Slurm job has started, and is unique to that job.
Warning
Bear in mind that these scratch drives are unique per node, which means any data stored there can only be seen by that node. The contents are automatically erased when the job ends, so you must copy any files you need to keep back to somewhere on shared storage as the final step in your job script.
It’s also important to be aware of the differences between local scratch drives, as the different nodes may have different capacities. Check the System Overview page for more details.