Database Mirrors

We maintain local mirrors of several popular bioinformatics databases which are accessible over the high-performance storage network from any node in the cluster.

All databases can be found at: /mnt/shared/datasets/databases/.

Important

The databases are updated and/or synced from their master copies at 1am on the first Sunday of each month. You may wish to avoid using them during this time in case any active files are changed.

NCBI BLAST

Copies of many NCBI BLAST databases are available at: /mnt/shared/datasets/databases/ncbi/. You can tell the command line BLAST tools to search here by setting an environment variable:

$ export BLASTDB=/mnt/shared/datasets/databases/ncbi/

The following databases are currently available:

Source

Name

Description

NCBI

Cdd.*

Protein domain database (for RPS-BLAST etc), the Conserved Domain Database (CDD) is compiled from PFAM, SMART, etc by the NCBI.

NCBI

cdd_delta.*

Protein domain database based on the Conserved Domain Database (CDD), compiled specifically for the DELTA-BLAST tool.

NCBI

Cog.*

Protein domain database (for RPS-BLAST etc) using sequences classified in the COGs resource, which focuses primarily on prokaryotes.

NCBI

Kog.*

Protein domain database (for RPS-BLAST etc) using sequences classified in the KOGs resource, the eukaryotic counterpart to COGs, see http://www.ncbi.nlm.nih.gov/COG/new/

NCBI

nr.*

A collection of protein sequences with entries from GenPept, Swissprot, PDB, PRF, PIR and NCBI Reference Sequence (RefSeq) project.

NCBI

nt.*

The nucleotide sequence database contains entries from traditional divisions of GenBank, EMBL and DDBJ. Sequences from bulk divisions, like gss, sts, pat, est and htg, as well as environmental sequences and whole genome shotgun assemblies are excluded.

NCBI

pdbaa.*

An alias database file marking a subset of nr database with entries from PDB protein structures. Its function requires the nr.

NCBI

pdbnt.*

An alias database containing nucleotide sequences from PDB structures. Its function requires the nt database.

NCBI

Pfam.*

Protein domain database (for RPS-BLAST etc) using the Pfam-A seed alignment database, see http://pfam.sanger.ac.uk/

NCBI

Prk.*

Protein domain database (for RPS-BLAST etc) using sequences classified as stable clusters in the Protein Clusters database

NCBI

Smart.*

Protein domain database (for RPS-BLAST etc) using the Smart domain alignment database, see http://smart.embl-heidelberg.de/

NCBI

swissprot.*

An alias database file marking a subset of nr database with entries from the swiss-prot sequence database (last major update). Its function requires the nr database.

NCBI

Tigr.*

Protein domain database (for RPS-BLAST etc) using models from the TIGRFAM database of protein families, see http://www.jcvi.org/cms/research/projects/tigrfams/overview/

NCBI

taxdb.*

A non-sequence database file containing taxonomic information for sequences in the pre-formatted databases providing common and scientific names for each entry.

Pfam

Copies of several popular Pfam databases are available at: /mnt/shared/datasets/databases/pfam-31 and /pfam-35 (mirrored from http://ftp.ebi.ac.uk/pub/databases/Pfam/releases).

Uniprot

A full copy of Uniprot is available at: /mnt/shared/datasets/databases/uniprot (mirrored from ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/).