• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
  • FAUTo the central FAU website
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz

Navigation Navigation close
  • News
  • About us
    • People
    • Funding
    • NHR Compute Time Projects
    • Tier3 User Project Reports
    • Success Stories from the Support
    • Annual Report
    • Jobs
    Portal About us
  • Research
    • Research Focus
    • Publications, Posters and Talks
    • Software & Tools
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
    • NHR PerfLab Seminar
    • Projects
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures and Seminars
    • Tutorials & Courses
    • Theses
    • HPC Café
    • Student Cluster Competition
    Portal Teaching & Training
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • Training Resources
    • Summary of System Utilization
    Portal Systems & Services
  • FAQ

  1. Home
  2. Systems & Services
  3. Systems, Documentation & Instructions
  4. File systems

File systems

In page navigation: Systems & Services
  • Systems, Documentation & Instructions
    • Getting started with HPC
      • NHR@FAU HPC-Portal Usage
    • Job monitoring with ClusterCockpit
    • NHR application rules – NHR@FAU
    • HPC clusters & systems
      • Dialog server
      • Alex GPGPU cluster (NHR+Tier3)
      • Fritz parallel cluster (NHR+Tier3)
      • Meggie parallel cluster (Tier3)
      • Emmy parallel cluster (Tier3)
      • Woody(-old) throughput cluster (Tier3)
      • Woody throughput cluster (Tier3)
      • TinyFat cluster (Tier3)
      • TinyGPU cluster (Tier3)
      • Test cluster
      • Jupyterhub
    • SSH – Secure Shell access to HPC systems
    • File systems
    • Batch Processing
      • Job script examples – Slurm
      • Advanced topics Slurm
    • Software environment
    • Special applications, and tips & tricks
      • Amber/AmberTools
      • ANSYS CFX
      • ANSYS Fluent
      • ANSYS Mechanical
      • Continuous Integration / Gitlab Cx
        • Continuous Integration / One-way syncing of GitHub to Gitlab repositories
      • CP2K
      • CPMD
      • GROMACS
      • IMD
      • Intel MKL
      • LAMMPS
      • Matlab
      • NAMD
      • OpenFOAM
      • ORCA
      • Python and Jupyter
      • Quantum Espresso
      • R and R Studio
      • Spack package manager
      • STAR-CCM+
      • Tensorflow and PyTorch
      • TURBOMOLE
      • VASP
        • Request access to central VASP installation
      • Working with NVIDIA GPUs
      • WRF
  • Support & Contact
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
  • HPC User Training
  • HPC System Utilization

File systems

front of two racks containing a few servers and many hard discs
The hard discs and servers of vault

This website shows information regarding the following topics:

  • File systems
    • Overview
    • Home directory
    • Vault
    • Work (woody, & saturn & titan & janus, atuin)
    • Parallel file systems 
  • Snapshots
  • Advanced Topics
  • Further information on HPC storage

File systems

Overview

A number of file systems is available at RRZE. They differ in available storage size, backup and also in their intended use. Please consider these properties when looking for a place to store your files. More details on the respective systems are listed below.

There is one simple logic rule to keep in mind: Everything that starts with /home/ is available throughout the RRZE, which naturally includes all HPC systems. Therefore, e.g. /home/woody is accessible from all clusters, even if it was originally bought together with the Woody-Cluster and mainly for use by the Woody cluster.

File system overview
Mount point Access via Purpose Size Backup Data lifetime Quota Remarks
/home/hpc $HOME Storage of source, input and important results only 40 TB Yes Account lifetime Yes (restrictive; capacity & inodes)
/home/vault $HPCVAULT Mid- to long- term high-quality storage; especially for large file 5 PB Yes Account lifetime Yes
diverse

/home/{woody, saturn, titan, janus, atuin}

$WORK general purpose work directory and storage for small files n/a NO Account lifetime Yes
/home/woody $WORK if you are not eligible for any of the other work filesystems general purpose work directory and storage for small files (used to be cluster local storage for woody cluster) 130 TB No Account lifetime Yes There may be limited backup, meaning that backup on this filesystem does not run daily and data is only kept in backup for a rather short time.
/home/saturn

/home/titan

/home/janus

 

$WORK if you are eligible general purpose work directory and storage for small to large files 2x 460 +400 TB NO Account lifetime Yes (group quota) There is no backup and it is a shareholder-only filesystem, i.e. only groups who paid for the file server have access.
/home/atuin $WORK if you are eligible general purpose work directory and storage for small to large files 1,1 PB NO Account lifetime Yes (typically group quota) There is no backup and it is reserved for NHR projects.
/lxfs $FASTTMP (Meggie-Cluster) High performance parallel I/O; short-term storage; no large ASCII files!

not currently recommended for use because of reconfiguration.

850 TB NO High watermark deletion No; but number of files/directories limited only available on the Meggie cluster
/lustre $FASTTMP (Fritz/Alex-Cluster) High performance parallel I/O; short-term storage; no large ASCII files! 2,8 PB NO High watermark deletion No; but number of files/directories limited only available on the Fritz + Alex cluster
diverse

often somewhere in /scratch, /scratchssd, /tmp or /dev/shm

$TMPDIR job-specific storage (either located in main memory [RAM disk] or if available local HDD / SDD) from some GB to several hundreds of GB  NO  job lifetime
No; but space is o f course very limited it’s always node-local only

see cluster specific documentation for details especially concerning size

Home directory $HOME

The Home directories of the HPC users are housed in the HPC storage system. These directories are available under the path /home/hpc/GROUPNAME/USERNAME on all RRZE HPC systems. The home directory is the directory, in which you are placed right after login, and where most programs try to save settings and similar things. When this directory is unavailable, most programs will stop working or show really strange behavior – which is why we tried to make the system highly redundant.

The home directory is protected by fine-grained snapshots, and additionally by regular backups. It should therefore be used for „important“ data, e.g. your job scripts, source code of the program you’re working on, or unrecoverable input files. There are comparatively small quotas there, so it will most probably be too small for the inputs/outputs of your jobs.

Each user gets a standard quota of 50 Gigabytes for the home. Quota extensions are not possible.

Vault $HPCVAULT

the inside of the tape robot, with shelves containing tapes on the left side, and the gripper on the right
view inside the tape robot

Additional high-quality storage is provided on in a second part of the HPC storage system called “vault”. Each HPC user has a directory there that is available under the path /home/vault/GROUPNAME/USERNAME on all RRZE HPC systems.

This filesystem is also protected by regular snapshots and backups, although not as fine-grained as on $HOME. It is suitable for mid and long-term storage of files.

The default quota for each user is 500 Gigabytes.

General work directory $WORK

The recommended work directory is $WORK. Its destination may point to different file servers and file systems:

However, bear in mind that there are neither backups nor snapshots on $WORK. Hence, important data should be archived in other locations.

/home/woody

Despite the name, it is available from all HPC systems under the path /home/woody/GROUPNAME/USERNAME.  It is intended as a general purpose work directory and should be used for input/output files and as a storage location for small files.

The standard quota for each user is 500 Gigabytes.

/home/saturn, /home/titan, /home/janus

Access to these shareholder-only filesystems is only available for eligible users. It is intended as a general purpose work directory for both small and large files.

The quota for this file system is defined for the whole group, not for the individual user. It is dependent on the respective share the group has paid for. If your group is interested in contributing, please contact HPC Services.

Share holders can lookup information on their group quota in text files available as /home/{saturn,titan,janus}/quota/GROUPNAME.txt.

/home/atuin

Atuin is the general work directory for NHR projects.

The quota for this file system is typically defined for the whole group, not for the individual user. It is dependent on the granted NHR proposal

Share holders can look up information on their group quota in text files available as /home/atuin/quota/GROUPNAME.txt.

Parallel file systems $FASTTMP

The Meggie and Fritz clusters have a local parallel filesystem for high performance short-term storage. Please note that they are entirely different systems, i.e. you cannot see the files on fritz’s $FASTTMP in the $FASTTMP on Meggie. They are not available on systems outside of the respective clusters.

The parallel file systems use a high watermark deletion algorithm: When the filling of the file system exceeds a certain limit (e.g. 70%), files will be deleted starting with the oldest and largest files until a filling of less than 60% is reached. Be aware that the normal tar -x command preserves the modification time of the original file instead of the time when the archive is unpacked. So unpacked files may become one of the first candidates for deletion. Use tar -mx or touch in combination with find to work around this. Be aware that the exact time of deletion is unpredictable.

Note that parallel filesystems generally are not made for handling large amounts of small files or ASCII files. This is by design: Parallel filesystems achieve their amazing speed by writing binary streams to multiple different servers at the same time. However, they do that in blocks, in our case 1 MB. That means that for a file that is smaller than 1 MB, only one server will ever be used, so the parallel filesystem can never be faster than a traditional NFS server – on the contrary: due to larger overhead, it will generally be slower. They can only show their strengths with files that are at least a few megabytes in size, and excel if very large files are written by many nodes simultaneously (e.g. checkpointing).

Snapshots ($HOME and $HPCVAULT)

Snapshots work mostly as the name suggests. In certain intervals, the filesystem takes a “snapshot”, which is an exact read-only copy of the contents of the whole filesystem at one moment in time. In a way, a snapshot is similar to a backup, but with one great restriction: As the “backup” is stored on the exact same filesystem, this is no protection against disasters – if for some reason the filesystem fails, all snapshots will be gone as well. Snapshots do however provide great protection against user errors, which has always been the number one cause of data loss on the RRZE HPC systems. Users can restore Important files that have been deleted or overwritten from an earlier snapshot.

Snapshots are stored in a hidden directory .snapshots. Please note that this directory is more hidden than usual: It will not even show up on ls -a, it will only appear when it is explicitly requested.

This is best explained by an example: let’s assume you have a file important.txt in your home directory /home/hpc/exam/example1 that you have been working on for months. You accidentally delete that file. Thanks to snapshots, you should be able to recover most of the file, and “only” lose the last few hours of work. If you do a ls -l /home/hpc/exam/example1/.snapshots/, you should see something like this:

ls -l /home/hpc/exam/example1/.snapshots/
drwx------ 49 example1 exam 32768  8. Feb 10:54 @GMT-2019.02.10-03.00.00
drwx------ 49 example1 exam 32768 16. Feb 18:06 @GMT-2019.02.17-03.00.00
drwx------ 49 example1 exam 32768 24. Feb 00:15 @GMT-2019.02.24-03.00.00
drwx------ 49 example1 exam 32768 28. Feb 23:06 @GMT-2019.03.01-03.00.00
drwx------ 49 example1 exam 32768  1. Mär 21:34 @GMT-2019.03.03-03.00.00
drwx------ 49 example1 exam 32768  1. Mär 21:34 @GMT-2019.03.02-03.00.00
drwx------ 49 example1 exam 32768  3. Mär 23:54 @GMT-2019.03.04-03.00.00
drwx------ 49 example1 exam 32768  4. Mär 17:01 @GMT-2019.03.05-03.00.00

Each of these directories contains an exact read-only copy of your home directory at the time that is given in the name. To restore the file in the state as it was at 3:00 UTC on the 5th of March, you can just copy it from there to your current work directory again: cp '/home/hpc/exam/example1/.snapshots/@GMT-2019.03.05-03.00.00/important.txt' '/home/hpc/exam/example1/important.txt'

Snapshots are enabled on both the home directories and vault section, but they are made much more often on the home directories than on vault. Please note that the exact snapshot intervals and the number of snapshots retained may change at any time – you should not rely on the existence of a specific snapshot. Also note that any times given are in GMT / UTC. That means that, depending on whether daylights saving time is active or not, the 03:00 UTC works out to either 05:00 or 04:00 german time. At the time of this writing, snapshots were configured as follows:

Snapshot settings on home section (/home/hpc)
Interval x Copies retained = covered timespan
30 minutes (every half and full hour) 6 3 hours
2 hours (every odd-numbered hour – 01:00, 03:00, 05:00, …) 12 1 day
1 day (at 03:00) 7 1 week
1 week (Sundays at 03:00) 4 4 weeks
Snapshot settings on vault section (/home/vault)
Interval x Copies retained = covered time span
1 day (at 03:00) 7 1 week
1 week (Sundays at 03:00) 4 4 weeks

Advanced Topics

Limitations on the number of files

Please note that having a large number of small files is pretty bad for the filesystem performance. This is actually true for almost any filesystem and certainly for all RRZE fileservers, but it is a bit tougher for the HPC storage system ($HOME,$HPCVAULT) due to the underlying parallel filesystem and the snapshots.  We have therefore set a limit on the number of files a user is allowed. That limit is set rather high for the home section, so that you are unlikely to hit it unless you try to, because small files are part of the intended usage there. It is however set rather tight on the vault section, especially compared to the large amount of space available there. If you are running into the file limit, you can always put small files that you don’t use regularly into an archive (tar, zip, etc.).

The same limitations apply for the parallel file systems ($FASTTMP).

Access Control Lists (ACLs)

Besides the normal Unix permissions that you set with chmod (where you can set permissions for the owning user, the owning group, and everyone else), the system also supports more advanced ACLs.

However, they are not done in the traditional (and non-standardized) way with setfacl / getfacl that users of Linux or Solaris might be familiar with, but in the new standardized way that NFS version 4 uses. You can set these ACLs from an NFS client (e.g. a cluster frontend) under Linux using the nfs4_setfacl and nfs4_getfacl commands. The ACLs are also practically compatible with what Windows does, meaning that you can edit them from a Windows client through the usual explorer interface.

Further Information on HPC storage

The system serves two functions: It houses the normal home directories of all HPC users, and it provides tape-backed mid- to long-term storage for users data. It is based on Lenovo hardware and IBM software (Spectrum Scale/GPFS) and took up operation in September 2020.

Technical data

  • 5 file servers, Lenovo ThinkSystem SR650, 128 GB RAM, 100 GB Ethernet
  • 1 archive frontend, Lenovo ThinkSystem SR650, 128 GB RAM, 100 GB Ethernet
  • 1 TSM server, Lenovo ThinkSystem SR650, 512 GB RAM, 100 GB Ethernet
  • IBM TS4500 tape library with currently
    • 8 LTO8 tape drives and two expansion frames
    • 3.370 LTO8 tape slots
    • >700 LTO7M tapes
  • 4 Lenovo DE6000H storage arrays
    • plus 8 Lenovo DE600S expansion units (2 per DE6000H)
    • redundant controllers
    • Usable data capacity: 5 PB for vault, 1 PB for FauDataCloud, and 40 TB for homes
Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
Up