Filesystems#

We provide several filesystems that differ in available storage size, backup and intended use. Data should be stored according to this requirements, see table for details.

Available filesystems#

mount point	access	purpose	technology	backup	snapshots	data lifetime	quota
`/home/hpc`	`$HOME`	Source, input, important results	NFS	YES	YES	Account lifetime	50 GB
`/home/vault`	`$HPCVAULT`	Mid-/long-term storage	NFS	YES	YES	Account lifetime	500 GB
`/home/{woody,saturn, titan,janus, atuin}`	`$WORK`	General-purpose, log files	NFS	NO	NO	Account lifetime	Tier3: 1000 GB, NHR: project quota
`/lustre`	`$FASTTMP` on Fritz	High performance parallel I/O	Lustre via InfiniBand	NO	NO	High watermark	Only inodes
`/???`	`$TMPDIR`	Node-local job-specific directory	SSD/RAM disk	NO	NO	Job runtime	NO

$HOME, $HPCVAULT, and $WORK are mounted on all our systems, like front end nodes or cluster nodes.

The environment variables $HOME, $HPCVAULT, and $WORK are automatically set upon login.

Home directory `$HOME`#

Home directories are available under /home/hpc/GROUPNAME/USERNAME.

You are placed into your $HOME directory after login. There, typically applications store settings and other data by default.

We perform regular snapshots and backups of your $HOME. Therefore use it for important data, e.g. your job scripts, source code of the program you're working on, or unrecoverable input files.

Because of regular snapshots and backups $HOME has a small quota. Quota extensions are not possible.

Vault `$HPCVAULT`#

$HPCVAULT is located under /home/vault/GROUPNAME/USERNAME.

We perform regular snapshots and backups of your $HPCVAULT, but not as regular as for $HOME. $HPCVAULT is suitable for mid and long-term storage of files.

Work directory `$WORK`#

Use $WORK as your general work directory.

$WORK has neither snapshots nor backup!

Store important data in $HOME or $HPCVAULT.

The actual location your $WORK points to depends on several factors. Possible locations are:

/home/woody:
- Despite the name woody, it is mounted on all our systems.
/home/saturn, /home/titan, /home/janus:
- Available for shareholder groups.
- Quota is set per group.
- Current quota can be found under /home/{saturn,titan,janus}/quota/GROUPNAME.txt.
/home/atuin
- Used for NHR projects
- Quota is set per group according to the NHR project proposal.
- List current quota

Parallel filesystems `$FASTTMP`#

The parallel filesystem $FASTTMP is available on Fritz frontends and nodes. The Lustre-based $FASTTMP is mounted under /lustre/$GROUP/$USER/.

$FASTTMP has a capacity of 3.5 PB and we do not limit the amount of data you can store on $FASTTMP, however, we limit the number of files you can create.

$FASTTMP has no snapshots and backup

Store important data under $HOME or $HPCVAULT.

High watermark deletion

$FASTTMP is for high-performance short-term storage only. When the filling of the filesystem exceeds a certain limit (e.g. 80%), a high-watermark deletion will be run, starting with the oldest and largest files.

Note that tar -x preserves for unpacked files the modification time of the original file and does not set it to the current time.
So unpacked files may be deleted first. Use tar -mx or touch in combination with find to update the modification time.

Intended I/O usage

$FASTTMP supports parallel I/O using the MPI-I/O functions and can be accessed with an aggregate bandwidth of > 20 GB/s (inside Fritz only). Use $FASTTMP only for large files. Ideally the files are written by many nodes simultaneous, e.g., for checkpointing with MPI-IO.

$FASTTMP is not made for handling large amounts for small files

Parallel filesystems achieve their speed by writing to multiple servers at the same time. Files are distributed in the granularity of blocks over the servers. On $FASTTMP a block has the a size of 1 MB. Files smaller than 1 MB will reside only on one server. Additional overhead of the parallel filesystem causes a slower access than traditional NFS servers. For that reason, we have set a limit on the number of files you can store there.

Node-local job-specific directory `$TMPDIR`#

$TMPDIR points to a node-local job-specific temporary location in the context of a SLURM job.

When a job is started, on each node a job specific temporary directory is created and $TMPDIR is set to this location. $TMPDIR is the same path inside a job for all nodes, but always points to a node-local resource. At the end of the job $TMPDIR is automatically removed, you do not have to clean it up.

You can use $TMPDIR as fast scratch space or as cache, e.g., to store your training data during a job to reduce pressure from $WORK, increase the I/O bandwidth, and reduce the I/O latency. See Staging to stage data onto $TMPDIR and optionally share it with your other jobs, in case they happen to run at the same time on the same node.

For all clusters except for Fritz $TMPDIR points to a node-local SSD. On Fritz $TMPDIR is located in a node-local RAM disk. This means all data you store there, cuts away from the available RAM your application can use on the specific node.

Overview of where $TMPDIR is located on each cluster:

Quotas#

Nearly all file systems impose quotas on the data volume and/or the number of files or directories. These quotas may be set per user or per group.

The soft quota can be exceeded temporarily, whereas the hard quota is the absolute upper limit. You will be notified automatically if you exceed your quota on any file system.

You can see your usage and corresponding limits:

with shownicherquota.pl, only available on our systems, shows filesystems in a user-friendly way:

$ shownicerquota.pl
Path              Used     SoftQ    HardQ    Gracetime  Filec    FileQ    FiHaQ    FileGrace
/home/hpc           83.3G   104.9G   209.7G        N/A     240K     500K   1,000K        N/A
/home/vault        447.4G  1048.6G  2097.2G        N/A   1,232      200K     400K        N/A
/home/woody         21.3G   500.0G   750.0G        N/A     175K   5,000K   7,500K        N/A
/home/titan        364.8G   500.0G  1000.0G        N/A   3,432K                          N/A
/lustre             45.7G     0.0K     0.0K        N/A     321       80K     250K        N/A

with quota -s, general Unix command, but report might be incomplete and is not that accessible:

$ quota -s
Disk quotas for user USER (uid 12345):
     Filesystem   space   quota   limit   grace   files   quota   limit   grace
10.28.20.202:/hpcdatacloud/hpchome/shared
                 81355M    100G    200G            241k    500k   1000k        
wnfs1.rrze.uni-erlangen.de:/srv/home
                 20843M    477G    716G            176k   5000k   7500k        
titan.rrze.uni-erlangen.de:/srv/viphome
                   348G    477G    954G           3433k       0       0

Quota on `/home/atuin`#

In case $WORK is located on /home/atuin, it will not be included in the output of shownicerquota.pl or quota -s. Use instead df -h $WORK to obtain the amount of total (Size) and free space (Avail) for your group or NHR project:

# $WORK points to /home/atuin/...
$ df -h $WORK
Filesystem                                                     Size  Used Avail Use% Mounted on
atuin.rrze.uni-erlangen.de:/zfspool/nhrprojects/<NHR-PROJECT>   10T  1.9T  8.2T  19% /home/atuin/<NHR-PROJECT>

There are no user quotas on /home/atuin.

Snapshots for `$HOME` and `$HPCVAULT`#

Snapshots are stored on the same filesystem and do not provide protection against a loss of the filesystem like backups do.

Snapshots record in intervals the state of $HOME and $HPCVAULT. If you accidentally delete or overwrite files you can easily recover them, if they are included in a previous snapshot.

Snapshots are stored in a hidden directory .snapshots inside each directory of $HOME and $HPCVAULT. Under .snapshots you find further sub directories named by the time the corresponding snapshots was taken. The .snapshots directories are not included in any directory listings, not even in ls -a. So you always have to specify them explicitly.

Example usage

Prerequisites:

Assume you have an important file important.txt stored under $HOME/exam/example1.
You accidentally delete important.txt.

Snapshots allow you to recover the state of important.txt from the latest snapshot. Note that you still loose the changes you made between the last snapshot and now.

Recovery:

List available snapshots:

ls -l /home/hpc/exam/example1/.snapshots/

A possible output might look like:

 drwx------ 49 example1 exam 32768  8. Feb 10:54 @GMT-2019.02.10-03.00.00
 drwx------ 49 example1 exam 32768 16. Feb 18:06 @GMT-2019.02.17-03.00.00
 drwx------ 49 example1 exam 32768 24. Feb 00:15 @GMT-2019.02.24-03.00.00
 drwx------ 49 example1 exam 32768 28. Feb 23:06 @GMT-2019.03.01-03.00.00
 drwx------ 49 example1 exam 32768  1. Mär 21:34 @GMT-2019.03.03-03.00.00
 drwx------ 49 example1 exam 32768  1. Mär 21:34 @GMT-2019.03.02-03.00.00
 drwx------ 49 example1 exam 32768  3. Mär 23:54 @GMT-2019.03.04-03.00.00
 drwx------ 49 example1 exam 32768  4. Mär 17:01 @GMT-2019.03.05-03.00.00

Each of the listed directories contains a read-only copy of /home/hpc/exam/example1. The directory name states the time the snapshot was taken.

To restore the file in the state as it was at 2019-03-05 at 3:00 UTC, copy it from the @GMT-2019.03.05-03.00.00 directory to your current directory:
```
cp '/home/hpc/exam/example1/.snapshots/@GMT-2019.03.05-03.00.00/important.txt' '/home/hpc/exam/example1/important.txt'
```
The file /home/hpc/exam/example1/important.txt contains now the state it was in at 2019-03-05 3:00 UTC.

Snapshot intervals

The interval snapshots are performed on $HOME and $HPCVAULT differ between the two filesystems. The exact snapshot intervals and the number of snapshots retained may change at any time. Do not rely on the existence of a specific snapshot.

Times of the snapshots as listed under the .snapshot directories are given in GMT/UTC. This means, depending on whether daylights saving time is active or not, the 03:00 UTC works out to either 05:00 or 04:00 German time.

At the time of this writing, snapshots were configured as follows:

Interval	x Copies retained	= covered time span
Snapshot settings for `$HOME`
30 minutes (every half and full hour)	6	3 hours
2 hours (every odd-numbered hour -- 01:00, 03:00, 05:00, ...)	12	1 day
1 day (at 03:00)	7	1 week
1 week (Sundays at 03:00)	4	4 weeks
Snapshot settings for `$HPCVAULT`
1 day (at 03:00)	7	1 week
1 week (Sundays at 03:00)	4	4 weeks

Advanced Topics#

Limitations on the number of files#

Please note that having a large number of small files is pretty bad for the filesystem performance. We have therefore set a limit on the number of files a user is allowed.

That limit is set rather high for $HOME, so that you are unlikely to hit it unless you try to. For $HPCVAULT the file limit is rather tight. If you are running into the file limit, you can always put small files that you don't use regularly into an archive (tar, zip, etc.).

The same limitations apply for the parallel filesystem $FASTTMP.

Access Control Lists (ACLs)#

Besides the normal Unix permissions that you set with chmod (where you can set permissions for the owning user, the owning group, and everyone else), the system also supports more advanced ACLs.

However, they are not done in the traditional (and non-standardized) way with setfacl/getfacl that users of Linux might be familiar with, but in the new standardized way that NFS version 4 uses. You can set these ACLs from an NFS client (e.g. a cluster frontend) under Linux using the nfs4_setfacl and nfs4_getfacl commands. The ACLs are also practically compatible with what Windows does, meaning that you can edit them from a Windows client through the usual explorer interface.

Further Information on HPC storage#

The system serves two functions: It houses the normal home directories of all HPC users, and it provides tape-backed mid- to long-term storage for users data. It is based on Lenovo hardware and IBM software (Spectrum Scale/GPFS) and took up operation in September 2020.

Technical data#

5 file servers, Lenovo ThinkSystem SR650, 128 GB RAM, 100 GB Ethernet
1 archive frontend, Lenovo ThinkSystem SR650, 128 GB RAM, 100 GB Ethernet
1 TSM server, Lenovo ThinkSystem SR650, 512 GB RAM, 100 GB Ethernet
IBM TS4500 tape library with currently
- 8 LTO8 tape drives and two expansion frames
- 3.370 LTO8 tape slots
- >700 LTO7M tapes
4 Lenovo DE6000H storage arrays
- plus 8 Lenovo DE600S expansion units (2 per DE6000H)
- redundant controllers
- Usable data capacity: 5 PB for vault, 1 PB for FauDataCloud, and 40 TB for homes

Filesystems#

Available filesystems#

Home directory $HOME#

Vault $HPCVAULT#

Work directory $WORK#

Parallel filesystems $FASTTMP#

Node-local job-specific directory $TMPDIR#