Index

Dialog server

Because practically all HPC-systems at RRZE use private IP-addresses that can only be reached from within the FAU, the dialog servers are the entry point for customers that want to access the clusters from the outside. Another alternative can be VPN, but that usually is more hassle.

VPN is not available for NHR users.

Login to the dialog-servers is via SSH.

Available servers

  • cshpc.rrze.fau.decshpc is a Linux-System that permits login to all HPC-accounts. A more verbose description of this system can be found below.
  • csnhr.nhr.fau.decsnhr is a Linux-System that permits login to all HPC-accounts. A more verbose description of this system can be found below. This machine will replace cshpc soon. It can already be used, but you should expect things to not be fully ready yet.

cshpc

Various software-packages like e.g. webbrowsers, mail clients, PDF readers or gnuplot are available on cshpc.

The standard filesystems (/home/hpc/home/vault, $WORK, essentially everything that starts with /home/...) are directly reachable from this system as well, so that you can easily copy data around using scp.

System specs
CPU 16 x 2,60 GHz (2 x Xeon E5-2650 v2)
Memory 128 GB
Operating system Ubuntu LTS (20.04 as of November 2022)
Network connectivity 2x 10 GBit

 

csnhr

Various software-packages like e.g. webbrowsers, mail clients, PDF readers or gnuplot are available on csnhr.

The standard filesystems (/home/hpc/home/vault, $WORK, essentially everything that starts with /home/...) are directly reachable from this system as well, so that you can easily copy data around using scp. The machine also has a 100 GBit network connection to facilitate that.

As of November 2023, this machine is running Ubuntu LTS 22.04.
 

Nomachine NX on cshpc

cshpc can also be used as server for Nomachine NX. NX enables use of a graphical desktop environment and applications (e.g. firefox) even over relatively slow connections (e.g. Hotel-Wifi abroad). In addition, it is possible to “park” sessions and resume them from elsewhere later, so in a way it is sort of screen for X. Your detached session keeps on running on the server, and when you reattach it later, all the applications you had open still are open. To use this, you will need the Nomachine Enterprise Client, available for Windows, Linux and MacOS. It can be downloaded for free from the Nomachine website.

The most important settings you will need to make when you create a new connection with the client are: protocol SSH, host cshpc.rrze.fau.de, port 22, Use the system login, Authentication by Password. Alternatively you can open this configuration file in your nomachine-client.

If you have an HPC portal account (i.e., if you do not have a password), you can configure the NoMachine client to use you private SSH key for authentication: Go to “Edit connection -> Configuration,” and select “Use key-based authentication with a key you provide.” Clicking on “Modify” lets you select a private key file.

While it is in principle possible to use Gnome or KDE4/Plasma desktops on cshpc, we do not recommend that. The reason is that these desktops nowadays require hardware 3D acceleration to be bearable, and the remote session cannot offer that, so using them will feel like molasses. In addition, in our experience Plasma crashes randomly at every second click if no 3D acceleration is available. We therefore recommend you use more lightweight desktop-environments like XFCE, or Trinity (KDE3). For the latter, you’ll need to click on “create a new custom session” in the client, and use the following settings:

  • Run the following command: starttde
  • Run the command in a virtual desktop

Because this system is shared by many users, it should be self-explanatory that you will need to be considerate of other users. Do not attempt to run things sucking up gigabytes of memory or long running calculations there.

Meggie parallel cluster (Tier3)

a row of racks full of servers in the serverroom
front view of the Meggie cluster at RRZE

The FAU’s Meggie cluster (manufacturer: Megware) is a high-performance compute resource with high-speed interconnect. It is intended for distributed-memory (MPI) or hybrid parallel programs with medium to high communication requirements.

  • 728 compute nodes, each with two Intel Xeon E5-2630v4 “Broadwell” chips (10 cores per chip) running at 2.2 GHz with 25 MB Shared Cache per chip and 64 GB of RAM.
  • 2 frontend nodes with the same CPUs as the compute nodes but 128 GB of RAM.
  • Lustre-based parallel filesystem with a capacity of almost 1 PB and an aggregated parallel I/O bandwidth of > 9000 MB/s.
  • Intel OmniPath interconnect with up to 100 GBit/s bandwidth per link and direction.
  • Measured LINPACK performance of ~481 TFlop/s.

The name “meggie” is a play with the name of the manufacturer.

Meggie is a system that is designed for running parallel programs using significantly more than one node. Jobs with less than one node are not supported by RRZE and are subject to be killed without notice.

This website shows information regarding the following topics:

Access, User Environment, and File Systems

Access to the machine

Users can connect to
meggie.rrze.fau.de
and will be randomly routed to one of the two front ends. All systems in the cluster, including the front ends, have private IPv4 addresses in the 10.28.24.0/21 and IPv6 addresses in the 2001:638:a000:3924::/64 range. They can normally only be accessed directly from within the FAU networks. There is one exception:
If your internet connection supports IPv6, you can directly ssh to the front ends (but not to the compute nodes). Otherwise, if you need access from outside of FAU, you usually have to connect for example to the dialog server cshpc.rrze.fau.de first and then ssh to meggie from there.

While it is possible to ssh directly to a compute node, a user is only allowed to do this while they have a batch job running there. When all batch jobs of a user on a node have ended, all of their processes, including any open shells, will be killed automatically.

Software environment

The login and compute nodes run AlmaLinux8 (which is basically Redhat Enterprise Linux 8 without the support).

The login shell for all users on meggie is always bash and cannot be changed.

As on many other HPC systems,  environment modules are used to facilitate access to software packages. Type “module avail” to get a list of available packages. Even more packages will become visible once one of the 000-all-spack-pkgs modules has been loaded. Most of the software is installed using “Spack as an enhanced HPC package manager.

General notes on how to use certain software on our systems (including in some cases sample job scripts) can be found on the Special applications, and tips & tricks pages. Specific notes on how some software provided via modules on the meggie cluster has been compiled can be found in the following accordion:

Intel One API is installed in the “Free User” edition via Spack.

The modules intel (and the Spack internal intel-oneapi-compilers) provide the legacy Intel compilers icc, icpc, and ifort as well as the new LLVM-based ones (icx, icpx, dpcpp, ifx).

Recommended compiler flags are:  -O3 -xHost

The modules intelmpi (and the Spack internal intel-oneapi-mpi) provides Intel MPI. To use the legacy Intel compilers with Intel MPI you just have to use the appropriate wrappers with the Intel compiler names, i.e. mpiicc, mpiicpc, mpiifort. To use the new LLVM-based Intel compilers with Intel MPI you have to specify them explicitly, i.e use mpiicc -cc=icx, mpiicpc -cxx=icpx, or mpiifort -fc=ifx. The execution of mpicc, mpicxx, and mpif90 results in using the GNU compilers.

The modules mkland tbb(and the Spack internal intel-oneapi-mkl, and intel-oneapi-tbb) provide Intel MKL and TBB. Use Intel’s MKL link line advisor to figure out the appropriate command line for linking with MKL. The Intel MKL also includes drop-in wrappers for FFTW3.

Further Intel tools may be added in the future.

The Intel modules on meggie, Fritz, Alex and the Slurm-TinyGPU/TinyFAT behave differently than on the previous RRZE systems: (1) The intel64 module has been renamed to intel and no longer automatically loads intel-mpi and mkl. (2) intel-mpi/VERSION-intel and intel-mpi/VERSION-gcc have been unified into intel-mpi/VERSION. The selection of the compiler occurs by the wrapper name, e.g. mpicc = GCC, mpiicc = Intel; mpif90 = GFortran; mpiifort = Intel.

The GNU compilers are available in the version coming with the operating system (currently 8.5.0) as well as modules (currently versions 11.2 and 12.1).

Recommended compiler flags are: -O3 -xHost

Open MPI is the default MPI for the meggie cluster. Usage of srun instead of mpirun is recommended.

Open MPI is built using Spack:

  • with the compiler mentioned in the module name; the corresponding compiler will be loaded as a dependency when the Open MPI module is loaded
  • without support for thread-multiple
  • with fabrics=ucx
  • with support for Slurm as scheduler (and internal PMIx of Open MPI)

Do not rely on the Python installation from the operating system. Use our python modules instead. These installations will be updated in place from time to time. We can add further packages from the Miniconda distribution as needed.

You can modify the Python environment as follows:

Set the location where pip and conda install packages to $WORK, see Python and Jupyter for details. By default packages will be installed in $HOME, which has limited capacity.

Extend the base environment
$ pip install --user <packages>'

Create a new one of your own
$ conda create -n <environment_name> <packages>'

Clone and modify this environment

$ conda create --name myclone --clone base
$ conda install --name myclone new_package

See also https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html.

When using gdb -p <pid> (or the equivalent attach <pid>  command in gdb) to attach to a process running in a SLURM job, you might encounter errors or warnings related to executable and library files than cannot be opened. Such issues will also prevent symbols from being resolved correctly, making debugging really difficult.

The reason that this happens is that processes in a SLURM job get a slightly different view of file system mounts (using a so-called namespace). When you want to attach GDB to a running process and use SSH to log into the node where the process is running, the gdb  process will not be in the same namespace, causing GDB to have issues directly accessing the binary (and its libraries) you’re trying to debug.

The workaround is to use a slightly different method for attaching to the process:

  1. $ gdb <executable>
  2. (gdb) set sysroot /
  3. (gdb) attach <pid>

(Thanks to our colleagues at SURFsara for figuring this out!)

Arm DDT is a powerful parallel debugger. NHR@FAU holds a license for 32 processes.

NHR@FAU holds a “compute center license” of Amber, thus, Amber is generally available to everyone for non-profit use, i.e. for academic research.

Amber usually delivers the most economic performance using GPGPUs. Thus, the Alex GPGPU cluster might be a better choice.

We provide Gromacs versions without and with PLUMED. Gromacs (and PLUMED) are built using Spack.

Gromacs often delivers the most economic performance if GPGPUs are used. Thus the Alex GPGPU cluster might be a better choice.

If running on meggie, it is mandatory in most cases to optimize the number of PME processes experimentally. “pme_tune” REQUIRES FURTHER WORK AS A NON-MPI BINARY HAS TO BE USED

TODO: How to exactly run gmx pme_tune

Do not start gmx mdrun with the option -v. The verbose output will only create extra large Slurm stdout files and your jobs will suffer if the NFS servers have high load. There is also only very limited use to see in the stdout all the time when the job is expected to reach the specified number of steps.

Feel free to compile software in the versions and with the options you need yourself. This is perfectly fine, yet support for self-installed software cannot be granted. We only can provide software centrally which is of importance for multiple groups. If you want to use Spack for compiling additional software, you can load our user-spack module to make use of the packages we already build with Spack if the concretization match instead of starting from scratch. Once user-spack is loaded, the command spack will be available (as alias), you will inherit the pre-sets we defined for certain packages (e.g. Open MPI to work with Slurm), but you’ll install everything into your own directories ($WORK/USER-SPACK).

You can also bring your own environment in a container using Singularity/Apptainer.

File Systems

The following table summarizes the available file systems and their features. It is only an excerpt from the description of the HPC file system.

File system overview for the Meggie cluster
Mount point Access via Purpose Technology, size Backup Data lifetime Quota
/home/hpc $HOME Storage of source, input and important results NFS on central servers, small YES + Snapshots Account lifetime YES (restrictive)
/home/vault $HPCVAULT Medium- to long-term high-quality storage central servers YES + Snapshots Account lifetime YES
/home/{woody, saturn, titan, janus, atuin} $WORK Short- to medium-term storage or small files central NFS server NO Account lifetime YES
/lxfs $FASTTMP High performance parallel I/O; short-term storage Lustre-based parallel file system via OmniPath, almost 1 PB NO High watermark deletion NO

Please note the following differences to our older clusters:

  • The nodes do not have any local hard disc drives like on previous clusters.
  • /tmp lies in RAM, so it is absolutely NOT possible to store more than a few MB of data there

NFS file system $HOME

When connecting to one of the front end nodes, you’ll find yourself in your regular HPC $HOME directory (/home/hpc/...). There are relatively tight quotas there, so it will most probably be too small for the inputs/outputs of your jobs. It, however, does offer a lot of nice features, like fine-grained snapshots, so use it for “important” stuff, e.g. your job scripts, or the source code of the program you’re working on. See the HPC file system page for a more detailed description of the features.

Parallel file system $FASTTMP (out of service)

The cluster’s parallel file system is mounted on all nodes under /lxfs/$GROUP/$USER/ and available via the $FASTTMP environment variable. It supports parallel I/O using the MPI-I/O functions and can be accessed with an aggregate bandwidth of >9000 MBytes/sec (and even much larger if caching effects can be used).

The parallel file system is strictly intended to be a high-performance short-term storage, so a high watermark deletion algorithm is employed: When the filling of the file system exceeds a certain limit (e.g. 80%), files will be deleted starting with the oldest and largest files until a filling of less than 60% is reached.
Be aware that the normal tar -x command preserves the modification time of the original file instead of the time when the archive is unpacked. So unpacked files may become one of the first candidates for deletion. Use tar -mx or touch in combination with find to work around this. Be aware that the exact time of deletion is unpredictable.

Note that parallel filesystems generally are not made for handling large amounts of small files. This is by design: Parallel filesystems achieve their amazing speed by writing to multiple different servers at the same time. However, they do that in blocks, in our case 1 MB. That means that for a file that is smaller than 1 MB, only one server will ever be used, so the parallel filesystem can never be faster than a traditional NFS server – on the contrary: due to larger overhead, it will generally be slower. They can only show their strengths with files that are at least a few megabytes in size, and excel if very large files are written by many nodes simultaneous (e.g. checkpointing).
For that reason, we have set a limit on the number of files you can store there.

Batch processing

As with all production clusters at RRZE, resources are controlled through a batch system. The front ends can be used for compiling and very short serial test runs, but everything else has to go through the batch system to the cluster.

Meggie uses SLURM as a batch system. Please see the batch system description for further details.

The granularity of batch allocations are complete nodes, i.e. nodes are never shared. The following queues are available on this cluster:

Queues/Partitions on the Meggie cluster
Partition min – max walltime min – max nodes availability Comments
devel 0 – 01:00:00 1 – 8 all users higher priority
work 0 – 24:00:00 1 – 64 all users “Workhorse”; default partition
big 0 – 24:00:00 1 – 256 special users Not active all the time as it causes quite some waste.
Users can get access for benchmarking or after proving they
can really make use of more than 64 nodes with their codes.
special 0 – infinity 1 – all special users only active during/after maintenance

There is no routing queue! If you want to take advantage of the features of a partition other than the work partition, you have to explicitly specify this in your job script via --partition=....

Eligible jobs in the devel and work partitions will automatically take advantage of the nodes reserved for short running jobs.

Interactive jobs can be requested by using salloc instead of sbatch and specifying the respective options on the command line.

The following will give you an interactive shell on one node for one hour:

salloc -N 1 --time=01:00:00

Settings from the calling shell (e.g. loaded module paths) will be inherited by the interactive job!

Interactive jobs can be requested by using salloc instead of sbatch and specifying the respective options on the command line.

The following will give you four nodes with an interactive shell on the first node for one hour:

salloc -N 4 --time=01:00:00

Settings from the calling shell (e.g. loaded module paths) will be inherited by the interactive job!

In this example, the executable will be run on one node, using 20 MPI processes, i.e. one per physical core.

#!/bin/bash -l
		#SBATCH --nodes=1 
		#SBATCH --ntasks-per-node=20
		#SBATCH --time=01:00:00
		#SBATCH --export=NONE
		
		unset SLURM_EXPORT_ENV
		module load XXX 
		
		srun ./mpi_application

In this example, the executable will be run using 20 OpenMP threads (i.e. one per physical core) for a total job walltime of 1 hour.

For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=coresOMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l
		#SBATCH --nodes=1
		#SBATCH --ntasks-per-node=1
		#SBATCH --cpus-per-task=20
		#SBATCH --time=01:00:00
		#SBATCH --export=NONE
		
		unset SLURM_EXPORT_ENV 
		module load XXX 
		
		# set number of threads to requested cpus-per-task
		export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK 
		./openmpi_application

In this example, the executable will be run using 2 MPI processes with  10 OpenMP threads (i.e. one per physical core) for a total job walltime of 1 hour.

For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=coresOMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l
		#SBATCH --nodes=1
		#SBATCH --ntasks-per-node=2
		#SBATCH --cpus-per-task=10
		#SBATCH --time=1:00:00
		#SBATCH --export=NONE
		
		unset SLURM_EXPORT_ENV 
		module load XXX 
		
		# set number of threads to requested cpus-per-task export 
		export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK 
		srun ./hybrid_application

In this example, the executable will be run on four nodes, using 20 MPI processes per node, i.e. one per physical core.

#!/bin/bash -l
		#SBATCH --nodes=4
		#SBATCH --ntasks-per-node=20
		#SBATCH --time=1:0:0
		#SBATCH --export=NONE
		
		unset SLURM_EXPORT_ENV 
		module load XXX 
		
		srun ./mpi_application
		

In this example, the executable will be run using on four nodes with 2 MPI processes per node and 20 OpenMP threads each (i.e. one per physical core) for a total job walltime of 1 hour.

For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=coresOMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l
		#SBATCH --nodes=4
		#SBATCH --ntasks-per-node=2
		#SBATCH --cpus-per-task=10
		#SBATCH --time=01:00:00
		#SBATCH --export=NONE
		
		unset SLURM_EXPORT_ENV 
		module load XXX 
		
		# set number of threads to requested cpus-per-task export 
		export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK 
		srun ./hybrid_application
		

Further Information

Intel Xeon E5-2630v4 “Broadwell” Processor

Intels ark lists some technical details about the Xeon E5-2630v4 processor.

Clock speed Base: 2.2 GHz, Turbo (1 core): 3.1 GHz, Turbo (all cores): 2.4 GHz
Number of cores 10 per socket
L1 cache 32 KiB per core (private)
L2 cache 256 KiB per core (private)
L3 cache 2.5 MiB per core (shared by all cores)
Peak performance @ base frequency 35.2 Gflop/s per core (16 flops/cy)
Supported SIMD extension AVX2 with FMA
STREAM triad bandwidth per socket 53.5 Gbyte/s (standard stores; corrected for write-allocate transfers)

 

Intel Omni-Path Interconnect

Omni-Path is essentially Intels proprietary implementation of “Infiniband”, after they acquired the Infiniband-part of QLogic. It shares most of the features and shortcomings of QLogic-based Infiniband networks.

Each node in Meggie has a 100 GBit Omni-Path-card, and is connected to a 100 GBit switch. However, the backbone of the network is not fully non-blocking: On each leaf-switch, 32 of the 48 ports are used for compute nodes, and 16 ports are used for the uplink, meaning there is a 1:2 blocking on the backbone.
As a result, if the nodes of your jobs are not all connected to the same switch, you may notice significant performance fluctuations due to the oversubscribed network. The batch system tries to run jobs on the same leaf switch if possible, but for obvious reasons that is not always possible, and for jobs utilizing more than 32 nodes is straight out impossible.

Compared to the Mellanox-IB-cards in our other clusters, you will also notice that the Omni-Path stack is a horrible CPU-hog. It can easily steal two whole CPUs, so if your job communicates a lot, it might be helpful to not use all cores of a node.

Emmy parallel cluster (Tier3)

Aisle in the serverroom with racks full of servers on both sides
The Emmy-cluster at RRZE

This cluster has been shut down in September 2022.

RRZE’s Emmy cluster (NEC) is a high-performance compute resource with high speed interconnect. It is intended for distributed-memory (MPI) or hybrid parallel programs with medium to high communication requirements.

  • 560 compute nodes, each with two Xeon 2660v2 “Ivy Bridge” chips (10 cores per chip + SMT) running at 2.2 GHz with 25 MB Shared Cache per chip and 64 GB of RAM
  • 2 front end nodes with the same CPUs as the nodes.
  • 16 Nvidia Tesla K20 GPGPUs spread over 10 compute nodes.
  • 4 Nvidia Tesla V100 (16 GB/PCIe) spread over 4 compute nodes.
  • parallel filesystem (LXFS) with a capacity of 400 TB and an aggregated parallel I/O bandwidth of > 7000 MB/s
  • fat-tree InfiniBand interconnect fabric with 40 GBit/s bandwidth per link and direction
  • overall peak performance of ca. 234 TFlop/s (191 TFlop/s LINPACK, using only the CPUs).

The Emmy cluster is named after famous mathematician Emmy Noether who was born here in Erlangen.

sideview of serverracks in the serverroom, with posters about Emmy Noether
The Emmy-cluster at RRZE

Emmy is a system that is designed for running parallel programs using significantly more than one node. Jobs with less than one node are not supported by RRZE.

This website shows information regarding the following topics:

Access, User Environment, and File Systems

Access to the machine

This cluster has been shut down in September 2022.

Users can connect to emmy.rrze.fau.de by SSH and will be randomly routed to one of the two front ends. All systems in the cluster, including the front ends, have private IP addresses in the 10.28.8.0/22 range. Thus they can only be accessed directly from within the FAU networks. If you need access from outside of FAU, you have to connect for example to the dialog server cshpc.rrze.fau.de first and then ssh to emmy from there. While it is possible to ssh directly to a compute node, a user is only allowed to do this while they have a batch job running there. When all batch jobs of a user on a node have ended, all of their processes, including any open shells, will be killed automatically.

The login and compute nodes run CentOS (which is basically Redhat Enterprise without the support). As on most other RRZE HPC systems, a modules environment is provided to facilitate access to software packages.
Type “module avail” to get a list of available packages.

The shell for all users on Emmy is always bash. This is different from our other clusters and the rest of RRZE, where the shell used to be tcsh unless you had requested it to be changed.

File Systems

The following table summarizes the available file systems and their features. It is only an excerpt from the description of the HPC file systems.

File system overview for the Emmy cluster
Mount point Access via Purpose Technology, size Backup Data lifetime Quota
/home/hpc $HOME Storage of source, input and important results NFS on central servers, small YES + Snapshots Account lifetime YES (restrictive)
/home/vault $HPCVAULT Medium- to long-term, high-quality storage central servers, HSM YES + Snapshots Account lifetime YES
/home/{woody, saturn, titan, janus, atuin} $WORK Short- to medium-term storage or small files central NFS server NO Account lifetime YES
/elxfs $FASTTMP High performance parallel I/O; short-term storage

only available for existing legacy users due to stability issues (8/2021)

LXFS (Lustre) parallel file system via InfiniBand, 400 TB NO High watermark deletion; the system is out of warranty since many years!
NO

Please note the following differences to our older clusters:

  • There is no cluster local NFS server like on previous clusters (e.g. /home/woody)
  • The nodes do not have any local hard disc drives like on previous clusters. Exception: The GPU nodes.
  • /tmp lies in RAM, so it is absolutely NOT possible to store more than a few MB of data here

NFS file system $HOME

When connecting to one of the front end nodes, you’ll find yourself in your regular RRZE $HOME directory (/home/hpc/...). There are relatively tight quotas there, so it will most probably be too small for the inputs/outputs of your jobs. It however does offer a lot of nice features, like fine grained snapshots, so use it for “important” stuff, e.g. your job scripts, or the source code of the program you’re working on. See the HPC file systems page for a more detailed description of the features.

Parallel file system $FASTTMP

The cluster’s parallel file system is mounted on all nodes under /elxfs/$GROUP/$USER/ and available via the $FASTTMP environment variable for existing legacy users only (i.e.people who had data on $FASTTMP already before 8/2021). It supports parallel I/O using the MPI-I/O functions and can be accessed with an aggregate bandwidth of >7000 MBytes/sec (and even much larger if caching effects can be used).

The parallel file system is strictly intended to be a high-performance short-term storage, so a high watermark deletion algorithm is employed: When the filling of the file system exceeds a certain limit (e.g. 80%), files will be deleted starting with the oldest and largest files until a filling of less than 60% is reached. Be aware that the normal tar -x command preserves the modification time of the original file instead of the time when the archive is unpacked. So unpacked files may become one of the first candidates for deletion. Use tar -mx or touch in combination with find to work around this. Be aware that the exact time of deletion is unpredictable.

Note that parallel filesystems generally are not made for handling large amounts of small files. This is by design: Parallel filesystems achieve their amazing speed by writing to multiple different servers at the same time. However, they do that in blocks, in our case 1 MB. That means that for a file that is smaller than 1 MB, only one server will ever be used, so the parallel filesystem can never be faster than a traditional NFS server – on the contrary: due to larger overhead, it will generally be slower. They can only show their strengths with files that are at least a few megabytes in size, and excel if very large files are written by many nodes simultaneously (e.g. checkpointing). For that reason, we have set a limit on the number of files you can store there.

Batch processing

This cluster has been shut down in September 2022.

As with all production clusters at RRZE, resources are controlled through a batch system. The front ends can be used for compiling and very short serial test runs, but everything else has to go through the batch system to the cluster.

Please see the batch system description for further details.

The following queues are available on this cluster:

Queues on the Emmy cluster
Queue min – max walltime min – max nodes availability Comments
route N/A N/A all users Default router queue; sorts jobs into execution queues
devel 0 – 01:00:00 1 – 8 all users Some nodes reserved for queue during working hours
work 01:00:01 – 24:00:00 1 – 64 all users “Workhorse”
big 01:00:01 – 24:00:00 1 – 560 special users Not active all the time as it causes quite some waste. Users can get access for benchmarking or after proving they can really make use of more than 64 nodes with their codes.
special 0 – infinity 1 – all special users Direct job submit with -q special

As full nodes have to be requested, you always need to specify -l nodes=<nnn>:ppn=40 on qsub.

All nodes have properties that you can use to request nodes of a certain type. This is mostly needed to request one of the GPU nodes. You request nodes with a certain property by appending :property to your request, e.g. -l nodes=<nnn>:ppn=40:v100.
The following properties are available:

Node properties on Emmy
Property Description
:k20m nodes with one or two NVidia Kepler cards. 10 nodes qualify
:k20m1x nodes with one NVidia Kepler card. 4 nodes qualify
:k20m2x nodes with two NVidia Kepler cards. 6 nodes qualify
:v100 nodes with one NVidia Tesla V100 (16GB) card. 4 nodes qualify
:anygpu nodes with any NVidia GPU. 4 nodes qualify

Properties can also be used to request a certain CPU clock frequency. This is not something you will usually want to do, but it can be used for certain kinds of benchmarking. Note that you cannot make the CPUs go any faster, only slower, as the default already is the turbo mode, which makes the CPU clock as fast as it can (up to 2.6 GHz) without exceeding its thermal or power budget. So please do not use any of the following options unless you know what you’re doing. The available options are: :noturbo to disable Turbo Mode, :f2.2 to request 2.2 GHz (this is equivalent to :noturbo), :f2.1 to request 2.1 GHz, and so on in 0.1 GHz steps down to :f1.2 to request 1.2 GHz.

To request access to the hardware performance counters (i.e. to use likwid-perfctr), you have to add the property :likwid. Otherwise you will get the error message Access to performance monitoring registers locked from likwid-perfctr. The property is not required (and should also not be used) for other parts of the LIKWID suite, e.g. it is not required for likwid-pin.

MPI

Intel MPI is recommended, but OpenMPI is available, too. For more details on running MPI parallel applications, please refer to the documentation on parallel computing.

Further Information

Intel Xeon E5-2660 v2 “Ivy Bridge” Processor

Intel’s ark lists some technical details about the Xeon E5-2660 v2 processor.

InfiniBand Interconnect Fabric

The InfiniBand network on Emmy is a quad data rate (QDR) network, i.e. the links run at 40 GBit/s in each direction. This is identical to the network on LiMa. The network is fully non blocking, i.e. the backbone is capable of handling the maximum amount of traffic coming in through the client ports without any congestion. However, due to the fact that InfiniBand still uses static routing, i.e. once a route is established between two nodes it doesn’t change even if the load on the backbone links changes, it is possible to generate traffic patterns that will cause congestion on individual links. This is however not likely to happen on normal user jobs.

Systems, Documentation & Instructions

The HPC group of RRZE – or now NHR@FAU – operates a number of clusters and related systems mainly for scientists at FAU.

Beginning in January 2021, the HPC group of RRZE became NHR@FAU. With the rebranding, we have been opening our (new) systems also nationwide for NHR users, i.e. scientists from any German university. See NHR@FAU application rules for details on national access. Also FAU researcher with demands beyond the free basic Tier3 resources have to apply link people form outside for NHR resources.

Support offerings, especially in the area of performance engineering and atomistic simulations, also start early in 2021 for national customers. Also see the Teaching&Training section.

Getting started

If you are new to HPC and want to know about the different clusters, how to log in, transfer files and run jobs, please refer to our Getting Started Guide.

HPC Clusters

The following table lists the available clusters and their key properties. To get further information about each cluster, click on the cluster name in the first column.

Cluster name #nodes target applications Parallel
filesystem
Local
storage
description
NHR&Tier3 parallel cluster (“Fritz“)
992 massively parallel Yes very limited dual-socket nodes with Intel IceLake processors (72 cores and 256 GB per node) and HDR100 interconnect
Plus some additional nodes with 1 or 2 TB of main memory.
Access to this cluster is restricted.
NHR&Tier3 GPGPU cluster (“Alex“)
82 GPGPU applications limited Yes 304 Nvidia A100 and 352 Nvidia A40 GPGPUs.

Access to this cluster will is restricted.

Meggie (Tier3) 728 parallel no longer No This is the current main cluster for parallel jobs, intended for parallel jobs.
Emmy (Tier3) The system has been retired in September 2022.
LiMa (Tier3) The system has been retired in December 2018.
Woody (Tier3) 288 single-node throughput No Yes Cluster with fast (single- and dual-socket) CPUs for serial throughput workloads.
TinyEth (Tier3) The system has been retired end of November 2021.
TinyGPU (Tier3) 35 nodes
168 GPUs
GPGPU No Yes (SSDs) The nodes in this cluster are equipped with different types and generations of NVIDIA GPUs. Access restrictions / throttling policies may apply.
TinyFat (Tier3) 47 large memory requirements No Yes (SSDs) This cluster is for applications that require large amounts of memory. Each node has 256 or 512 gigabytes of main memory.

If you’re unsure about which systems to use, feel free to contact the HPC group.

Other HPC systems at RRZE

Dialog server

This machine can be used as an access portal to reach the rest of the HPC systems from outside the university network. This is necessary, because most of our HPC systems are in private IP address ranges that can only be reached directly from inside the network of the FAU

HPC Testcluster

The HPC Testcluster consists of a diversity of systems (countless X86 variants, ARM, NEC Aurora/Tsubasa, …) intended for benchmarking and software testing. Contact the HPC group for details. Access to certain machines may be restricted owing to NDA agreements.

HPC software environment

The software environment available on RRZE’s HPC systems has been made as uniformly as possible, in order to allow users to change between clusters without having to start from scratch. Documentation for this environment can be found under HPC environment.

Access to external (national) HPC systems

LRZ systems

Users that require huge amounts of computing power can also apply to use the HPC systems of the “Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften” in Garching near Munich.

Federal systems at HLRS and JSC

Access to the national supercomputers at HLRS (High Performance Computing Center Stuttgart) or JSC (Jülich Supercomputing Centre) requires a scientific proposal similar to SuperMUC at LRZ. Depending on the size of the project, the proposals have to be submitted locally at HLRS / JSC or through large-scale calls of GCS (Gauss Center for Supercomputing).

Other NHR centers

NHR@FAU is not the only NHR center; there are 8 more. See https://www.nhr-verein.de/rechnernutzung. Each center has its own scientific and application focus – but all serve researchers from universities all over Germany.