Getting started with HPC#

This guide will give you a short overview of the most important aspects of running applications on the HPC systems. For more in-depth information, please refer to the linked documentation.

Monthly HPC beginner's introduction

If you have any questions, you can come to our online HPC beginner's introduction. On every second Wednesday of the month, we offer a hands-on introduction to our systems.

Getting an HPC account#

Depending on the status, there are different ways to get an HPC account. You can find out what applies to you under Getting an account.

HPC accounts are managed through the HPC portal. You can find more information about how to use the portal and upload an ssh key in the HPC portal documentation

HPC Clusters#

By default, all clusters use Linux operating systems with text-mode only. Basic knowledge of file handling, scripting, editing, etc. under Linux is therefore required.

The HPC clusters at NHR@FAU are tailored to different use cases:

multi-node MPI-parallel jobs: Fritz
GPU jobs: TinyGPU, Alex
single-core or single node (throughput) jobs: Woody
single-core or single node with larger main memory requirement: TinyFat

Connecting to HPC systems#

To connect to the HPC systems at NHR@FAU, OpenSSH is used. OpenSSH is supported under Linux, Mac, and Windows either natively or via third-party clients.

A step-by-step guide how to configure SSH is given below:

Command-line based (Linux, Mac, Windows PowerShell)
MobaXterm (graphical client for Windows)
VS Code

It is also possible to use JupyterHub to work interactively on the systems.

Working with data#

Different file systems are accessible from the clusters. Due to their different properties, they are suited for different use cases. More detailed information is available in the general filesystem documentation

mount point	access	purpose	technology	backup	snapshots	data lifetime	quota
`/home/hpc`	`$HOME`	Source, input, important results	NFS	YES	YES	Account lifetime	50 GB
`/home/vault`	`$HPCVAULT`	Mid-/long-term storage	NFS	YES	YES	Account lifetime	500 GB
`/home/{woody,saturn, titan,janus, atuin}`	`$WORK`	General-purpose, log files	NFS	NO	NO	Account lifetime	Tier3: 500 GB, NHR: project quota
`/lustre`	`$FASTTMP` on Fritz	High performance parallel I/O	Lustre via InfiniBand	NO	NO	High watermark	Only inodes
`/???`	`$TMPDIR`	Node-local job-specific directory	SSD/RAM disk	NO	NO	Job runtime	NO

$HOME, $HPCVAULT, and $WORK are mounted on all HPC systems, including frontend nodes and cluster nodes.

For all filesystems, your personal folder is located in your group directory, for example for $HOME at /home/hpc/GROUPNAME/USERNAME.

The environment variables $HOME, $HPCVAULT, and $WORK are automatically set upon login and can be used to access the folders directly.

File system quota#

Nearly all file systems impose quotas on the data volume and/or the number of files or directories. These quotas may be set per user or per group. The soft quota can be exceeded temporarily, whereas the hard quota is the absolute upper limit. You will be notified automatically if you exceed your quota on any file system.

You can look up your used quota by typing quota -s or shownicerquota.pl on any cluster front end.

More information is available in the description of the specific file systems in the general filesystem documentation.

Data transfer#

Under Linux and Mac, scp and rsync are the preferred ways to copy data from and to a remote machine.

Under Windows, either the Linux subsystem, scp via Command/PowerShell, or additional tools like WinSCP can be used.

You can also mount the remote file systems locally on your machines.

For large files, we recommend using scp or rsyncbecause they are usually much faster than other transfer protocols. rsync also provides more extensive functionality, e.g. skipping files that already exist or resuming file transfers.

Available Software#

The standard Linux packages are installed on the cluster front ends. The majority of software is provided via environment modules, e.g. compilers, libraries, open and commercial software. Available modules may differ between clusters.

Environment modules are a way to dynamically alter the user's environment to make a certain software usable, e.g. by modifying search paths. This makes it easy to switch between different versions of a software.

A module has to be loaded explicitly to become usable. All module commands affect the environment of the current shell only. They can also be used within Slurm scripts.

The most important module commands are the following:

Command	Explanation
`module avail`	lists available modules
`module list`	shows which modules are currently loaded
`module load <modulename>`	loads the module `<modulename>`
`module unload <pkg>`	removes the module `<modulename>`

Compiling applications#

For compiling your applications, you have to load the necessary modules first. Usually, the same modules also have to be loaded prior to running the application inside your Slurm job script.

Lists of available compilers and recommended flags are given in the compiler documentation.

On the clusters, Open MPI and Intel MPI are available via modules. More information is available in the MPI documentation.

Python#

We recommend the usage of our Python modules that provide an Anaconda installation with a wide range of preinstalled packages and the conda package manager.

We discourage the usage of the system-wide installed Python on our systems as it is rather old.

List available Python versions with module avail python and load a specific version with module load python/<version>. Additional packages can be installed via pip or conda in your own personal package directory. All necessary setup steps and more hints are summarized in the Python documentation.

Running Jobs#

The cluster front ends can be used for interactive work like editing input files or compiling your application. Do not run applications with large computational or memory requirements on the front ends, since this will impact other users. MPI parallel jobs are generally not allowed on the front ends.

Interactive jobs#

Especially for test runs and debugging, interactive jobs can be used. When started, these open a shell on one of the assigned compute nodes and and let you run interactive programs there.

Batch jobs#

Compute nodes cannot be accessed directly. Compute resources have to be requested through the so-called batch system. This is done by creating a job script, that contains all the commands you want to run and also the requested resources (number of compute nodes, runtime, etc.). A job will run when the required resources become available. The output of the job is written into a file in your submit directory.

All clusters at NHR@FAU use Slurm as the batch system. The following documentation is available: - general usage of Slurm - general example scripts - cluster documentation for cluster-specific hints

Job status#

Use the command sinfo on the respective cluster frontend node to get the current status of the cluster nodes (idle, mixed, allocated). This shows the current workload of the cluster, which also influences the queuing time of your job. Keep in mind, however, that there might be some resource or priority limitations that prevent your job from running even if nodes are available.

You can use squeue to check the status of your jobs. If your job is not starting straight away, the reason for the delay is listed in the column NODELIST(REASON). An explanation of these reasons is available here.

When you log into the cluster frontend node, it will also show the Message of the Day (MOTD), where changes in the configuration, maintenance times, and other disruptions in service will be announced.

Good practices#

Check the results of your job regularly to avoid wasting resources.
Look at the performance and resource usage of your job. If you do not have a password for your HPC account, log into the HPC Portal and follow the links.
Try to use the appropriate amount of parallelism. Run scaling experiments to figure out the "sweet spot" of your application.
Use the appropriate filesystem for your data.
If you need help, open a support ticket.