Software environment

We aim to provide an environment across the RRZE production cluster systems that is as homogeneous as possible. This page describes this environment.

This page covers the following topics:

Available Software
modules system
Available shells
Software development
Parallel computing
Libraries

Available software on the HPC systems

As the parallel computers of RRZE are operated diskless, all system software has to reside in a RAM disk, i.e. in main memory. Therefore, only limited packages from the Linux distribution can be installed on the compute nodes, and the compute nodes only contain a subset of the packages installed on the login nodes. Most (application) software but also libraries, therefore, will be installed in /apps and made available via the modules system. Multiple versions of a single software can be provided in that way, too. We provide mainly libraries and compilers, but also some other frequently requested software packages.

As a general rule, software will be installed centrally, if

there are multiple groups that benefit from the software, or
the software is very easy to install.

The only commercial software we provide on all clusters are the Intel compilers and related tools.
Some other commercial software may be installed, but HPC@RRZE will NOT provide any licenses. If you want to use other commercial software, you will need to bring the license with you. This is also true for software sub-licensed from the RRZE software group. All calculations you do on the clusters will draw licenses out of your license pool. Please try to clarify any licensing questions before contacting us, as we really do not plan to become experts in software licensing.

Feel free to compile software in the versions and with the options you need yourself. This is perfectly fine, yet support for self-installed software cannot be granted. We only can provide software centrally which is of importance for multiple groups. If you want to use Spack for compiling additional software, on most of our clusters, you can load our user-spack module to make use of the packages we already build with Spack if the concretization match instead of starting from scratch. Once user-spack is loaded, the command spack will be available (as alias), you will inherit the pre-sets we defined for certain packages (e.g. Open MPI to work with Slurm), but you’ll install everything into your own directories ($WORK/USER-SPACK).

You can also bring your own environment in a container using Singularity/Apptainer.

modules system

On all RRZE HPC systems, established tools for software development (compilers, editors, …), libraries, and selected applications are available. For many of these applications, it is necessary to set special environment variables, so that e.g. search paths are correct or license servers can be found.

To ease selection of and switching between different versions of software packages, all HPC systems at RRZE use the modules system (cf. modules.sourceforge.net). It allows to conveniently load the necessary configurations for different programs or different versions of the same program an, if necessary, unload them again later.

Important module commands

Overview of the most important *module* commands
Command	What it does
`module avail`	lists available modules
`module whatis`	shows an over-verbose listing of all available modules
`module list`	shows which modules are currently loaded
`module load <pkg>`	loads the module pkg, that means it makes all the settings that are necessary for using the package pkg (e.g. search paths).
`module load <pkg>/version`	loads a specific version of the module pkg instead of the default version.
`module unload <pkg>`	removes the module pkg, which means it undoes what the load command did.
`module help <pkg>`	shows a detailed description for module pkg.
`module show <pkg>`	shows what environment variables module pkg actually sets/modifies.

General hints for using modules

modules always only affect the current shell.
If individual modules are to be loaded all the time, you can put the command into your login scripts, e.g. into $HOME/.bash_profile – but keep in mind that your home is shared among all HPC clusters and that the available modules are likely to be different on different systems.
The syntax of the module-Commands are independent of the shell used. They can thus usually be used unmodified in any type of job script.
Some modules cannot be loaded together. In some cases, such a conflict is detected automatically during the load command, in which case an error message is printed and no modifications are made.
Modules can depend on other modules so that these are loaded automatically when you load the module.
A current list of all available modules can be retrieved with the command module avail.

Important modules

Important standard modules, available on most or all clusters
`intel`	This module provides the legacy Intel compilers `icc`, `icpc`, and `ifort` as well as the new LLVM-based ones (`icx`, `icpx`, `dpcpp`, `ifx`).
`intelmpi`	This module provides Intel MPI. To use the legacy Intel compilers with Intel MPI you just have to use the appropriate wrappers with the Intel compiler names, i.e. `mpiicc`, `mpiicpc`, `mpiifort`. To use the new LLVM-based Intel compilers with Intel MPI you have to specify them explicitly, i.e use `mpiicc -cc=icx`, `mpiicpc -cxx=icpx`, or `mpiifort -fc=ifx`. The execution of `mpicc`, `mpicxx`, and `mpif90` results in using the GNU compilers.
`openmpi`	Loads some version of OpenMPI and the matching compiler.
`gcc`	Some version of the GNU compiler collection. Please note that all systems naturally have a default gcc version that is delivered together with the operating system and that is always available without loading any module. However, that version is often a bit dated, so we provide a gcc-module with a somewhat newer version on some clusters.

Some hints which can simplify the usage of modules in Makefiles:

When using MPI modules, the environment variables MPICHROOTDIR and MPIHOME are set to the root directory of the respective MPICH version. Access to include files and libraries can therefore be achieved by $MPIHOME/include and $MPIHOME/lib.
Analogously, the environment variables INTEL_C_HOME and INTEL_F_HOME are set to the respective root directory when using the Intel compiler modules. This can be helpful when Fortran and C++ objects should be linked and the respective libraries have to be included manually.

On all newly installed clusters, the intel64 module has been renamed to intel and no longer automatically loads intel-mpi and mkl.

Defining own module tree

Some users or groups require software or specific software versions not provided by the NHR@FAU module system. For them we offer the user-spack modules which integrates itself into the available module tree. If you need other software but still want the easy management through modules, you can create an own module tree.

Create a folder in your home directory ($HOME/.modulefiles) for the module tree. Inside this folder, create a folder for each module you want ($HOME/.modulefiles/mymod). Inside this folder, create a file for the version (can also be a name or similar). The module will be present as mymod/version in the end. In most cases the file is very simple and has a common structure.


#%Module

# Descriptive text
module-whatis "This is my module"

# conflicting modules to load only one version at a time
conflict mymod

# Define base path to the installed software
set rootdir_name /base/path/of/own/software

# Add bin directory to $PATH
prepend-path PATH "$rootdir_name/bin"

# Add sbin directory to $PATH if needed
#prepend-path PATH "$rootdir_name/sbin"

# Add library directory to $LD_LIBRARY_PATH
prepend-path LD_LIBRARY_PATH "$rootdir_name/lib"

# Man pages of the software (if available)
#prepend-path MANPATH "$rootdir_name/man"

# Set own environment variables
setenv MYVAR "foobar"

# Remove variables from environment
# unsetenv NOTMYVAR

In order to append your module tree to the NHR@FAU modules, run module use -a $HOME/.modulefiles.

If you have multiple versions of a module, it is helpful to define a default version. You can do that by creating a file in the module ($HOME/.modulefiles/mymod/.version):


#%Module1.0
set ModulesVersion "default"

The modules system provide a lot of features not shown here. If you are interested, check out the official documentation.

shells

In general, two types of shells are available on the HPC systems at RRZE:

csh, the C-shell, usually in the form of the feature enhanced tcsh instead of the classic csh.
bash

csh used to be the default login shell for all users, not because it is a good shell (it certainly isn’t!), but simply for “historical reasons”. Since ca. 2014 the default shell for new users has been bash instead, which most people having used any Linux systems will be familiar with. The newer clusters (starting with Emmy) will always enforce bash as the shell, even for old accounts. If you have one of those old accounts still using csh and want to change to bash for the older clusters too, you can contact the ServiceTheke or the HPC team to get your login shell changed.

Software Development

You will find a wide variety of software packages in different versions installed on the cluster frontends. The module concept is used to simplify the selection and switching between different software packages and versions. Please see the page on batch processing for a description of how to use modules in batch scripts.

Compilers

Intel

Intel compilers are the recommended choice for software development on all clusters. A current version of the Fortran90, C and C++ compilers (called icc, icpc, and ifort for the legacy Intel compilers and icx, icpx, dpcpp, ifx for the new LLVM-based compilers.) can be selected by loading the intel module. For use in scripts and makefiles, the module sets the shell variables $INTEL_F_HOME and $INTEL_C_HOME to the base directories of the compiler packages.

As a starting point, try to use the option combination -O3 -xHost when building objects. All Intel compilers have a -help switch that gives an overview of all available compiler options. For in-depth information please consult the local docs in $INTEL_[F,C]_HOME/doc/ and Intel’s online documentation for their compiler suite (currently named “Intel Parallel Studio XE”).

Endianness

All x86-based processors use the little-endian storage format which means that the LSB for multi-byte data has the lowest memory location. The same format is used in unformatted Fortran data files. To simplify the handling of big-endian files (e.g. data you have produced on IBM Power, Sun Ultra, or NEC SX systems) the Intel Fortran compiler has the ability to convert the endianness on the fly in read or write operations. This can be configured separately for different Fortran units. Just set the environment variable F_UFMTENDIAN at run-time.

Examples:

Effect of the environment variable `F_UFMTENDIAN`
F_UFMTENDIAN=	Effect
big	everything treated as BE
little	everything treated as LE (default)
big:10,20	everything treated as LE, except for units 10 and 20
“big;little:8”	everything treated as BE, except for unit 8

GCC

The GNU compiler collection (GCC) is available directly without having to load any module. However, do not expect to find the latest GCC version here. Typically, several versions are separately installed on all systems and made available via environment modules, e.g. module load gcc/<version>.

Be aware that the Open MPI module depends on the used compiler, i.e. openmpi/XX-intel for use with the Intel compiler and openmpi/XX-gcc for the GCC compiler. The IntelMPI module intelmpi/XX is independent of the used compiler.

MPI Profiling with Intel Trace Collector/Analyzer

Intel Trace Collector/Analyzer are powerful tools that acquire/display information on the communication behavior of an MPI program. Performance problems related to MPI can be identified by looking at timelines and statistical data. Appropriate filters can reduce the amount of information displayed to a manageable level.

In order to use Trace Collector/Analyzer you have to load the itac module. This section describes only the most basic usage patterns. Complete documentation can be found on Intel’s ITAC website, or in the Trace Analyzer Help menu. Please note that tracing is currently only possible when using Intel MPI, therefore the corresponding intel64 and intelmpi module have to be loaded.

Trace Collector (ITC)

ITC is a tool for producing tracefiles from a running MPI application. These traces contain information about all MPI calls and messages and, optionally, on functions in the user code.

It is possible to trace your application without rebuilding it by dynamically loading the ITC profiling library during execution. The library intercepts all MPI calls and generates a trace file. To start the trace, simply add the -trace option to your mpirun command, e.g.:

$ mpirun -trace -n 4 ./myApp.

In some cases, your application has to be rebuild to trace it, for example, if it is statically linked with the MPI library or if you want to add user function information to the trace. To include the required libraries, you can use the -trace option during compilation and linking. Your application can then be run as usual, for example:

$ mpicc -trace myApp.c -o myApp
$ mpirun -n 4 ./myApp

You can also specify other profiling libraries, for a complete list please refer to the ITC User Guide.

After an MPI application that has been compiled or linked with ITC has terminated, a collection of trace files is written to the current directory. They follow the naming scheme <binary-name>.stf* and serve as input for the Trace Analyzer tool. Keep in mind that depending on the amount of communication and the number of MPI processes used, these trace files can become quite large. To generate one single file instead of several smaller files, specify the option -genv VT_LOGFILE_FORMAT=SINGLESTF in your mpiruncall.

Trace Analyzer (ITA)

The <binary-name>.stf file produced after running the instrumented MPI application should be used as an argument to the traceanalyzer command:

traceanalyzer <binary-name>.stf

The trace analyzer processes the trace files written by the application and lets you browse through the data. Click on “Charts-Event Timeline” to see the messages transferred between all MPI processes and the time each process spends in MPI and application code, respectively. Click and drag lets you zoom into the timeline data (zoom out with the “o” key). “Charts-Message profile” shows statistics about the communication requirements of each pair of MPI processes. The statistics displays change their content according to the currently displayed data in the timeline window. Please consider the Help menu or the ITAC User Guide to get more information. Additionally, the HPC group of RRZE will be happy to work with you on getting insight into the performance characteristics of your MPI applications.

Python

You have to distinguish between the Python installation from the Linux distribution in the default path and the one available through the python/[3.x]-anaconda modules. The system Python only provides a limited functionality (especially on the compute nodes). Some software packages (e.g. AMBER) come with their own Python installation. For more information, refer to the documentation about Python and Jupyter.

Parallel Computing

The intended parallelization paradigm on all clusters is either message passing using the Message Passing Interface (MPI) or shared-memory programming with OpenMP.

IntelMPI

Intel MPI supports different compilers (GCC, Intel). It is available by loading the intelmpi module. To use the legacy Intel compilers with Intel MPI you just have to use the appropriate wrappers with the Intel compiler names, i.e. mpiicc, mpiicpc, mpiifort. To use the new LLVM-based Intel compilers with Intel MPI you have to specify them explicitly, i.e use mpiicc -cc=icx, mpiicpc -cxx=icpx, or mpiifort -fc=ifx. The execution of mpicc, mpicxx, and mpif90 results in using the GNU compilers.

There are no special prerequisites for running MPI programs. On all clusters, we recommend starting MPI programs with srun instead of mpirun. Both IntelMPI and OpenMPI have integrated support for the Slurm batch system and are able to detect the allocated resources automatically.

If you require using Intel mpirun for your application, this usually works without problems for pure MPI applications. Slurm instructs mpirun about number of processes (as defined via sbatch/salloc) and node hostnames for IntelMPI automatically. Do NOT add options like -n <number_of_processes> or any other option defining the number of processes or nodes to mpirun, as this might mess with the automatic affinity settings of the processes.

By default, one process will be started on each allocated CPU in a blockwise fashion, i.e. the first node is filled completely, followed by the second node, etc.. If you want to start fewer processes per node (e.g. because of large memory requirements) you can specify the --ntasks-per-node=XX option to sbatch to define the number of processes per node.

We do not support running MPI programs interactively on the frontends. To do interactive testing, please start an interactive batch job on some compute nodes. During working hours, a number of nodes is reserved for short (< 1 hour) tests.

OpenMPI

Loading the openmpi/XX-intel or openmpi/XX-gcc module will automatically also load the respective compiler module. The standard MPI compiler wrapper mpicc, mpicxx and mpifort are then available.

The usage of OpenMPI is very similar to IntelMPI. On all clusters, we recommend starting MPI programs with srun instead of mpirun. Both IntelMPI and OpenMPI have integrated support for the Slurm batch system and are able to detect the allocated resources automatically.

MPI process binding

It is possible to use process binding to specify the placement of the processes on the architecture. This may increase the speed of your application, but also requires advanced knowledge about the system’s architecture. When no options are given, default values are used. This is the recommended setting for most users.

Both srun and mpirun will bind the MPI processes automatically in most cases. Two cases have to be distinguished regarding the binding of processes:

Full nodes: all available cores are used by a jobstep. In this case, process binding is usually done correctly and automatically by srun and mpirun
Partially used nodes: some (automatically) allocated cores are not used by a jobstep. In this case, process binding is in some cases not done automatically by srun. In this case, use the option --cpu-bind=cores with srun.

Automatic binding behavior can differ between the type of MPI (IntelMPI vs. OpenMPI), the version of the MPI library, and the Slurm version. The resulting distribution of processes may also differ between srun and mpirun. We strongly recommend checking the process binding of your application regularly, especially after changing versions of any of the used libraries. Incorrect process binding can negatively impact the performance of your application. Use --cpu-bind=verbose for srun, export I_MPI_DEBUG=5 for IntelMPI-mpirun, or --report-bindings for OpenMPI-mpirun.

More information about process binding can be found in the HPC Wiki.

OpenMP

The installed compilers support at least the relevant parts of recent OpenMP standards. The compiler recognizes OpenMP directives if you supply the command line option. Use -fopenmp for GCC and -qopenmp for the Intel compiler. This is also required for the link step.

To run an OpenMP application, the number of threads has to be specified explicitly by setting the environment variable OMP_NUM_THREADS. This is not done automatically via the batch system, since Slurm is not OpenMP-aware. If this is not set, the default variable will be used. In most cases, the default is 1, which means that your code is executed serially. If you want to use for example 12 threads in the parallel regions of your program, you can change the environment variable by export OMP_NUM_THREADS=12.

For correct resource allocation in Slurm, use --cpus-per-task to define the number of OMP threads. If your application does not use OpenMP but other shared-memory parallelization, please consult the application manual on how to specify number of threads.

OpenMP Pinning

To reach optimum performance with OpenMP codes, the correct pinning of the OpenMP threads is essential. As nowadays practically all machines are ccNUMA, where incorrect or no pinning can have devastating effects, this is something that should not be ignored. Slurm will not pin OpenMP threads automatically.

A comfortable way to pin your OpenMP threads to processors is by using likwid-pin, which is available within the likwid module on all clusters. You can start your program run using the following syntax:

likwid-pin -c <cpulist> <executable>

There are various possibilities to specify the CPU list, depending on the hardware setup and the requirements of your application. A short summary is available by calling likwid-pin -h. A more detailed documentation can be found on the Likwid Github page.

An alternative way of pinning is using OpenMP specific methods, e.g. by setting $OMP_PLACES=cores and $OMP_PROC_BIND=spread. More information about this is available in the HPC Wiki.

Libraries

Intel Math Kernel Library (MKL)

The Math Kernel Library provides threaded BLAS, LAPACK, and FFTW routines and some supplementary functions (e.g., random number generators). For distributed-memory parallelization, there is also ScaLAPACK and CDFT (cluster DFT), together with some sparse solver subroutines. It is highly recommended to use MKL for any kind of linear algebra if possible. To facilitate the choice of functions for a specific use case, you can refer for example to the Intel MKL LAPACK function finding advisor.

After loading the mkl module, several shell variables are available that help with compiling and linking programs that use MKL. The installation directory can be found under $MKLROOT, other useful environment variables are available by module show mkl/XX. For most applications, it should be sufficient to compile and link your program with -mkl. For more complex applications, you can find out what libraries are recommended by using the Intel MKL link line advisor.

Many MKL routines are threaded and can run in parallel by setting the OMP_NUM_THREADS shell variable to the desired number of threads. If you do not set OMP_NUM_THREADS, the default number of threads is one. Using OpenMP together with threaded MKL is possible, but the OMP_NUM_THREADS setting will apply to both your code and the MKL routines. If you don’t want this it is possible to force MKL into serial mode by setting the MKL_SERIAL environment variable to YES.

For more in-depth information, please refer to Intel’s online documentation on MKL.

Software environment

Available software on the HPC systems

modules system

Important module commands

General hints for using modules

Important modules

Defining own module tree

Module file mymod/version

Default module file .version

shells

Software Development

Compilers

Intel

Endianness

GCC

MPI Profiling with Intel Trace Collector/Analyzer

Trace Collector (ITC)

Trace Analyzer (ITA)

Python

Parallel Computing

IntelMPI

OpenMPI

MPI process binding

OpenMP

OpenMP Pinning

Libraries

Intel Math Kernel Library (MKL)