• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
  • FAUTo the central FAU website
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz

Navigation Navigation close
  • News
  • About us
    • People
    • Funding
    • NHR Compute Time Projects
    • Tier3 User Project Reports
    • Success Stories from the Support
    • Annual Report
    • Jobs
    Portal About us
  • Research
    • Research Focus
    • Publications, Posters and Talks
    • Software & Tools
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
    • NHR PerfLab Seminar
    • Projects
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures and Seminars
    • Tutorials & Courses
    • Theses
    • HPC Café
    • Student Cluster Competition
    Portal Teaching & Training
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • Training Resources
    • Summary of System Utilization
    Portal Systems & Services
  • FAQ

  1. Home
  2. Systems & Services
  3. Systems, Documentation & Instructions
  4. HPC clusters & systems
  5. TinyFat cluster (Tier3)

TinyFat cluster (Tier3)

In page navigation: Systems & Services
  • Systems, Documentation & Instructions
    • Getting started with HPC
      • NHR@FAU HPC-Portal Usage
    • Job monitoring with ClusterCockpit
    • NHR application rules – NHR@FAU
    • HPC clusters & systems
      • Dialog server
      • Alex GPGPU cluster (NHR+Tier3)
      • Fritz parallel cluster (NHR+Tier3)
      • Meggie parallel cluster (Tier3)
      • Emmy parallel cluster (Tier3)
      • Woody(-old) throughput cluster (Tier3)
      • Woody throughput cluster (Tier3)
      • TinyFat cluster (Tier3)
      • TinyGPU cluster (Tier3)
      • Test cluster
      • Jupyterhub
    • SSH – Secure Shell access to HPC systems
    • File systems
    • Batch Processing
      • Job script examples – Slurm
      • Advanced topics Slurm
    • Software environment
    • Special applications, and tips & tricks
      • Amber/AmberTools
      • ANSYS CFX
      • ANSYS Fluent
      • ANSYS Mechanical
      • Continuous Integration / Gitlab Cx
        • Continuous Integration / One-way syncing of GitHub to Gitlab repositories
      • CP2K
      • CPMD
      • GROMACS
      • IMD
      • Intel MKL
      • LAMMPS
      • Matlab
      • NAMD
      • OpenFOAM
      • ORCA
      • Python and Jupyter
      • Quantum Espresso
      • R and R Studio
      • Spack package manager
      • STAR-CCM+
      • Tensorflow and PyTorch
      • TURBOMOLE
      • VASP
        • Request access to central VASP installation
      • Working with NVIDIA GPUs
      • WRF
  • Support & Contact
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
  • HPC User Training
  • HPC System Utilization

TinyFat cluster (Tier3)

The TinyFat cluster is one of a group of small special-purpose clusters. Memoryhog and the TinyFat cluster are intended for running serial or moderately parallel (OpenMP) applications that require large amounts of memory in one machine. They are mostly used for pre-/post-processing work for jobs on other clusters. The nodes are only connected by GBit Ethernet. Therefore, only single-node jobs are allowed on this cluster.

On 19.09.2022, the name of the frontend node for TinyFat has been changed to tinyx.nhr.fau.de.

There are a number of different machines in TinyFat:

Hostnames #
nodes
CPUs and number of cores per machine Main memory
(in GB)
Additional comments and
required sbatch parameters
memoryhog
(formerly tf020)
1 4x Intel Xeon X7560 (“Nehalem EX”) @2,27 GHz
= 32 cores/64 threads
512 interactively accessible without batch job
tf040-tf042
(not always generally available)
3 2x Intel Xeon E5-2680 v4 (“Broadwell”) @2.4 GHz
= 28 cores/56 threads
512 10 GBit Ethernet, 1 TB Intel 750 SSD
-p broadwell512
tf050-tf057
(not always generally available)
8 2x Intel Xeon E5-2643 v4 (“Broadwell”) @3.4 GHz
= 12 cores/24 threads
256 10 GBit Ethernet, 1 TB Intel 750 SSD
-p broadwell256
tf060-tf095
(not generally available)
36 2x AMD Rome 7502 @2.5 GHz
= 64 cores/128 threads (SMT enabled)
512 10 GBit Ethernet, 3.5 TB NVMe SSD

All Broadwell-based nodes have been purchased by specific groups or special projects. These users have priority access and nodes may be reserved exclusively for them.

All AMD Rome-based nodes have been purchased by specific groups or special projects. These users have priority access and nodes may be reserved exclusively for them.

Access, User Environment, and File Systems

Access to the machines

TinyFat shares a frontend node with TinyGPU. To access the systems, please connect to tinyx.nhr.fau.de via ssh.

On 19.09.2022, the name of the frontend node for TinyFat has been changed to tinyx.nhr.fau.de.

While it is possible to ssh directly to a compute node, a user is only allowed to do this when they have a batch job running there. When all batch jobs of a user on a node have ended, all of their shells will be killed automatically.

One exception to this is memoryhog, which can be used interactively without a batch job. Every HPC user can log in directly to memoryhog.rrze.fau.de to run their memory-intensive workloads. This of course means you need to be considerate of other users. Processes hogging up too many resources or running for too long will be killed without notice.

File Systems

The following table summarizes the available file systems and their features. Also, check the description of the HPC file systems.

File system overview for the TinyFat cluster
Mount point Access via Purpose Technology, size Backup Data lifetime Quota
/home/hpc $HOME Storage of source, input, important results central servers YES + Snapshots Account lifetime YES (restrictive)
/home/vault $HPCVAULT Mid- to long-term, high-quality storage central servers YES + Snapshots Account lifetime YES
/home/{woody, saturn, titan, janus, atuin} $WORK storage for small files NFS limited Account lifetime YES
/scratchssd $TMPDIR Temporary job data directory Node-local SSD, between 750 GB and 3.5 TB NO Job runtime NO

Node-local storage $TMPDIR

Each node has at least 750 GB of local SSD capacity for temporary files available under $TMPDIR (also accessible via /scratchssd). All files in these directories will be deleted at the end of a job without any notification. Important data to be kept can be copied to a cluster-wide volume at the end of the job, even if the job is canceled by a time limit. Please see the section on batch processing for examples on how to use $TMPDIR.

Please only use the node-local SSDs if you can really profit from their use, as like all consumer SSDs they only support a limited number of writes, so in other words, by writing to them, you “use them up”.

Batch Processing

All user jobs (except on memoryhog) must be submitted to the cluster by means of the batch system The submitted jobs are routed into a number of queues (depending on the needed resources, e.g. runtime) and sorted according to some priority scheme.

The nodes on TinyFat use Slurm as batch systems. However, there are some small differences:  the Broadwell-based nodes are automatically allocated exclusively for a job, whereas the newer AMD Rome-based nodes might be shared among different jobs. Please see the batch system description for general information about Slurm. In the following, only the features specific to TinyFat will be described.

Slurm

To specify to which cluster jobs should be submitted, command wrappers are available for most Slurm commands. This means that jobs can be submitted from the frontend via sbatch.tinyfat. Other examples are srun.tinyfat, salloc.tinyfat, sinfo.tinyfat and squeue.tinyfat. These commands are equivalent to using the option --clusters=tinyfat.

In contrast to other clusters, the Rome-based compute nodes are not allocated exclusively but are shared among several jobs. However, users will only have access to resources (cores, memory) allocated by their job. Exclusive access to the whole node can be requested by using the --exclusive option.

We recommend always using srun instead of mpirunor mpiexecto start your parallel application, since it automatically uses the allocated resources (number of tasks, cores per task, …) and also binds the tasks to the allocated cores. If you have to use mpirun, make sure to check that the binding of your processes is correct (e.g. with --report-bindings for OpenMPI and export I_MPI_DEBUG=4 for IntelMPI). OpenMP threads are not automatically pinned to specific cores. In order for the application to run efficiently, this has to be done manually. For more information, see e.g. the HPC Wiki.

Per default, 8000MB of memory are allocated per physical core. If your application needs a higher ratio than this, more memory can be requested with the option --mem=<memory in MByte>.

We still recommend using the sbatch option --export=none to prevent this export. Additionally, unset SLURM_EXPORT_ENV has to be called before srun to ensure that it is executed correctly. Both options are already included in the example scripts below.[/notice-hinweis]

Example Slurm Batch Scripts

Although SMT is enabled on these nodes, per default only one task per (physical) core is scheduled. If you want to use Hyperthreads, this has to be explicitly specified.

For the most common use cases, examples are provided below.

MPI without hyperthreading

In this example, the executable will be run using 2 MPI processes for a total job walltime of 6 hours. Each process is running on a physical core and hyperthreads are not used. The job can use up to 16000MB of main memory.

#!/bin/bash -l
#
# start 2 MPI processes
#SBATCH --ntasks=2
# allocate nodes for 6 hours
#SBATCH --time=06:00:00
# job name 
#SBATCH --job-name=Testjob
# do not export environment variables
#SBATCH --export=NONE

# do not export environment variables
unset SLURM_EXPORT_ENV

srun --mpi=pmi2 ./executable.exe

MPI with hyperthreading

In this example, the executable will be run using 2 MPI processes for a total job walltime of 6 hours. Only one physical core is allocated and each process is running on one of its hardware threads. The job can use up to 8000MB of main memory.

#!/bin/bash -l
#
# start 2 MPI processes
#SBATCH --ntasks=2
# specify to use hyperthreads
#SBATCH --hint=multithread
#equivalent to
##SBATCH --ntasks-per-core=2
# allocate nodes for 6 hours
#SBATCH --time=06:00:00
# job name 
#SBATCH --job-name=Testjob
# do not export environment variables
#SBATCH --export=NONE

# do not export environment variables
unset SLURM_EXPORT_ENV

srun --mpi=pmi2 ./executable.exe

Hybrid MPI/OpenMP without hyperthreading

In this example, the executable will be run using 2 MPI processes with 8 OpenMP threads each for a total job walltime of 6 hours. 16 cores are allocated in total and each OpenMP thread is running on a physical core. Hyperthreads are not used. The job can use up to 16*8000MB=128000MB of main memory.

OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS in your script. It should match the number of cores requested via --cpus-per-task.

For a more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=cores, OMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l
#
# start 2 MPI processes
#SBATCH --ntasks=2
# requests 8 OpenMP threads per MPI task
#SBATCH --cpus-per-task=8
# do not use hyperthreads
#SBATCH --hint=nomultithread
# allocate nodes for 6 hours
#SBATCH --time=06:00:00
# job name 
#SBATCH --job-name=Testjob
# do not export environment variables
#SBATCH --export=NONE

# do not export environment variables
unset SLURM_EXPORT_ENV
# cpus-per-task has to be set again for srun
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
# set number of threads to requested cpus-per-task 
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun --mpi=pmi2 ./executable_hybrid.exe

Hybrid MPI/OpenMP with hyperthreading

In this example, the executable will be run using 2 MPI processes with 8 OpenMP threads each for a total job walltime of 6 hours. 8 cores are allocated in total and each OpenMP thread is running on one of its hardware threads.  The job can use up to 8*8000MB=64000MB of main memory.

OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS in your script. It should match the number of cores requested via --cpus-per-task.

For a more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=threads, OMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l 
#
# start 2 MPI processes 
#SBATCH --ntasks=2
# requests 8 OpenMP threads per MPI task
#SBATCH --cpus-per-task=8
# allocate nodes for 6 hours 
#SBATCH --time=06:00:00 
# job name #SBATCH --job-name=Testjob 
# do not export environment variables 
#SBATCH --export=NONE 

# do not export environment variables 
unset SLURM_EXPORT_ENV 

# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# cpus-per-task has to be set again for srun
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
srun --mpi=pmi2 ./executable_hybrid.exe

OpenMP Job without hyperthreading

In this example, the executable will be run using 6 OpenMP threads for a total job walltime of 6 hours. 6 cores are allocated in total and each OpenMP thread is running on a physical core.  The job can use up to 6*8000MB=48000MB of main memory.

OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS in your script. It should match the number of cores requested via --cpus-per-task.

For a more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=cores, OMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l
#
# requests 6 OpenMP threads
#SBATCH --cpus-per-task=6
# do not use hyperthreads
#SBATCH --hint=nomultithread
# allocate nodes for 6 hours 
#SBATCH --time=06:00:00 
# job name #SBATCH --job-name=Testjob 
# do not export environment variables 
#SBATCH --export=NONE 

# do not export environment variables 
unset SLURM_EXPORT_ENV 

# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# cpus-per-task has to be set again for srun
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
srun --mpi=pmi2 ./executable_hybrid.exe

OpenMP Job with hyperthreading

In this example, the executable will be run using 6 OpenMP threads for a total job walltime of 6 hours. 3 cores are allocated in total and each OpenMP thread is running on one of its hardware threads.  The job can use up to 3*8000MB=24000MB of main memory.

OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS in your script. It should match the number of cores requested via --cpus-per-task.

For a more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=threads, OMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l 
#
# requests 6 OpenMP threads
#SBATCH --cpus-per-task=6
# allocate nodes for 6 hours 
#SBATCH --time=06:00:00 
# job name #SBATCH --job-name=Testjob 
# do not export environment variables 
#SBATCH --export=NONE 

# do not export environment variables 
unset SLURM_EXPORT_ENV 

# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# cpus-per-task has to be set again for srun
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
 srun --mpi=pmi2 ./executable_hybrid.exe

 

Interactive Slurm Shell

To generate an interactive Slurm shell on one of the compute nodes, the following command has to be issued on the frontend:
salloc --cpus-per-task=10 --time=00:30:00
This will give you an interactive shell for 30 minutes on one of the nodes, allocating 10 physical cores and 80000MB memory. There, you can then for example compile your code or do test runs of your binary. For MPI-parallel binaries, use sruninstead of mpirun.

Please note that sallocautomatically exports the environment of your shell on the login node to your interactive job. This can cause problems if you have loaded any modules due to the version differences between the frontend and the TinyFat compute nodes. To mitigate this, purge all loaded modules via module purge before issuing the salloc command.

Software

Host software compiled specifically for Intel processors might not run on tf060-tf095, since they have AMD processors.

Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
Up