Woody(-old) throughput cluster (Tier3)

All nodes of this cluster have either been shut down or moved to the new Woody cluster in September 2022

The RRZE’s “Woody” is the preferred cluster for serial/single-node throughput jobs.

The cluster has changed significantly over time. You can find more about the history in the section about history. The current hardware configuration looks like this:

~~40 compute nodes (w10xx nodes) with Xeon E3-1280 CPUs (“SandyBridge”, 4 cores, HT disabled, 3,5 GHz base frequency; only AVX but no AVX2), 8 GB RAM, 500 GB HDD – from 12/2011~~ these nodes have been shutdown in October 2020
~~70 compute nodes (w11xx nodes) with Xeon E3-1240 v3 CPUs (“Haswell”, 4 cores, HT disabled, 3,4 GHz base frequency), 8 GB RAM, 1 TB HDD~~ – from 09/2013 these nodes have been shutdown in September 2022
~~64 compute nodes (w12xx/w13xx nodes) with Xeon E3-1240 v5 CPUs (“Skylake”, 4 cores, HT disabled, 3,5 GHz base frequency), 32 GB RAM, 1 TB HDD – from 04/2016 and 01/2017~~ => now part of Woody-NG
~~112 compute nodes (w14xx/w15xx nodes) with Xeon E3-1240 v6 CPUs (“Kaby Lake”, 4 cores, HT disabled, 3,7 GHz base frequency), 32 GB RAM, 960 GB SDD – from Q3/2019~~ => now part of Woody-NG

front of a rack, with servers in it and lots of cables — The w11xx nodes in Woody

This website shows information regarding the following topics:

Access, User Environment, File Systems
Further Information
- History

Access, User Environment, and File Systems

Access to the machine

Access to the system is granted through the frontend nodes via ssh. Please connect to

woody.rrze.fau.de

and you will be randomly routed to one of the frontends. All systems in the cluster, including the frontends, have private IP addresses in the 10.188.82.0/23 range. Thus they can only be accessed directly from within the FAU networks. If you need access from outside of FAU you have to connect for example to the dialog server cshpc.rrze.fau.de first and then ssh to Woody from there. While it is possible to ssh directly to a compute node, a user is only allowed to do this when they have a batch job running there. When all batch jobs of a user on a node have ended, all of their shells will be killed automatically.

The login and compute nodes run a 64-bit Ubuntu LTS-version. As on most other RRZE HPC systems, a modules environment is provided to facilitate access to software packages. Type “module avail” to get a list of available packages.

On September 5th, 2022, the login node woody3 has been updates to Ubuntu 20.04 while the remaining w11xx nodes of Woody-PBS are still Ubuntu 18.04. However, woody3 now matches the OS of TinyGPU/Slurm and TinyFAT.

File Systems

The following table summarizes the available file systems and their features. Also check the description of the HPC file systems.

File system overview for the Woody cluster
Mount point	Access via	Purpose	Technology, size	Backup	Data lifetime	Quota
`/home/hpc`	`$HOME`	Storage of source, input and important results	central servers, 5 TB	YES + Snapshots	Account lifetime	YES (very restrictive)
`/home/vault`	`$HPCVAULT`	Mid- to long-term, high-quality storage	central servers	YES + Snapshots	Account lifetime	YES
`/home/{woody, saturn, titan, atiun}`	`$WORK`	storage for small files	NFS, in total ~2 PB	NO	Account lifetime	YES (user and/or group)
`/tmp`	`$TMPDIR`	Temporary job data directory	Node-local, between 400 and 900 GB	NO	Job runtime	NO (but only limited capacity depending on the node)

Node-local storage `$TMPDIR`

Each node has at least 400 GB of local hard drive capacity for temporary files available under /tmp (also accessible via /scratch/). All files in these directories will be deleted at the end of a job without any notification.

If possible, compute jobs should use the local disk for scratch space as this reduces the load on the central servers. In batch scripts the shell variable $TMPDIR points to a node-local, job-exclusive directory whose lifetime is limited to the duration of the batch job. This directory exists on each node of a parallel job separately (it is not shared between the nodes). It will be deleted automatically when the job ends. Important data to be kept can be copied to a cluster-wide volume at the end of the job, even if the job is cancelled by a time limit. Please see the section on batch processing for examples on how to use $TMPDIR.

Batch Processing

All user jobs except short serial test runs must be submitted to the cluster by means of the Torque Resource Manager. The submitted jobs are routed into a number of queues (depending on the needed resources, e.g. runtime) and sorted according to some priority scheme. It is normally not necessary to explicitly specify the queue when submitting a job to the cluster, the sorting into the proper queue happens automatically.

Please see the batch system description for further details.

The following queues are available on this cluster. There is no need to specify a queue manually!

Queues on the Woody cluster
Queue	min – max walltime	Comments
`route`	N/A	Default router queue; sorts jobs into execution queues
`devel`	0 – 01:00:00	Some nodes reserved for queue during working hours
`work`	01:00:01 – 24:00:00	“Workhorse”
`onenode`	01:00:01 – 48:00:00	only very few jobs from this queue are allowed to run at the same time.

Regular jobs are always required to request all CPUs in a node (ppn=4). ~~Using less than 4 CPUs per node is only supported in the SandyBridge segment.~~

If you submit jobs, then by default you can get any type of node: ~~SandyBridge,~~ Haswell, Skylake, or Kabylake based w1xxx-nodes. They all have the same number of cores (4) and minimum memory (at least 8 GB) per node, but the speed of the CPUs can be different, which means that job runtimes will vary. You will have to calculate the walltime you request from the batch system so that your jobs can finish even on the slowest nodes.

It is also possible to request certain kinds of nodes from the batch system. This has two mayor use cases besides the obvious “benchmarking”: If you want to run jobs that use less than a full node, those are currently only allowed on the SandyBridge nodes, so you need to request those explicitly. Some applications can benefit from using AVX2 which is not available on the SandyBridge based nodes. Moreover, the Skylake and Kabylake based nodes have more memory (32 GB). You request a node property by adding it to your -lnodes=... request string, e.g.: qsub -l nodes=1:ppn=4:hw. In general, the following node properties are available:

Available node properties on the Woody cluster
Property	Matching nodes (#)	Comments
(none specified)	w1xxx (286)	Can run on any node, that is all the ~~SandyBridge,~~ Haswell, Skylake, and Kabylake nodes.
`:sb`	w10xx (40)	~~Can run on the SandyBridge nodes only. Required for jobs with ppn other than 4.~~
`:hw`	w11xx (70)	Can run on the Haswell nodes only.
~~`:sl32g`~~	w12xx and w13xx (64)	~~Can run on the Skylake nodes (with 32 GB RAM) only.~~
~~`:kl32g`~~	w14xx and w15xx (112)	~~Can run on the Kabylake nodes (with 32 GB RAM) only.~~
~~`:any32g`~~	w13xx, w14xx and w15xx (176)	~~Can run on the Skylake or Kabylake nodes (with 32 GB RAM) only.~~
`:hdd900`	w1[1-5]xx (246)	Can run on any node with (at least) 900 GB scratch on HDD/SDD.

Note: Many of the original properties are no longer supported.

OpenMP

The installed Intel compilers support at least the relevant parts of recent OpenMP standards. The compiler recognizes OpenMP directives if you supply the command line option -openmp or -qopenmp. This is also required for the link step.

MPI

Although the cluster is basically able to support many different MPI versions, we maintain and recommend to use Intel MPI. Intel MPI supports different compilers (GCC, Intel). If you use Intel compilers, the appropriate intelmpi module is loaded automatically upon loading the intel64 compiler module. The standard MPI scripts mpif77, mpif90, mpicc and mpicxx are then available. By loading a intelmpi/201X.XXX-gnu module instead of the default intelmpi, those scripts will use the GCC.

Further Information

History

The cluster was originally delivered end of 2006 by companies Bechtle and HP, with 180 compute-nodes, each with two Xeon 5160 “Woodcrest” chips (4 cores) running at 3.0 GHz with 4 MB Shared Level 2 Cache per dual core, 8 GB of RAM and 160 GB of local scratch disk and a half-DDR/half-SDR high speed infiniband-network. The cluster was expanded to 212 nodes within a year. However, those nodes were replaced over time and turned off one by one. None of these nodes remain today. At that time it was the main cluster at RRZE, intended for distributed-memory (MPI) or hybrid parallel programs with medium to high communication requirements. It also was the first cluster at RRZE that employed a parallel filesystem (HP SFS) with a capacity of 15 TB and an aggregated parallel I/O bandwidth of > 900 MB/s. That filesystem was retired in 2012.

row of racks with servers — The woody cluster in 2006

The system entered the November 2006 Top500 list on rank 124 and in (November 2007) was ranked number 329.

In 2012, 40 single socket compute nodes with Intel Xeon E3-1280 processors (4-core “SandyBridge”, 3.5 GHz, 8 GB RAM and 400 GB of local scratch disk) have been added (w10xx nodes). These nodes are only connected by GBit Ethernet. Therefore, only single-node (or single-core) jobs are allowed in this segment. These nodes have been shutdown in October 2020.

In 2013, 72 single socket compute nodes with Intel Xeon E3-1240 v3 processors (4-core “Haswell”, 3.4 GHz, 8 GB RAM and 900 GB of local scratch disk) have been added (w11xx nodes). These nodes are only connected by GBit Ethernet. Therefore, only single-node jobs are allowed in this segment. These nodes replaced three racks full of old w0xxx-nodes, providing significantly more compute power at a fraction of the power usage.

In 2016/2017, 64 single socket compute nodes with Intel Xeon E3-1240 v5 processors (4-core “Skylake”, 3.5 GHz, 32 GB RAM and 900 GB of local scratch disk) have been added (w12xx/w13xx nodes). Only single-node jobs are allowed in this segment.

In autumn 2019, 112 single socket compute nodes with Intel Xeon E3-1240 v6 processors (4-core “Kabylake”, 3.7 GHz, 32 GB RAM and 900 GB of local scratch SSD) have been added (w14xx/w15xx nodes). Only single-node jobs are allowed in this segment, too.

Although Woody was originally a system that was designed for running parallel programs using significantly more than one node, the communications network is pretty weak compared to our other clusters and today’s standards. It is therefore now intended for running single node jobs. In other words, you cannot reserve single cores, the minimum allocation is one node. In the w10xx segment, also single cores can be requested as an exception.

Woody(-old) throughput cluster (Tier3)

Access, User Environment, and File Systems

Access to the machine

File Systems

Node-local storage $TMPDIR

Batch Processing

OpenMP

MPI

Further Information

History

Node-local storage `$TMPDIR`