13. March 2025

Downtime of all our clusters on Monday, 17.03.

There will be a scheduled downtime of all the HPC systems of NHR@FAU starting on

Monday, 17.03. at 9:00 and expected to last until about 17:00.

As usual, Jobs that would collide with the downtime will automatically be postponed until the downtime is over. Frontends and filesystems will be available most of the time, but there will be short interruptions on all clusters, and longer interruptions on Fritz and Alex.

Reason for the downtime is generic hardware and software maintenance.

We will keep this post updated with progress reports.

Update 17.03. 15:15: Batch processing has been resumed on Woody.

Update 17.03. 16:00: Batch processing has been resumed on Alex.

Update 17.03. 16:40: Batch processing has been resumed on Fritz.

Update 17.03. 16:50: Problems were reported with atuin not behaving properly after the maintenance. These were fixed. However, if your jobs started between 15:15 and 16:50, they may have experienced problems accessing /home/atuin (which is $WORK for some users).

Update 17.03. 16:55: Batch processing has been resumed on TinyGPU (except for the dedicated tg10x nodes)

Update 17.03. 18:30: Batch processing has been resumed on part of TinyFat. The rest will have to wait until tomorrow.

Update 17.03. 18:30: Batch processing on Meggie will probably not be resumed before tomorrow due to issues with Slurm.

Update 17.03. 20:30: Batch processing on Meggie has been resumed.

Update 18.03. 09:40: Batch processing has been resumed on all of TinyFat.

Update 18.03. 09:40: We’re finally done with the maintenance.

Last update: 2025-03-18 - 10:49 AM