Unscheduled downtime of most of our systems since Friday, Oct. 10 until Oct. 16
Due to complications encountered during maintenance work on the cooling infrastructure, said infrastructure is currently operating with severely reduced capacity.
To ensure cooling availability for critical infrastructure, we will have to shut down most of the clusters. Jobs that are already running will probably be able to finish, but no new jobs will be run.
Affected clusters are:
- parts of Alex
- Meggie
- large parts of TinyGPU (RTX2080Ti are available)
- TinyFat
- Testcluster
- small parts of Woody
There is more bad news: The planned repair date is currently Thursday, October 16!
Moreover, Fritz & Helma are also down due to scheduled infrastructure work, cf. https://hpc.fau.de/2025/10/06/scheduled-downtime-of-fritz-and-helma-clusters-from-october-06/
Update 2025-10-11: 1/4 of Alex is back
Update 2025-10-14: 1/2 of Alex is back
Update 2025-10-16 15:30: Cooling is back at full capacity. We will now power up the clusters.
Update 2025-10-16 15:45: Alex and Woody are back.
Update 2025-10-16 17:00: Meggie, parts of TinyFAT (all but the 256 GB Broadwell nodes), parts of TinyGPU (only RTX3080 and A100 and some Jupyternodes; RTX2080TI&V100 are not available), Testcluster
Update 2025-10-17 08:15: RTX2080Ti nodes in TinyGPU are back
Update 2025-10-17 18:00: V100 nodes in TinyGPU and Alex are back
Update 2025-10-19 15:00: 256 GB Broadwell nodes of TinyFAT are back

