Systems

Today at about 10:45 a.m., a large-area power outage in Erlangen has brought all our clusters down. All running jobs have been terminated. Some frontends have been switched off manually to lower the load on the uninterruptible power supply infrastructure. We are working to get the systems running...

Kategorie: Allgemein, HPC, Systems

A big water pipe has burst in the basement of the building where the w2xxx nodes are housed at around 12:30. As a result, the cooling water that used to be in the pipes was forming a pretty little swimming pool in the basement, and the cooling for the w2xxx nodes (i.e. the nodes with 32 Intel IceLak...

Kategorie: HPC, Systems

There will be a scheduled downtime of our HPC systems: Monday, March 11, from 06:50 to 16:00, affecting ONLY Fritz Wednesday, March 20, from 00:00 to end of the day, affecting ALL our clusters including frontends and fileservers For the March 20 downtime: Reason is general maintenan...

Kategorie: HPC, Systems

On Friday, February 9th @ 19:30, one of the network file server (NFS) stopped working. As a consequence, many login and compute nodes became unresponsive as a major file system ($WORK for NHR projects) could not be accessed. The NFS server (atuin) has been rebooted and operation seems to be stabl...

Kategorie: HPC, Systems

Last night (Monday, August 28 @ 23:00) one of the network file server (NFS) stopped working. As a consequence, many login and compute nodes became unresponsive as a major file system ($WORK for NHR projects) cannot be accessed. We are working on resolving the issue. Batch processing on Alex an...

Kategorie: HPC, Systems