Outage of HPC services due to file system issues [SOLVED]
Last night (Monday, August 28 @ 23:00) one of the network file server (NFS) stopped working. As a consequence, many login and compute nodes became unresponsive as a major file system ($WORK for NHR projects) cannot be accessed.
We are working on resolving the issue.
Batch processing on Alex and Fritz has been halted in the morning (Tuesday, August 29 @ 07:30). Jobs already running or started after yesterday @ 23:00 may be impacted.
UPDATE: Batch processing on Alex and Fritz has been resumed at 08:45.