Systems

A big water pipe has burst in the basement of the building where the w2xxx nodes are housed at around 12:30. As a result, the cooling water that used to be in the pipes was forming a pretty little swimming pool in the basement, and the cooling for the w2xxx nodes (i.e. the nodes with 32 Intel IceLak...

Kategorie: HPC, Systems

There will be a scheduled downtime of our HPC systems: Monday, March 11, from 06:50 to 16:00, affecting ONLY Fritz Wednesday, March 20, from 00:00 to end of the day, affecting ALL our clusters including frontends and fileservers For the March 20 downtime: Reason is general maintenan...

Kategorie: HPC, Systems

On Friday, February 9th @ 19:30, one of the network file server (NFS) stopped working. As a consequence, many login and compute nodes became unresponsive as a major file system ($WORK for NHR projects) could not be accessed. The NFS server (atuin) has been rebooted and operation seems to be stabl...

Kategorie: HPC, Systems

Last night (Monday, August 28 @ 23:00) one of the network file server (NFS) stopped working. As a consequence, many login and compute nodes became unresponsive as a major file system ($WORK for NHR projects) cannot be accessed. We are working on resolving the issue. Batch processing on Alex an...

Kategorie: HPC, Systems

As already announced in the HPC Cafe in January for summer, we will now reinstall the RTX2080Ti and V100 nodes in TinyGPU with Ubuntu 20.04 (instead of Ubuntu 18.04) and integrate them into the Slurm batch system of the RTX3080/A100 GPU nodes. First RTX2080Ti and V100 nodes have already been reinstalled and moved to Slurm in the past days. The remaining nodes will follow gradually until end of October to allow a smooth transition.

Kategorie: HPC, Systems