• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
NHR@FAU
  • FAUTo the central FAU website
Suche öffnen
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz

NHR@FAU

Navigation Navigation close
  • News
  • About us
    • People
    • Funding
    • BayernKI
    • NHR Compute Time Projects
    • Tier3 User Project Reports
    • Support Success Stories
    • Annual Reports
    • NHR@FAU Newsletters
    • Previous Events
    • Jobs
    Portal About us
  • Research
    • Research Focus
    • Publications, Posters & Talks
    • Performance Tools and Libraries
    • NHR PerfLab Seminar
    • Projects
    • Workshops
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures & Seminars
    • Tutorials & Courses
    • Monthly HPC Café and Beginner’s Introduction
    • Theses
    • Student Cluster Competition
    Portal Teaching & Training
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • HPC User Training
    • HPC System Utilization
    Portal Systems & Services
  • FAQ

NHR@FAU

  1. Home
  2. Systems & Services
  3. Systems, Documentation & Instructions
  4. Job monitoring with ClusterCockpit

Job monitoring with ClusterCockpit

In page navigation: Systems & Services
  • Systems, Documentation & Instructions
    • Getting started with HPC
    • Job monitoring with ClusterCockpit
    • NHR application rules - NHR@FAU
    • HPC clusters & systems
    • SSH - Secure Shell access to HPC systems
    • File systems
    • Batch Processing
    • Software environment
    • Special applications, and tips & tricks
  • Support & Contact
  • HPC User Training
  • HPC System Utilization

Job monitoring with ClusterCockpit

Introduction

System monitoring of cluster systems is a crucial task for system administrators but there is also the users of these systems are interested in a part of the collected metrics. For users there are also additional metrics of interest like floating-point performance or memory bandwidth. The NHR@FAU provides job-specific monitoring for clusters already for quite some time but with the installation of the Fritz and Alex cluster, the whole system has been re-created. The development is led by NHR@FAU but also other NHR centers are contributing by enhancing or just using (and therefore testing) the framework. The whole stack is called ClusterCockpit and contains multiple components:

  1. Node agent on each compute node: cc-metric-collector
  2. In-memory short-term and file-based long-term storage: cc-metric-store
  3. Webfrontend with authentication for all users: cc-backend

Setup at NHR@FAU

The main point of access for the users is monitoring.nhr.fau.de. For authentication the HPC account is required, not the IDM account.

Integrated clusters:

  • Fritz
  • Alex
  • Woody
  • Meggie

Scope of user accounts

Users can only see their own jobs in the monitoring.

Different Views

ClusterCockpit provides different views of the systems depending on the scope of your account.

Job list

The job list contains all currently running jobs with job information like requested resources and a limited set of plots that give a first impression of the quality of a job.

If you click on the job id on the left, the job-specific page with more information and plots is shown.

It takes a few minutes after job start that it is shown in the list of running jobs.

User section

In the user section, each user can check the history of the jobs including some statistics.

Tag section

Users can enrich the information of a job with tags, a key/value pair, describing the job. In the tag section, you can select tags and get a list of all jobs with the requested tags.

Reporting problems with ClusterCockpit

If you have problems with the setup at NHR@FAU, please contact the common support hpc-support@fau.de.

For general questions about ClusterCockpit and it’s development, there are two separate matrix chats:

  • ClusterCockpit General
  • ClusterCockpit Develop

Dr. Jan Eitzinger

Head of Software & Tools

Erlangen National High Performance Computing Center
Software & Tools

  • Phone number: +49 9131 85-28911
  • Email: jan.eitzinger@fau.de

Thomas Gruber

Development LIKWID, ClusterCockpit & MachineState

Erlangen National High Performance Computing Center
Software & Tools Division

  • Phone number: +49 9131 85-28911
  • Email: thomas.gruber@fau.de
Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
  • RSS Feed
Up