Continuous Integration / Gitlab Cx

HPC4FAU and NHR@FAU are happy to provide continuous integration for HPC-related software projects developed on one of the Gitlab instances at RRZE (gitlab.rrze.fau.de or gitos.rrze.fau.de). Access to the Gitlab Runner is restricted. Moreover, every job on the HPC systems has to be associated with an HPC user account.

The Cx jobs run on the Testcluster provided by HPC4FAU and NHR@FAU.

Prerequisites:

  1. Valid HPC account at HPC4FAU and NHR@FAU (Getting started guide)
  2. SSH key pair for authentication of the Gitlab Runner. Main information about SSH access is provided here. We recommend creating a separate SSH key pair without passphrase for Gitlab CI only, e.g. by running ssh-keygen -t ed25519 -f id_ssh_ed25519_gitlab, which generates id_ssh_ed25519_gitlab and id_ssh_ed25519_gitlab.pub.
  3. Request Cx usage by mail at the HPC user support hpc-support@fau.de with
    • your HPC account name
    • the URL to the repository
    • the public key (like id_ssh_ed25519_gitlab.pub)

Preparing Gitlab repositories:

  1. Configure SSH authentification for the HPC Cx service. In the repository go to Settings -> CI/CD -> Variables and add two variables:
    1. AUTH_USER: The name of your HPC account.
    2. AUTH_KEY: The content of the private SSH key file (like id_ssh_ed25519_gitlab). The key is not shown in the logs but is visible for all maintainers of the project!
  2. Enable the HPC runner for the repository at Settings -> CI/CD -> Runnerand flip the switch at Enable shared runners for this project. The HPC Runner has the testcluster tag.

Define jobs using the HPC Cx service

Jobs for CI/CD in Gitlab are defined in the file .gitlab-ci.yml in the top level of the repository. In order to run on the HPC system, the jobs need the tag testcluster. The tag tells the system on which runner the job can be executed.

job:
  tags:
    - testcluster
  [...]

To define where and how the job is run, the following variables are available:

Variable Value Changeable Description
SLURM_PARTITION work NO Specify the set of nodes which should be used for the job. We currently allow Cx jobs only in the work partition
SLURM_NODES 1 NO Only single-node jobs are allowed at the moment.
SLURM_TIMELIMIT 120 YES (values 1120 allowed) Specify the maximal runtime of a job
SLURM_NODELIST phinally YES to any hostname in the system, see here Specify the host for the the job.

You only need to specify a host in SLURM_NODELIST if you want to test different architecture-specific build options or optimizations.

In order to change one of the settings globally, you can overwrite them globally for all jobs:

SLURM options can be set globally in the variables section to apply to all jobs:

variables:
  SLURM_TIMELIMIT: 60
  SLURM_NODELIST: rome1

job1:
  [...]
  tags:
    - testcluster

job2:
  [...]
  tags:
    - testcluster

The options can also be specified for each job individually. This will overwrite the global settings.

job:
  [...]
  variables:
    SLURM_NODELIST: rome1
  tags:
    - testcluster

The Cx system uses the salloc command to submit the jobs to the batch system. All available environment variables for salloc can be applied here. An example would be SLURM_MAIL_USER to get notified by the system.

If you want to run on the frontend node testfront instead of a compute node, you can specify the variable NO_SLURM_SUBMIT: 1. This is commonly not what you want!

It may happen that your CI job fails if the node is occupied with other jobs for more than 24 hours. In that case, simply restart the CI job.

Examples:

stages:
	- build
	- test

build:
	stage: build
	script:
		- export NUM_CORES=$(nproc --all)
		- mkdir $CI_PROJECT_DIR/build
		- cd $CI_PROJECT_DIR/build
		- cmake ..
		- make -j $NUM_CORES
	tags:
		- testcluster
	artifacts:
		paths:
			- build

test:
	stage: test
	variables: 
		SLURM_TIMELIMIT: 30
	script:
		- cd $CI_PROJECT_DIR/build
		- ./test
	tags:
		- testcluster

variables:
	SLURM_CONSTRAINT: "hwperf"

stages:
	- prepare
	- build
	- test

prepare:
	stage: prepare
	script:
		- echo "Preparing on frontend node..."
	variables:
		NO_SLURM_SUBMIT: 1
	tags:
		- testcluster

build:
	stage: build
	script:
		- export NUM_CORES=$(nproc --all)
		- mkdir $CI_PROJECT_DIR/build
		- cd $CI_PROJECT_DIR/build
		- cmake ..
		- make -j $NUM_CORES
	tags:
		- testcluster
	artifacts:
		paths:
			- build

test:
	stage: test
	variables: 
		SLURM_TIMELIMIT: 30
	script:
		- cd $CI_PROJECT_DIR/build
		- ./test
	tags:
		- testcluster

variables:
	SLURM_NODELIST: broadep2
	SLURM_TIMELIMIT: 10

stages:
	- build
	- test

build:
	stage: build
	script:
		- export NUM_CORES=$(nproc --all)
		- mkdir $CI_PROJECT_DIR/build
		- cd $CI_PROJECT_DIR/build
		- cmake ..
		- make -j $NUM_CORES
	tags:
		- testcluster
	artifacts:
		paths:
			- build

test:
	stage: test
	variables: 
		SLURM_TIMELIMIT: 30
	script:
		- cd $CI_PROJECT_DIR/build
		- ./test
	tags:
		- testcluster

stages:
	- build
	- benchmark

.build:
	stage: build
	script:
		- export NUM_CORES=$(nproc --all)
		- mkdir $CI_PROJECT_DIR/build
		- cd $CI_PROJECT_DIR/build
		- cmake ..
		- make -j $NUM_CORES
	tags:
		- testcluster
	variables: 
		SLURM_TIMELIMIT: 10
	artifacts:
		paths:
			- build

.benchmark:
	stage: benchmark
	variables: 
		SLURM_TIMELIMIT: 20
	script:
		- cd $CI_PROJECT_DIR/build
		- ./benchmark
	tags:
		- testcluster

# broadep2

build-broadep2:
   extends: .build
   variables:
      SLURM_NODELIST: broadep2

benchmark-broadep2:
   extends: .benchmark
   dependencies:
      - build-broadep2
   variables:
      SLURM_NODELIST: broadep2

# naples1

build-naples1:
   extends: .build
   variables:
      SLURM_NODELIST: naples1

benchmark-naples1:
   extends: .benchmark
   dependencies:
      - build-naples1
   variables:
      SLURM_NODELIST: naples1

 

Disclaimer

Be aware that

  • the private SSH key is visible by all maintainers of your project. Best is to have only a single maintainer and all others are developers.
  • the CI jobs can access data ($HOME, $WORK, …) of the CI user.
  • BIOS and OS settings of Testcluster nodes can change without notification.

 

Mentors

  • T. Gruber, RRZE/NHR@FAU, hpc-support@fau.de
  • L. Werner, Chair of Computer Science 10, Chair of System Simulation
  • Prof. Dr. Harald Köstler (NHR@FAU and Chair of System Simulation)