VASP
Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modeling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.
Availability / Target HPC systems
VASP requires an individual license.
Notes
- Parallelization and optimal performance:
- (try to) always use full nodes (PPN=20 for Meggie)
- NCORE=5/10 & PPN=20 results in optimal performance in almost all cases, in general NCORE should be a divisor of PPN
- OpenMP parallelization is supposed to supersede NCORE
- use kpar if possible
- Compilation:
- use -Davoidalloc
- use Intel toolchain and MKL
- in case of very large jobs with high memory requirements add ‘ -heap-arrays 64’ to Fortran flags before compilation (only possible for Intel ifort)
- Filesystems:
- Occasionally VASP user reported failing I/O on Meggie’s $FASTTMP (/lxfs), this might be a problem with Lustre and Fortran I/O. Please try to use the fix described here: https://github.com/RRZE-HPC/getcwd-autoretry-preload
- Since VASP does not do parallel MPI I/O, $WORK is more appropriate than $FASTTMP
- For medium sized jobs, even node local /dev/shm/ might be an option
- Walltime limit:
- VASP can only be gracefully stopped by creating the file “STOPCAR” https://www.vasp.at/wiki/index.php/STOPCAR automatic creation is shown in the example scripts below
Sample job scripts
parallel Intel MPI job on Meggie
#! /bin/bash -l # #SBATCH --nodes=4 #SBATCH --tasks-per-node=20 #SBATCH --time=24:00:00 #SBATCH --job-name=my-vasp #SBATCH --mail-user=my.mail #SBATCH --mail-type=ALL #SBATCH --export=NONE unset SLURM_EXPORT_ENV#enter submit directory cd $SLURM_SUBMIT_DIR
#load modules module load intel module load intelmpi module load mkl
#set PPN and pinning export PPN=20 export I_MPI_PIN=enable
#define executable: VASP=/path-to-your-vasp-installation/vasp
#create STOPCAR with LSTOP 1800s before reaching walltimelimit lstop=1800
#create STOPCAR with LABORT 600s before reaching walltimelimit labort=600
#automatically detect how much time this batch job requested and adjust the # sleep accordingly TIMELEFT=$(squeue -j $SLURM_JOBID -o %L -h) HHMMSS=${TIMELEFT#*-} [ $HHMMSS != $TIMELEFT ] && DAYS=${TIMELEFT%-*} IFS=: read -r HH MM SS <<< $TIMELEFT [ -z $SS ] && { SS=$MM; MM=$HH; HH=0 ; } [ -z $SS ] && { SS=$MM; MM=0; } #timer for STOP = .TRUE. SLEEPTIME1=$(( ( ( ${DAYS:-0} * 24 + 10#${HH} ) * 60 + 10#${MM} ) * 60 + 10#$SS - $lstop )) echo "Available runtime: ${DAYS:-0}-${HH:-0}:${MM:-0}:${SS}, sleeping for up to $SLEEPTIME1, thus reserving $lstop for clean stopping/saving results"
#timer for LABORT = .TRUE. SLEEPTIME2=$(( ( ( ${DAYS:-0} * 24 + 10#${HH} ) * 60 + 10#${MM} ) * 60 + 10#$SS - $labort )) echo "Available runtime: ${DAYS:-0}-${HH:-0}:${MM:-0}:${SS}, sleeping for up to $SLEEPTIME2, thus reserving $labort for clean stopping/saving results"
(sleep ${SLEEPTIME1} ; echo "LSTOP = .TRUE." > STOPCAR) & lstoppid=!$ (sleep ${SLEEPTIME2} ; echo "LABORT = .TRUE." > STOPCAR) & labortpid=!$
mpirun -ppn $PPN $VASP
pkill -P $lstoppid pkill -P $labortpid
Hybrid OpenMP/MPI job (multi-node) on Fritz
#!/bin/bash -l
#SBATCH –nodes=2
#SBATCH –ntasks-per-node=4
#SBATCH –cpus-per-task=18
#SBATCH –partition=multinode
#SBATCH –time=01:00:00
#SBATCH –export=NONE
unset SLURM_EXPORT_ENV
module load vasp6/6.3.2-hybrid-intel-impi-AVX512
# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
echo “OMP_NUM_THREADS=$OMP_NUM_THREADS”
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
export OMP_PLACES=cores
export OMP_PROC_BIND=true
srun /apps/vasp6/6.3.2-hybrid-intel-AVX512/bin/vasp_std >output_filename
Performance tests for VASP-6 on fritz
The calculations were performed using the binary file from module vasp6/6.3.2-hybrid-intel-impi-AVX512 for the ground state structure of sodium chloride, namely rocksalt, downloaded from The Materials Project. In order to enforce the same number of SCF iterations and ensure convergence, which in turn could be relevant to the tasks and calculations considered by VASP, we set NELMIN=26 and NELM=26.
- System I:
- Single point calculations with PBE exchange-correlation functional
- Supercell containing 64 atoms
- 2x2x2 k-points
- ALGO=FAST, ENCUT=500, PREC=High, LREAL=Auto, LPLANE=True, NCORE=4, KPAR=4
Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. Tref /(T*nodes). Tref is the time of calculations on one node with only MPI, higher is better.
- System II:
- Single point calculations with PBE exchange-correlation functional
- Supercell containing 512 atoms
- No k-points
- ALGO=FAST, ENCUT=500, PREC=High, LREAL=Auto, LPLANE=True, NCORE=4
Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. Tref /(T*nodes). Tref is the time of calculations on one node with only MPI, higher is better.
- System III:
- Single point calculations with HSE06 exchange-correlation functional
- Supercell containing 64 atoms
- 2x2x2 k-points
- ALGO=Damped, TIME=0.4, ENCUT=500, PREC=High, LREAL=Auto, LPLANE=True, NCORE=4, KPAR=4
- Please note that in the hybrid OpenMP/MPI execution of VASP for HSE06 calculations, the default stack memory for OpenMP is insufficient and you should explicitly increase the value, otherwise your run might crash. The calculations in this section are run with “export OMP_STACKSIZE=500m” added to the submit script.
Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. Tref /(T*nodes). Tref is the time of calculations on one node with only MPI, higher is better.
Further information
Mentors
- Dr. A. Ghasemi, NHR@FAU, hpc-support@fau.de
- T. Klöffel, RRZE, hpc-support@fau.de
- AG A. Görling (Chair of Theoretical Chemistry)