GROMACS

GROMACS (GROningen MAchine for Chemical Simulations) is a molecular dynamics package primarily designed for simulations of proteins, lipids and nucleic acids.

Availability / Target HPC systems

  • TinyGPU: best value if only one GPU is used per run – use the latest versions of GROMACS as they allow more and more offloading to the GPU
  • parallel computers: experiment to find proper setting for -npme
  • throughput cluster Woody: best suited for small systems

New versions of GROMACS are installed by RRZE upon request.

Notes

GROMACS can produce large amounts of data in small increments:

  • Try to reduce the frequency and amount of data as much as possible.
  • It also may be useful to stage the generated output in the node’s RAMdisk (i.e. in the directory /dev/shm/) first and only copy it back to e.g. $WORK once just before quitting the job.
  • The high output frequency of small amounts of data is NOT suitable for $FASTTMP.
  • For serial and single-node simulations you have to use gmx mdrun;
    for multi-node simulations, the binary to use with mpirun is mdrun_mpi or mdrun_mpi+OMP. See the sample scripts below!

Sample job scripts

#!/bin/bash -l
#PBS -lnodes=1:ppn=4,walltime=10:00:00
#PBS -N my-gmx
#PBS -j eo

cd $PBS_O_WORKDIR

module load gromacs/2019.3-mkl

### the argument of -maxh should match the requested walltime!
gmx mdrun -maxh 10 -s my.tpr

### try automatic restart (adapt the conditions to fit your needs)
if [ -f confout.gro ]; then
   echo "*** confout.gro found; no re-submit required"
   exit
if [ $SECONDS -lt 1800 ]; then
   echo "*** no automatic restart as runtime of the present job was too short"
   exit
fi
qsub $0

#!/bin/bash -l
#PBS -lnodes=4:ppn=40,walltime=10:00:00
#PBS -N my-gmx
#PBS -j eo

cd $PBS_O_WORKDIR

module load gromacs/2019.3-mkl-IVB

### 1) The argument of -maxh should match the requested walltime!
### 2) Performance often can be optimized if -npme # with a proper number of pme tasks is specified; 
###    experiment of use tune_mpe to find the optimal value.
###    Using the SMT threads can sometimes be beneficial, however, requires testing.
/apps/rrze/bin/mpirun [-pinexpr S0:0-19@S1:0-19] mdrun_mpi [-npme #] -maxh 10 -s my.tpr

### try automatic restart (adapt the conditions to fit your needs)
if [ -f confout.gro ]; then
   echo "*** confout.gro found; no re-submit required"
   exit
if [ $SECONDS -lt 1800 ]; then
   echo "*** no automatic restart as runtime of the present job was too short"
   exit
fi
qsub $0

#!/bin/bash -l
#PBS -lnodes=1:ppn=4,walltime=10:00:00
#PBS -N my-gmx
#PBS -j eo

cd $PBS_O_WORKDIR

module load gromacs/2019.3-mkl-CUDA101

### 1) the argument of -maxh should match the requested walltime!
### 2) optional arguments are: -pme gpu -npme 1
###                            -bonded gpu
gmx mdrun -maxh 10 -s my.tpr

### try automatic restart (adapt the conditions to fit your needs)
if [ -f confout.gro ]; then
   echo "*** confout.gro found; no re-submit required"
   exit
if [ $SECONDS -lt 1800 ]; then
   echo "*** no automatic restart as runtime of the present job was too short"
   exit
fi
qsub $0

The performance benefit of using multiple GPUs is often very low! You get much better throughout if you run multiple independent jobs on a single GPUs as shown above.

If you request specific GPU types and their nodes support SMT (which currenty is the case for the v100 and rtx2080ti nodes), request ppn=16:smt as SMT typically gives a small performance boost.

Even if using multiple GPUs do not use the MPI-parallel version (mdrun_mpi) but the thread-mpi version (gmx mdrun) of Gromacs. -ntmpi # usually should match the number of GPUs available.

#!/bin/bash -l
#PBS -lnodes=1:ppn=16,walltime=10:00:00
#PBS -N my-gmx
#PBS -j eo

cd $PBS_O_WORKDIR

module load gromacs/2019.3-mkl-CUDA101

### 1) The argument of -maxh should match the requested walltime!
### 2) Typical optional arguments are: -pme gpu -npme 1
###                                    -bonded gpu
gmx mdrun -ntmpi 4 -ntomp 4 -maxh 10 -s my.tpr
### if :smt is requested, the following line should typically be used
# gmx mdrun -ntmpi 4 -ntomp 8 -maxh 10 -s my.tpr

### try automatic restart (adapt the conditions to fit your needs)
if [ -f confout.gro ]; then
   echo "*** confout.gro found; no re-submit required"
   exit
if [ $SECONDS -lt 1800 ]; then
   echo "*** no automatic restart as runtime of the present job was too short"
   exit
fi
qsub $0

This is an example script for running a meta-dynamic simulation with 32 walkers with Gromacs patched with Plumed on four of our RTX2080TI GPUs. Transfer to other GPU hardware is possible, but may require adjustment of settings (e.g. MPS-server [y/n], flags for mpirun and Gromacs program flags).

Please note: The run-input-file (*.tpr) for each walker needs to be in its own directory and it must be given the same name inside that directory.

#!/bin/bash -l
#PBS -l nodes=1:ppn=16:rtx2080ti:smt,walltime=24:00:00

module load gromacs/2020.2-mkl-CUDA102-plumed2.6.1

# change into submit directory if that's the working directory
cd $PBS_O_WORKDIR

TPR=name

# not necessary, but makes sure the directories are in correct order
directories=`echo dir{0..9} dir{1..2}{0..9} dir3{0..1}`

# these variables are needed to start the MPS-server
# Select a location that’s accessible to the given $UID
export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps.$PBS_JOBID 
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log.$PBS_JOBID
# Start the daemon.
nvidia-cuda-mps-control -d 
export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps.$PBS_JOBID 
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log.$PBS_JOBID

# these variables need to be placed directly before the Gromacs invocation
# these variables are needed for halo exchange and 
# optimized communication between the GPUs 
export GMX_GPU_DD_COMMS=true 
export GMX_GPU_PME_PP_COMMS=true 
export GMX_GPU_FORCE_UPDATE_DEFAULT_GPU=true

# --oversubscribe is necessary, otherwise mpirun aborts
# -s is needed, otherwise gromacs complains
# -pme -nb -update -bonded make sure everything is offloaded to the GPU
# -pin -pinstride order the threads on the CPU, otherwise there's 
#  wild chaos on CPU
# -plumed ../plumed_in.dat needs to point to where the file is relative 
#  to the directory the .tpr is in
mpirun -np 32 --oversubscribe gmx_mpi mdrun -v -s $TPR -pme gpu -nb gpu -update gpu -bonded gpu -pin on -pinstride 1 -plumed ../plumed_in.dat -multidir ${directories} -cpi $TPR

# this will stop the MPS-server
echo quit | nvidia-cuda-mps-control

Further information

Mentors

  • Dr. A. Kahler, RRZE, hpc-support@fau.de
  • AG Böckmann (Professur für Computational Biology, NatFak)