The power of modern GROMACS versions on modern hardware
Background
Characterization of molecular diffusion in electrolyte systems [note: copied from application]
The main aim of the intended research project is to gain a fundamental understanding of the diffusive mass transport in electrolyte systems. For this, DLS experiments are combined with EMD simulations to characterize the diffusive mass transport by the determination of molecular diffusion coefficients of systematically selected electrolyte systems. Here, diffusion coefficients for mixtures containing ions and molecules with varying size, structure, and charge distribution are intended to be studied over a broad temperature and composition range at macroscopic thermodynamic equilibrium. For electrolytes containing ions with a permanent electrostatic charge, special focus lies on the evaluation of the influence of intermolecular electrostatic interactions on the diffusive process. While EMD simulations are able to quantify this contribution, they strongly rely on the underlying force fields (FFs). In the further development and testing of existing FFs in this project, a special focus lies on the combination of compatible FFs for the different species of interest allowing a sound description of the interactions relevant in electrolyte mixtures. To access accurate diffusion coefficients by DLS, which also serve for the validation and development of the simulations, the technique should be further developed for the reliable measurement of diffusivities in electrolyte systems.
HPC necessity as given in the application [note: copied from application]
To obtain a statistically sound enough data point, comparatively large simulation times about 200–300 nano seconds must be performed. For this task, the GROMACS software package, which has shown to be one of best performing MD simulations software package for HPC systems, is used. Since GROMACS employs enhanced parallelization algorithms, MD simulations can efficiently be implemented in HPC clusters. In addition, GROMACS developers have been able to offload time-consuming calculation like non-bonded interactions to GPUs, making it more attractive to be used in HPC clusters like Alex. However, to improve the performance of our simulations in ns/day, benchmarking must be performed. Thus, to improve the performances and/or scalability of our simulations, we are highly interested in seeking the support of NHR@FAU. Based on benchmark simulations, HPC clusters Alex or Fritz will be selected for production simulations. The thermophysical properties of interest for the current research project, which are, namely, density, mutual diffusivities, thermodynamic factor and viscosities, are obtained in a post-processing manner from trajectories. For this, post-processing scripts, which take up to about 45 minutes, were developed and parallelized on multiple threads/nodes for an efficient use of compute time. These postprocessing scripts are planned to be run on the RRZEs woody cluster.
Problem definition
The usage of molecular dynamics (MD) simulations in the proposed project appears to be reasonable; however, several aspects should be taken into careful consideration beforehand. Firstly, the addition of new molecules to already existing force fields (FFs) or the combination of parameters from different FFs requires a careful evaluation and testing of the new parameter set. The parametrization strategy as well as the compatibility of the individual FFs must be in all cases considered. Secondly, the simulation of charged species requires reasonable settings for the simulation parameters to handle the electrostatic and the van der Waals interactions, as well as neighbor list searches (e.g., cutoffs, frequency). Arbitrary choices can lead to numerically unstable simulations and unphysical results. Thirdly, GROMACS has a plethora of implemented methods for MD simulations ranging from different timestep integration algorithms over several barostats and thermostats to controlling constraints placed on bonds. In addition, only some of these MD methods are implemented to run on GPU; thus, temperature coupling methods or constraint settings could be adjusted to gain performance without influencing the dynamics of the system. And finally, GROMACS runs out of the box on a large variety of hardware with relatively good performance but there are certain run time parameters that can speed up the simulation significantly if chosen correctly. Furthermore, depending on post-production analyses, output control parameters have to be adjusted to avoid re-running of simulations in case certain data are needed.
Methods
To support the group on the aforementioned points as best as possible, simulation and post-processing protocols, previous publications and related literature were requested. The setup of the production simulations was straightforward with 20,000 interaction sites, an integration time step of 2 fs, temperature and pressure setting in line with experimental setups, and a simulation length of about 300 ns. OPLS-AA was chosen as force field and bond-constraints were set on all-bonds using the LINCS algorithm. The equilibration step, performed with a 1 fs time step before the production, was split into two phases: 0.5 ns in the canonical ensemble with different initial velocities followed by 3.0 ns with isothermal-isobaric ensemble settings to attain the right pressure.
Main properties of interest, calculated in post-processing, were: Maxwell-Stefan diffusivities (requires center of mass (COM) positions of the molecules), thermodynamic factor (requires radial distribution functions related to COM of molecules), and viscosity (requires output from energies files and position of atoms).
The group sent four scientific papers detailing simulation parameters and calculation approaches:
- Klein et.al.[1], published by the group, describes the determination of mutual diffusivity directly from MD simulations. This paper gives all related fundamentals as well as the simulation protocols together with involved parameters.
- García-Melgarejo et.al.[2] describes a systematic way to develop OPLS non-bonded FF parameters of solvents used in electrolytes, enabling some level of reliable determination of transport properties.
- Kirshnamoorthy et.al.[3] is relevant for studying structural as well as some dynamic properties of electrolyte solution.
- Liu et.al.[4] discusses the Maxwell-Stefan diffusivities of electrolytes systems consisting of ionic liquids.
Since the literature provided convincing evidence that the protocols and force field choice lead to reliable results for this type of simulation, it was decided to focus on finding the optimal thermostat and best constraint settings to yield maximum performance.
The group agreed to send simulation files that allowed us to setup different types of simulation benchmarks: (1) all-bond constraints with Nosé-Hoover thermostat, (2) all-bond constraints with v-rescale thermostat, (3) h-bond constraints with Nosé-Hoover thermostat, and (4) h-bond constraints with v-rescale thermostat. The reason for running four benchmarks is that not all available algorithms in a MD step are ported to GPUs, i.e., data has to be exchanged between GPU and CPU more often. For common simulation settings, calculations concerning the short-range non-bonded interactions, long-range non-bonded interactions (Particle Mesh Ewald), bonded interactions, and execution of update and constraints can be off-loaded to a GPU.
In this case, simulations conducted with the Nosé-Hoover thermostat requires the calculation of update and constraints on the CPU, which can lead to overall performance decrease due to a larger communication overhead. Furthermore, the constraints setting in the GROMACS molecular simulations parameter file controls which bonds in the topology will be converted to rigid holonomic constraints; most often, constraints are only put on bonds containing hydrogen atoms because these bonds vibrate with a very short oscillation period of less than 10 fs. Converting all bonds between atoms to constraints might ensure atomic positions in relation to each other but leads to longer computation times and, depending on the force field used, yield unreliable results. Covalent bonds are usually parameterized so that atoms have an optimal distance depending on the chemical environment.
Support & Performance analysis
Four benchmarks on one A40 GPU and on up to three Fritz nodes with Gromacs 2021.5 were performed. According to this blogpost on GROMACS performance on different GPU types, systems with a relatively small number of atoms (20,000 to 30,000 atoms) do not benefit from the professional A100 GPUs, thus we did not run benchmarks on this type of hardware. The data from the benchmark runs is listed below.
allbonds_nosehoover | hbonds_nosehoover | allbonds_vrescale | hbonds_vrescale | |
A40 | 480 ns/day | 586 ns/day | 759 ns/day | 775 ns/day |
1 Fritz node | 382 ns/day | 417 ns/day | 369 ns/day | 418 ns/day |
2 Fritz nodes | 581 ns/day | 598 ns/day | 562 ns/day | 608 ns/day |
3 Fritz nodes | 645 ns/day | 724 ns/day | 645 ns/day | 720 ns/day |
Notably, a slight decrease in performance is expected due to the overhead of writing to output files, as the mdp files of the production runs require “nstenergy=10” (energies are written every 10 MD steps). The scaling on Fritz appears reasonable for all benchmarks; however, the performance of the v-rescale benchmarks is higher on the A40 compared to a total of three Fritz nodes. The performance of the hbonds benchmarks is higher compared to the allbonds benchmarks and that is consistent over all choices of hardware and thermostats. Both equilibration steps are lower in performance on CPU and GPU (data not shown); allbonds_nosehoover shows the lowest performance values on GPU, followed by the hbonds_nosehoover. On Fritz, all benchmark versions reach a similar performance for the equilibration steps. For running on the A40 GPUs, we recommended setting the following environment variables to increase performance:
export GMX_GPU_PME_DECOMPOSITION=1
export GMX_USE_GPU_BUFFER_OPS=1
export GMX_DISABLE_GPU_TIMING=1
export GMX_ENABLE_DIRECT_GPU_COMM=1
export GMX_GPU_PME_PP_COMMS=true
export GMX_GPU_DD_COMMS=true
For the Nosé-Hoover thermostat, the update step has to be put back on the CPU; for v-rescale, the update step can be offloaded to the GPU; the benchmarks on GPU ran for 200,000 steps. The command used is:
gmx mdrun -s <benchmark> -nb gpu -pme gpu -bonded gpu -update <gpu or cpu> -ntomp 16 -ntmpi 1 -pin on -pinstride 1
For running on Fritz nodes, the optimal runtime parameters have to be checked for each system and these are different for the 4 benchmark systems and the number of used nodes, so it would be necessary finding these parameters after deciding which constraints and thermostat will be used in this project. The command used on CPU is:
gmx_mpi mdrun -npme <npme> -resethway -nsteps 30000 -noconfout -maxh 0.1 -ntomp <ntomp> -dlb yes
Moreover, links to an example Gromacs script on Alex, to the general Slurm documentation, and to the batch processing on Alex were provided. Since the group has previous experience with running MD simulation on CPU on the Emmy cluster, they were informed about changes of available environment variables between Torque and Slurm. A comprehensive overview of how the commands have changed from Torque to Slurm can be found at the Woody transition announcement page.
As a final measure, the group was informed that the latest version of the GROMACS 2021 series was available on Alex (2021.6, gromacs/2021.6-gcc11.2.0-mkl-cuda), that it was released mid 2022 and fixed a (very rare) bug in the pressure coupling of simulations if “‑update gpu” was enabled. The results of the benchmark tests should be not affected by switching to the 2021.6 release.
References
[1] Klein, Tobias, et al., J. Phys. Chem. B 125, 5100–5113 (2021).
[2] García-Melgarejo, et al., J. Phys. Chem. B 124, 4741–4750 (2020).
[3] Kirshnamoorthy, et al., Phys. Chem. Chem. Phys. 20, 25701–25715 (2018).
[4] Liu, X., et al., J. Phys. Chem. B 115, 8506–8517 (2011).