gromacs on GPUs
Nam Pho
Director for Research Computinginfo
During the January 12, 2021 mox maintenance period long overdue package updates will be applied. The most user impactful upgrade is the GPU driver from to 418.40.04 to 460.27.04 that will allow for CUDA 11 support (up from CUDA 10).
The second most widely used GPU-enabled workflow on HYAK (besides machine learning) is molecular dynamics (MD) so we wanted to test one of the most popular MD codes, gromacs [source], and ensure this driver upgrade wouldn't negatively impact our researchers. I couldn't find gromacs compiled with GPU support currently in our module collection so I used it as an opportunity to create one for you all, read on!
warning
This is an excercise to demonstrate the support for molecular dynamics on GPUs as a proof-of-concept. Scientific verification of the software compile options (e.g., single-precision) and its results is the responsibility of the researcher.
#
Using gromacsI'll start with the end result for those of you who just want to use it but following that I'll dive into the nuts and bolts of how we created the module so you can perform additional optimizations.
This is a GPU-enabled version of gromacs so we need a GPU first (can verify with nvidia-smi
).
#
gromacs-2020.4 moduleOnce we have a GPU we use modules to load gromacs-2020.4 and all its required dependencies (e.g., CUDA11).
All packages are sub-commands of the gmx
binary so you can verify the module.
#
Test simulation of LysozymeI used a tutorial from the gromacs website here to show it runs processes on GPU(s). The tutorial runs an MD simulation on a lysozyme but that's the extent of my study there. The commands below are a summary of the tutorial with a note that the genbox
subcommand is now replaced by solvate
.
The final gromacs command below starts the fun, the documentation suggests it will automatically identify the GPUs available to send work to them. However, there are more explicit GPU arguments we encourage you to explore.
You can ssh
into the node you're using in a separate window to have a parallel nvidia-smi
command run so we can monitor the load on the GPU(s).
We can see a process occuping each GPU so it works! At least, gromacs uses GPUs...the GPUs themselves weren't stressed heavily and that requires the user to increase the number of rank processes and match that with available GPUs. You can do this by adding arguments to the gmx mdrun
command but by default it did 2 ranks per GPU it detected, which is not a lot.
#
(Optional) Compile NotesYou need CUDA11, GNU Compiler, and OpenBLAS library for the version I put together but I was focused on a proof-of-concept and not squeezing out every last drop of performance. There's a lot of further optimization to be done and that's left as an exercise for the reader:
- Try the Intel compiler and see if it provides further optimization for non-GPU parts of the workflow.
- Try other math libraries (e.g., MKL) and see if it speeds things up.
- Add in MPI support if you want to use multiple GPUs across multiple nodes.
- Add in modules (e.g., PLUMED).
- Other stuff I can't think of with compile flags [here].
#
Download SourceFrom the login node I staged a folder in the modules directory.
Grab regression tests.
Download gromacs-2020.4 [source].
#
Get a GPU and CodeI used the shared build-gpu
node for an interactive session but if you are affiliated with a group that has their own you can use that instead.
Once you get a session with GPU (you can run nvidia-smi
to confirm you see one). Extract regression tests.
Do the same for the gromacs code and enter the directory.
#
Pre-requisite ModulesModules loaded individually for readability but you could load all modules in one command. Get a refresher on modules here.
#
CompileI created a subdirectory within the source to compile.
Use cmake
to create the Makefile
. Note: if you copy-and-paste the cmake
command below you will have to modify the paths referenced for your environment.
With the Makefile
ready you can run make -j 4
and replace 4 with however many cores you have in your session then make install
. I created the module file separately so you can load it with module load gromacs/2020.4-cuda11.1
and run the single gmx
binary.