GROMACS 4.6 Philip Fowler, 18th October 201323rd September 2018 GROMACS is a scientific code designed to simulate the dynamics of small boxes of stuff, that usually contain a protein, water, perhaps a lipid bilayer and a range of other molecules depending on the study. It assumes that all the atoms can be represented as points with a mass and an electrical charge and that all the bonds can modelled using simple harmonic springs. There are some other terms that describe the bending and twisting of molecules and all of these, when combined with two long range terms, which take into account the repulsion and attraction between electrical charges, allow you to calculate the force on any atom due to the positions of all the other atoms. Once you know the force, you can calculate where the atom will be a short time later (often 2 fs) but of course the positions have changed so you have to recalculate the forces. And so on. Anyway, I use GROMACS a lot in my research and the most recent major version, 4.6, was released in January 2012. In this post I’m going to briefly describe my experience with some of the improvements. First off, so much has changed that I think it would have been more accurate to call this GROMACS 5.0. For example, version 4.6 is a lot faster than version 4.5. I typically use three different benchmarks when measuring the performance; one is an all-atom simulation of a bacterial peptide transporter in a lipid bilayer (78,033 atoms). The other two are both coarse-grained models of a lipid bilayer using the MARTINI forcefield – the difference is one has 6,000 lipids (137,232 beads), the other 54,000 (2,107,010 beads). Ok, so how much faster is version 4.6? It is important here to bear in mind that GROMACS was already very fast since a lot of effort had been put into optimising the loops that the code spends most of its time running. Even so, version 4.6 is between 20-120% faster when using either of the first two benchmarks, and in some cases even faster. How? Well, it seems the developers have completely re-written those loops using SIMD commands. One important consequence of this is that it is vital to use the best compiler and, since you have to specify which SIMD instruction sets to use, you may need several different versions of the key binary, mdrun. For example, you may want a version compiled using AVX SIMD instruction sets for recent CPUs, but also a version compiled using an older SSE SIMD instruction set. The latter will run on newer architectures, but it will be slower. You must never run a version compiled with no SIMD instruction sets as this can be 10x slower! The other big performance improvement is that GROMACS 4.6 now uses GPUs seamlessly. The calculations are shared between any GPUs and the CPUs and GROMACS will even shift the load to try and share it equally. Erik Landahl, one of the GROMACS developers, gave an interesting NVIDA webinar on this subject in April 2013. A GPU here just means a reasonable consumer graphics card, such as an NVIDIA GTX680, that has compute capability of 2.0 or higher. So, how much performance boost do we see? I typically see a boost of 2.1-2.7x for the atomistic benchmark and 1.4-2.2x for the first, smaller coarse-grained benchmark. Just for fun, you can try running a version of GROMACS compiled with no SIMD instructions with a GPU (and without a GPU) and then you can get a performance increase of 10x. Before I finish, I was given some good advice on running GROMACS benchmarks. Firstly, make sure you use the -noconfout mdrun option since this prevents it from writing a final .gro file as this takes some time. Secondly setup a .tpr file that will run for a long (wallclock) time even on a large number of cores and then use the -resethway option in combination with a time limit, such as -maxh 0.25, as this would then reset the timers after 7.5 min and record how many steps were calculated between 7.5 and 15 minutes. From experience a bit of time spent writing some good BASH scripts to automatically setup, run and analyse the benchmarking simulations really pays off in the long run. In future posts I’ll talk about the scaling of GROMACS 4.6 (that is where the third benchmark comes in) and also look at the GPU performance in a bit more detail. Share this: Click to share on X (Opens in new window) X Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Mastodon (Opens in new window) Mastodon Related computing molecular dynamics
antimicrobial resistance GPAS 17th May 202113th October 2021 I’ve been working on this for the last few months and very happy that we… Share this: Click to share on X (Opens in new window) X Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Mastodon (Opens in new window) Mastodon Read More
antimicrobial resistance Updating the Grammar for Antimicrobial Resistance Catalogues 18th July 202418th July 2024 This blog updates an old (and now out of date) post describing the grammar we’ve… Share this: Click to share on X (Opens in new window) X Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Mastodon (Opens in new window) Mastodon Read More
computing CECAM Macromolecular simulation software workshop 14th July 2015 I’m co-organiser of this slightly-different CECAM workshop in October 2015 at the Forschungszentrum Jülich, Germany. Rather than following the… Share this: Click to share on X (Opens in new window) X Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Mastodon (Opens in new window) Mastodon Read More