molecular dynamics

GROMACS in DOCKER: First Steps

DOCKER is cool. But what is it? From the DOCKER webpage

Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.

I like to think of it as somewhere in between virtualenv and a virtual machine. Although the DOCKER website is focussed on commercial software development, and so talks about building and shipping applications, DOCKER could be of huge use to myself as a computational scientist. For example, rather than make a series of input files for my simulations available, along with a list of which software versions I used, I could instead simply make a DOCKER image available that contains all the compiled software I used along with all the input files. Then anyone should, in principle, be able to reproduce my research.

 

Make no mistake: reproducibility is, rightly, a coming trend. But surely all scientific results are reproduced?. Turns out if the experiment or simulation was difficult to do the answer is not so much. And when concerted efforts have been made to reproduce results reported in high impact journals, the answer is often, well, disconcerting at the very least. In a now famous study, Begley & Ellis from a pharmaceutical company, Amgen, reported that their in-house scientists were unable to reproduce 47 out of 53 landmark experimental studies in haematology and oncology. They were looking at novel, exciting findings which are more likely to be challenging to reproduce (although the pressure to over-sell is also stronger). I have no reason to think computational studies are much better. The past few years there have been a flurry of paperscomments and best practices. One can even now make a DOCKER image available via GitHub with a DOI so it can be cited independently of an article.

As I’d like to do this in the future, I’ve started to play with DOCKER and GROMACS. Since my workstation is a Mac, the DOCKER host has to run within a lightweight Linux virtual machine. First I installed DOCKER. Then I opened a DOCKER Quick Terminal and checked everything was working by downloading the hello-world image and running it

$ docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world

4276590986f6: Pull complete 
a3ed95caeb02: Pull complete 
Digest: sha256:4f32210e234b4ad5cac92efacc0a3d602b02476c754f13d517e1ada048e5a8ba
Status: Downloaded newer image for hello-world:latest

Hello from Docker.
This message shows that your installation appears to be working correctly.

Let’s get try something more real, like an Ubuntu 16.04 Server image.

$ docker run -it ubuntu bash

This drops me inside the Ubuntu image. Let’s compile GROMACS!

root@4b511a41dbf0:/# apt-get update -y
root@4b511a41dbf0:/# apt-get upgrade -y
root@4b511a41dbf0:/# apt-get install build-essential cmake wget openssh-server -y
root@4b511a41dbf0:/# wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.1.2.tar.gz
root@4b511a41dbf0:/# tar zxvf gromacs-5.1.2.tar.gz 
root@4b511a41dbf0:/# cd gromacs-5.1.2
root@4b511a41dbf0:/# mkdir build
root@4b511a41dbf0:/# cd build
root@4b511a41dbf0:/# cmake .. -DGMX_BUILD_OWN_FFTW=ON
root@4b511a41dbf0:/# make 
root@4b511a41dbf0:/# make install
root@4b511a41dbf0:/# cd

Now let’s copy over a TPR file to see how fast GROMACS is within a DOCKER container

root@4b511a41dbf0:/# scp fowler@somewhere.else:benchmark.tpr .
root@4b511a41dbf0:/# source /usr/local/gromacs/bin/GMXRC
root@4b511a41dbf0:/# gmx mdrun -s benchmark -resethway -noconfout -maxh 0.1

Note that this is a single CPU DOCKER image. I was worried that since the DOCKER host was running inside a Linux VM it would be slow compared to running natively in Mac OS X so I ran three repeats of each and DOCKER was only 1.7% slower…

To save this DOCKER image locally, quit the session

$ docker commit -m "Installed GROMACS 5.1.2 for benchmarking" -a "Philip W Fowler" c5f1cf30c96b philipwfowler/gromacs-5.1.2
$ docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
philipwfowler/gromacs-5.1.2   latest              73e44c120bfa        6 seconds ago       809 MB
ubuntu                        latest              c5f1cf30c96b        2 weeks ago         120.8 MB
hello-world                   latest              94df4f0ce8a4        3 weeks ago         967 B

Done. More soon on multiple cores, can-we-use-the-GPU? and using DOCKER on Amazon Web Services.

Setting up a GROMACS cluster

Recently I’ve moved to the John Radcliffe hospital and my old lab kindly let me have some old servers that were switched off. This pushed me to learn how to setup them up as a compute cluster with a scheduler for running GROMACS jobs. I’ve wanted to learn this for years, having used many clusters myself, but haven’t plucked up the courage until now.

This post is a detailed walk-through on how I chose to do this. During the process I did a lot of Googling and have written the post I would have liked to have found; long, comprehensive and a bit verbose.

Since it is long so let’s break it down into four tasks.

Fingerless gloves and a woolly hat can be useful in a cold, noisy machine room

Fingerless gloves and a woolly hat can be useful in a cold, noisy machine room

1. Install Ubuntu on each machine

2. Setup networking, including sharing directories on the headnode via NFS

3. Using environment modules, compile GROMACS into one of the shared directories so all the machines in the cluster can run lmx

4. Install SLURM

How to setup a Gramble

This is a Gramble, which of course is short for a GROMACS Bramble, or, in other words, a Raspberry Pi 2 model B cluster running GROMACS. Given the ARM processor in a Raspberry Pi 2 does not allow SIMD instructions like the more complex (and expensive) Intel chips, why would I want to do such a thing? Well, I wanted to learn how to setup a simple compute cluster.

And this is what I did. Unless stated otherwise you need to do this on both machines (or how ever many you are using).

1. Install Ubuntu 14.04 LTS Server onto the microSD cards

Each Raspberry Pi 2 runs off a microSD card; on a computer with a microSD slot (I used an iMac) download the Ubuntu 14.04 LTS image and copy it onto the microSD card as described here. Then you simply push it back into the slot on the Raspberry Pi and power it up. Note that Ubuntu does not run on the model A and, at the time of writing, only ran on the model B. If your microSD card is bigger than 2GB you might want to resize the partition.

2. Update

First, let’s update the installed software and also install the ssh server so we can remotely connect.

sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get install openssh-server

3. Setup the network

Now my setup was a bit strange. I was using an old Apple Time Capsule I had; both Raspberry Pis were connected to this via ethernet cables and the Time Capsule itself was in “Extend Wireless Network” mode since our main wireless router is somewhere else in the house. Ideally, I’d want a dual-homed headnode with one public IP address and then a private network for communication within the cluster. Instead each of the two Raspberry Pis have their own IP that is dynamically assigned by my router, but this will do for now.

sudo nano /etc/hosts

so it reads

127.0.0.1 localhost
192.168.0.12 rasp0
192.168.0.18 rasp1

Also edit the hostname

sudo nano /etc/hostname

so it matches /etc/hosts

rasp0

and reboot

sudo reboot

4. Add an MPI user

We will need a special user that can log in without passwords to all the nodes that SLURM will use later on. As I understand it, giving it a uid less than 1000 stops the user appearing in any login GUI.

sudo adduser mpiuser --uid 999

5. Install NFS

We will share a folder on the headnode with all the compute nodes using the NFS protocol. This means we’ll only need to install applications on the headnode and they will be accessible from any compute node. Also this is where the GROMACS output files will be written.

on the headnode (rasp0)

sudo apt-get install nfs-kernel-server

on the compute node (rasp1)

sudo apt-get install nfs-common

on the headnode (rasp0) add the following to /etc/exports

/home/mpiuser *(rw,sync,no_subtree_check)
/apps *(rw,sync,no_subtree_check)

This will export the folders /apps and /home/mpiuser on rasp0 to all the compute nodes (in this case just rasp1). You need to make sure all folders shared by NFS exist on both machines. So on both machines

sudo mkdir /apps

You don’t need to mkdir /home/mpiuser as creating this user will have automatically created a home directory for it. Now on the headnode

sudo service nfs-kernel-server start

On the compute node (rasp1),

sudo ufw allow from 192.168.1.0/24
sudo mount rasp0:/home/mpiuser /home/mpiuser
sudo mount rasp0:/apps /apps

The first line opens a port in the firewall. although I’m definitely sure I needed to do this. The last two manually mount the NFS share from the headnode (rasp0). To set it up so this happens automatically

sudo nano /etc/fstab

and add

rasp0:/home/mpiuser /home/mpiuser nfs
rasp0:/apps /apps nfs

then we can force a remount via

sudo mount -a

6. Create an SSH key pair to allow passwordless login

Because we now have /apps and /home/mpiuser shared with all nodes of the cluster (ok, just rasp1, but you know what I mean) we can simply on the headnode create an ssh keypair as mpiuser and it will be shared with all the compute nodes. So on rasp0

su mpiuser
ssh-keygen -t rsa
cd .ssh/
cat id_rsa.pub >> authorized_keys

I didn’t use a passphrase during key generation. I expect this is a bad thing and I did read you could use a key chain, but as this is a toy cluster I’m going to stick my fingers in my ears and pretend I didn’t read that. If you haven’t created an ssh keypair before, it is fairly simple – it creates a public and a private key. This is described in more detail here. The key things are that the private key (.ssh/id_rsa) should only be readable by the mpiuser and no-one else. In Linux-land, this means it should have permissions of 400 – this is how it will be created. Secondly, any remote machine will allow a passwordless login if the public key for that user is in .ssh/authorized_keys; this explains the last line above.

Let’s test it. Since we are already the mpiuser and we are on rasp0

ssh rasp1

Should automatically log you into rasp1. If you try the same thing as the default ubuntu user it will prompt you for your password as that user doesn’t have an ssh keypair setup.

7. Compile GROMACS

As we are thinking about NFS, let’s compile GROMACS in /apps so the gmx binary can be run from any of the compute node(s). We need a few things before we begin.

sudo apt-get install build-essential cmake

The first package contains the compilers you’ll need to, well, compile GROMACS and cmake is the build tool GROMACS uses. So as the mpiuser,

cd /apps
mkdir src
cd src/
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.1.2.tar.gz
cd gromacs-5.1.2/
mkdir build-gcc48
cd build-gcc48
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX=‘/apps/gromacs/5.1.2’ -DBUILD_SHARED_LIBS=off
make -j 4
sudo make install

Note the unusual -DCBUILD_SHARED_LIBS flag in the GROMACS cmake command; this is to get around an error when compiling GROMACS on the Raspberry Pi. You shouldn’t normally need this flag. Now the make command will take at least ten minutes, so put the kettle on.

Because of dynamic linking, you’ll also need to install the compilers on the compute nodes via

sudo apt-get install build-essential

I suspect using environment modules might avoid this issue; I’m going to play with these and if I can get it to work will write another post.

Once you’ve done this, then on any machine you should be able to run GROMACS via

source /apps/gromacs/5.1.2/bin/GMXRC
gmx mdrun

8. Install a cluster management and job scheduling system (SLURM)

Despite the fact the machines I’ve used in the past have tended to use PBS or SGE (and so my fingers can type qstat really, really fast), I chose to use SLURM as

  • it is available as an Ubuntu package
  • our University high performance computing centre have recently started using it and they recommended it
  • it is actively developed and I can’t work out, or at least remember for more than five minutes, what is going on with SGE
  • it has documentation! and tutorials!
  • it is open source (GPL2)
  • I liked the name

To install on rasp0

sudo apt-get install slurm-llnl

This also installs MUNGE as a pre-requisite (see the next section). Now SLURM appeared to want to use /usr/bin/mail and complained when it couldn’t find it so I also installed

sudo apt-get install mailutils

which drops you into a setup screen and I chose the “local” option.

Also on rasp1

sudo apt-get install slurm-llnl

9. Get MUNGE working

MUNGE creates and validates credentials and SLURM uses it. On the headnode

sudo /usr/sbin/create-munge-key

This creates a key /etc/munge/munge.key. Now copy this key to /etc/munge/ on all nodes (you may need to fiddle with permissions etc to use the NFS share). There appears to be a bug with Ubuntu and MUNGE, but the workaround is to do the following on all nodes

sudo nano /etc/default/munge

and add the line

OPTIONS=“—force"

now start the service

sudo service munge start

Check it is running

ps -e | grep munge

10. Get SLURM working

This was the bit I wasn’t looking forward to as job schedulers have, frankly, scared me. But it turns out this was one of the easiest steps. If we are the mpiuser on rasp0. SLURM comes with a very simple configuration file that we can edit.

cp /usr/share/doc/slurm-llnl/examples/slurm.conf.simple.gz .
gunzip slurm.conf.simple.gz
nano slurm.conf.simple

All I did was change the lines so they read

ControlMachine=rasp0
..
NodeName=rasp[0-1] Procs=4 State=UNKNOWN
PartitionName=test Nodes=rasp[0-1] Default=YES MaxTime=INIFINITE State=UP

You’ll notice I’ve identified rasp0 as both the ControlMachine (i.e. headnode) and also a Node (compute node) belonging to the test Partition. On a regular cluster you probably don’t want the headnode also being a compute node, but I only had two Raspberry Pis so I thought why not? This also shows the syntax for referring to multiple nodes. If you want a more complex configuration an online configuration tool is provided. There is also an even more complicated online configurator. A note of caution: these may not work with the version of SLURM installed by apt-get (2.6.5) since the current version is 15.08. That doesn’t mean 2.6.5 is old; they’ve changed the numbering system recently.

Finally copy the file to the right place on all nodes

 sudo cp slurm.conf.simple /etc/slurm-llnl/slurm.conf

and (on the headnode, rasp0)

sudo service slurm start

on the compute node

sudo slurmd -c

Test!

srun -N1 hostname

or

sinfo

11. Submit a GROMACS job to the queue

I’m going to assume you have prepared a TPR file called md.tpr and have copied it into /home/ubuntu (and we are now the default user, ubuntu).

Let’s do some simple benchmarking – remember a Raspberry Pi has 4 cores. So first, let’s create a series of TPR files

cp md.tpr md-1.tpr
cp md.tpr md-2.tpr
cp md.tpr md-4.tpr

Now let’s create some SLURM job submission files. This is the one for running on two cores – you’ll need to change the --cpus-per-task, the --job-name SBATCH flags and the -deffnm and -ntmpi GROMACS flags depending on the number of cores.

 sudo nano md-2.slurm.sh

and copy in

 #!/bin/bash
 #SBATCH --nodes=1
 #SBATCH --ntasks-per-node=1
 #SBATCH --cpus-per-task=2
 #SBATCH --time=00:15:00
 #SBATCH --job-name=md-2

 source /apps/gromacs/5.1.2/bin/GMXRC

 srun gmx mdrun -deffnm md-2 -ntmpi 2 -ntomp 1 -maxh 0.1 -resethway -noconfout

This will run for 6 minutes, resetting the GROMACS timers after 3 minutes. It won’t write out a final GRO file as this can affect the timings. Hopefully you’ll find that, whilst useful and fun machines, Raspberry Pis are really slow at running GROMACS! To submit the jobs

sbatch md-2.slurm.sh

To check the queue we can issue

squeue

and to cancel we can use scancel.

Ta da!

New Publication: Predicting affinities for peptide transporters

PepT1 is a nutrient transporter found in the cells that line your small intestine. It is not only responsible for the uptake of di- and tai-peptides, and therefore much of your dietary proteins, but also the uptake of most β-lactam antibiotics. This serendipity ensures that we can take (many of) these important drugs orally.

Our ultimate goal is to develop the capability to predict modifications to drug scaffolds that will improve or enable their uptake by PepT1, thereby improving their oral bioavailability.

In this paper, just published online in the new journal Cell Chemical Biology (and free to download, thanks to the Wellcome Trust), we show that it is possible to predict how well a series of di- and tai-peptides bind to a bacterial homologue of PepT1 using a hierarchical approach that combines an end-point free energy method with thermodynamic integration. Since there is no structure of PepT1, we then tried our method on a homology model we have published in 2015. We found that method lost its predictive power. By studying a range of homology models of intermediate quality, we showed that it is highly likely an experimental structure of hPepT1 will be required for in silico accurate predictions of transport.

This is the second paper that Firdaus Samsudin has published as part of his DPhil here in Oxford.

GROMACS on AWS: compiling against CUDA

If you want to compile GROMACS to run on a GPU Amazon Web Services EC2 instance, please first read these instructions on how to compile GROMACS on an AMI without CUDA. These instructions then explain how to install the CUDA toolkit and compile GROMACS against it.

The first few steps are loosely based on these instructions, except rather than download the NVIDIA driver, we shall download the CUDA toolkit since this includes an NVIDIA driver. First we need to make sure the kernel is updated

sudo yum install kernel-devel-`uname -r`
sudo reboot

Safest to do a quick reboot here. Assuming you are in your HOME directory, move into your packages folder.

cd packages/

And download the CUDA toolkit (version 7.5 at present)

wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.run
sudo /bin/bash cuda_7.5.18_linux.run

It will ask you to accept the license and then asks you a series of questions. I answer Yes to everything except installing the CUDA samples. Now add the following to the end of your ~/.bash_profile using a text editor

export PATH; PATH="/usr/local/cuda-7.5/bin:$PATH"
export LD_LIBRARY_PATH; LD_LIBRARY_PATH="/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH"

Now we can build GROMACS against the CUDA toolkit. I’m assuming you’ve already downloaded a version of GROMACS and probably installed a non-CUDA version of GROMACS (so you’ll already have one build directory). Let’s make another build directory. You can call it what you want, but some kind of consistent naming can be helpful. The -j 4 flag assumes you have four cores to compile on – this will depend on the EC2 instance you have deployed. Obviously the more cores, the faster, but GROMACS only takes minutes, not hours.

mkdir build-gcc48-cuda75
cd build-gcc48-cuda75
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/5.0.7-cuda/  -DGMX_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
make -j 4
sudo make install

To load all the GROMACS tools into your $PATH, run this command and you are done!

source /usr/local/gromacs/5.0.7-cuda/bin/GMXRC

If you run this mdrun binary on a GPU instance it should automatically detect the GPU and run on it, assuming your MDP file options support this. If it does you will see this zip by in the log file as GROMACS starts up

1 GPU detected:
  #0: NVIDIA GRID K520, compute cap.: 3.0, ECC:  no, stat: compatible

1 GPU auto-selected for this run.
Mapping of GPU to the 1 PP rank in this node: #0

Will do PME sum in reciprocal space for electrostatic interactions.

Depending on the size and forcefield you are using you should get a speedup of at least a factor two, and realistically three, using a GPU in combination with the CPUs. For example, see these benchmarks.

GROMACS on AWS: compiling GCC

These are some quick instructions on how to build a more recent version of GCC than is provided by the devel-tools package on the Cent OS based Amazon Linux AMI. (currently GCC 4.8.3) You may, for example, wish to use a more recent version to compile GROMACS – that is my interest. If so, then these instructions assume you have done all the steps up to, but not including, compiling GROMACS in this post. Compiling GCC needs several GB of disk space so if you use the default 8GB for an EC2 AMI it will run out of disk space; increasing this to 12 GB is sufficient.

First let’s find out what versions of GCC are available.

[ec2-user@ip-172-30-0-42 ~]$ svn ls svn://gcc.gnu.org/svn/gcc/tags | grep gcc | grep release
...
gcc_4_8_3_release/
gcc_4_8_4_release/
gcc_4_8_5_release/
gcc_4_9_0_release/
gcc_4_9_1_release/
gcc_4_9_2_release/
gcc_4_9_3_release/
gcc_5_1_0_release/
gcc_5_2_0_release/
gcc_5_3_0_release/

As you can see when I wrote this 5.3.0 was the most recent stable version, so let’s try that one. I’m going to compile everything inside a folder called packages/ so let’s create that then use subversion to check out version 5.3.0 (this is going to download a lot of files so will take a minute or two)

[ec2-user@ip-172-30-0-42 ~]$ mkdir ~/packages
[ec2-user@ip-172-30-0-42 ~]$ cd ~/packages
[ec2-user@ip-172-30-0-42 packages]$ svn co svn://gcc.gnu.org/svn/gcc/tags/gcc_5_3_0_release/
A    gcc_5_3_0_release/config-ml.in
A    gcc_5_3_0_release/libitm
...
A    gcc_5_3_0_release/fixincludes/fixopts.c
A    gcc_5_3_0_release/install-sh
A    gcc_5_3_0_release/ylwrap
 U   gcc_5_3_0_release
Checked out revision 232268.
[ec2-user@ip-172-30-0-42 packages]$ cd gcc_5_3_0_release/

GCC needs some prerequisites which are installed by this script.

[ec2-user@ip-172-30-0-42 gcc_5_3_0_release]$ ./contrib/download_prerequisites 
--2016-01-12 13:24:23--  ftp://gcc.gnu.org/pub/gcc/infrastructure/mpfr-2.4.2.tar.bz2
       => ‘mpfr-2.4.2.tar.bz2’
Resolving gcc.gnu.org (gcc.gnu.org)... 209.132.180.131
...
isl-0.14.tar.bz2    100%[=====================>]   1.33M   693KB/s   in 2.0s   

2016-01-12 13:24:39 (693 KB/s) - ‘isl-0.14.tar.bz2’ saved [1399896]

Go up a level, make a build directory and move there.

[ec2-user@ip-172-30-0-42 gcc_5_3_0_release]$ cd ..
[ec2-user@ip-172-30-0-42 packages]$ mkdir gcc_5_3_0_release_build/
[ec2-user@ip-172-30-0-42 packages]$ cd gcc_5_3_0_release_build/

Now we are in a position to compile GCC 5.3.0. This took about 50 min using all eight cores of a c3.2xlarge instance, so this is a good moment to go and have lunch. Note that since the instance I am compiling on has 8 virtual CPUs, I can use the -j 8 flag to tell make to use up to 8 threads during compilation which will speed things up. If you are using a micro instance, just omit the
-j 8 (but good luck as that would take a long time).

[ec2-user@ip-172-30-0-42 gcc_5_3_0_release_build]$ ../gcc_5_3_0_release/configure && make -j 8 && sudo make install && echo "success" && date
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether ln works... yes
...

Hopefully you now have a newer version of GCC to compile binaries with. With any luck it might even give you a performance boost.

GROMACS on AWS: Performance and Cost

So we have created an Amazon Machine Image (AMI) with GROMACS installed. In this post I will examine the sort of single core performance you can expect and much this is likely to cost compared to other compute options you might have.

Benchmark

To test the different types of instances you can deploy our GROMACS image on, we need a benchmark system to test. For this I’ve chosen a peptide MFS transporter in a simple POPC lipid bilayer solvated by water. This is very similar to the simulations found in this paper. Or to put it another way: 78,000 atoms in a cube, most of which are water, some belong to lipids and the rest, protein. It is fully atomistic and is described using the CHARMM27 forcefield.

Computing Resources Tested

I tried to use a range of compute resources to provide a good comparison for AWS. First, and most obviously, I used my workstation on my desk, which is a late-2013 MacPro which has 12 Intel Xeon cores. In our department we also have a small compute cluster, each node of which has 16 cores. Some of these nodes also have a K20 GPU. Then I also have access to a much larger computing cluster run by the University. Unfortunately, since the division I am in has decided not to contribute to its running, I have to pay for any significant usage.

Rather than test all the different types of instances available on EC2, I tested an example from each of the current (m4) and older generation (m3) of non-burstable general purpose instances. I also tested an example from the latest generation of compute instances (c4) and finally the smaller instance from the GPU instances (g2).

Performance

fig-aws-gromacs-performance

The performance, in nanoseconds per day for a single compute core, is shown on the left (bigger is better).

One worry about AWS EC2 is that for a highly-optimised compute code, like GROMACS, performance might suffer due to the layers of virtualisation, but, as you can see, even the current generation of general purpose instances is as fast as my MacPro workstation. The fastest machine, perhaps unsurprisingly, is the new University compute cluster. On AWS, the compute c2 class is faster than the current general purpose m4 class, which in turn is faster than the older generation general purpose m3 class. Finally, as you might expect, using a GPU boosts performance by slightly more than 3x.

 

Cost

fig-aws-gromacs-costI’m going to do a “real” comparison. So if I buy a compute cluster and keep it in the department I only have to pay the purchase cost but none of the running costs. So I’m assuming the workstation is £2,500 and a single 16-core node is £4,000 and both of these have a five year lifetime. Alternatively I can use the university’s high performance computing clusters at 2p per core hour. This obviously is unfair on the university facility as this does include operational costs, like electricity, staff etc, and you can see that reflected in the difference in costs.

So AWS EC2 more or less expensive? This hinges on whether you use it in the standard “on demand” manner or instead get access through bidding via the market. The later is significantly cheaper but you only have access whilst your bid price is above the current “spot price” and so you can’t guarantee access and your simulations have to be able to cope with restarts. Since the spot price varies with time, I took the average of two prices at different times on Wed 13 Jan 2016.

As you can see AWS is more expensive per core hour if you use it “on demand”, but is cheaper than the university facility if you are willing to surf the market. Really, though we should be considering the cost efficiency i.e. the cost per nanosecond as this also takes into account the performance.

 

Cost efficiency

fig-aws-gromacs-efficiency

 

When we do this an interesting picture emerges: using AWS EC2 via bidding on the market is cheaper than using the university facility and can be as cheap as buying your own hardware even if you don’t have to pay the running costs. Furthermore, as you’d expect, using a GPU decreases cost and so should be a no-brainer for GROMACS.

Of course, this assumes lots of people don’t start using the EC2 market, thereby driving the spot price up…

GROMACS on AWS

In this post I’m going to show how I created an Amazon Machine Instance with GROMACS 5.0.7 installed for use in the Amazon Web Services cloud.

I’m going to assume that you have signed up for Amazon Web Services (AWS), created an Identity and Access Management (IAM) user (each AWS account can have multiple IAM users), created an SSH key pair for that user, downloaded it, given it an appropriate name with the correct permissions and placed it in. ~/.ssh. Amazon have a good tutorial that cover the above actions. One thing that confused me is if you already have an amazon.com or amazon.co.uk account then you can use this to signup to AWS. In other words, depending on your mood, you can order a book or 10,000 CPU hours of compute. I felt a bit nervous about setting up an account backed by my credit card – if you also feel nervous, then Amazon offer a Free Tier which permits you at present to use up to 750 hours a month, as long as you only use the smallest virtual machine instance (t2.micro). If you use more than this, or use a more powerful instance then you will be billed.

First, log in to your AWS console. This will have a strange URL like

https://123456789012.signin.aws.amazon.com/console

where 123456789012 is your AWS account number. You should get something that looks like this.

AWS Management Console

AWS Management Console

Next we need to create an EC2 (ElastiCloud) instance based on one of the standard virtual machine images and download and compile GROMACS on it. In the AWS Management Console, choosing “EC2” in the top left should bring you here

AWS EC2 dashboard

AWS EC2 dashboard

Now click the Blue “Launch Instance” button.

Step 1. Choose an Amazon Machine Instance (AMI).

Here we can choose one of the standard virtual machine images to compile GROMACS on. Let’s keep it simple and use the standard Amazon Linux AMI.

aws-ec2-1

Step 2. Choose an Instance Type.

The important thing to remember here is that the image we create can be run on any instance type. So if we want to compile on multiple cores to speed things up we can choose an instance with say 8 vCPUs, or if we don’t want to be billed and are willing to wait that we can choose the t2.micro instance. Let’s choose an c4.2xlarge instance which has 8 vCPUs. You can at this stage hit “Review and Launch” but it is worth checking the amount of storage allocated to the instance. So hit Next:Configure Instance Details. I’m not going to fiddle with these options. Hit Next:Add Storage.

aws-ec2-2

Step 4. Add storage.

What I have found here is if you use the version of gcc installed via yum (4.8.3) then 8 GB is fine, but if you want to compile a more recent version you will need at least 12 GB.
I’m going to accept the rest of the defaults for the rest of the steps so will click “Review and Launch” now.

aws-ec2-4

Step 7. Review instance Launch.

Check it all looks ok and hit “Launch”. This will bring up a window. Here it is crucial that you choose the name of the keypair you created and downloaded. As you need a different key pair for each IAM user for each Amazon Region, it is worth naming them carefully as you will otherwise rapidly get very confused. Also Amazon don’t let you download a key pair again so you have to be careful with them. You can see mine is called

PhilFowler-key-pair-euwest.pem

Which contains the name of my IAM user and the name of the AWS region it will work for, here EU West, which is Ireland. Hit Launch.

aws-ec2-7b

Launch Status

This window gives you some links on how to connect to the AWS instance. Hit View Instances””. It may take a minute or two for your instance to be created. During this time the status is given as “Initializing”. When it is finished, you can click on your new instance (you should have only one) and it will give you a whole host of information. We need the public IP address and the name of our SSH key pair so we can ssh to the instance (Note that the user by default is called ec2-user).

aws-instances-2

lambda 508 $ ssh -i "PhilFowler-key-pair-euwest.pem" ec2-user@54.229.73.128
The authenticity of host '54.229.73.128 (54.229.73.128)' can't be established.
ECDSA key fingerprint is SHA256:N+B3toLxLE3vRuuzLZWF44N9qb3ucUVVU/RD00W3iNo.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '54.229.73.128' (ECDSA) to the list of known hosts.

__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2015.09-release-notes/
11 package(s) needed for security, out of 27 available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-172-30-0-42 ~]$

Installing pre-requisites

Amazon Linux is based on CentOS so uses the yum package manager. You might be more familiar with apt-get if you use Ubuntu but the principles are similar. Worth following their recommendation and applying all the updates – this will spew out a lot of information to the terminal and asks you to confirm.

[ec2-user@ip-172-30-0-42 ~]$ sudo yum update
Loaded plugins: priorities, update-motd, upgrade-helper
Resolving Dependencies
--> Running transaction check
---> Package aws-cli.noarch 0:1.9.1-1.29.amzn1 will be updated
---> Package aws-cli.noarch 0:1.9.11-1.30.amzn1 will be an update
---> Package binutils.x86_64 0:2.23.52.0.1-30.64.amzn1 will be updated
---> Package binutils.x86_64 0:2.23.52.0.1-55.65.amzn1 will be an update
---> Package ec2-net-utils.noarch 0:0.4-1.23.amzn1 will be updated
...
sudo.x86_64 0:1.8.6p3-20.21.amzn1
vim-common.x86_64 2:7.4.944-1.35.amzn1
vim-enhanced.x86_64 2:7.4.944-1.35.amzn1
vim-filesystem.x86_64 2:7.4.944-1.35.amzn1
vim-minimal.x86_64 2:7.4.944-1.35.amzn1

Complete!

This instance is fairly basic and there is no version of gcc, cmake etc. But we can install them via yum

[ec2-user@ip-172-30-0-42 ~]$ sudo yum install gcc gcc-c++ openmpi-devel mpich-devel cmake svn texinfo-tex flex zip libgcc.i686 glibc-devel.i686
...
texlive-xdvi.noarch 2:svn26689.22.85-27.21.amzn1
texlive-xdvi-bin.x86_64 2:svn26509.0-27.20130427_r30134.21.amzn1
zziplib.x86_64 0:0.13.62-1.3.amzn1

Complete!

Next we need to add some the openmpi executables to the $PATH. These will only persist for this session; to make them permanent add them to the .bashrc.

export PATH=/usr/lib64/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib

Now we hit a potential problem. The version of gcc installed by yum is fairly old

[ec2-user@ip-172-30-0-42 ~]$ gcc --version
gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Having said that 4.8.3 should be good enough for GROMACS. I’ll push ahead using this version, but in a subsequent post I also detail how to download and install gcc 5.3.0.

Compiling GROMACS

First, let’s get the GROMACS source code using wget. I’m going to compile version 5.0.7 since I’ve got benchmarks for this one, but you could equally install 5.1.X.

[ec2-user@ip-172-30-0-42 ~]$ mkdir ~/packages
[ec2-user@ip-172-30-0-42 ~]$ cd ~/packages
[ec2-user@ip-172-30-0-42 packages]$ wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.0.7.tar.gz
[ec2-user@ip-172-30-0-42 packages]$ tar zxvf gromacs-5.0.7.tar.gz
[ec2-user@ip-172-30-0-42 packages]$ cd gromacs-5.0.7

Now let’s make a build directory, move there and then issue the cmake directive

[ec2-user@ip-172-30-0-42 gromacs-5.0.7]$ mkdir build-gcc48
[ec2-user@ip-172-30-0-42 gromacs-5.0.7]$ cd build-gcc48
[ec2-user@ip-172-30-0-42 build-gcc48]$ cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX='/usr/local/gromacs/5.0.7/

The compilation step will take a good few minutes on a single core machine, but as I’ve got 8 virtual CPUs to play with I can give make the “-j 8” flag which is going to speed things up.

[ec2-user@ip-172-30-0-42 build-gcc48]$ make -j 8
...
Building CXX object src/programs/CMakeFiles/gmx.dir/gmx.cpp.o
Building CXX object src/programs/CMakeFiles/gmx.dir/legacymodules.cpp.o
Linking CXX executable ../../bin/gmx
[100%] Built target gmx
Linking CXX executable ../../bin/template
[100%] Built target template

This took 90 seconds using all 8 cores. Now we can install the binary. Note that because I told cmake to install it in /usr/local/gromacs/5.0.7 so I can keep track of different versions, rather than just having /usr/local/gromacs

[ec2-user@ip-172-30-0-51 build-gcc48]$ sudo make install
...
-- Installing: Creating symbolic link /usr/local/gromacs/5.0.7/bin/g_velacc
-- Installing: Creating symbolic link /usr/local/gromacs/5.0.7/bin/g_wham
-- Installing: Creating symbolic link /usr/local/gromacs/5.0.7/bin/g_wheel

To add this version of GROMACS to your $PATH (add this to .bashrc to avoid doing this each time)

[ec2-user@ip-172-30-0-51 build-gcc48]$ source /usr/local/gromacs/5.0.7/bin/GMXRC

Now you have all the GROMACS tools available!

Analysing Simulation Data CECAM Workshop, Jülich, 14-15 October 2015

IMG_0987 (1)

This two day workshop on Analysing Simulation Data was part of the larger CECAM Macromolecular Simulation Software Workshop at the Forschnungzentrum, Jülich that I co-organised. It was the second workshop and immediately followed an introductory Software Carpentry workshop.

Prior to a few years ago I analysed all my simulation data using either VMD, often by writing Tcl scripts, or using a GROMACS g_tools, if one was available. Then I started using MDAnalysis, a python module. This enabled me to do two things: first MDAnalysis has its own analysis routines and therefore you could often do all the analysis you needed to in a simple python script. More powerfully, since it can read many different simulation formats, it can also act as a gatekeeper to the huge range of powerful python modules. The net result is I have been able to analyse my data in ways that previously would not have been possible (i.e. I would have had to write C code.)

For example we presented a paper (open access) at a Faraday Discussion meeting where we used the image-processing tools in scikt-image to analyse whether the presence of a small cell-signalling protein retarded the rate at which a three-component lipid bilayer phase separated. I posted some example code on GitHub. In other work, not all published, I have used scipy and numpy to, for example, calculate the power spectra of fluctuations in lipid bilayers (using fast fourier transforms).

Aim

The workshop brought together researchers, especially PhD students and postdoctoral researchers, and academic software developers.

The hope was that the researchers would come out of it feeling not only more confident about developing their own software and maybe even start contributing to an academic open source project but also that they could use the python ecosystem to analyse their data in new and interesting ways.

For the developers the hope was they would get to talk to a range of current and prospective users and gain a better understanding of how people are using their code (and maybe pick up some contributors along the way)

IMG_0970

Structure

I felt that a traditional didactic approach wouldn’t work; so no sessions of talks + questions. In the end I stole shamelessly from the excellent series of Collaborations Workshops run by the Software Sustainability Institute in the UK. The workshop worked towards and cumulated in a HackDay. I now believe HackDays are great ways of not only teaching but also as a way of building teams — I am writing a post of this for the SSI and will link to it here when it is up.

IMG_0988 (1)I invited developers from two biomolecular python projects: MDAnalysis and pmx. Given more budget I would have loved to invite other developers, e.g. from mdtraj. On the first day each project gave a short talk followed by around two hours of guided tutorial. Then at the end of day one, I invited participants to present analysis problems drawn from their own research that they would like to solve. Teams were allowed to form around six ideas. On day two these teams had around six hours to solve their problem, before presenting their solution to the rest of the workshop. The winning project, MDARTINI, aimed to make MDAnalysis more aware of the coarse-grained forcefield, MARTINI.

Feedback

Overall,

  • 94% of participants enjoyed the workshop,
  • 100% learnt something useful that will help their research
  • 100% would recommend a workshop like this one to other researchers.
  • 88% feel confident enough to contribute to an academic open source project.
graph-understand-enough-try

“I now understand enough to try using the following tools”

I then asked “I now understand enough to try using the following tools”. Given most participants had heard of MDAnalysis, but only a few had used it – and very few had heard of pmx – this is an encouraging shift. This was then followed up by: “I intend using the tools and methods to help my research”. Usually, the answers are a bit more pessimistic as people might understand a tool, but not have any intention of using it. Here, though it goes the other way.

graph-intend-using

“I intend using the tools and methods to help my research.”

To try and understand which parts of the workshop went well: “I enjoyed the following components of the workshop”. So, talks were ok, then the HackDay but the tutorials and meeting other researchers were most highly rated.

graph-enjoyed-components

“I enjoyed the following components of the workshop.”

Finally, to find out if there were any practical problems I asked “The following elements contributed to making the workshop a success”.

graph-elements-contributed-success

“The following elements contributed to making the workshop a success.”

The big problem here was the network; we had better connectivity in the small hotel in Jülich. It turned out there was a problem with the wireless router in the room and this was fixed a few days after this workshop. Nor did many people like the location in Jülich, however the various coffee breaks – which we were grateful to the Software Sustainability Institute for sponsoring – and the general social atmosphere were appreciated.

 

 

Lessons for next time

This type of workshop is very complicated and plenty can go wrong. Always have a Plan B. For example, assume that not everyone will be able to install all the necessary software on their laptops so come prepared with a (linux) virtual machine image that will work in all the tutorials. And don’t assume that the network will “just work”.

CECAM Macromolecular simulation software workshop

I’m co-organiser of this slightly-different CECAM workshop in October 2015 at the Forschungszentrum Jülich, Germany. Rather than following the traditional format of 3-4 day populated by talks with the odd poster session, this is an extended workshop made up of six mini-workshops. Since it is focussed on python-based tools for biomolecular simulations, of which there are an increasing number, the first mini-workshop will be a Software Carpentry bootcamp that I will be lead instructor on (helped by David Dotson from ASU). I’m also leading the next mini-workshop on analysing biomolecular simulation data.