distributed computing

Accelerating Oxford Nanopore basecalling

It looks innocuous sitting on the desk, an Oxford Nanopore MinION, but it can produce a huge data of data from a single sequencing run. Since the nanowire works by inferring which base is in the pore by how much it reduces the flow of ions (and hence current) through the pore, the raw data is commonly called “squiggles”. Converting each of these squiggles to a sequence of nucleotides is “base-calling”. Ideally, you want to do this in real-time, i.e. as the squiggles are being produced by the MinION. Interestingly, this is becoming a challenge for the molecular biologists and bioinformaticians in our group since the flow of data is now high enough that a high-spec laptop struggles to cope. It may becoming obvious that this is not my field of expertise – and you’d be right – but I do know something about speeding up computational jobs through the use of GPUs and/or computing clusters. There appear to be two main use-cases we have for base-calling the squiggles. I’m only going to talk about nanonetcall,

1. Live basecalling.

Nick Sanderson is leading the charge here in our group (that is his hand in the photo above). He has built a workstation with a GPU and SSD disc and was playing around with the ability of nanonetcall to use multiple threads. This is our base case, which has a single process with one OpenMP thread, so by definition has a speedup of 1.0x. The folder sample-data/ contains a large number of squiggle files (fast5 files).

OMP_NUM_THREADS=1 nanonetcall sample_data/ --jobs 1 > /dev/null 

Let’s first try using 2 OpenMP threads and a single process.

OMP_NUM_THREADS=2 nanonetcall sample_data/ --jobs 1 > /dev/null 

This makes no difference whatsoever. In common with running GROMACS simulations, OpenMP doesn’t help much, in my hands at least. Let’s rule out using additional OpenMP threads and simply increase
We can simply try increasing the number of jobs to run in parallel:

OMP_NUM_THREADS=1 nanonetcall sample_data/ --jobs 2 > /dev/null    
OMP_NUM_THREADS=1 nanonetcall sample_data/ --jobs 4 > /dev/null 
OMP_NUM_THREADS=1 nanonetcall sample_data/ --jobs 8 > /dev/null 
OMP_NUM_THREADS=1 nanonetcall sample_data/ --jobs 12 > /dev/null 

These lead to speedups of 1.8x, 3.0x, 3.2x and 3.2x. These were all run on my early-2013 MacPro which has a single 6-core 3.5 GHz Intel Xeon CPU, so can run up to 12 processes with hyperthreading. I don’t know exactly how nanonetcall is parallelised, but at least a good chunk of it is Python, it is no surprise that it doesn’t scale that well since Python will always struggle due to the limitations inherent in being an interpreted language (which in Python’s case means the GIL). Now parts of nanonetcall are cleverly written using OpenCL so it can make use of a GPU if one is available. My MacPro has two AMD FirePro D300 cards. Good graphics cards, but I would have chosen better GPUs if I could. Even so using a single GPU gives a speedup of 3.3x.

nanonetcall sample_data/ --platforms Apple:0:1 Apple:1:1 --exc_opencl > /dev/null

I suggested we try one of my favourite apparently-little-known unix tools, GNU Parallel. This is simple but truly awesome command that you can install via a package manager like apt-get on Ubuntu and MacPorts on a Mac. The simplest way to use it is in a pipe like this;

find sample-data/ -name '*.fast5' | parallel -j 12 nanonetcall {} > /dev/null

This needs explaining. The find command will produce a long list of all the fast5 files. Parallel then consumes these, and will first launch 12 nanonetcall jobs, each running on a single core. As soon as one of these finishes, parallel will launch another nanonetcall job to process the next fast5 in the list. In this way parallel will ensure that there are 12 nanonetcall jobs running at all times and we rapidly work out way through the squiggles. This results in a speed up of 4.8x, so not linear, but certainly better than trying to use the ‘internal’ parallelisation of nanonetcall.

But we can do better because we can use parallel to overload the GPU too.

find sample-data/ -name '*fast5' | parallel nanonetcall {} --platforms Apple:0:1 --exc_opencl > /dev/null

This yields our fastest speed-up of 5.7x. But perhaps the GPU is getting too overloaded, so let’s try sharing the loads

find sample-data-1/ -name '*.fast5' | parallel -j 6 nanonetcall {} --platforms Apple:1:1 --exc_opencl > /dev/null &
find sample-data-2/ -name '*.fast5' | parallel -j 6 nanonetcall {} --platforms Apple:0:1 --exc_opencl > /dev/null

where I’ve split the data equally into two folders. Sure enough, this now gives a speedup of 9.9x. Now, remember there are only 12 virtual cores, so if we try running more processes, we should start to see a performance penalty, but let’s try!

find sample-data-1/ -name '*.fast5' | parallel -j 12 nanonetcall {} --platforms Apple:1:1 --exc_opencl > /dev/null &
find sample-data-2/ -name '*.fast5' | parallel -j 12 nanonetcall {} --platforms Apple:0:1 --exc_opencl > /dev/null

Unexpectedly, this ekes out a bit more speed at 10.9x! So by ignoring the inherently poor scalability built-in to nanonetcall and using GNU Parallel in harness with two GPUs, we have managed to speedup the base-calling by a factor of nearly eleven. I expect Nick to manage even higher speedups using more powerful GPUs.

2. Batch basecalling.

I sit in the same office as two of our bioinformaticians and even with a good setup for live-basecalling, it sounds like there are still occasions when they need to baseball a large dataset (~50GB) of squiggle files. This might be because part of the live base calling process failed, or even the MinION writing files to a different folder due to some software update, or perhaps you simply want to compare several different pieces of basecalling software or even just compare across versions. You want to load the data onto some “computational resource”, press go and have it chew its way through the squiggle files as quickly and reliably as possible. There are clearly many ways to do this; here I am going to look at a simple linux computing cluster with a head node and a large number of slave nodes. Jobs are distributed across the slave nodes using a batch scheduler and queuing system (I use SLURM).

Hang Phan, another bioinformatician, had a large dataset of squiggles that needed 2D base calling and she wanted to try nanonetcall. To demonstrate the idea, I simply installed nanonetcall on my venerable but perfectly serviceable cluster of Apple Xserves. Then it is just a matter of rsync`ing the data over, writing a simple bit of Python to (a) group together sets of fast5 files (I chose groups of 50) and then (b) create a SLURM job submission file for each group and finally (c) submit the job to the queue.

The advantages of this well-established “bare metal” approach are that
– it is truly scalable: if you want to speed up the process, you add more nodes
– it is reliable; my Ubuntu/SLURM/NFS cluster wasn’t switched off for over half a year (and this isn’t unusual)
– you can walk away and let the scheduler handle what runs when

As you can see from my other posts, I am a fan of cloud and distributed computing, but in this case a good old-fashioned computing cluster (preferably one with GPUs!) fits the bill nicely.

bashthebug.net alpha launch

I’m planning to launch a citizen science project, bashthebug.net, in 2017 which has two distinct ways anyone can help combat antibiotic resistance. I’ve revamped and relaunched what will ultimately become the public-facing project website – please have a look.

The first strand is closer to the light of day and will help the international Tuberculosis consortium, CRyPTIC. This global group of researchers, of which I am a part, will be collecting over 100,000 samples from patients with TB. Each sample will be tested to see which antibiotics are effective as well as having the genome of its M.tuberculosis bacterium sequenced. In practice, because each sample is measured at least three different times, that means looking at 300,000 96-well plates. Step forward Zooniverse! This type of large-scale image classification is exactly the sort of thing Zooniverse Citizen Science projects excel at. I hope to launch this project in early 2017.

The second citizen science project is more complex and I have recently applied for funding. As described in my Research, I am developing methods that can predict whether novel or rarely-observed mutations cause resistance to an antibiotic (or not). These require a lot of computer resource and the idea is to build a volunteer computing project, like [climate prediction.net](http://climate prediction.net), using the BOINC framework, so that volunteers can download a program onto their laptop or desktop. When they’re not using their computer, the program will retrieve part of a problem and run the simulations on their machine before returning the results over the internet. These type of project is more complicated and requires more infrastructure to be setup, but with some luck, I’d hope to have a soft launch late in 2017.

GROMACS in DOCKER: First Steps

DOCKER is cool. But what is it? From the DOCKER webpage

Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.

I like to think of it as somewhere in between virtualenv and a virtual machine. Although the DOCKER website is focussed on commercial software development, and so talks about building and shipping applications, DOCKER could be of huge use to myself as a computational scientist. For example, rather than make a series of input files for my simulations available, along with a list of which software versions I used, I could instead simply make a DOCKER image available that contains all the compiled software I used along with all the input files. Then anyone should, in principle, be able to reproduce my research.

 

Make no mistake: reproducibility is, rightly, a coming trend. But surely all scientific results are reproduced?. Turns out if the experiment or simulation was difficult to do the answer is not so much. And when concerted efforts have been made to reproduce results reported in high impact journals, the answer is often, well, disconcerting at the very least. In a now famous study, Begley & Ellis from a pharmaceutical company, Amgen, reported that their in-house scientists were unable to reproduce 47 out of 53 landmark experimental studies in haematology and oncology. They were looking at novel, exciting findings which are more likely to be challenging to reproduce (although the pressure to over-sell is also stronger). I have no reason to think computational studies are much better. The past few years there have been a flurry of paperscomments and best practices. One can even now make a DOCKER image available via GitHub with a DOI so it can be cited independently of an article.

As I’d like to do this in the future, I’ve started to play with DOCKER and GROMACS. Since my workstation is a Mac, the DOCKER host has to run within a lightweight Linux virtual machine. First I installed DOCKER. Then I opened a DOCKER Quick Terminal and checked everything was working by downloading the hello-world image and running it

$ docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world

4276590986f6: Pull complete 
a3ed95caeb02: Pull complete 
Digest: sha256:4f32210e234b4ad5cac92efacc0a3d602b02476c754f13d517e1ada048e5a8ba
Status: Downloaded newer image for hello-world:latest

Hello from Docker.
This message shows that your installation appears to be working correctly.

Let’s get try something more real, like an Ubuntu 16.04 Server image.

$ docker run -it ubuntu bash

This drops me inside the Ubuntu image. Let’s compile GROMACS!

root@4b511a41dbf0:/# apt-get update -y
root@4b511a41dbf0:/# apt-get upgrade -y
root@4b511a41dbf0:/# apt-get install build-essential cmake wget openssh-server -y
root@4b511a41dbf0:/# wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.1.2.tar.gz
root@4b511a41dbf0:/# tar zxvf gromacs-5.1.2.tar.gz 
root@4b511a41dbf0:/# cd gromacs-5.1.2
root@4b511a41dbf0:/# mkdir build
root@4b511a41dbf0:/# cd build
root@4b511a41dbf0:/# cmake .. -DGMX_BUILD_OWN_FFTW=ON
root@4b511a41dbf0:/# make 
root@4b511a41dbf0:/# make install
root@4b511a41dbf0:/# cd

Now let’s copy over a TPR file to see how fast GROMACS is within a DOCKER container

root@4b511a41dbf0:/# scp fowler@somewhere.else:benchmark.tpr .
root@4b511a41dbf0:/# source /usr/local/gromacs/bin/GMXRC
root@4b511a41dbf0:/# gmx mdrun -s benchmark -resethway -noconfout -maxh 0.1

Note that this is a single CPU DOCKER image. I was worried that since the DOCKER host was running inside a Linux VM it would be slow compared to running natively in Mac OS X so I ran three repeats of each and DOCKER was only 1.7% slower…

To save this DOCKER image locally, quit the session

$ docker commit -m "Installed GROMACS 5.1.2 for benchmarking" -a "Philip W Fowler" c5f1cf30c96b philipwfowler/gromacs-5.1.2
$ docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
philipwfowler/gromacs-5.1.2   latest              73e44c120bfa        6 seconds ago       809 MB
ubuntu                        latest              c5f1cf30c96b        2 weeks ago         120.8 MB
hello-world                   latest              94df4f0ce8a4        3 weeks ago         967 B

Done. More soon on multiple cores, can-we-use-the-GPU? and using DOCKER on Amazon Web Services.

GROMACS on AWS: compiling against CUDA

If you want to compile GROMACS to run on a GPU Amazon Web Services EC2 instance, please first read these instructions on how to compile GROMACS on an AMI without CUDA. These instructions then explain how to install the CUDA toolkit and compile GROMACS against it.

The first few steps are loosely based on these instructions, except rather than download the NVIDIA driver, we shall download the CUDA toolkit since this includes an NVIDIA driver. First we need to make sure the kernel is updated

sudo yum install kernel-devel-`uname -r`
sudo reboot

Safest to do a quick reboot here. Assuming you are in your HOME directory, move into your packages folder.

cd packages/

And download the CUDA toolkit (version 7.5 at present)

wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.run
sudo /bin/bash cuda_7.5.18_linux.run

It will ask you to accept the license and then asks you a series of questions. I answer Yes to everything except installing the CUDA samples. Now add the following to the end of your ~/.bash_profile using a text editor

export PATH; PATH="/usr/local/cuda-7.5/bin:$PATH"
export LD_LIBRARY_PATH; LD_LIBRARY_PATH="/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH"

Now we can build GROMACS against the CUDA toolkit. I’m assuming you’ve already downloaded a version of GROMACS and probably installed a non-CUDA version of GROMACS (so you’ll already have one build directory). Let’s make another build directory. You can call it what you want, but some kind of consistent naming can be helpful. The -j 4 flag assumes you have four cores to compile on – this will depend on the EC2 instance you have deployed. Obviously the more cores, the faster, but GROMACS only takes minutes, not hours.

mkdir build-gcc48-cuda75
cd build-gcc48-cuda75
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/5.0.7-cuda/  -DGMX_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
make -j 4
sudo make install

To load all the GROMACS tools into your $PATH, run this command and you are done!

source /usr/local/gromacs/5.0.7-cuda/bin/GMXRC

If you run this mdrun binary on a GPU instance it should automatically detect the GPU and run on it, assuming your MDP file options support this. If it does you will see this zip by in the log file as GROMACS starts up

1 GPU detected:
  #0: NVIDIA GRID K520, compute cap.: 3.0, ECC:  no, stat: compatible

1 GPU auto-selected for this run.
Mapping of GPU to the 1 PP rank in this node: #0

Will do PME sum in reciprocal space for electrostatic interactions.

Depending on the size and forcefield you are using you should get a speedup of at least a factor two, and realistically three, using a GPU in combination with the CPUs. For example, see these benchmarks.

GROMACS on AWS: compiling GCC

These are some quick instructions on how to build a more recent version of GCC than is provided by the devel-tools package on the Cent OS based Amazon Linux AMI. (currently GCC 4.8.3) You may, for example, wish to use a more recent version to compile GROMACS – that is my interest. If so, then these instructions assume you have done all the steps up to, but not including, compiling GROMACS in this post. Compiling GCC needs several GB of disk space so if you use the default 8GB for an EC2 AMI it will run out of disk space; increasing this to 12 GB is sufficient.

First let’s find out what versions of GCC are available.

[ec2-user@ip-172-30-0-42 ~]$ svn ls svn://gcc.gnu.org/svn/gcc/tags | grep gcc | grep release
...
gcc_4_8_3_release/
gcc_4_8_4_release/
gcc_4_8_5_release/
gcc_4_9_0_release/
gcc_4_9_1_release/
gcc_4_9_2_release/
gcc_4_9_3_release/
gcc_5_1_0_release/
gcc_5_2_0_release/
gcc_5_3_0_release/

As you can see when I wrote this 5.3.0 was the most recent stable version, so let’s try that one. I’m going to compile everything inside a folder called packages/ so let’s create that then use subversion to check out version 5.3.0 (this is going to download a lot of files so will take a minute or two)

[ec2-user@ip-172-30-0-42 ~]$ mkdir ~/packages
[ec2-user@ip-172-30-0-42 ~]$ cd ~/packages
[ec2-user@ip-172-30-0-42 packages]$ svn co svn://gcc.gnu.org/svn/gcc/tags/gcc_5_3_0_release/
A    gcc_5_3_0_release/config-ml.in
A    gcc_5_3_0_release/libitm
...
A    gcc_5_3_0_release/fixincludes/fixopts.c
A    gcc_5_3_0_release/install-sh
A    gcc_5_3_0_release/ylwrap
 U   gcc_5_3_0_release
Checked out revision 232268.
[ec2-user@ip-172-30-0-42 packages]$ cd gcc_5_3_0_release/

GCC needs some prerequisites which are installed by this script.

[ec2-user@ip-172-30-0-42 gcc_5_3_0_release]$ ./contrib/download_prerequisites 
--2016-01-12 13:24:23--  ftp://gcc.gnu.org/pub/gcc/infrastructure/mpfr-2.4.2.tar.bz2
       => ‘mpfr-2.4.2.tar.bz2’
Resolving gcc.gnu.org (gcc.gnu.org)... 209.132.180.131
...
isl-0.14.tar.bz2    100%[=====================>]   1.33M   693KB/s   in 2.0s   

2016-01-12 13:24:39 (693 KB/s) - ‘isl-0.14.tar.bz2’ saved [1399896]

Go up a level, make a build directory and move there.

[ec2-user@ip-172-30-0-42 gcc_5_3_0_release]$ cd ..
[ec2-user@ip-172-30-0-42 packages]$ mkdir gcc_5_3_0_release_build/
[ec2-user@ip-172-30-0-42 packages]$ cd gcc_5_3_0_release_build/

Now we are in a position to compile GCC 5.3.0. This took about 50 min using all eight cores of a c3.2xlarge instance, so this is a good moment to go and have lunch. Note that since the instance I am compiling on has 8 virtual CPUs, I can use the -j 8 flag to tell make to use up to 8 threads during compilation which will speed things up. If you are using a micro instance, just omit the
-j 8 (but good luck as that would take a long time).

[ec2-user@ip-172-30-0-42 gcc_5_3_0_release_build]$ ../gcc_5_3_0_release/configure && make -j 8 && sudo make install && echo "success" && date
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether ln works... yes
...

Hopefully you now have a newer version of GCC to compile binaries with. With any luck it might even give you a performance boost.

GROMACS on AWS: Performance and Cost

So we have created an Amazon Machine Image (AMI) with GROMACS installed. In this post I will examine the sort of single core performance you can expect and much this is likely to cost compared to other compute options you might have.

Benchmark

To test the different types of instances you can deploy our GROMACS image on, we need a benchmark system to test. For this I’ve chosen a peptide MFS transporter in a simple POPC lipid bilayer solvated by water. This is very similar to the simulations found in this paper. Or to put it another way: 78,000 atoms in a cube, most of which are water, some belong to lipids and the rest, protein. It is fully atomistic and is described using the CHARMM27 forcefield.

Computing Resources Tested

I tried to use a range of compute resources to provide a good comparison for AWS. First, and most obviously, I used my workstation on my desk, which is a late-2013 MacPro which has 12 Intel Xeon cores. In our department we also have a small compute cluster, each node of which has 16 cores. Some of these nodes also have a K20 GPU. Then I also have access to a much larger computing cluster run by the University. Unfortunately, since the division I am in has decided not to contribute to its running, I have to pay for any significant usage.

Rather than test all the different types of instances available on EC2, I tested an example from each of the current (m4) and older generation (m3) of non-burstable general purpose instances. I also tested an example from the latest generation of compute instances (c4) and finally the smaller instance from the GPU instances (g2).

Performance

fig-aws-gromacs-performance

The performance, in nanoseconds per day for a single compute core, is shown on the left (bigger is better).

One worry about AWS EC2 is that for a highly-optimised compute code, like GROMACS, performance might suffer due to the layers of virtualisation, but, as you can see, even the current generation of general purpose instances is as fast as my MacPro workstation. The fastest machine, perhaps unsurprisingly, is the new University compute cluster. On AWS, the compute c2 class is faster than the current general purpose m4 class, which in turn is faster than the older generation general purpose m3 class. Finally, as you might expect, using a GPU boosts performance by slightly more than 3x.

 

Cost

fig-aws-gromacs-costI’m going to do a “real” comparison. So if I buy a compute cluster and keep it in the department I only have to pay the purchase cost but none of the running costs. So I’m assuming the workstation is £2,500 and a single 16-core node is £4,000 and both of these have a five year lifetime. Alternatively I can use the university’s high performance computing clusters at 2p per core hour. This obviously is unfair on the university facility as this does include operational costs, like electricity, staff etc, and you can see that reflected in the difference in costs.

So AWS EC2 more or less expensive? This hinges on whether you use it in the standard “on demand” manner or instead get access through bidding via the market. The later is significantly cheaper but you only have access whilst your bid price is above the current “spot price” and so you can’t guarantee access and your simulations have to be able to cope with restarts. Since the spot price varies with time, I took the average of two prices at different times on Wed 13 Jan 2016.

As you can see AWS is more expensive per core hour if you use it “on demand”, but is cheaper than the university facility if you are willing to surf the market. Really, though we should be considering the cost efficiency i.e. the cost per nanosecond as this also takes into account the performance.

 

Cost efficiency

fig-aws-gromacs-efficiency

 

When we do this an interesting picture emerges: using AWS EC2 via bidding on the market is cheaper than using the university facility and can be as cheap as buying your own hardware even if you don’t have to pay the running costs. Furthermore, as you’d expect, using a GPU decreases cost and so should be a no-brainer for GROMACS.

Of course, this assumes lots of people don’t start using the EC2 market, thereby driving the spot price up…

GROMACS on AWS

In this post I’m going to show how I created an Amazon Machine Instance with GROMACS 5.0.7 installed for use in the Amazon Web Services cloud.

I’m going to assume that you have signed up for Amazon Web Services (AWS), created an Identity and Access Management (IAM) user (each AWS account can have multiple IAM users), created an SSH key pair for that user, downloaded it, given it an appropriate name with the correct permissions and placed it in. ~/.ssh. Amazon have a good tutorial that cover the above actions. One thing that confused me is if you already have an amazon.com or amazon.co.uk account then you can use this to signup to AWS. In other words, depending on your mood, you can order a book or 10,000 CPU hours of compute. I felt a bit nervous about setting up an account backed by my credit card – if you also feel nervous, then Amazon offer a Free Tier which permits you at present to use up to 750 hours a month, as long as you only use the smallest virtual machine instance (t2.micro). If you use more than this, or use a more powerful instance then you will be billed.

First, log in to your AWS console. This will have a strange URL like

https://123456789012.signin.aws.amazon.com/console

where 123456789012 is your AWS account number. You should get something that looks like this.

AWS Management Console

AWS Management Console

Next we need to create an EC2 (ElastiCloud) instance based on one of the standard virtual machine images and download and compile GROMACS on it. In the AWS Management Console, choosing “EC2” in the top left should bring you here

AWS EC2 dashboard

AWS EC2 dashboard

Now click the Blue “Launch Instance” button.

Step 1. Choose an Amazon Machine Instance (AMI).

Here we can choose one of the standard virtual machine images to compile GROMACS on. Let’s keep it simple and use the standard Amazon Linux AMI.

aws-ec2-1

Step 2. Choose an Instance Type.

The important thing to remember here is that the image we create can be run on any instance type. So if we want to compile on multiple cores to speed things up we can choose an instance with say 8 vCPUs, or if we don’t want to be billed and are willing to wait that we can choose the t2.micro instance. Let’s choose an c4.2xlarge instance which has 8 vCPUs. You can at this stage hit “Review and Launch” but it is worth checking the amount of storage allocated to the instance. So hit Next:Configure Instance Details. I’m not going to fiddle with these options. Hit Next:Add Storage.

aws-ec2-2

Step 4. Add storage.

What I have found here is if you use the version of gcc installed via yum (4.8.3) then 8 GB is fine, but if you want to compile a more recent version you will need at least 12 GB.
I’m going to accept the rest of the defaults for the rest of the steps so will click “Review and Launch” now.

aws-ec2-4

Step 7. Review instance Launch.

Check it all looks ok and hit “Launch”. This will bring up a window. Here it is crucial that you choose the name of the keypair you created and downloaded. As you need a different key pair for each IAM user for each Amazon Region, it is worth naming them carefully as you will otherwise rapidly get very confused. Also Amazon don’t let you download a key pair again so you have to be careful with them. You can see mine is called

PhilFowler-key-pair-euwest.pem

Which contains the name of my IAM user and the name of the AWS region it will work for, here EU West, which is Ireland. Hit Launch.

aws-ec2-7b

Launch Status

This window gives you some links on how to connect to the AWS instance. Hit View Instances””. It may take a minute or two for your instance to be created. During this time the status is given as “Initializing”. When it is finished, you can click on your new instance (you should have only one) and it will give you a whole host of information. We need the public IP address and the name of our SSH key pair so we can ssh to the instance (Note that the user by default is called ec2-user).

aws-instances-2

lambda 508 $ ssh -i "PhilFowler-key-pair-euwest.pem" ec2-user@54.229.73.128
The authenticity of host '54.229.73.128 (54.229.73.128)' can't be established.
ECDSA key fingerprint is SHA256:N+B3toLxLE3vRuuzLZWF44N9qb3ucUVVU/RD00W3iNo.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '54.229.73.128' (ECDSA) to the list of known hosts.

__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2015.09-release-notes/
11 package(s) needed for security, out of 27 available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-172-30-0-42 ~]$

Installing pre-requisites

Amazon Linux is based on CentOS so uses the yum package manager. You might be more familiar with apt-get if you use Ubuntu but the principles are similar. Worth following their recommendation and applying all the updates – this will spew out a lot of information to the terminal and asks you to confirm.

[ec2-user@ip-172-30-0-42 ~]$ sudo yum update
Loaded plugins: priorities, update-motd, upgrade-helper
Resolving Dependencies
--> Running transaction check
---> Package aws-cli.noarch 0:1.9.1-1.29.amzn1 will be updated
---> Package aws-cli.noarch 0:1.9.11-1.30.amzn1 will be an update
---> Package binutils.x86_64 0:2.23.52.0.1-30.64.amzn1 will be updated
---> Package binutils.x86_64 0:2.23.52.0.1-55.65.amzn1 will be an update
---> Package ec2-net-utils.noarch 0:0.4-1.23.amzn1 will be updated
...
sudo.x86_64 0:1.8.6p3-20.21.amzn1
vim-common.x86_64 2:7.4.944-1.35.amzn1
vim-enhanced.x86_64 2:7.4.944-1.35.amzn1
vim-filesystem.x86_64 2:7.4.944-1.35.amzn1
vim-minimal.x86_64 2:7.4.944-1.35.amzn1

Complete!

This instance is fairly basic and there is no version of gcc, cmake etc. But we can install them via yum

[ec2-user@ip-172-30-0-42 ~]$ sudo yum install gcc gcc-c++ openmpi-devel mpich-devel cmake svn texinfo-tex flex zip libgcc.i686 glibc-devel.i686
...
texlive-xdvi.noarch 2:svn26689.22.85-27.21.amzn1
texlive-xdvi-bin.x86_64 2:svn26509.0-27.20130427_r30134.21.amzn1
zziplib.x86_64 0:0.13.62-1.3.amzn1

Complete!

Next we need to add some the openmpi executables to the $PATH. These will only persist for this session; to make them permanent add them to the .bashrc.

export PATH=/usr/lib64/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib

Now we hit a potential problem. The version of gcc installed by yum is fairly old

[ec2-user@ip-172-30-0-42 ~]$ gcc --version
gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Having said that 4.8.3 should be good enough for GROMACS. I’ll push ahead using this version, but in a subsequent post I also detail how to download and install gcc 5.3.0.

Compiling GROMACS

First, let’s get the GROMACS source code using wget. I’m going to compile version 5.0.7 since I’ve got benchmarks for this one, but you could equally install 5.1.X.

[ec2-user@ip-172-30-0-42 ~]$ mkdir ~/packages
[ec2-user@ip-172-30-0-42 ~]$ cd ~/packages
[ec2-user@ip-172-30-0-42 packages]$ wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.0.7.tar.gz
[ec2-user@ip-172-30-0-42 packages]$ tar zxvf gromacs-5.0.7.tar.gz
[ec2-user@ip-172-30-0-42 packages]$ cd gromacs-5.0.7

Now let’s make a build directory, move there and then issue the cmake directive

[ec2-user@ip-172-30-0-42 gromacs-5.0.7]$ mkdir build-gcc48
[ec2-user@ip-172-30-0-42 gromacs-5.0.7]$ cd build-gcc48
[ec2-user@ip-172-30-0-42 build-gcc48]$ cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX='/usr/local/gromacs/5.0.7/

The compilation step will take a good few minutes on a single core machine, but as I’ve got 8 virtual CPUs to play with I can give make the “-j 8” flag which is going to speed things up.

[ec2-user@ip-172-30-0-42 build-gcc48]$ make -j 8
...
Building CXX object src/programs/CMakeFiles/gmx.dir/gmx.cpp.o
Building CXX object src/programs/CMakeFiles/gmx.dir/legacymodules.cpp.o
Linking CXX executable ../../bin/gmx
[100%] Built target gmx
Linking CXX executable ../../bin/template
[100%] Built target template

This took 90 seconds using all 8 cores. Now we can install the binary. Note that because I told cmake to install it in /usr/local/gromacs/5.0.7 so I can keep track of different versions, rather than just having /usr/local/gromacs

[ec2-user@ip-172-30-0-51 build-gcc48]$ sudo make install
...
-- Installing: Creating symbolic link /usr/local/gromacs/5.0.7/bin/g_velacc
-- Installing: Creating symbolic link /usr/local/gromacs/5.0.7/bin/g_wham
-- Installing: Creating symbolic link /usr/local/gromacs/5.0.7/bin/g_wheel

To add this version of GROMACS to your $PATH (add this to .bashrc to avoid doing this each time)

[ec2-user@ip-172-30-0-51 build-gcc48]$ source /usr/local/gromacs/5.0.7/bin/GMXRC

Now you have all the GROMACS tools available!