Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.
I like to think of it as somewhere in between virtualenv and a virtual machine. Although the DOCKER website is focussed on commercial software development, and so talks about building and shipping applications, DOCKER could be of huge use to myself as a computational scientist. For example, rather than make a series of input files for my simulations available, along with a list of which software versions I used, I could instead simply make a DOCKER image available that contains all the compiled software I used along with all the input files. Then anyone should, in principle, be able to reproduce my research.
Make no mistake: reproducibility is, rightly, a coming trend. But surely all scientific results are reproduced?. Turns out if the experiment or simulation was difficult to do the answer is not so much. And when concerted efforts have been made to reproduce results reported in high impact journals, the answer is often, well, disconcerting at the very least. In a now famous study, Begley & Ellis from a pharmaceutical company, Amgen, reported that their in-house scientists were unable to reproduce 47 out of 53 landmark experimental studies in haematology and oncology. They were looking at novel, exciting findings which are more likely to be challenging to reproduce (although the pressure to over-sell is also stronger). I have no reason to think computational studies are much better. The past few years there have been a flurry of papers, comments and best practices. One can even now make a DOCKER image available via GitHub with a DOI so it can be cited independently of an article.
As I’d like to do this in the future, I’ve started to play with DOCKER and GROMACS. Since my workstation is a Mac, the DOCKER host has to run within a lightweight Linux virtual machine. First I installed DOCKER. Then I opened a DOCKER Quick Terminal and checked everything was working by downloading the hello-world image and running it
$ docker run hello-world Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world 4276590986f6: Pull complete a3ed95caeb02: Pull complete Digest: sha256:4f32210e234b4ad5cac92efacc0a3d602b02476c754f13d517e1ada048e5a8ba Status: Downloaded newer image for hello-world:latest Hello from Docker. This message shows that your installation appears to be working correctly.
Let’s get try something more real, like an Ubuntu 16.04 Server image.
$ docker run -it ubuntu bash
This drops me inside the Ubuntu image. Let’s compile GROMACS!
root@4b511a41dbf0:/# apt-get update -y root@4b511a41dbf0:/# apt-get upgrade -y root@4b511a41dbf0:/# apt-get install build-essential cmake wget openssh-server -y root@4b511a41dbf0:/# wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.1.2.tar.gz root@4b511a41dbf0:/# tar zxvf gromacs-5.1.2.tar.gz root@4b511a41dbf0:/# cd gromacs-5.1.2 root@4b511a41dbf0:/# mkdir build root@4b511a41dbf0:/# cd build root@4b511a41dbf0:/# cmake .. -DGMX_BUILD_OWN_FFTW=ON root@4b511a41dbf0:/# make root@4b511a41dbf0:/# make install root@4b511a41dbf0:/# cd
Now let’s copy over a TPR file to see how fast GROMACS is within a DOCKER container
root@4b511a41dbf0:/# scp email@example.com:benchmark.tpr . root@4b511a41dbf0:/# source /usr/local/gromacs/bin/GMXRC root@4b511a41dbf0:/# gmx mdrun -s benchmark -resethway -noconfout -maxh 0.1
Note that this is a single CPU DOCKER image. I was worried that since the DOCKER host was running inside a Linux VM it would be slow compared to running natively in Mac OS X so I ran three repeats of each and DOCKER was only 1.7% slower…
To save this DOCKER image locally, quit the session
$ docker commit -m "Installed GROMACS 5.1.2 for benchmarking" -a "Philip W Fowler" c5f1cf30c96b philipwfowler/gromacs-5.1.2 $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE philipwfowler/gromacs-5.1.2 latest 73e44c120bfa 6 seconds ago 809 MB ubuntu latest c5f1cf30c96b 2 weeks ago 120.8 MB hello-world latest 94df4f0ce8a4 3 weeks ago 967 B
Done. More soon on multiple cores, can-we-use-the-GPU? and using DOCKER on Amazon Web Services.